* [PATCH 0/4] xfs: more reflink fixes
@ 2017-12-15 17:11 Darrick J. Wong
2017-12-15 17:11 ` [PATCH 1/4] xfs: track cowblocks separately in i_flags Darrick J. Wong
` (4 more replies)
0 siblings, 5 replies; 22+ messages in thread
From: Darrick J. Wong @ 2017-12-15 17:11 UTC (permalink / raw)
To: darrick.wong; +Cc: linux-xfs
Hi all,
Here are more reflink fixes that I've accumulated over the past week.
I added CLONERANGE support to fsstress and reran xfstests, which very
rapidly found problems. This series fixes a number of problems dug up
by xfs/017.
The first two patches disentangle a mess between the incore inode radix
tree tags and the corresponding inode flags. We use the inode eofblocks
flags as a proxy for whether or not the same eofblocks tag is set for
the radix tree, which means that we can't use that single iflag to
represent the cowblocks radix tree tag, so give the cowblocks tag a
corresponding inode flag. This fixes a bug where the cowblocks tag
silently isn't set if the inode already had the eofblocks tag set.
The second patch then fixes the clearing of the cow tag & reflink flag
when truncating the file -- we only want to clear the reflink flag if
both data & cow forks are empty, and we can only clear the cow tag if
the cow fork is totally empty; i_cnextents only tracks real extents.
The third patch flushes all the cow reservations when we remount the
filesystem ro, to put the fs in a clean quiesced state (i.e. no orphaned
cow staging extents) so that we can reduce mount times later.
The fourth patch sets the cow tag when we do direct allocations into the
cow fork so that any hint-aligned remnants are cleaned out at remount-ro
time.
These patches apply directly to the end of 'xfs: reflink fixes'.
--D
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 1/4] xfs: track cowblocks separately in i_flags
2017-12-15 17:11 [PATCH 0/4] xfs: more reflink fixes Darrick J. Wong
@ 2017-12-15 17:11 ` Darrick J. Wong
2017-12-19 0:12 ` Dave Chinner
2017-12-21 13:36 ` Christoph Hellwig
2017-12-15 17:11 ` [PATCH 2/4] xfs: don't be so eager to clear the cowblocks tag on truncate Darrick J. Wong
` (3 subsequent siblings)
4 siblings, 2 replies; 22+ messages in thread
From: Darrick J. Wong @ 2017-12-15 17:11 UTC (permalink / raw)
To: darrick.wong; +Cc: linux-xfs
From: Darrick J. Wong <darrick.wong@oracle.com>
The EOFBLOCKS/COWBLOCKS tags are totally separate things, so track them
with separate i_flags. Right now we're abusing IEOFBLOCKS for both,
which is totally bogus because we won't tag the inode with COWBLOCKS if
IEOFBLOCKS was set by a previous tagging of the inode with EOFBLOCKS.
Found by wiring up clonerange to fsstress in xfs/017.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/xfs_icache.c | 33 ++++++++++++++++++++++++---------
fs/xfs/xfs_inode.h | 1 +
2 files changed, 25 insertions(+), 9 deletions(-)
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 43005fb..58d2d42 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -1536,8 +1536,23 @@ xfs_inode_free_quota_eofblocks(
return __xfs_inode_free_quota_eofblocks(ip, xfs_icache_free_eofblocks);
}
+static inline unsigned long
+xfs_iflag_for_tag(
+ int tag)
+{
+ switch (tag) {
+ case XFS_ICI_EOFBLOCKS_TAG:
+ return XFS_IEOFBLOCKS;
+ case XFS_ICI_COWBLOCKS_TAG:
+ return XFS_ICOWBLOCKS;
+ default:
+ ASSERT(0);
+ return 0;
+ }
+}
+
static void
-__xfs_inode_set_eofblocks_tag(
+__xfs_inode_set_blocks_tag(
xfs_inode_t *ip,
void (*execute)(struct xfs_mount *mp),
void (*set_tp)(struct xfs_mount *mp, xfs_agnumber_t agno,
@@ -1552,10 +1567,10 @@ __xfs_inode_set_eofblocks_tag(
* Don't bother locking the AG and looking up in the radix trees
* if we already know that we have the tag set.
*/
- if (ip->i_flags & XFS_IEOFBLOCKS)
+ if (ip->i_flags & xfs_iflag_for_tag(tag))
return;
spin_lock(&ip->i_flags_lock);
- ip->i_flags |= XFS_IEOFBLOCKS;
+ ip->i_flags |= xfs_iflag_for_tag(tag);
spin_unlock(&ip->i_flags_lock);
pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
@@ -1587,13 +1602,13 @@ xfs_inode_set_eofblocks_tag(
xfs_inode_t *ip)
{
trace_xfs_inode_set_eofblocks_tag(ip);
- return __xfs_inode_set_eofblocks_tag(ip, xfs_queue_eofblocks,
+ return __xfs_inode_set_blocks_tag(ip, xfs_queue_eofblocks,
trace_xfs_perag_set_eofblocks,
XFS_ICI_EOFBLOCKS_TAG);
}
static void
-__xfs_inode_clear_eofblocks_tag(
+__xfs_inode_clear_blocks_tag(
xfs_inode_t *ip,
void (*clear_tp)(struct xfs_mount *mp, xfs_agnumber_t agno,
int error, unsigned long caller_ip),
@@ -1603,7 +1618,7 @@ __xfs_inode_clear_eofblocks_tag(
struct xfs_perag *pag;
spin_lock(&ip->i_flags_lock);
- ip->i_flags &= ~XFS_IEOFBLOCKS;
+ ip->i_flags &= ~xfs_iflag_for_tag(tag);
spin_unlock(&ip->i_flags_lock);
pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
@@ -1630,7 +1645,7 @@ xfs_inode_clear_eofblocks_tag(
xfs_inode_t *ip)
{
trace_xfs_inode_clear_eofblocks_tag(ip);
- return __xfs_inode_clear_eofblocks_tag(ip,
+ return __xfs_inode_clear_blocks_tag(ip,
trace_xfs_perag_clear_eofblocks, XFS_ICI_EOFBLOCKS_TAG);
}
@@ -1724,7 +1739,7 @@ xfs_inode_set_cowblocks_tag(
xfs_inode_t *ip)
{
trace_xfs_inode_set_cowblocks_tag(ip);
- return __xfs_inode_set_eofblocks_tag(ip, xfs_queue_cowblocks,
+ return __xfs_inode_set_blocks_tag(ip, xfs_queue_cowblocks,
trace_xfs_perag_set_cowblocks,
XFS_ICI_COWBLOCKS_TAG);
}
@@ -1734,6 +1749,6 @@ xfs_inode_clear_cowblocks_tag(
xfs_inode_t *ip)
{
trace_xfs_inode_clear_cowblocks_tag(ip);
- return __xfs_inode_clear_eofblocks_tag(ip,
+ return __xfs_inode_clear_blocks_tag(ip,
trace_xfs_perag_clear_cowblocks, XFS_ICI_COWBLOCKS_TAG);
}
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index b2136af..d383e39 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -232,6 +232,7 @@ static inline bool xfs_is_reflink_inode(struct xfs_inode *ip)
* log recovery to replay a bmap operation on the inode.
*/
#define XFS_IRECOVERY (1 << 11)
+#define XFS_ICOWBLOCKS (1 << 12)/* has the cowblocks tag set */
/*
* Per-lifetime flags need to be reset when re-using a reclaimable inode during
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 2/4] xfs: don't be so eager to clear the cowblocks tag on truncate
2017-12-15 17:11 [PATCH 0/4] xfs: more reflink fixes Darrick J. Wong
2017-12-15 17:11 ` [PATCH 1/4] xfs: track cowblocks separately in i_flags Darrick J. Wong
@ 2017-12-15 17:11 ` Darrick J. Wong
2017-12-19 0:12 ` Dave Chinner
2017-12-21 13:37 ` Christoph Hellwig
2017-12-15 17:11 ` [PATCH 3/4] xfs: remove leftover CoW reservations when remounting ro Darrick J. Wong
` (2 subsequent siblings)
4 siblings, 2 replies; 22+ messages in thread
From: Darrick J. Wong @ 2017-12-15 17:11 UTC (permalink / raw)
To: darrick.wong; +Cc: linux-xfs
From: Darrick J. Wong <darrick.wong@oracle.com>
Currently, xfs_itruncate_extents clears the cowblocks tag if i_cnextents
is zero. This is wrong, since i_cnextents only tracks real extents in
the CoW fork, which means that we could have some delayed CoW
reservations still in there that will now never get cleaned.
Fix a further bug where we /don't/ clear the reflink iflag if there are
any attribute blocks -- really, it's only safe to clear the reflink flag
if there are no data fork extents and no cow fork extents.
Found by adding clonerange to fsstress in xfs/017.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/xfs_inode.c | 28 +++++++++++++++++++---------
1 file changed, 19 insertions(+), 9 deletions(-)
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index b41952a..6f95bdb 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1487,6 +1487,24 @@ xfs_link(
return error;
}
+/* Clear the reflink flag and the cowblocks tag if possible. */
+static void
+xfs_itruncate_clear_reflink_flags(
+ struct xfs_inode *ip)
+{
+ struct xfs_ifork *dfork;
+ struct xfs_ifork *cfork;
+
+ if (!xfs_is_reflink_inode(ip))
+ return;
+ dfork = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
+ cfork = XFS_IFORK_PTR(ip, XFS_COW_FORK);
+ if (dfork->if_bytes == 0 && cfork->if_bytes == 0)
+ ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
+ if (cfork->if_bytes == 0)
+ xfs_inode_clear_cowblocks_tag(ip);
+}
+
/*
* Free up the underlying blocks past new_size. The new size must be smaller
* than the current size. This routine can be used both for the attribute and
@@ -1583,15 +1601,7 @@ xfs_itruncate_extents(
if (error)
goto out;
- /*
- * Clear the reflink flag if there are no data fork blocks and
- * there are no extents staged in the cow fork.
- */
- if (xfs_is_reflink_inode(ip) && ip->i_cnextents == 0) {
- if (ip->i_d.di_nblocks == 0)
- ip->i_d.di_flags2 &= ~XFS_DIFLAG2_REFLINK;
- xfs_inode_clear_cowblocks_tag(ip);
- }
+ xfs_itruncate_clear_reflink_flags(ip);
/*
* Always re-log the inode so that our permanent transaction can keep
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 3/4] xfs: remove leftover CoW reservations when remounting ro
2017-12-15 17:11 [PATCH 0/4] xfs: more reflink fixes Darrick J. Wong
2017-12-15 17:11 ` [PATCH 1/4] xfs: track cowblocks separately in i_flags Darrick J. Wong
2017-12-15 17:11 ` [PATCH 2/4] xfs: don't be so eager to clear the cowblocks tag on truncate Darrick J. Wong
@ 2017-12-15 17:11 ` Darrick J. Wong
2017-12-19 0:17 ` Dave Chinner
2017-12-19 21:08 ` [PATCH v2 " Darrick J. Wong
2017-12-15 17:11 ` [PATCH 4/4] xfs: set cowblocks tag for direct cow writes too Darrick J. Wong
2017-12-19 21:09 ` [PATCH 5/4] xfs: clean up cow mappings during fs data freeze Darrick J. Wong
4 siblings, 2 replies; 22+ messages in thread
From: Darrick J. Wong @ 2017-12-15 17:11 UTC (permalink / raw)
To: darrick.wong; +Cc: linux-xfs
From: Darrick J. Wong <darrick.wong@oracle.com>
When we're remounting the filesystem readonly, remove all CoW
preallocations prior to going ro. If the fs goes down after the ro
remount, we never clean up the staging extents, which means xfs_check
will trip over them on a subsequent run. Practically speaking, the
next mount will clean them up too, so this is unlikely to be seen.
Found by adding clonerange to fsstress and running xfs/017.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/xfs_super.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index f663022..7b6d150 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1369,6 +1369,14 @@ xfs_fs_remount(
/* rw -> ro */
if (!(mp->m_flags & XFS_MOUNT_RDONLY) && (*flags & MS_RDONLY)) {
+ /* Get rid of any leftover CoW reservations... */
+ cancel_delayed_work_sync(&mp->m_cowblocks_work);
+ error = xfs_icache_free_cowblocks(mp, NULL);
+ if (error) {
+ xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
+ return error;
+ }
+
/* Free the per-AG metadata reservation pool. */
error = xfs_fs_unreserve_ag_blocks(mp);
if (error) {
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 4/4] xfs: set cowblocks tag for direct cow writes too
2017-12-15 17:11 [PATCH 0/4] xfs: more reflink fixes Darrick J. Wong
` (2 preceding siblings ...)
2017-12-15 17:11 ` [PATCH 3/4] xfs: remove leftover CoW reservations when remounting ro Darrick J. Wong
@ 2017-12-15 17:11 ` Darrick J. Wong
2017-12-19 0:18 ` Dave Chinner
2017-12-21 13:39 ` Christoph Hellwig
2017-12-19 21:09 ` [PATCH 5/4] xfs: clean up cow mappings during fs data freeze Darrick J. Wong
4 siblings, 2 replies; 22+ messages in thread
From: Darrick J. Wong @ 2017-12-15 17:11 UTC (permalink / raw)
To: darrick.wong; +Cc: linux-xfs
From: Darrick J. Wong <darrick.wong@oracle.com>
If a user performs a direct CoW write, we end up loading the CoW fork
with preallocated extents. Therefore, we must set the cowblocks tag so
that they can be cleared out if we run low on space.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/xfs_reflink.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c
index e49e6db..47aea2e 100644
--- a/fs/xfs/xfs_reflink.c
+++ b/fs/xfs/xfs_reflink.c
@@ -454,6 +454,8 @@ xfs_reflink_allocate_cow(
if (error)
goto out_bmap_cancel;
+ xfs_inode_set_cowblocks_tag(ip);
+
/* Finish up. */
error = xfs_defer_finish(&tp, &dfops);
if (error)
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 1/4] xfs: track cowblocks separately in i_flags
2017-12-15 17:11 ` [PATCH 1/4] xfs: track cowblocks separately in i_flags Darrick J. Wong
@ 2017-12-19 0:12 ` Dave Chinner
2017-12-21 13:36 ` Christoph Hellwig
1 sibling, 0 replies; 22+ messages in thread
From: Dave Chinner @ 2017-12-19 0:12 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs
On Fri, Dec 15, 2017 at 09:11:19AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> The EOFBLOCKS/COWBLOCKS tags are totally separate things, so track them
> with separate i_flags. Right now we're abusing IEOFBLOCKS for both,
> which is totally bogus because we won't tag the inode with COWBLOCKS if
> IEOFBLOCKS was set by a previous tagging of the inode with EOFBLOCKS.
> Found by wiring up clonerange to fsstress in xfs/017.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Looks reasonable.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/4] xfs: don't be so eager to clear the cowblocks tag on truncate
2017-12-15 17:11 ` [PATCH 2/4] xfs: don't be so eager to clear the cowblocks tag on truncate Darrick J. Wong
@ 2017-12-19 0:12 ` Dave Chinner
2017-12-21 13:37 ` Christoph Hellwig
1 sibling, 0 replies; 22+ messages in thread
From: Dave Chinner @ 2017-12-19 0:12 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs
On Fri, Dec 15, 2017 at 09:11:25AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> Currently, xfs_itruncate_extents clears the cowblocks tag if i_cnextents
> is zero. This is wrong, since i_cnextents only tracks real extents in
> the CoW fork, which means that we could have some delayed CoW
> reservations still in there that will now never get cleaned.
>
> Fix a further bug where we /don't/ clear the reflink iflag if there are
> any attribute blocks -- really, it's only safe to clear the reflink flag
> if there are no data fork extents and no cow fork extents.
>
> Found by adding clonerange to fsstress in xfs/017.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Looks good.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/4] xfs: remove leftover CoW reservations when remounting ro
2017-12-15 17:11 ` [PATCH 3/4] xfs: remove leftover CoW reservations when remounting ro Darrick J. Wong
@ 2017-12-19 0:17 ` Dave Chinner
2017-12-19 3:49 ` Darrick J. Wong
2017-12-19 21:08 ` [PATCH v2 " Darrick J. Wong
1 sibling, 1 reply; 22+ messages in thread
From: Dave Chinner @ 2017-12-19 0:17 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs
On Fri, Dec 15, 2017 at 09:11:31AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> When we're remounting the filesystem readonly, remove all CoW
> preallocations prior to going ro. If the fs goes down after the ro
> remount, we never clean up the staging extents, which means xfs_check
> will trip over them on a subsequent run. Practically speaking, the
> next mount will clean them up too, so this is unlikely to be seen.
>
> Found by adding clonerange to fsstress and running xfs/017.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> fs/xfs/xfs_super.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
>
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index f663022..7b6d150 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1369,6 +1369,14 @@ xfs_fs_remount(
>
> /* rw -> ro */
> if (!(mp->m_flags & XFS_MOUNT_RDONLY) && (*flags & MS_RDONLY)) {
> + /* Get rid of any leftover CoW reservations... */
> + cancel_delayed_work_sync(&mp->m_cowblocks_work);
> + error = xfs_icache_free_cowblocks(mp, NULL);
> + if (error) {
> + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> + return error;
> + }
On rw->ro do we start the m_cowblocks_work back up?
What about when we freeze the filesystem - shouldn't we clean
up the cow blocks there, too? We've tried hard in the past to make
freeze and rw->ro exactly the same so that if the system is powered
down while frozen it comes up almost entirely clean just like a
ro-remount in shutdown....
And, FWIW, this makes ro->rw and thaw essentially the same, too.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 4/4] xfs: set cowblocks tag for direct cow writes too
2017-12-15 17:11 ` [PATCH 4/4] xfs: set cowblocks tag for direct cow writes too Darrick J. Wong
@ 2017-12-19 0:18 ` Dave Chinner
2017-12-21 13:39 ` Christoph Hellwig
1 sibling, 0 replies; 22+ messages in thread
From: Dave Chinner @ 2017-12-19 0:18 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs
On Fri, Dec 15, 2017 at 09:11:37AM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> If a user performs a direct CoW write, we end up loading the CoW fork
> with preallocated extents. Therefore, we must set the cowblocks tag so
> that they can be cleared out if we run low on space.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Looks good.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/4] xfs: remove leftover CoW reservations when remounting ro
2017-12-19 0:17 ` Dave Chinner
@ 2017-12-19 3:49 ` Darrick J. Wong
2017-12-19 4:37 ` Dave Chinner
0 siblings, 1 reply; 22+ messages in thread
From: Darrick J. Wong @ 2017-12-19 3:49 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-xfs
On Tue, Dec 19, 2017 at 11:17:55AM +1100, Dave Chinner wrote:
> On Fri, Dec 15, 2017 at 09:11:31AM -0800, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> >
> > When we're remounting the filesystem readonly, remove all CoW
> > preallocations prior to going ro. If the fs goes down after the ro
> > remount, we never clean up the staging extents, which means xfs_check
> > will trip over them on a subsequent run. Practically speaking, the
> > next mount will clean them up too, so this is unlikely to be seen.
> >
> > Found by adding clonerange to fsstress and running xfs/017.
> >
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> > fs/xfs/xfs_super.c | 8 ++++++++
> > 1 file changed, 8 insertions(+)
> >
> >
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index f663022..7b6d150 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -1369,6 +1369,14 @@ xfs_fs_remount(
> >
> > /* rw -> ro */
> > if (!(mp->m_flags & XFS_MOUNT_RDONLY) && (*flags & MS_RDONLY)) {
> > + /* Get rid of any leftover CoW reservations... */
> > + cancel_delayed_work_sync(&mp->m_cowblocks_work);
> > + error = xfs_icache_free_cowblocks(mp, NULL);
> > + if (error) {
> > + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> > + return error;
> > + }
>
> On rw->ro do we start the m_cowblocks_work back up?
Assuming you meant to ask about ro->rw, then yes it should get started
back up the next time something sets the cowblocks tag. I'm not opposed
to starting it back up directly from the ro->rw handler.
> What about when we freeze the filesystem - shouldn't we clean
> up the cow blocks there, too? We've tried hard in the past to make
> freeze and rw->ro exactly the same so that if the system is powered
> down while frozen it comes up almost entirely clean just like a
> ro-remount in shutdown....
I don't see a hard requirement to clean them up at freeze time, though
we certainly can do it for consistency's sake.
> And, FWIW, this makes ro->rw and thaw essentially the same, too.
Noted.
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/4] xfs: remove leftover CoW reservations when remounting ro
2017-12-19 3:49 ` Darrick J. Wong
@ 2017-12-19 4:37 ` Dave Chinner
2017-12-19 4:53 ` Darrick J. Wong
0 siblings, 1 reply; 22+ messages in thread
From: Dave Chinner @ 2017-12-19 4:37 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs
On Mon, Dec 18, 2017 at 07:49:11PM -0800, Darrick J. Wong wrote:
> On Tue, Dec 19, 2017 at 11:17:55AM +1100, Dave Chinner wrote:
> > On Fri, Dec 15, 2017 at 09:11:31AM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > >
> > > When we're remounting the filesystem readonly, remove all CoW
> > > preallocations prior to going ro. If the fs goes down after the ro
> > > remount, we never clean up the staging extents, which means xfs_check
> > > will trip over them on a subsequent run. Practically speaking, the
> > > next mount will clean them up too, so this is unlikely to be seen.
> > >
> > > Found by adding clonerange to fsstress and running xfs/017.
> > >
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > > fs/xfs/xfs_super.c | 8 ++++++++
> > > 1 file changed, 8 insertions(+)
> > >
> > >
> > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > > index f663022..7b6d150 100644
> > > --- a/fs/xfs/xfs_super.c
> > > +++ b/fs/xfs/xfs_super.c
> > > @@ -1369,6 +1369,14 @@ xfs_fs_remount(
> > >
> > > /* rw -> ro */
> > > if (!(mp->m_flags & XFS_MOUNT_RDONLY) && (*flags & MS_RDONLY)) {
> > > + /* Get rid of any leftover CoW reservations... */
> > > + cancel_delayed_work_sync(&mp->m_cowblocks_work);
> > > + error = xfs_icache_free_cowblocks(mp, NULL);
> > > + if (error) {
> > > + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> > > + return error;
> > > + }
> >
> > On rw->ro do we start the m_cowblocks_work back up?
>
> Assuming you meant to ask about ro->rw, then yes it should get started
> back up the next time something sets the cowblocks tag. I'm not opposed
> to starting it back up directly from the ro->rw handler.
>
> > What about when we freeze the filesystem - shouldn't we clean
> > up the cow blocks there, too? We've tried hard in the past to make
> > freeze and rw->ro exactly the same so that if the system is powered
> > down while frozen it comes up almost entirely clean just like a
> > ro-remount in shutdown....
>
> I don't see a hard requirement to clean them up at freeze time, though
> we certainly can do it for consistency's sake.
can't the background worker come around and attempt to do cleanup
while the fs is frozen? We've had vectors like that in the past that
have written to frozen filesystems (e.g. inode reclaim writing
inodes, memory reclaim shrinkers triggering AIL pushes) so leaving
potentially dirty objects in memory when the filesystem is frozen
is kinda dangerous. That's the reason behind trying to make
freeze/ro states identical - it makes sure we don't accidentally
leave writable objects in memory when frozen...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/4] xfs: remove leftover CoW reservations when remounting ro
2017-12-19 4:37 ` Dave Chinner
@ 2017-12-19 4:53 ` Darrick J. Wong
2017-12-19 6:46 ` Dave Chinner
0 siblings, 1 reply; 22+ messages in thread
From: Darrick J. Wong @ 2017-12-19 4:53 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-xfs
On Tue, Dec 19, 2017 at 03:37:02PM +1100, Dave Chinner wrote:
> On Mon, Dec 18, 2017 at 07:49:11PM -0800, Darrick J. Wong wrote:
> > On Tue, Dec 19, 2017 at 11:17:55AM +1100, Dave Chinner wrote:
> > > On Fri, Dec 15, 2017 at 09:11:31AM -0800, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > >
> > > > When we're remounting the filesystem readonly, remove all CoW
> > > > preallocations prior to going ro. If the fs goes down after the ro
> > > > remount, we never clean up the staging extents, which means xfs_check
> > > > will trip over them on a subsequent run. Practically speaking, the
> > > > next mount will clean them up too, so this is unlikely to be seen.
> > > >
> > > > Found by adding clonerange to fsstress and running xfs/017.
> > > >
> > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > ---
> > > > fs/xfs/xfs_super.c | 8 ++++++++
> > > > 1 file changed, 8 insertions(+)
> > > >
> > > >
> > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > > > index f663022..7b6d150 100644
> > > > --- a/fs/xfs/xfs_super.c
> > > > +++ b/fs/xfs/xfs_super.c
> > > > @@ -1369,6 +1369,14 @@ xfs_fs_remount(
> > > >
> > > > /* rw -> ro */
> > > > if (!(mp->m_flags & XFS_MOUNT_RDONLY) && (*flags & MS_RDONLY)) {
> > > > + /* Get rid of any leftover CoW reservations... */
> > > > + cancel_delayed_work_sync(&mp->m_cowblocks_work);
> > > > + error = xfs_icache_free_cowblocks(mp, NULL);
> > > > + if (error) {
> > > > + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> > > > + return error;
> > > > + }
> > >
> > > On rw->ro do we start the m_cowblocks_work back up?
> >
> > Assuming you meant to ask about ro->rw, then yes it should get started
> > back up the next time something sets the cowblocks tag. I'm not opposed
> > to starting it back up directly from the ro->rw handler.
> >
> > > What about when we freeze the filesystem - shouldn't we clean
> > > up the cow blocks there, too? We've tried hard in the past to make
> > > freeze and rw->ro exactly the same so that if the system is powered
> > > down while frozen it comes up almost entirely clean just like a
> > > ro-remount in shutdown....
> >
> > I don't see a hard requirement to clean them up at freeze time, though
> > we certainly can do it for consistency's sake.
>
> can't the background worker come around and attempt to do cleanup
> while the fs is frozen? We've had vectors like that in the past that
> have written to frozen filesystems (e.g. inode reclaim writing
> inodes, memory reclaim shrinkers triggering AIL pushes) so leaving
> potentially dirty objects in memory when the filesystem is frozen
> is kinda dangerous. That's the reason behind trying to make
> freeze/ro states identical - it makes sure we don't accidentally
> leave writable objects in memory when frozen...
Hmmm, so /me tried making fsfreeze clear out the cow reservations, but
doing so requires allocating a transaction, which blows the assert in
sb_start_write because the fs is already frozen... I could just kill
the thread without cleaning out the cow reservations and let the
post-crash mount clean things up, since we already have the
infrastructure to do that anyway?
(Or create a ->freeze_super and do it there...)
--D
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/4] xfs: remove leftover CoW reservations when remounting ro
2017-12-19 4:53 ` Darrick J. Wong
@ 2017-12-19 6:46 ` Dave Chinner
2017-12-19 19:00 ` Darrick J. Wong
0 siblings, 1 reply; 22+ messages in thread
From: Dave Chinner @ 2017-12-19 6:46 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs
On Mon, Dec 18, 2017 at 08:53:01PM -0800, Darrick J. Wong wrote:
> On Tue, Dec 19, 2017 at 03:37:02PM +1100, Dave Chinner wrote:
> > On Mon, Dec 18, 2017 at 07:49:11PM -0800, Darrick J. Wong wrote:
> > > On Tue, Dec 19, 2017 at 11:17:55AM +1100, Dave Chinner wrote:
> > > > On Fri, Dec 15, 2017 at 09:11:31AM -0800, Darrick J. Wong wrote:
> > > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > >
> > > > > When we're remounting the filesystem readonly, remove all CoW
> > > > > preallocations prior to going ro. If the fs goes down after the ro
> > > > > remount, we never clean up the staging extents, which means xfs_check
> > > > > will trip over them on a subsequent run. Practically speaking, the
> > > > > next mount will clean them up too, so this is unlikely to be seen.
> > > > >
> > > > > Found by adding clonerange to fsstress and running xfs/017.
> > > > >
> > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > ---
> > > > > fs/xfs/xfs_super.c | 8 ++++++++
> > > > > 1 file changed, 8 insertions(+)
> > > > >
> > > > >
> > > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > > > > index f663022..7b6d150 100644
> > > > > --- a/fs/xfs/xfs_super.c
> > > > > +++ b/fs/xfs/xfs_super.c
> > > > > @@ -1369,6 +1369,14 @@ xfs_fs_remount(
> > > > >
> > > > > /* rw -> ro */
> > > > > if (!(mp->m_flags & XFS_MOUNT_RDONLY) && (*flags & MS_RDONLY)) {
> > > > > + /* Get rid of any leftover CoW reservations... */
> > > > > + cancel_delayed_work_sync(&mp->m_cowblocks_work);
> > > > > + error = xfs_icache_free_cowblocks(mp, NULL);
> > > > > + if (error) {
> > > > > + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> > > > > + return error;
> > > > > + }
> > > >
> > > > On rw->ro do we start the m_cowblocks_work back up?
> > >
> > > Assuming you meant to ask about ro->rw, then yes it should get started
> > > back up the next time something sets the cowblocks tag. I'm not opposed
> > > to starting it back up directly from the ro->rw handler.
> > >
> > > > What about when we freeze the filesystem - shouldn't we clean
> > > > up the cow blocks there, too? We've tried hard in the past to make
> > > > freeze and rw->ro exactly the same so that if the system is powered
> > > > down while frozen it comes up almost entirely clean just like a
> > > > ro-remount in shutdown....
> > >
> > > I don't see a hard requirement to clean them up at freeze time, though
> > > we certainly can do it for consistency's sake.
> >
> > can't the background worker come around and attempt to do cleanup
> > while the fs is frozen? We've had vectors like that in the past that
> > have written to frozen filesystems (e.g. inode reclaim writing
> > inodes, memory reclaim shrinkers triggering AIL pushes) so leaving
> > potentially dirty objects in memory when the filesystem is frozen
> > is kinda dangerous. That's the reason behind trying to make
> > freeze/ro states identical - it makes sure we don't accidentally
> > leave writable objects in memory when frozen...
>
> Hmmm, so /me tried making fsfreeze clear out the cow reservations, but
> doing so requires allocating a transaction, which blows the assert in
> sb_start_write because the fs is already frozen...
Ah, didn't we solve that problem years ago? Ah, yeah,
XFS_TRANS_NO_WRITECOUNT. That'd be a bit of a hack, but the
problem here is we need to run this between freezing data writes and
freezing transactions and we have no hook in the generic freeze
code to do that...
> I could just kill
> the thread without cleaning out the cow reservations and let the
> post-crash mount clean things up, since we already have the
> infrastructure to do that anyway?
Well, we do leave the log dirty on freeze so that we cleanup
unlinked inodes if we crash while frozen, so there is precedence
there. However, we need to balance that with the fairly common
problem of having to run recovery on read-only snapshots on the
first mount because a freeze leaves the log dirty. I don't
think we want to make that problem worse so I'd like to avoid this
solution if at all possible.
> (Or create a ->freeze_super and do it there...)
A ->freeze_data callout from the generic freezing code would be more
appropriate than completely reimplementing our own freeze code.
Right now the generic code just calls sync_filesystem(sb) to do this
before setting SB_FREEZE_FS - we need to do more than just sync data
if we are going to remove cow mappings on freeze....
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/4] xfs: remove leftover CoW reservations when remounting ro
2017-12-19 6:46 ` Dave Chinner
@ 2017-12-19 19:00 ` Darrick J. Wong
2017-12-19 20:55 ` Dave Chinner
0 siblings, 1 reply; 22+ messages in thread
From: Darrick J. Wong @ 2017-12-19 19:00 UTC (permalink / raw)
To: Dave Chinner; +Cc: linux-xfs
On Tue, Dec 19, 2017 at 05:46:55PM +1100, Dave Chinner wrote:
> On Mon, Dec 18, 2017 at 08:53:01PM -0800, Darrick J. Wong wrote:
> > On Tue, Dec 19, 2017 at 03:37:02PM +1100, Dave Chinner wrote:
> > > On Mon, Dec 18, 2017 at 07:49:11PM -0800, Darrick J. Wong wrote:
> > > > On Tue, Dec 19, 2017 at 11:17:55AM +1100, Dave Chinner wrote:
> > > > > On Fri, Dec 15, 2017 at 09:11:31AM -0800, Darrick J. Wong wrote:
> > > > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > >
> > > > > > When we're remounting the filesystem readonly, remove all CoW
> > > > > > preallocations prior to going ro. If the fs goes down after the ro
> > > > > > remount, we never clean up the staging extents, which means xfs_check
> > > > > > will trip over them on a subsequent run. Practically speaking, the
> > > > > > next mount will clean them up too, so this is unlikely to be seen.
> > > > > >
> > > > > > Found by adding clonerange to fsstress and running xfs/017.
> > > > > >
> > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > > ---
> > > > > > fs/xfs/xfs_super.c | 8 ++++++++
> > > > > > 1 file changed, 8 insertions(+)
> > > > > >
> > > > > >
> > > > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > > > > > index f663022..7b6d150 100644
> > > > > > --- a/fs/xfs/xfs_super.c
> > > > > > +++ b/fs/xfs/xfs_super.c
> > > > > > @@ -1369,6 +1369,14 @@ xfs_fs_remount(
> > > > > >
> > > > > > /* rw -> ro */
> > > > > > if (!(mp->m_flags & XFS_MOUNT_RDONLY) && (*flags & MS_RDONLY)) {
> > > > > > + /* Get rid of any leftover CoW reservations... */
> > > > > > + cancel_delayed_work_sync(&mp->m_cowblocks_work);
> > > > > > + error = xfs_icache_free_cowblocks(mp, NULL);
> > > > > > + if (error) {
> > > > > > + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> > > > > > + return error;
> > > > > > + }
> > > > >
> > > > > On rw->ro do we start the m_cowblocks_work back up?
> > > >
> > > > Assuming you meant to ask about ro->rw, then yes it should get started
> > > > back up the next time something sets the cowblocks tag. I'm not opposed
> > > > to starting it back up directly from the ro->rw handler.
> > > >
> > > > > What about when we freeze the filesystem - shouldn't we clean
> > > > > up the cow blocks there, too? We've tried hard in the past to make
> > > > > freeze and rw->ro exactly the same so that if the system is powered
> > > > > down while frozen it comes up almost entirely clean just like a
> > > > > ro-remount in shutdown....
> > > >
> > > > I don't see a hard requirement to clean them up at freeze time, though
> > > > we certainly can do it for consistency's sake.
> > >
> > > can't the background worker come around and attempt to do cleanup
> > > while the fs is frozen? We've had vectors like that in the past that
> > > have written to frozen filesystems (e.g. inode reclaim writing
> > > inodes, memory reclaim shrinkers triggering AIL pushes) so leaving
> > > potentially dirty objects in memory when the filesystem is frozen
> > > is kinda dangerous. That's the reason behind trying to make
> > > freeze/ro states identical - it makes sure we don't accidentally
> > > leave writable objects in memory when frozen...
> >
> > Hmmm, so /me tried making fsfreeze clear out the cow reservations, but
> > doing so requires allocating a transaction, which blows the assert in
> > sb_start_write because the fs is already frozen...
>
> Ah, didn't we solve that problem years ago? Ah, yeah,
> XFS_TRANS_NO_WRITECOUNT. That'd be a bit of a hack, but the
> problem here is we need to run this between freezing data writes and
> freezing transactions and we have no hook in the generic freeze
> code to do that...
>
> > I could just kill
> > the thread without cleaning out the cow reservations and let the
> > post-crash mount clean things up, since we already have the
> > infrastructure to do that anyway?
>
> Well, we do leave the log dirty on freeze so that we cleanup
> unlinked inodes if we crash while frozen, so there is precedence
> there. However, we need to balance that with the fairly common
> problem of having to run recovery on read-only snapshots on the
> first mount because a freeze leaves the log dirty. I don't
> think we want to make that problem worse so I'd like to avoid this
> solution if at all possible.
>
> > (Or create a ->freeze_super and do it there...)
>
> A ->freeze_data callout from the generic freezing code would be more
> appropriate than completely reimplementing our own freeze code.
> Right now the generic code just calls sync_filesystem(sb) to do this
> before setting SB_FREEZE_FS - we need to do more than just sync data
> if we are going to remove cow mappings on freeze....
<nod>
I was thinking of replacing the sync_filesystem() call in freeze_super
with:
if (sb->s_op->freeze_data) {
ret = sb->s_op->freeze_data(sb);
if (ret) {
printk(KERN_ERR
"VFS:Filesystem dta freeze failed\n");
sb->s_writers.frozen = SB_UNFROZEN;
sb_freeze_unlock(sb);
wake_up(&sb->s_writers.wait_unfrozen);
deactivate_locked_super(sb);
return ret;
}
} else {
sync_filesystem(sb);
}
Though at this point I feel that the freeze fix should be a totally
separate patch from the ro<->rw patch.
--D
> -Dave.
> --
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 3/4] xfs: remove leftover CoW reservations when remounting ro
2017-12-19 19:00 ` Darrick J. Wong
@ 2017-12-19 20:55 ` Dave Chinner
0 siblings, 0 replies; 22+ messages in thread
From: Dave Chinner @ 2017-12-19 20:55 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs
On Tue, Dec 19, 2017 at 11:00:26AM -0800, Darrick J. Wong wrote:
> On Tue, Dec 19, 2017 at 05:46:55PM +1100, Dave Chinner wrote:
> > On Mon, Dec 18, 2017 at 08:53:01PM -0800, Darrick J. Wong wrote:
> > > On Tue, Dec 19, 2017 at 03:37:02PM +1100, Dave Chinner wrote:
> > > > On Mon, Dec 18, 2017 at 07:49:11PM -0800, Darrick J. Wong wrote:
> > > > > On Tue, Dec 19, 2017 at 11:17:55AM +1100, Dave Chinner wrote:
> > > > > > On Fri, Dec 15, 2017 at 09:11:31AM -0800, Darrick J. Wong wrote:
> > > > > > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > > >
> > > > > > > When we're remounting the filesystem readonly, remove all CoW
> > > > > > > preallocations prior to going ro. If the fs goes down after the ro
> > > > > > > remount, we never clean up the staging extents, which means xfs_check
> > > > > > > will trip over them on a subsequent run. Practically speaking, the
> > > > > > > next mount will clean them up too, so this is unlikely to be seen.
> > > > > > >
> > > > > > > Found by adding clonerange to fsstress and running xfs/017.
> > > > > > >
> > > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > > > > > ---
> > > > > > > fs/xfs/xfs_super.c | 8 ++++++++
> > > > > > > 1 file changed, 8 insertions(+)
> > > > > > >
> > > > > > >
> > > > > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > > > > > > index f663022..7b6d150 100644
> > > > > > > --- a/fs/xfs/xfs_super.c
> > > > > > > +++ b/fs/xfs/xfs_super.c
> > > > > > > @@ -1369,6 +1369,14 @@ xfs_fs_remount(
> > > > > > >
> > > > > > > /* rw -> ro */
> > > > > > > if (!(mp->m_flags & XFS_MOUNT_RDONLY) && (*flags & MS_RDONLY)) {
> > > > > > > + /* Get rid of any leftover CoW reservations... */
> > > > > > > + cancel_delayed_work_sync(&mp->m_cowblocks_work);
> > > > > > > + error = xfs_icache_free_cowblocks(mp, NULL);
> > > > > > > + if (error) {
> > > > > > > + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> > > > > > > + return error;
> > > > > > > + }
> > > > > >
> > > > > > On rw->ro do we start the m_cowblocks_work back up?
> > > > >
> > > > > Assuming you meant to ask about ro->rw, then yes it should get started
> > > > > back up the next time something sets the cowblocks tag. I'm not opposed
> > > > > to starting it back up directly from the ro->rw handler.
> > > > >
> > > > > > What about when we freeze the filesystem - shouldn't we clean
> > > > > > up the cow blocks there, too? We've tried hard in the past to make
> > > > > > freeze and rw->ro exactly the same so that if the system is powered
> > > > > > down while frozen it comes up almost entirely clean just like a
> > > > > > ro-remount in shutdown....
> > > > >
> > > > > I don't see a hard requirement to clean them up at freeze time, though
> > > > > we certainly can do it for consistency's sake.
> > > >
> > > > can't the background worker come around and attempt to do cleanup
> > > > while the fs is frozen? We've had vectors like that in the past that
> > > > have written to frozen filesystems (e.g. inode reclaim writing
> > > > inodes, memory reclaim shrinkers triggering AIL pushes) so leaving
> > > > potentially dirty objects in memory when the filesystem is frozen
> > > > is kinda dangerous. That's the reason behind trying to make
> > > > freeze/ro states identical - it makes sure we don't accidentally
> > > > leave writable objects in memory when frozen...
> > >
> > > Hmmm, so /me tried making fsfreeze clear out the cow reservations, but
> > > doing so requires allocating a transaction, which blows the assert in
> > > sb_start_write because the fs is already frozen...
> >
> > Ah, didn't we solve that problem years ago? Ah, yeah,
> > XFS_TRANS_NO_WRITECOUNT. That'd be a bit of a hack, but the
> > problem here is we need to run this between freezing data writes and
> > freezing transactions and we have no hook in the generic freeze
> > code to do that...
> >
> > > I could just kill
> > > the thread without cleaning out the cow reservations and let the
> > > post-crash mount clean things up, since we already have the
> > > infrastructure to do that anyway?
> >
> > Well, we do leave the log dirty on freeze so that we cleanup
> > unlinked inodes if we crash while frozen, so there is precedence
> > there. However, we need to balance that with the fairly common
> > problem of having to run recovery on read-only snapshots on the
> > first mount because a freeze leaves the log dirty. I don't
> > think we want to make that problem worse so I'd like to avoid this
> > solution if at all possible.
> >
> > > (Or create a ->freeze_super and do it there...)
> >
> > A ->freeze_data callout from the generic freezing code would be more
> > appropriate than completely reimplementing our own freeze code.
> > Right now the generic code just calls sync_filesystem(sb) to do this
> > before setting SB_FREEZE_FS - we need to do more than just sync data
> > if we are going to remove cow mappings on freeze....
>
> <nod>
>
> I was thinking of replacing the sync_filesystem() call in freeze_super
> with:
>
> if (sb->s_op->freeze_data) {
> ret = sb->s_op->freeze_data(sb);
> if (ret) {
> printk(KERN_ERR
> "VFS:Filesystem dta freeze failed\n");
> sb->s_writers.frozen = SB_UNFROZEN;
> sb_freeze_unlock(sb);
> wake_up(&sb->s_writers.wait_unfrozen);
> deactivate_locked_super(sb);
> return ret;
> }
> } else {
> sync_filesystem(sb);
> }
>
> Though at this point I feel that the freeze fix should be a totally
> separate patch from the ro<->rw patch.
Yup, agreed. So consider the original patch
Reviewed-by: Dave Chinner <dchinner@redhat.com>
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v2 3/4] xfs: remove leftover CoW reservations when remounting ro
2017-12-15 17:11 ` [PATCH 3/4] xfs: remove leftover CoW reservations when remounting ro Darrick J. Wong
2017-12-19 0:17 ` Dave Chinner
@ 2017-12-19 21:08 ` Darrick J. Wong
2017-12-21 13:38 ` Christoph Hellwig
1 sibling, 1 reply; 22+ messages in thread
From: Darrick J. Wong @ 2017-12-19 21:08 UTC (permalink / raw)
To: linux-xfs
From: Darrick J. Wong <darrick.wong@oracle.com>
When we're remounting the filesystem readonly, remove all CoW
preallocations prior to going ro. If the fs goes down after the ro
remount, we never clean up the staging extents, which means xfs_check
will trip over them on a subsequent run. Practically speaking, the next
mount will clean them up too, so this is unlikely to be seen. Since we
shut down the cowblocks cleaner on remount-ro, we also have to make sure
we start it back up if/when we remount-rw.
Found by adding clonerange to fsstress and running xfs/017.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/xfs_icache.c | 2 +-
fs/xfs/xfs_icache.h | 1 +
fs/xfs/xfs_super.c | 9 +++++++++
3 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index 58d2d42..3861d61 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -870,7 +870,7 @@ xfs_eofblocks_worker(
* based on the 'speculative_cow_prealloc_lifetime' tunable (5m by default).
* (We'll just piggyback on the post-EOF prealloc space workqueue.)
*/
-STATIC void
+void
xfs_queue_cowblocks(
struct xfs_mount *mp)
{
diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
index bff4d85..d4a7758 100644
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -81,6 +81,7 @@ void xfs_inode_clear_cowblocks_tag(struct xfs_inode *ip);
int xfs_icache_free_cowblocks(struct xfs_mount *, struct xfs_eofblocks *);
int xfs_inode_free_quota_cowblocks(struct xfs_inode *ip);
void xfs_cowblocks_worker(struct work_struct *);
+void xfs_queue_cowblocks(struct xfs_mount *);
int xfs_inode_ag_iterator(struct xfs_mount *mp,
int (*execute)(struct xfs_inode *ip, int flags, void *args),
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index f663022..2db6a40 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1360,6 +1360,7 @@ xfs_fs_remount(
xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
return error;
}
+ xfs_queue_cowblocks(mp);
/* Create the per-AG metadata reservation pool .*/
error = xfs_fs_reserve_ag_blocks(mp);
@@ -1369,6 +1370,14 @@ xfs_fs_remount(
/* rw -> ro */
if (!(mp->m_flags & XFS_MOUNT_RDONLY) && (*flags & MS_RDONLY)) {
+ /* Get rid of any leftover CoW reservations... */
+ cancel_delayed_work_sync(&mp->m_cowblocks_work);
+ error = xfs_icache_free_cowblocks(mp, NULL);
+ if (error) {
+ xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
+ return error;
+ }
+
/* Free the per-AG metadata reservation pool. */
error = xfs_fs_unreserve_ag_blocks(mp);
if (error) {
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 5/4] xfs: clean up cow mappings during fs data freeze
2017-12-15 17:11 [PATCH 0/4] xfs: more reflink fixes Darrick J. Wong
` (3 preceding siblings ...)
2017-12-15 17:11 ` [PATCH 4/4] xfs: set cowblocks tag for direct cow writes too Darrick J. Wong
@ 2017-12-19 21:09 ` Darrick J. Wong
2017-12-21 13:39 ` Christoph Hellwig
4 siblings, 1 reply; 22+ messages in thread
From: Darrick J. Wong @ 2017-12-19 21:09 UTC (permalink / raw)
To: linux-xfs
From: Darrick J. Wong <darrick.wong@oracle.com>
As part of the fs freezing process, we want to free the copy-on-write
reservations after all the dirty data has been written to disk but prior
to freezing the filesystem. Currently, only a sync_filesystem call is
hardwired into the VFS code, so provide a new callout, ->freeze_data, so
that filesystems can override with whatever behavior they want.
In the case of XFS, that behavior is sync and then clean the cow
mappings. This reduces the amount of recovery that must happen if the
system goes down while the fs is frozen.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/super.c | 23 +++++++++++++++++------
fs/xfs/xfs_super.c | 21 +++++++++++++++++++++
include/linux/fs.h | 1 +
3 files changed, 39 insertions(+), 6 deletions(-)
diff --git a/fs/super.c b/fs/super.c
index d4e33e8..f0ee186 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1454,7 +1454,16 @@ int freeze_super(struct super_block *sb)
sb_wait_write(sb, SB_FREEZE_PAGEFAULT);
/* All writers are done so after syncing there won't be dirty data */
- sync_filesystem(sb);
+ if (sb->s_op->freeze_data) {
+ ret = sb->s_op->freeze_data(sb);
+ if (ret) {
+ printk(KERN_ERR
+ "VFS:Filesystem data freeze failed\n");
+ goto freeze_fail;
+ }
+ } else {
+ sync_filesystem(sb);
+ }
/* Now wait for internal filesystem counter */
sb->s_writers.frozen = SB_FREEZE_FS;
@@ -1465,11 +1474,7 @@ int freeze_super(struct super_block *sb)
if (ret) {
printk(KERN_ERR
"VFS:Filesystem freeze failed\n");
- sb->s_writers.frozen = SB_UNFROZEN;
- sb_freeze_unlock(sb);
- wake_up(&sb->s_writers.wait_unfrozen);
- deactivate_locked_super(sb);
- return ret;
+ goto freeze_fail;
}
}
/*
@@ -1480,6 +1485,12 @@ int freeze_super(struct super_block *sb)
lockdep_sb_freeze_release(sb);
up_write(&sb->s_umount);
return 0;
+freeze_fail:
+ sb->s_writers.frozen = SB_UNFROZEN;
+ sb_freeze_unlock(sb);
+ wake_up(&sb->s_writers.wait_unfrozen);
+ deactivate_locked_super(sb);
+ return ret;
}
EXPORT_SYMBOL(freeze_super);
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 2db6a40..2817bce 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1408,6 +1408,26 @@ xfs_fs_remount(
}
/*
+ * First stage of a freeze. We need to sync all the dirty data and clean up
+ * all the leftover CoW mappings to make the filesystem as tidy as possible.
+ */
+STATIC int
+xfs_fs_freeze_data(
+ struct super_block *sb)
+{
+ struct xfs_mount *mp = XFS_M(sb);
+ int error;
+
+ error = sync_filesystem(sb);
+ if (error)
+ return error;
+
+ if (!xfs_sb_version_hasreflink(&mp->m_sb))
+ return 0;
+ return xfs_icache_free_cowblocks(mp, NULL);
+}
+
+/*
* Second stage of a freeze. The data is already frozen so we only
* need to take care of the metadata. Once that's done sync the superblock
* to the log to dirty it in case of a crash while frozen. This ensures that we
@@ -1780,6 +1800,7 @@ static const struct super_operations xfs_super_operations = {
.drop_inode = xfs_fs_drop_inode,
.put_super = xfs_fs_put_super,
.sync_fs = xfs_fs_sync_fs,
+ .freeze_data = xfs_fs_freeze_data,
.freeze_fs = xfs_fs_freeze,
.unfreeze_fs = xfs_fs_unfreeze,
.statfs = xfs_fs_statfs,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 2995a27..d53d55f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1811,6 +1811,7 @@ struct super_operations {
void (*put_super) (struct super_block *);
int (*sync_fs)(struct super_block *sb, int wait);
int (*freeze_super) (struct super_block *);
+ int (*freeze_data) (struct super_block *);
int (*freeze_fs) (struct super_block *);
int (*thaw_super) (struct super_block *);
int (*unfreeze_fs) (struct super_block *);
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 1/4] xfs: track cowblocks separately in i_flags
2017-12-15 17:11 ` [PATCH 1/4] xfs: track cowblocks separately in i_flags Darrick J. Wong
2017-12-19 0:12 ` Dave Chinner
@ 2017-12-21 13:36 ` Christoph Hellwig
1 sibling, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2017-12-21 13:36 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs
> +static inline unsigned long
> +xfs_iflag_for_tag(
> + int tag)
> +{
> + switch (tag) {
> + case XFS_ICI_EOFBLOCKS_TAG:
> + return XFS_IEOFBLOCKS;
> + case XFS_ICI_COWBLOCKS_TAG:
> + return XFS_ICOWBLOCKS;
> + default:
> + ASSERT(0);
> + return 0;
> + }
> +}
I'd rather pass the flag explicitly to the functions that already
take the tag value.
Except for that the patch looks fine:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/4] xfs: don't be so eager to clear the cowblocks tag on truncate
2017-12-15 17:11 ` [PATCH 2/4] xfs: don't be so eager to clear the cowblocks tag on truncate Darrick J. Wong
2017-12-19 0:12 ` Dave Chinner
@ 2017-12-21 13:37 ` Christoph Hellwig
1 sibling, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2017-12-21 13:37 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 3/4] xfs: remove leftover CoW reservations when remounting ro
2017-12-19 21:08 ` [PATCH v2 " Darrick J. Wong
@ 2017-12-21 13:38 ` Christoph Hellwig
0 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2017-12-21 13:38 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 4/4] xfs: set cowblocks tag for direct cow writes too
2017-12-15 17:11 ` [PATCH 4/4] xfs: set cowblocks tag for direct cow writes too Darrick J. Wong
2017-12-19 0:18 ` Dave Chinner
@ 2017-12-21 13:39 ` Christoph Hellwig
1 sibling, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2017-12-21 13:39 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs
Looks good,
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 5/4] xfs: clean up cow mappings during fs data freeze
2017-12-19 21:09 ` [PATCH 5/4] xfs: clean up cow mappings during fs data freeze Darrick J. Wong
@ 2017-12-21 13:39 ` Christoph Hellwig
0 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2017-12-21 13:39 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: linux-xfs
This needs to be a separate thread including -fsdevel.
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2017-12-21 13:39 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-12-15 17:11 [PATCH 0/4] xfs: more reflink fixes Darrick J. Wong
2017-12-15 17:11 ` [PATCH 1/4] xfs: track cowblocks separately in i_flags Darrick J. Wong
2017-12-19 0:12 ` Dave Chinner
2017-12-21 13:36 ` Christoph Hellwig
2017-12-15 17:11 ` [PATCH 2/4] xfs: don't be so eager to clear the cowblocks tag on truncate Darrick J. Wong
2017-12-19 0:12 ` Dave Chinner
2017-12-21 13:37 ` Christoph Hellwig
2017-12-15 17:11 ` [PATCH 3/4] xfs: remove leftover CoW reservations when remounting ro Darrick J. Wong
2017-12-19 0:17 ` Dave Chinner
2017-12-19 3:49 ` Darrick J. Wong
2017-12-19 4:37 ` Dave Chinner
2017-12-19 4:53 ` Darrick J. Wong
2017-12-19 6:46 ` Dave Chinner
2017-12-19 19:00 ` Darrick J. Wong
2017-12-19 20:55 ` Dave Chinner
2017-12-19 21:08 ` [PATCH v2 " Darrick J. Wong
2017-12-21 13:38 ` Christoph Hellwig
2017-12-15 17:11 ` [PATCH 4/4] xfs: set cowblocks tag for direct cow writes too Darrick J. Wong
2017-12-19 0:18 ` Dave Chinner
2017-12-21 13:39 ` Christoph Hellwig
2017-12-19 21:09 ` [PATCH 5/4] xfs: clean up cow mappings during fs data freeze Darrick J. Wong
2017-12-21 13:39 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).