[PATCH 3/3] xfs: don't leak quotacheck dquots when cow recovery fails

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 3/3] xfs: don't leak quotacheck dquots when cow recovery fails
  2017-08-09  1:04 Darrick J. Wong
@ 2017-08-09  1:07 ` Darrick J. Wong
  2017-08-09 12:36   ` Brian Foster
  0 siblings, 1 reply; 20+ messages in thread
From: Darrick J. Wong @ 2017-08-09  1:07 UTC (permalink / raw)
  To: xfs

If we fail a mount on account of cow recovery errors, it's possible that
a previous quotacheck left some dquots in memory.  The bailout clause of
xfs_mountfs forgets to purge these, and so we leak them.  Fix that.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_mount.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index d463ab3..8a12118 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1029,6 +1029,8 @@ xfs_mountfs(
  out_agresv:
 	xfs_fs_unreserve_ag_blocks(mp);
  out_quota:
+	/* Clean out dquots that might be in memory after quotacheck. */
+	xfs_qm_unmount(mp);
 	xfs_qm_unmount_quotas(mp);
  out_rtunmount:
 	xfs_rtunmount_inodes(mp);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/3] xfs: don't leak quotacheck dquots when cow recovery fails
  2017-08-09  1:07 ` [PATCH 3/3] xfs: don't leak quotacheck dquots when cow recovery fails Darrick J. Wong
@ 2017-08-09 12:36   ` Brian Foster
  2017-08-09 16:06     ` Darrick J. Wong
  0 siblings, 1 reply; 20+ messages in thread
From: Brian Foster @ 2017-08-09 12:36 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: xfs

On Tue, Aug 08, 2017 at 06:07:00PM -0700, Darrick J. Wong wrote:
> If we fail a mount on account of cow recovery errors, it's possible that
> a previous quotacheck left some dquots in memory.  The bailout clause of
> xfs_mountfs forgets to purge these, and so we leak them.  Fix that.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/xfs_mount.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index d463ab3..8a12118 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -1029,6 +1029,8 @@ xfs_mountfs(
>   out_agresv:
>  	xfs_fs_unreserve_ag_blocks(mp);
>   out_quota:
> +	/* Clean out dquots that might be in memory after quotacheck. */
> +	xfs_qm_unmount(mp);
>  	xfs_qm_unmount_quotas(mp);

Should these calls be reversed? It looks like qm_unmount() can free
m_quotainfo before qm_unmount_quotas() has a chance to release quota
inodes and whatnot.

Brian

>   out_rtunmount:
>  	xfs_rtunmount_inodes(mp);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/3] xfs: don't leak quotacheck dquots when cow recovery fails
  2017-08-09 12:36   ` Brian Foster
@ 2017-08-09 16:06     ` Darrick J. Wong
  0 siblings, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2017-08-09 16:06 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Wed, Aug 09, 2017 at 08:36:58AM -0400, Brian Foster wrote:
> On Tue, Aug 08, 2017 at 06:07:00PM -0700, Darrick J. Wong wrote:
> > If we fail a mount on account of cow recovery errors, it's possible that
> > a previous quotacheck left some dquots in memory.  The bailout clause of
> > xfs_mountfs forgets to purge these, and so we leak them.  Fix that.
> > 
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/xfs_mount.c |    2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > index d463ab3..8a12118 100644
> > --- a/fs/xfs/xfs_mount.c
> > +++ b/fs/xfs/xfs_mount.c
> > @@ -1029,6 +1029,8 @@ xfs_mountfs(
> >   out_agresv:
> >  	xfs_fs_unreserve_ag_blocks(mp);
> >   out_quota:
> > +	/* Clean out dquots that might be in memory after quotacheck. */
> > +	xfs_qm_unmount(mp);
> >  	xfs_qm_unmount_quotas(mp);
> 
> Should these calls be reversed? It looks like qm_unmount() can free
> m_quotainfo before qm_unmount_quotas() has a chance to release quota
> inodes and whatnot.

xfs_qm_unmount -> xfs_qm_destroy_quotainfo will free the quota inodes,
but you're right that we need (at least in theory) to call
xfs_qm_unmount_quotas first so that we can xfs_qm_dqdetach the
root/rbm/rsum inodes.  I don't think we've actually attached dquots to
those inodes at that point in the mount process (and the slab leak
complaints have stopped) so at a practical level it might not matter.

Will change & resubmit,

--D

> 
> Brian
> 
> >   out_rtunmount:
> >  	xfs_rtunmount_inodes(mp);
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/3] xfs: clear MS_ACTIVE after finishing log recovery to avoid inode leak
@ 2017-08-10  5:23 Darrick J. Wong
  2017-08-10  5:23 ` [PATCH 2/3] xfs: evict all inodes involved with log redo item recovery Darrick J. Wong
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Darrick J. Wong @ 2017-08-10  5:23 UTC (permalink / raw)
  To: linux-xfs; +Cc: Brian Foster

From: Darrick J. Wong <darrick.wong@oracle.com>

Way back when we established inode block-map redo log items, it was
discovered that we needed to prevent the VFS from evicting inodes during
log recovery because any given inode might be have bmap redo items to
replay even if the inode has no link count and is ultimately deleted,
and any eviction of an unlinked inode causes the inode to be truncated
and freed too early.

To make this possible, we set MS_ACTIVE so that inodes would not be torn
down immediately upon release.  Unfortunately, this also results in the
quota inodes not being released at all if a later part of the mount
process should fail, because we never reclaim the inodes.  So, clear
MS_ACTIVE immediately after we finish the log recovery so that the quota
inodes will be torn down properly if we abort the mount.

Fixes: 17c12bcd30 ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
---
v2: wrap the xfs_log_mount_finish more tightly with the MS_ACTIVE mods
---
 fs/xfs/xfs_mount.c |   16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)


diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 40d4e8b..d63a367 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -945,20 +945,21 @@ xfs_mountfs(
 	}
 
 	/*
+	 * Finish recovering the file system.  This part needed to be delayed
+	 * until after the root and real-time bitmap inodes were consistently
+	 * read in.
+	 *
 	 * During the second phase of log recovery, we need iget and
 	 * iput to behave like they do for an active filesystem.
 	 * xfs_fs_drop_inode needs to be able to prevent the deletion
 	 * of inodes before we're done replaying log items on those
-	 * inodes.
+	 * inodes.  Turn it off immediately after xfs_log_mount_finish
+	 * so that we don't leak the quota inodes if subsequent mount
+	 * activities fail.
 	 */
 	mp->m_super->s_flags |= MS_ACTIVE;
-
-	/*
-	 * Finish recovering the file system.  This part needed to be delayed
-	 * until after the root and real-time bitmap inodes were consistently
-	 * read in.
-	 */
 	error = xfs_log_mount_finish(mp);
+	mp->m_super->s_flags &= ~MS_ACTIVE;
 	if (error) {
 		xfs_warn(mp, "log mount finish failed");
 		goto out_rtunmount;
@@ -1028,7 +1029,6 @@ xfs_mountfs(
  out_quota:
 	xfs_qm_unmount_quotas(mp);
  out_rtunmount:
-	mp->m_super->s_flags &= ~MS_ACTIVE;
 	xfs_rtunmount_inodes(mp);
  out_rele_rip:
 	IRELE(rip);


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/3] xfs: evict all inodes involved with log redo item recovery
  2017-08-10  5:23 [PATCH 1/3] xfs: clear MS_ACTIVE after finishing log recovery to avoid inode leak Darrick J. Wong
@ 2017-08-10  5:23 ` Darrick J. Wong
  2017-08-10 14:51   ` Brian Foster
  2017-08-11 19:50   ` [PATCH v2 " Darrick J. Wong
  2017-08-10  5:23 ` [PATCH 3/3] xfs: don't leak quotacheck dquots when cow recovery fails Darrick J. Wong
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 20+ messages in thread
From: Darrick J. Wong @ 2017-08-10  5:23 UTC (permalink / raw)
  To: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

When we introduced the bmap redo log items, we set MS_ACTIVE on the
mountpoint and XFS_IRECOVERY on the inode to prevent unlinked inodes
from being truncated prematurely during log recovery.  This also had the
effect of putting linked inodes on the lru instead of evicting them.

Unfortunately, we neglected to find all those unreferenced lru inodes
and evict them after finishing log recovery, which means that we leak
them if anything goes wrong in the rest of xfs_mountfs, because the lru
is only cleaned out on unmount.

Therefore, look for unreferenced inodes in the lru list immediately
after clearing MS_ACTIVE.  If we find any, igrab/iput them so that
they're evicted from memory.

Fixes: 17c12bcd30 ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_mount.c |   29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)


diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index d63a367..8da5155 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -616,6 +616,34 @@ xfs_default_resblks(xfs_mount_t *mp)
 }
 
 /*
+ * Find inodes that were put on the LRU during the last part of log
+ * recovery that are no longer referenced by anything, and igrab/iput
+ * them so that they'll be evicted now.
+ *
+ * We let all inodes involved in redo item processing end up on the LRU
+ * instead of being evicted immediately so that if we do something to an
+ * unlinked inode, the irele won't cause premature truncation and
+ * freeing of the inode, which results in log recovery failure.  We have
+ * to evict the unreferenced lru inodes here because we don't otherwise
+ * clean up the lru if there's a subsequent failure in xfs_mountfs,
+ * which leads to us leaking the inodes if nothing else (e.g. quotacheck)
+ * references the inodes before the failure occurs.
+ */
+STATIC void
+xfs_mountfs_evict_log_redo_inodes(
+	struct xfs_mount	*mp)
+{
+	struct super_block	*sb = mp->m_super;
+	struct inode		*inode;
+	struct inode		*next;
+
+	list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
+		if (!atomic_read(&inode->i_count))
+			iput(igrab(inode));
+	}
+}
+
+/*
  * This function does the following on an initial mount of a file system:
  *	- reads the superblock from disk and init the mount struct
  *	- if we're a 32-bit kernel, do a size check on the superblock
@@ -960,6 +988,7 @@ xfs_mountfs(
 	mp->m_super->s_flags |= MS_ACTIVE;
 	error = xfs_log_mount_finish(mp);
 	mp->m_super->s_flags &= ~MS_ACTIVE;
+	xfs_mountfs_evict_log_redo_inodes(mp);
 	if (error) {
 		xfs_warn(mp, "log mount finish failed");
 		goto out_rtunmount;


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/3] xfs: don't leak quotacheck dquots when cow recovery fails
  2017-08-10  5:23 [PATCH 1/3] xfs: clear MS_ACTIVE after finishing log recovery to avoid inode leak Darrick J. Wong
  2017-08-10  5:23 ` [PATCH 2/3] xfs: evict all inodes involved with log redo item recovery Darrick J. Wong
@ 2017-08-10  5:23 ` Darrick J. Wong
  2017-08-10 14:51   ` Brian Foster
                     ` (2 more replies)
  2017-08-10 18:15 ` [PATCH 1/3] xfs: clear MS_ACTIVE after finishing log recovery to avoid inode leak Allison Henderson
  2017-08-11 11:13 ` Christoph Hellwig
  3 siblings, 3 replies; 20+ messages in thread
From: Darrick J. Wong @ 2017-08-10  5:23 UTC (permalink / raw)
  To: linux-xfs

From: Darrick J. Wong <darrick.wong@oracle.com>

If we fail a mount on account of cow recovery errors, it's possible that
a previous quotacheck left some dquots in memory.  The bailout clause of
xfs_mountfs forgets to purge these, and so we leak them.  Fix that.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v2: unmount quotas, then free all the quota information
---
 fs/xfs/xfs_mount.c |    2 ++
 1 file changed, 2 insertions(+)


diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 8da5155..d50f4a7 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1057,6 +1057,8 @@ xfs_mountfs(
 	xfs_fs_unreserve_ag_blocks(mp);
  out_quota:
 	xfs_qm_unmount_quotas(mp);
+	/* Clean out dquots that might be in memory after quotacheck. */
+	xfs_qm_unmount(mp);
  out_rtunmount:
 	xfs_rtunmount_inodes(mp);
  out_rele_rip:


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] xfs: evict all inodes involved with log redo item recovery
  2017-08-10  5:23 ` [PATCH 2/3] xfs: evict all inodes involved with log redo item recovery Darrick J. Wong
@ 2017-08-10 14:51   ` Brian Foster
  2017-08-10 17:18     ` Darrick J. Wong
  2017-08-11 19:50   ` [PATCH v2 " Darrick J. Wong
  1 sibling, 1 reply; 20+ messages in thread
From: Brian Foster @ 2017-08-10 14:51 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Aug 09, 2017 at 10:23:51PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> When we introduced the bmap redo log items, we set MS_ACTIVE on the
> mountpoint and XFS_IRECOVERY on the inode to prevent unlinked inodes
> from being truncated prematurely during log recovery.  This also had the
> effect of putting linked inodes on the lru instead of evicting them.
> 
> Unfortunately, we neglected to find all those unreferenced lru inodes
> and evict them after finishing log recovery, which means that we leak
> them if anything goes wrong in the rest of xfs_mountfs, because the lru
> is only cleaned out on unmount.
> 
> Therefore, look for unreferenced inodes in the lru list immediately
> after clearing MS_ACTIVE.  If we find any, igrab/iput them so that
> they're evicted from memory.
> 
> Fixes: 17c12bcd30 ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped")
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/xfs_mount.c |   29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> 
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index d63a367..8da5155 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -616,6 +616,34 @@ xfs_default_resblks(xfs_mount_t *mp)
>  }
>  
>  /*
> + * Find inodes that were put on the LRU during the last part of log
> + * recovery that are no longer referenced by anything, and igrab/iput
> + * them so that they'll be evicted now.
> + *
> + * We let all inodes involved in redo item processing end up on the LRU
> + * instead of being evicted immediately so that if we do something to an
> + * unlinked inode, the irele won't cause premature truncation and
> + * freeing of the inode, which results in log recovery failure.  We have
> + * to evict the unreferenced lru inodes here because we don't otherwise
> + * clean up the lru if there's a subsequent failure in xfs_mountfs,
> + * which leads to us leaking the inodes if nothing else (e.g. quotacheck)
> + * references the inodes before the failure occurs.
> + */
> +STATIC void
> +xfs_mountfs_evict_log_redo_inodes(
> +	struct xfs_mount	*mp)
> +{
> +	struct super_block	*sb = mp->m_super;
> +	struct inode		*inode;
> +	struct inode		*next;
> +

Perhaps assert that MS_ACTIVE is cleared here as a bit of
self-documentation..?

> +	list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
> +		if (!atomic_read(&inode->i_count))
> +			iput(igrab(inode));
> +	}

I assume we don't need s_inodes_list_lock here since nothing else should
be messing with the list..? If so, this looks fine otherwise. Though any
reason you didn't end up calling evict_inodes() here? Because it's not
exported..?

Brian

> +}
> +
> +/*
>   * This function does the following on an initial mount of a file system:
>   *	- reads the superblock from disk and init the mount struct
>   *	- if we're a 32-bit kernel, do a size check on the superblock
> @@ -960,6 +988,7 @@ xfs_mountfs(
>  	mp->m_super->s_flags |= MS_ACTIVE;
>  	error = xfs_log_mount_finish(mp);
>  	mp->m_super->s_flags &= ~MS_ACTIVE;
> +	xfs_mountfs_evict_log_redo_inodes(mp);
>  	if (error) {
>  		xfs_warn(mp, "log mount finish failed");
>  		goto out_rtunmount;
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/3] xfs: don't leak quotacheck dquots when cow recovery fails
  2017-08-10  5:23 ` [PATCH 3/3] xfs: don't leak quotacheck dquots when cow recovery fails Darrick J. Wong
@ 2017-08-10 14:51   ` Brian Foster
  2017-08-11 11:19   ` Christoph Hellwig
  2017-08-11 19:48   ` [PATCH v3 " Darrick J. Wong
  2 siblings, 0 replies; 20+ messages in thread
From: Brian Foster @ 2017-08-10 14:51 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Aug 09, 2017 at 10:23:58PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> If we fail a mount on account of cow recovery errors, it's possible that
> a previous quotacheck left some dquots in memory.  The bailout clause of
> xfs_mountfs forgets to purge these, and so we leak them.  Fix that.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> v2: unmount quotas, then free all the quota information
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_mount.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> 
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index 8da5155..d50f4a7 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -1057,6 +1057,8 @@ xfs_mountfs(
>  	xfs_fs_unreserve_ag_blocks(mp);
>   out_quota:
>  	xfs_qm_unmount_quotas(mp);
> +	/* Clean out dquots that might be in memory after quotacheck. */
> +	xfs_qm_unmount(mp);
>   out_rtunmount:
>  	xfs_rtunmount_inodes(mp);
>   out_rele_rip:
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] xfs: evict all inodes involved with log redo item recovery
  2017-08-10 14:51   ` Brian Foster
@ 2017-08-10 17:18     ` Darrick J. Wong
  2017-08-10 17:54       ` Brian Foster
  2017-08-11 11:22       ` Christoph Hellwig
  0 siblings, 2 replies; 20+ messages in thread
From: Darrick J. Wong @ 2017-08-10 17:18 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Thu, Aug 10, 2017 at 10:51:26AM -0400, Brian Foster wrote:
> On Wed, Aug 09, 2017 at 10:23:51PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > When we introduced the bmap redo log items, we set MS_ACTIVE on the
> > mountpoint and XFS_IRECOVERY on the inode to prevent unlinked inodes
> > from being truncated prematurely during log recovery.  This also had the
> > effect of putting linked inodes on the lru instead of evicting them.
> > 
> > Unfortunately, we neglected to find all those unreferenced lru inodes
> > and evict them after finishing log recovery, which means that we leak
> > them if anything goes wrong in the rest of xfs_mountfs, because the lru
> > is only cleaned out on unmount.
> > 
> > Therefore, look for unreferenced inodes in the lru list immediately
> > after clearing MS_ACTIVE.  If we find any, igrab/iput them so that
> > they're evicted from memory.
> > 
> > Fixes: 17c12bcd30 ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped")
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/xfs_mount.c |   29 +++++++++++++++++++++++++++++
> >  1 file changed, 29 insertions(+)
> > 
> > 
> > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > index d63a367..8da5155 100644
> > --- a/fs/xfs/xfs_mount.c
> > +++ b/fs/xfs/xfs_mount.c
> > @@ -616,6 +616,34 @@ xfs_default_resblks(xfs_mount_t *mp)
> >  }
> >  
> >  /*
> > + * Find inodes that were put on the LRU during the last part of log
> > + * recovery that are no longer referenced by anything, and igrab/iput
> > + * them so that they'll be evicted now.
> > + *
> > + * We let all inodes involved in redo item processing end up on the LRU
> > + * instead of being evicted immediately so that if we do something to an
> > + * unlinked inode, the irele won't cause premature truncation and
> > + * freeing of the inode, which results in log recovery failure.  We have
> > + * to evict the unreferenced lru inodes here because we don't otherwise
> > + * clean up the lru if there's a subsequent failure in xfs_mountfs,
> > + * which leads to us leaking the inodes if nothing else (e.g. quotacheck)
> > + * references the inodes before the failure occurs.
> > + */
> > +STATIC void
> > +xfs_mountfs_evict_log_redo_inodes(
> > +	struct xfs_mount	*mp)
> > +{
> > +	struct super_block	*sb = mp->m_super;
> > +	struct inode		*inode;
> > +	struct inode		*next;
> > +
> 
> Perhaps assert that MS_ACTIVE is cleared here as a bit of
> self-documentation..?

Huh?  That's the previous patch; MS_ACTIVE should already be clear when
we get here.

> > +	list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
> > +		if (!atomic_read(&inode->i_count))
> > +			iput(igrab(inode));
> > +	}
> 
> I assume we don't need s_inodes_list_lock here since nothing else should
> be messing with the list..?

Nothing should be messing with the list, though the inode_lru_list_del
in iput_final will take s_inode_list_lock.

> If so, this looks fine otherwise. Though any reason you didn't end up
> calling evict_inodes() here? Because it's not exported..?

I felt there's less messing around with vfs internals by sticking to
igrab and iput (and letting them deal with the internal inode state)
since that's what xfs uses elsewhere to manage inode life cycles.

--D

> 
> Brian
> 
> > +}
> > +
> > +/*
> >   * This function does the following on an initial mount of a file system:
> >   *	- reads the superblock from disk and init the mount struct
> >   *	- if we're a 32-bit kernel, do a size check on the superblock
> > @@ -960,6 +988,7 @@ xfs_mountfs(
> >  	mp->m_super->s_flags |= MS_ACTIVE;
> >  	error = xfs_log_mount_finish(mp);
> >  	mp->m_super->s_flags &= ~MS_ACTIVE;
> > +	xfs_mountfs_evict_log_redo_inodes(mp);
> >  	if (error) {
> >  		xfs_warn(mp, "log mount finish failed");
> >  		goto out_rtunmount;
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] xfs: evict all inodes involved with log redo item recovery
  2017-08-10 17:18     ` Darrick J. Wong
@ 2017-08-10 17:54       ` Brian Foster
  2017-08-11 11:22       ` Christoph Hellwig
  1 sibling, 0 replies; 20+ messages in thread
From: Brian Foster @ 2017-08-10 17:54 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Thu, Aug 10, 2017 at 10:18:51AM -0700, Darrick J. Wong wrote:
> On Thu, Aug 10, 2017 at 10:51:26AM -0400, Brian Foster wrote:
> > On Wed, Aug 09, 2017 at 10:23:51PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > > 
> > > When we introduced the bmap redo log items, we set MS_ACTIVE on the
> > > mountpoint and XFS_IRECOVERY on the inode to prevent unlinked inodes
> > > from being truncated prematurely during log recovery.  This also had the
> > > effect of putting linked inodes on the lru instead of evicting them.
> > > 
> > > Unfortunately, we neglected to find all those unreferenced lru inodes
> > > and evict them after finishing log recovery, which means that we leak
> > > them if anything goes wrong in the rest of xfs_mountfs, because the lru
> > > is only cleaned out on unmount.
> > > 
> > > Therefore, look for unreferenced inodes in the lru list immediately
> > > after clearing MS_ACTIVE.  If we find any, igrab/iput them so that
> > > they're evicted from memory.
> > > 
> > > Fixes: 17c12bcd30 ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped")
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > >  fs/xfs/xfs_mount.c |   29 +++++++++++++++++++++++++++++
> > >  1 file changed, 29 insertions(+)
> > > 
> > > 
> > > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > > index d63a367..8da5155 100644
> > > --- a/fs/xfs/xfs_mount.c
> > > +++ b/fs/xfs/xfs_mount.c
> > > @@ -616,6 +616,34 @@ xfs_default_resblks(xfs_mount_t *mp)
> > >  }
> > >  
> > >  /*
> > > + * Find inodes that were put on the LRU during the last part of log
> > > + * recovery that are no longer referenced by anything, and igrab/iput
> > > + * them so that they'll be evicted now.
> > > + *
> > > + * We let all inodes involved in redo item processing end up on the LRU
> > > + * instead of being evicted immediately so that if we do something to an
> > > + * unlinked inode, the irele won't cause premature truncation and
> > > + * freeing of the inode, which results in log recovery failure.  We have
> > > + * to evict the unreferenced lru inodes here because we don't otherwise
> > > + * clean up the lru if there's a subsequent failure in xfs_mountfs,
> > > + * which leads to us leaking the inodes if nothing else (e.g. quotacheck)
> > > + * references the inodes before the failure occurs.
> > > + */
> > > +STATIC void
> > > +xfs_mountfs_evict_log_redo_inodes(
> > > +	struct xfs_mount	*mp)
> > > +{
> > > +	struct super_block	*sb = mp->m_super;
> > > +	struct inode		*inode;
> > > +	struct inode		*next;
> > > +
> > 
> > Perhaps assert that MS_ACTIVE is cleared here as a bit of
> > self-documentation..?
> 
> Huh?  That's the previous patch; MS_ACTIVE should already be clear when
> we get here.
> 

Right.. I mean add something like the following at the top of the
function:

ASSERT(!(mp->m_super->s_flags & MS_ACTIVE));

... so it's clear that this function only serves a purpose when the
filesystem is not active.

> > > +	list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
> > > +		if (!atomic_read(&inode->i_count))
> > > +			iput(igrab(inode));
> > > +	}
> > 
> > I assume we don't need s_inodes_list_lock here since nothing else should
> > be messing with the list..?
> 
> Nothing should be messing with the list, though the inode_lru_list_del
> in iput_final will take s_inode_list_lock.
> 
> > If so, this looks fine otherwise. Though any reason you didn't end up
> > calling evict_inodes() here? Because it's not exported..?
> 
> I felt there's less messing around with vfs internals by sticking to
> igrab and iput (and letting them deal with the internal inode state)
> since that's what xfs uses elsewhere to manage inode life cycles.
> 

Ok, sounds good:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> --D
> 
> > 
> > Brian
> > 
> > > +}
> > > +
> > > +/*
> > >   * This function does the following on an initial mount of a file system:
> > >   *	- reads the superblock from disk and init the mount struct
> > >   *	- if we're a 32-bit kernel, do a size check on the superblock
> > > @@ -960,6 +988,7 @@ xfs_mountfs(
> > >  	mp->m_super->s_flags |= MS_ACTIVE;
> > >  	error = xfs_log_mount_finish(mp);
> > >  	mp->m_super->s_flags &= ~MS_ACTIVE;
> > > +	xfs_mountfs_evict_log_redo_inodes(mp);
> > >  	if (error) {
> > >  		xfs_warn(mp, "log mount finish failed");
> > >  		goto out_rtunmount;
> > > 
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] xfs: clear MS_ACTIVE after finishing log recovery to avoid inode leak
  2017-08-10  5:23 [PATCH 1/3] xfs: clear MS_ACTIVE after finishing log recovery to avoid inode leak Darrick J. Wong
  2017-08-10  5:23 ` [PATCH 2/3] xfs: evict all inodes involved with log redo item recovery Darrick J. Wong
  2017-08-10  5:23 ` [PATCH 3/3] xfs: don't leak quotacheck dquots when cow recovery fails Darrick J. Wong
@ 2017-08-10 18:15 ` Allison Henderson
  2017-08-11 11:13 ` Christoph Hellwig
  3 siblings, 0 replies; 20+ messages in thread
From: Allison Henderson @ 2017-08-10 18:15 UTC (permalink / raw)
  To: Darrick J. Wong, linux-xfs; +Cc: Brian Foster

On 8/9/2017 10:23 PM, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> Way back when we established inode block-map redo log items, it was
> discovered that we needed to prevent the VFS from evicting inodes during
> log recovery because any given inode might be have bmap redo items to
> replay even if the inode has no link count and is ultimately deleted,
> and any eviction of an unlinked inode causes the inode to be truncated
> and freed too early.
>
> To make this possible, we set MS_ACTIVE so that inodes would not be torn
> down immediately upon release.  Unfortunately, this also results in the
> quota inodes not being released at all if a later part of the mount
> process should fail, because we never reclaim the inodes.  So, clear
> MS_ACTIVE immediately after we finish the log recovery so that the quota
> inodes will be torn down properly if we abort the mount.
>
> Fixes: 17c12bcd30 ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped")
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Reviewed-by: Brian Foster <bfoster@redhat.com>

These look good to me.  You can add my review to your set.  Thanks!
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>

> ---
> v2: wrap the xfs_log_mount_finish more tightly with the MS_ACTIVE mods
> ---
>  fs/xfs/xfs_mount.c |   16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
>
>
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index 40d4e8b..d63a367 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -945,20 +945,21 @@ xfs_mountfs(
>  	}
>
>  	/*
> +	 * Finish recovering the file system.  This part needed to be delayed
> +	 * until after the root and real-time bitmap inodes were consistently
> +	 * read in.
> +	 *
>  	 * During the second phase of log recovery, we need iget and
>  	 * iput to behave like they do for an active filesystem.
>  	 * xfs_fs_drop_inode needs to be able to prevent the deletion
>  	 * of inodes before we're done replaying log items on those
> -	 * inodes.
> +	 * inodes.  Turn it off immediately after xfs_log_mount_finish
> +	 * so that we don't leak the quota inodes if subsequent mount
> +	 * activities fail.
>  	 */
>  	mp->m_super->s_flags |= MS_ACTIVE;
> -
> -	/*
> -	 * Finish recovering the file system.  This part needed to be delayed
> -	 * until after the root and real-time bitmap inodes were consistently
> -	 * read in.
> -	 */
>  	error = xfs_log_mount_finish(mp);
> +	mp->m_super->s_flags &= ~MS_ACTIVE;
>  	if (error) {
>  		xfs_warn(mp, "log mount finish failed");
>  		goto out_rtunmount;
> @@ -1028,7 +1029,6 @@ xfs_mountfs(
>   out_quota:
>  	xfs_qm_unmount_quotas(mp);
>   out_rtunmount:
> -	mp->m_super->s_flags &= ~MS_ACTIVE;
>  	xfs_rtunmount_inodes(mp);
>   out_rele_rip:
>  	IRELE(rip);
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html&d=DwICaQ&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=XFp4B05bcXkJ0dhYaFjd3F8telP01COkBp9cI7mKLb4&m=ZRJN-nlKvtwRQMyfqfP0fbTnWqpvu10UW0FgWde9x3s&s=9TJPrWuX_B_5LmwXgPPbM_1TBzmmtjFE3RKmTgccKLk&e=
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] xfs: clear MS_ACTIVE after finishing log recovery to avoid inode leak
  2017-08-10  5:23 [PATCH 1/3] xfs: clear MS_ACTIVE after finishing log recovery to avoid inode leak Darrick J. Wong
                   ` (2 preceding siblings ...)
  2017-08-10 18:15 ` [PATCH 1/3] xfs: clear MS_ACTIVE after finishing log recovery to avoid inode leak Allison Henderson
@ 2017-08-11 11:13 ` Christoph Hellwig
  3 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2017-08-11 11:13 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, Brian Foster

This looks much more reasonable than what we had before:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/3] xfs: don't leak quotacheck dquots when cow recovery fails
  2017-08-10  5:23 ` [PATCH 3/3] xfs: don't leak quotacheck dquots when cow recovery fails Darrick J. Wong
  2017-08-10 14:51   ` Brian Foster
@ 2017-08-11 11:19   ` Christoph Hellwig
  2017-08-11 19:48   ` [PATCH v3 " Darrick J. Wong
  2 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2017-08-11 11:19 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Wed, Aug 09, 2017 at 10:23:58PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> If we fail a mount on account of cow recovery errors, it's possible that
> a previous quotacheck left some dquots in memory.  The bailout clause of
> xfs_mountfs forgets to purge these, and so we leak them.  Fix that.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> v2: unmount quotas, then free all the quota information
> ---
>  fs/xfs/xfs_mount.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> 
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index 8da5155..d50f4a7 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -1057,6 +1057,8 @@ xfs_mountfs(
>  	xfs_fs_unreserve_ag_blocks(mp);
>   out_quota:
>  	xfs_qm_unmount_quotas(mp);
> +	/* Clean out dquots that might be in memory after quotacheck. */
> +	xfs_qm_unmount(mp);
>   out_rtunmount:
>  	xfs_rtunmount_inodes(mp);
>   out_rele_rip:

The unmount path calls this much later, e.g. the xfs_rtunmount_inodes
call follows ifirst, then we release the root inode, etc.

I'd suggest we try to follow that order as much as possible.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] xfs: evict all inodes involved with log redo item recovery
  2017-08-10 17:18     ` Darrick J. Wong
  2017-08-10 17:54       ` Brian Foster
@ 2017-08-11 11:22       ` Christoph Hellwig
  2017-08-11 16:51         ` Darrick J. Wong
  1 sibling, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2017-08-11 11:22 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Brian Foster, linux-xfs

On Thu, Aug 10, 2017 at 10:18:51AM -0700, Darrick J. Wong wrote:
> I felt there's less messing around with vfs internals by sticking to
> igrab and iput (and letting them deal with the internal inode state)
> since that's what xfs uses elsewhere to manage inode life cycles.

I'd feel much safer calling evict_inodes rather than reimplementing
it poorly.  In fact we might want to just call it in the VFS after
a failed mount?

Either way please also add Al to the Cc for the next version.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] xfs: evict all inodes involved with log redo item recovery
  2017-08-11 11:22       ` Christoph Hellwig
@ 2017-08-11 16:51         ` Darrick J. Wong
  0 siblings, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2017-08-11 16:51 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Brian Foster, linux-xfs

On Fri, Aug 11, 2017 at 04:22:03AM -0700, Christoph Hellwig wrote:
> On Thu, Aug 10, 2017 at 10:18:51AM -0700, Darrick J. Wong wrote:
> > I felt there's less messing around with vfs internals by sticking to
> > igrab and iput (and letting them deal with the internal inode state)
> > since that's what xfs uses elsewhere to manage inode life cycles.
> 
> I'd feel much safer calling evict_inodes rather than reimplementing
> it poorly.  In fact we might want to just call it in the VFS after
> a failed mount?

I don't think we can call it in the vfs after a failed mount since by
that time we've pulled down all the fs-specific stuff by then, right?

In any case I think it's only xfs that needs this, since (afaict) the
ones I looked at don't set MS_ACTIVE in fill_super and have a small
set of inodes that they explicitly iget and iput.

> Either way please also add Al to the Cc for the next version.

Ok.

--D

> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v3 3/3] xfs: don't leak quotacheck dquots when cow recovery fails
  2017-08-10  5:23 ` [PATCH 3/3] xfs: don't leak quotacheck dquots when cow recovery fails Darrick J. Wong
  2017-08-10 14:51   ` Brian Foster
  2017-08-11 11:19   ` Christoph Hellwig
@ 2017-08-11 19:48   ` Darrick J. Wong
  2017-08-14 12:40     ` Brian Foster
  2 siblings, 1 reply; 20+ messages in thread
From: Darrick J. Wong @ 2017-08-11 19:48 UTC (permalink / raw)
  To: linux-xfs; +Cc: Christoph Hellwig

If we fail a mount on account of cow recovery errors, it's possible that
a previous quotacheck left some dquots in memory.  The bailout clause of
xfs_mountfs forgets to purge these, and so we leak them.  Fix that.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
v3: move call to the same place as it is in unmountfs
v2: move call below quota inode removal
---
 fs/xfs/xfs_mount.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index d63a367..a46c9d7 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1034,6 +1034,8 @@ xfs_mountfs(
 	IRELE(rip);
 	cancel_delayed_work_sync(&mp->m_reclaim_work);
 	xfs_reclaim_inodes(mp, SYNC_WAIT);
+	/* Clean out dquots that might be in memory after quotacheck. */
+	xfs_qm_unmount(mp);
  out_log_dealloc:
 	mp->m_flags |= XFS_MOUNT_UNMOUNTING;
 	xfs_log_mount_cancel(mp);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/3] xfs: evict all inodes involved with log redo item recovery
  2017-08-10  5:23 ` [PATCH 2/3] xfs: evict all inodes involved with log redo item recovery Darrick J. Wong
  2017-08-10 14:51   ` Brian Foster
@ 2017-08-11 19:50   ` Darrick J. Wong
  2017-08-11 23:42     ` Dave Chinner
  1 sibling, 1 reply; 20+ messages in thread
From: Darrick J. Wong @ 2017-08-11 19:50 UTC (permalink / raw)
  To: linux-xfs; +Cc: viro, Christoph Hellwig

When we introduced the bmap redo log items, we set MS_ACTIVE on the
mountpoint and XFS_IRECOVERY on the inode to prevent unlinked inodes
from being truncated prematurely during log recovery.  This also had the
effect of putting linked inodes on the lru instead of evicting them.

Unfortunately, we neglected to find all those unreferenced lru inodes
and evict them after finishing log recovery, which means that we leak
them if anything goes wrong in the rest of xfs_mountfs, because the lru
is only cleaned out on unmount.

Therefore, evict unreferenced inodes in the lru list immediately
after clearing MS_ACTIVE.

Fixes: 17c12bcd30 ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: viro@ZenIV.linux.org.uk
---
v2: use the vfs evict_inodes instead of duplicating it
---
 fs/inode.c         |    1 +
 fs/internal.h      |    1 -
 fs/xfs/xfs_mount.c |   12 ++++++++++++
 include/linux/fs.h |    1 +
 4 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/fs/inode.c b/fs/inode.c
index 5037059..6a1626e 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -637,6 +637,7 @@ void evict_inodes(struct super_block *sb)
 
 	dispose_list(&dispose);
 }
+EXPORT_SYMBOL_GPL(evict_inodes);
 
 /**
  * invalidate_inodes	- attempt to free all inodes on a superblock
diff --git a/fs/internal.h b/fs/internal.h
index 9676fe1..fedfe94 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -132,7 +132,6 @@ static inline bool atime_needs_update_rcu(const struct path *path,
 extern void inode_io_list_del(struct inode *inode);
 
 extern long get_nr_dirty_inodes(void);
-extern void evict_inodes(struct super_block *);
 extern int invalidate_inodes(struct super_block *, bool);
 
 /*
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index a46c9d7..351e2c3 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -956,10 +956,22 @@ xfs_mountfs(
 	 * inodes.  Turn it off immediately after xfs_log_mount_finish
 	 * so that we don't leak the quota inodes if subsequent mount
 	 * activities fail.
+	 *
+	 * We let all inodes involved in redo item processing end up on
+	 * the LRU instead of being evicted immediately so that if we do
+	 * something to an unlinked inode, the irele won't cause
+	 * premature truncation and freeing of the inode, which results
+	 * in log recovery failure.  We have to evict the unreferenced
+	 * lru inodes after clearing MS_ACTIVE because we don't
+	 * otherwise clean up the lru if there's a subsequent failure in
+	 * xfs_mountfs, which leads to us leaking the inodes if nothing
+	 * else (e.g. quotacheck) references the inodes before the
+	 * mount failure occurs.
 	 */
 	mp->m_super->s_flags |= MS_ACTIVE;
 	error = xfs_log_mount_finish(mp);
 	mp->m_super->s_flags &= ~MS_ACTIVE;
+	evict_inodes(mp->m_super);
 	if (error) {
 		xfs_warn(mp, "log mount finish failed");
 		goto out_rtunmount;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7b5d681..e730438 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2830,6 +2830,7 @@ static inline void lockdep_annotate_inode_mutex_key(struct inode *inode) { };
 #endif
 extern void unlock_new_inode(struct inode *);
 extern unsigned int get_next_ino(void);
+extern void evict_inodes(struct super_block *sb);
 
 extern void __iget(struct inode * inode);
 extern void iget_failed(struct inode *);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/3] xfs: evict all inodes involved with log redo item recovery
  2017-08-11 19:50   ` [PATCH v2 " Darrick J. Wong
@ 2017-08-11 23:42     ` Dave Chinner
  2017-08-11 23:59       ` Darrick J. Wong
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2017-08-11 23:42 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, viro, Christoph Hellwig

On Fri, Aug 11, 2017 at 12:50:10PM -0700, Darrick J. Wong wrote:
> When we introduced the bmap redo log items, we set MS_ACTIVE on the
> mountpoint and XFS_IRECOVERY on the inode to prevent unlinked inodes
> from being truncated prematurely during log recovery.  This also had the
> effect of putting linked inodes on the lru instead of evicting them.
> 
> Unfortunately, we neglected to find all those unreferenced lru inodes
> and evict them after finishing log recovery, which means that we leak
> them if anything goes wrong in the rest of xfs_mountfs, because the lru
> is only cleaned out on unmount.
> 
> Therefore, evict unreferenced inodes in the lru list immediately
> after clearing MS_ACTIVE.
> 
> Fixes: 17c12bcd30 ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped")
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> Cc: viro@ZenIV.linux.org.uk
> ---
> v2: use the vfs evict_inodes instead of duplicating it
> ---
>  fs/inode.c         |    1 +
>  fs/internal.h      |    1 -
>  fs/xfs/xfs_mount.c |   12 ++++++++++++
>  include/linux/fs.h |    1 +
>  4 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index 5037059..6a1626e 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -637,6 +637,7 @@ void evict_inodes(struct super_block *sb)
>  
>  	dispose_list(&dispose);
>  }
> +EXPORT_SYMBOL_GPL(evict_inodes);
>  
>  /**
>   * invalidate_inodes	- attempt to free all inodes on a superblock
> diff --git a/fs/internal.h b/fs/internal.h
> index 9676fe1..fedfe94 100644
> --- a/fs/internal.h
> +++ b/fs/internal.h
> @@ -132,7 +132,6 @@ static inline bool atime_needs_update_rcu(const struct path *path,
>  extern void inode_io_list_del(struct inode *inode);
>  
>  extern long get_nr_dirty_inodes(void);
> -extern void evict_inodes(struct super_block *);
>  extern int invalidate_inodes(struct super_block *, bool);
>  
>  /*
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index a46c9d7..351e2c3 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -956,10 +956,22 @@ xfs_mountfs(
>  	 * inodes.  Turn it off immediately after xfs_log_mount_finish
>  	 * so that we don't leak the quota inodes if subsequent mount
>  	 * activities fail.
> +	 *
> +	 * We let all inodes involved in redo item processing end up on
> +	 * the LRU instead of being evicted immediately so that if we do
> +	 * something to an unlinked inode, the irele won't cause
> +	 * premature truncation and freeing of the inode, which results
> +	 * in log recovery failure.  We have to evict the unreferenced
> +	 * lru inodes after clearing MS_ACTIVE because we don't
> +	 * otherwise clean up the lru if there's a subsequent failure in
> +	 * xfs_mountfs, which leads to us leaking the inodes if nothing
> +	 * else (e.g. quotacheck) references the inodes before the
> +	 * mount failure occurs.
>  	 */
>  	mp->m_super->s_flags |= MS_ACTIVE;
>  	error = xfs_log_mount_finish(mp);
>  	mp->m_super->s_flags &= ~MS_ACTIVE;
> +	evict_inodes(mp->m_super);

Shouldn't all this MS_ACTIVE flag and inode eviction stuff be put
inside xfs_log_mount_finish()? Seems to me like wrapping it aroudn
the outside is the wrong place to be putting it...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/3] xfs: evict all inodes involved with log redo item recovery
  2017-08-11 23:42     ` Dave Chinner
@ 2017-08-11 23:59       ` Darrick J. Wong
  0 siblings, 0 replies; 20+ messages in thread
From: Darrick J. Wong @ 2017-08-11 23:59 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, viro, Christoph Hellwig

On Sat, Aug 12, 2017 at 09:42:04AM +1000, Dave Chinner wrote:
> On Fri, Aug 11, 2017 at 12:50:10PM -0700, Darrick J. Wong wrote:
> > When we introduced the bmap redo log items, we set MS_ACTIVE on the
> > mountpoint and XFS_IRECOVERY on the inode to prevent unlinked inodes
> > from being truncated prematurely during log recovery.  This also had the
> > effect of putting linked inodes on the lru instead of evicting them.
> > 
> > Unfortunately, we neglected to find all those unreferenced lru inodes
> > and evict them after finishing log recovery, which means that we leak
> > them if anything goes wrong in the rest of xfs_mountfs, because the lru
> > is only cleaned out on unmount.
> > 
> > Therefore, evict unreferenced inodes in the lru list immediately
> > after clearing MS_ACTIVE.
> > 
> > Fixes: 17c12bcd30 ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped")
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > Cc: viro@ZenIV.linux.org.uk
> > ---
> > v2: use the vfs evict_inodes instead of duplicating it
> > ---
> >  fs/inode.c         |    1 +
> >  fs/internal.h      |    1 -
> >  fs/xfs/xfs_mount.c |   12 ++++++++++++
> >  include/linux/fs.h |    1 +
> >  4 files changed, 14 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/inode.c b/fs/inode.c
> > index 5037059..6a1626e 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -637,6 +637,7 @@ void evict_inodes(struct super_block *sb)
> >  
> >  	dispose_list(&dispose);
> >  }
> > +EXPORT_SYMBOL_GPL(evict_inodes);
> >  
> >  /**
> >   * invalidate_inodes	- attempt to free all inodes on a superblock
> > diff --git a/fs/internal.h b/fs/internal.h
> > index 9676fe1..fedfe94 100644
> > --- a/fs/internal.h
> > +++ b/fs/internal.h
> > @@ -132,7 +132,6 @@ static inline bool atime_needs_update_rcu(const struct path *path,
> >  extern void inode_io_list_del(struct inode *inode);
> >  
> >  extern long get_nr_dirty_inodes(void);
> > -extern void evict_inodes(struct super_block *);
> >  extern int invalidate_inodes(struct super_block *, bool);
> >  
> >  /*
> > diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> > index a46c9d7..351e2c3 100644
> > --- a/fs/xfs/xfs_mount.c
> > +++ b/fs/xfs/xfs_mount.c
> > @@ -956,10 +956,22 @@ xfs_mountfs(
> >  	 * inodes.  Turn it off immediately after xfs_log_mount_finish
> >  	 * so that we don't leak the quota inodes if subsequent mount
> >  	 * activities fail.
> > +	 *
> > +	 * We let all inodes involved in redo item processing end up on
> > +	 * the LRU instead of being evicted immediately so that if we do
> > +	 * something to an unlinked inode, the irele won't cause
> > +	 * premature truncation and freeing of the inode, which results
> > +	 * in log recovery failure.  We have to evict the unreferenced
> > +	 * lru inodes after clearing MS_ACTIVE because we don't
> > +	 * otherwise clean up the lru if there's a subsequent failure in
> > +	 * xfs_mountfs, which leads to us leaking the inodes if nothing
> > +	 * else (e.g. quotacheck) references the inodes before the
> > +	 * mount failure occurs.
> >  	 */
> >  	mp->m_super->s_flags |= MS_ACTIVE;
> >  	error = xfs_log_mount_finish(mp);
> >  	mp->m_super->s_flags &= ~MS_ACTIVE;
> > +	evict_inodes(mp->m_super);
> 
> Shouldn't all this MS_ACTIVE flag and inode eviction stuff be put
> inside xfs_log_mount_finish()? Seems to me like wrapping it aroudn
> the outside is the wrong place to be putting it...

Yeah, I suppose we ought to shove everything into xfs_log_mount_finish
instead of dumping it all here...

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v3 3/3] xfs: don't leak quotacheck dquots when cow recovery fails
  2017-08-11 19:48   ` [PATCH v3 " Darrick J. Wong
@ 2017-08-14 12:40     ` Brian Foster
  0 siblings, 0 replies; 20+ messages in thread
From: Brian Foster @ 2017-08-14 12:40 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, Christoph Hellwig

On Fri, Aug 11, 2017 at 12:48:44PM -0700, Darrick J. Wong wrote:
> If we fail a mount on account of cow recovery errors, it's possible that
> a previous quotacheck left some dquots in memory.  The bailout clause of
> xfs_mountfs forgets to purge these, and so we leak them.  Fix that.
> 
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> v3: move call to the same place as it is in unmountfs
> v2: move call below quota inode removal
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_mount.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index d63a367..a46c9d7 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -1034,6 +1034,8 @@ xfs_mountfs(
>  	IRELE(rip);
>  	cancel_delayed_work_sync(&mp->m_reclaim_work);
>  	xfs_reclaim_inodes(mp, SYNC_WAIT);
> +	/* Clean out dquots that might be in memory after quotacheck. */
> +	xfs_qm_unmount(mp);
>   out_log_dealloc:
>  	mp->m_flags |= XFS_MOUNT_UNMOUNTING;
>  	xfs_log_mount_cancel(mp);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-08-14 12:40 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-10  5:23 [PATCH 1/3] xfs: clear MS_ACTIVE after finishing log recovery to avoid inode leak Darrick J. Wong
2017-08-10  5:23 ` [PATCH 2/3] xfs: evict all inodes involved with log redo item recovery Darrick J. Wong
2017-08-10 14:51   ` Brian Foster
2017-08-10 17:18     ` Darrick J. Wong
2017-08-10 17:54       ` Brian Foster
2017-08-11 11:22       ` Christoph Hellwig
2017-08-11 16:51         ` Darrick J. Wong
2017-08-11 19:50   ` [PATCH v2 " Darrick J. Wong
2017-08-11 23:42     ` Dave Chinner
2017-08-11 23:59       ` Darrick J. Wong
2017-08-10  5:23 ` [PATCH 3/3] xfs: don't leak quotacheck dquots when cow recovery fails Darrick J. Wong
2017-08-10 14:51   ` Brian Foster
2017-08-11 11:19   ` Christoph Hellwig
2017-08-11 19:48   ` [PATCH v3 " Darrick J. Wong
2017-08-14 12:40     ` Brian Foster
2017-08-10 18:15 ` [PATCH 1/3] xfs: clear MS_ACTIVE after finishing log recovery to avoid inode leak Allison Henderson
2017-08-11 11:13 ` Christoph Hellwig
  -- strict thread matches above, loose matches on Subject: below --
2017-08-09  1:04 Darrick J. Wong
2017-08-09  1:07 ` [PATCH 3/3] xfs: don't leak quotacheck dquots when cow recovery fails Darrick J. Wong
2017-08-09 12:36   ` Brian Foster
2017-08-09 16:06     ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox