* [PATCH] xfs: fix premature enospc on inode allocation
@ 2014-12-01 0:32 Dave Chinner
2014-12-01 18:40 ` Brian Foster
0 siblings, 1 reply; 3+ messages in thread
From: Dave Chinner @ 2014-12-01 0:32 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
After growing a filesystem, XFS can fail to allocate inodes even
though there is a large amount of space available in the filesystem
for inodes. The issue is caused by a nearly full allocation group
having enough free space in it to be considered for inode
allocation, but not enough contiguous free space to actually
allocation inodes. This situation results in successful selection
of the AG for allocation, then failure of the allocation resulting
in ENOSPC being reported to the caller.
It is caused by two possible issues. Firstly, we only consider the
lognest free extent and whether it would fit an inode chunk. If the
extent is not correctly aligned, then we can't allocate an inode
chunk in it regardless of the fact that it is large enough. This
tends to be a permanent error until space in the AG is freed.
The second issue is that we don't actually lock the AGI or AGF when
we are doing these checks, and so by the time we get to actually
allocating the inode chunk the space we thought we had in the AG may
have been allocated. This tends to be a spurious error as it
requires a race to trigger. Hence this case is ignored in this patch
as the reported problem is for permanent errors.
The first issue could be addressed by simply taking into account the
alignment when checking the longest extent. This, however, would
prevent allocation in AGs that have aligned, exact sized extents
free. However, this case should be fairly rare compared to the
number of allocations that occur near ENOSPC that would trigger this
condition.
Hence, when selecting the inode AG, take into account the inode
cluster alignment when checking the lognest free extent in the AG.
If we can't find any AGs with a contiguous free space large
enough to be aligned, drop the alignment addition and just try for
an AG that has enough contiguous free space available for an inode
chunk. This won't prevent issues from occurring, but should avoid
situations where other AGs have lots of free space but the selected
AG can't allocate due to alignment constraints.
Reported-by: Arkadiusz Mi¿kiewicz <arekm@maven.pl>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/libxfs/xfs_ialloc.c | 34 ++++++++++++++++++++++++----------
1 file changed, 24 insertions(+), 10 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index d1dc590..53757bd 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -45,12 +45,12 @@
*/
static inline int
xfs_ialloc_cluster_alignment(
- xfs_alloc_arg_t *args)
+ struct xfs_mount *mp)
{
- if (xfs_sb_version_hasalign(&args->mp->m_sb) &&
- args->mp->m_sb.sb_inoalignmt >=
- XFS_B_TO_FSBT(args->mp, args->mp->m_inode_cluster_size))
- return args->mp->m_sb.sb_inoalignmt;
+ if (xfs_sb_version_hasalign(&mp->m_sb) &&
+ mp->m_sb.sb_inoalignmt >=
+ XFS_B_TO_FSBT(mp, mp->m_inode_cluster_size))
+ return mp->m_sb.sb_inoalignmt;
return 1;
}
@@ -409,7 +409,7 @@ xfs_ialloc_ag_alloc(
* but not to use them in the actual exact allocation.
*/
args.alignment = 1;
- args.minalignslop = xfs_ialloc_cluster_alignment(&args) - 1;
+ args.minalignslop = xfs_ialloc_cluster_alignment(args.mp) - 1;
/* Allow space for the inode btree to split. */
args.minleft = args.mp->m_in_maxlevels - 1;
@@ -445,7 +445,7 @@ xfs_ialloc_ag_alloc(
args.alignment = args.mp->m_dalign;
isaligned = 1;
} else
- args.alignment = xfs_ialloc_cluster_alignment(&args);
+ args.alignment = xfs_ialloc_cluster_alignment(args.mp);
/*
* Need to figure out where to allocate the inode blocks.
* Ideally they should be spaced out through the a.g.
@@ -474,7 +474,7 @@ xfs_ialloc_ag_alloc(
args.type = XFS_ALLOCTYPE_NEAR_BNO;
args.agbno = be32_to_cpu(agi->agi_root);
args.fsbno = XFS_AGB_TO_FSB(args.mp, agno, args.agbno);
- args.alignment = xfs_ialloc_cluster_alignment(&args);
+ args.alignment = xfs_ialloc_cluster_alignment(args.mp);
if ((error = xfs_alloc_vextent(&args)))
return error;
}
@@ -629,10 +629,24 @@ xfs_ialloc_ag_select(
}
/*
- * Is there enough free space for the file plus a block of
- * inodes? (if we need to allocate some)?
+ * Check that there enough free space for the file plus a chunk
+ * of inodes if we need to allocate some. If this is the first
+ * pass across the AGs, take into account the potential space
+ * needed for alignment of inode chunks when checking the
+ * longest contiguous free space in the AG - this prevents us
+ * from getting ENOSPC because we have free space larger than
+ * m_ialloc_blks but alignment constraints prevent us from using
+ * it.
+ *
+ * If we can't find an AG with space for full alignment slack to
+ * be taken into account, we must be near ENOSPC in all AGs.
+ * Hence we don't include alignment for the second pass and so
+ * if we fail allocation due to alignment issues then it is most
+ * likely a real ENOSPC condition.
*/
ineed = mp->m_ialloc_blks;
+ if (flags && ineed > 1)
+ ineed += xfs_ialloc_cluster_alignment(mp);
longest = pag->pagf_longest;
if (!longest)
longest = pag->pagf_flcount > 0;
--
2.0.0
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] xfs: fix premature enospc on inode allocation
2014-12-01 0:32 [PATCH] xfs: fix premature enospc on inode allocation Dave Chinner
@ 2014-12-01 18:40 ` Brian Foster
2014-12-01 21:51 ` Dave Chinner
0 siblings, 1 reply; 3+ messages in thread
From: Brian Foster @ 2014-12-01 18:40 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Mon, Dec 01, 2014 at 11:32:08AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> After growing a filesystem, XFS can fail to allocate inodes even
> though there is a large amount of space available in the filesystem
> for inodes. The issue is caused by a nearly full allocation group
> having enough free space in it to be considered for inode
> allocation, but not enough contiguous free space to actually
> allocation inodes. This situation results in successful selection
> of the AG for allocation, then failure of the allocation resulting
> in ENOSPC being reported to the caller.
>
> It is caused by two possible issues. Firstly, we only consider the
> lognest free extent and whether it would fit an inode chunk. If the
> extent is not correctly aligned, then we can't allocate an inode
> chunk in it regardless of the fact that it is large enough. This
> tends to be a permanent error until space in the AG is freed.
>
> The second issue is that we don't actually lock the AGI or AGF when
> we are doing these checks, and so by the time we get to actually
> allocating the inode chunk the space we thought we had in the AG may
> have been allocated. This tends to be a spurious error as it
> requires a race to trigger. Hence this case is ignored in this patch
> as the reported problem is for permanent errors.
>
> The first issue could be addressed by simply taking into account the
> alignment when checking the longest extent. This, however, would
> prevent allocation in AGs that have aligned, exact sized extents
> free. However, this case should be fairly rare compared to the
> number of allocations that occur near ENOSPC that would trigger this
> condition.
>
I think the allocation of aligned and exact size extents is already
prevented in the inode chunk allocation code (where it sets the
alignment requirement of the alloc.) and the associated checks in
xfs_alloc_fix_freelist() (in particular, the couple bits of logic there
that consider args->alignment and args->minalignslop).
This is an interesting observation made from early tests of the sparse
inode chunk work. At the point where an AG no longer satisfies inode
chunk allocations, the sparse chunk mechanism actually leads to behavior
where full inode chunks are allocated a sparse chunk at a time in
regions of free space that were technically capable of supporting inode
chunks before resorting to sparse allocs, but that the pre-allocation
checks were not granular enough to allow to proceed. Note that this
behavior assumes a workload that sequentially allocates inodes so as to
not compete with allocations for other purposes, of course.
This patch seems to make the higher level AG selection more consistent
with the lower level allocation attempt, which seems reasonable to me
because it shouldn't introduce any allocation failures that aren't going
to end up as failures anyways.
> Hence, when selecting the inode AG, take into account the inode
> cluster alignment when checking the lognest free extent in the AG.
> If we can't find any AGs with a contiguous free space large
> enough to be aligned, drop the alignment addition and just try for
> an AG that has enough contiguous free space available for an inode
> chunk. This won't prevent issues from occurring, but should avoid
> situations where other AGs have lots of free space but the selected
> AG can't allocate due to alignment constraints.
>
> Reported-by: Arkadiusz Mi¿kiewicz <arekm@maven.pl>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
Reviewed-by: Brian Foster <bfoster@redhat.com>
(minor comment nit in the last hunk)
> fs/xfs/libxfs/xfs_ialloc.c | 34 ++++++++++++++++++++++++----------
> 1 file changed, 24 insertions(+), 10 deletions(-)
>
> diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> index d1dc590..53757bd 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.c
> +++ b/fs/xfs/libxfs/xfs_ialloc.c
> @@ -45,12 +45,12 @@
> */
> static inline int
> xfs_ialloc_cluster_alignment(
> - xfs_alloc_arg_t *args)
> + struct xfs_mount *mp)
> {
> - if (xfs_sb_version_hasalign(&args->mp->m_sb) &&
> - args->mp->m_sb.sb_inoalignmt >=
> - XFS_B_TO_FSBT(args->mp, args->mp->m_inode_cluster_size))
> - return args->mp->m_sb.sb_inoalignmt;
> + if (xfs_sb_version_hasalign(&mp->m_sb) &&
> + mp->m_sb.sb_inoalignmt >=
> + XFS_B_TO_FSBT(mp, mp->m_inode_cluster_size))
> + return mp->m_sb.sb_inoalignmt;
> return 1;
> }
>
> @@ -409,7 +409,7 @@ xfs_ialloc_ag_alloc(
> * but not to use them in the actual exact allocation.
> */
> args.alignment = 1;
> - args.minalignslop = xfs_ialloc_cluster_alignment(&args) - 1;
> + args.minalignslop = xfs_ialloc_cluster_alignment(args.mp) - 1;
>
> /* Allow space for the inode btree to split. */
> args.minleft = args.mp->m_in_maxlevels - 1;
> @@ -445,7 +445,7 @@ xfs_ialloc_ag_alloc(
> args.alignment = args.mp->m_dalign;
> isaligned = 1;
> } else
> - args.alignment = xfs_ialloc_cluster_alignment(&args);
> + args.alignment = xfs_ialloc_cluster_alignment(args.mp);
> /*
> * Need to figure out where to allocate the inode blocks.
> * Ideally they should be spaced out through the a.g.
> @@ -474,7 +474,7 @@ xfs_ialloc_ag_alloc(
> args.type = XFS_ALLOCTYPE_NEAR_BNO;
> args.agbno = be32_to_cpu(agi->agi_root);
> args.fsbno = XFS_AGB_TO_FSB(args.mp, agno, args.agbno);
> - args.alignment = xfs_ialloc_cluster_alignment(&args);
> + args.alignment = xfs_ialloc_cluster_alignment(args.mp);
> if ((error = xfs_alloc_vextent(&args)))
> return error;
> }
> @@ -629,10 +629,24 @@ xfs_ialloc_ag_select(
> }
>
> /*
> - * Is there enough free space for the file plus a block of
> - * inodes? (if we need to allocate some)?
> + * Check that there enough free space for the file plus a chunk
is
> + * of inodes if we need to allocate some. If this is the first
> + * pass across the AGs, take into account the potential space
> + * needed for alignment of inode chunks when checking the
> + * longest contiguous free space in the AG - this prevents us
> + * from getting ENOSPC because we have free space larger than
> + * m_ialloc_blks but alignment constraints prevent us from using
> + * it.
> + *
> + * If we can't find an AG with space for full alignment slack to
> + * be taken into account, we must be near ENOSPC in all AGs.
> + * Hence we don't include alignment for the second pass and so
> + * if we fail allocation due to alignment issues then it is most
> + * likely a real ENOSPC condition.
> */
> ineed = mp->m_ialloc_blks;
> + if (flags && ineed > 1)
> + ineed += xfs_ialloc_cluster_alignment(mp);
> longest = pag->pagf_longest;
> if (!longest)
> longest = pag->pagf_flcount > 0;
> --
> 2.0.0
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] xfs: fix premature enospc on inode allocation
2014-12-01 18:40 ` Brian Foster
@ 2014-12-01 21:51 ` Dave Chinner
0 siblings, 0 replies; 3+ messages in thread
From: Dave Chinner @ 2014-12-01 21:51 UTC (permalink / raw)
To: Brian Foster; +Cc: xfs
On Mon, Dec 01, 2014 at 01:40:00PM -0500, Brian Foster wrote:
> On Mon, Dec 01, 2014 at 11:32:08AM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > After growing a filesystem, XFS can fail to allocate inodes even
> > though there is a large amount of space available in the filesystem
> > for inodes. The issue is caused by a nearly full allocation group
> > having enough free space in it to be considered for inode
> > allocation, but not enough contiguous free space to actually
> > allocation inodes. This situation results in successful selection
> > of the AG for allocation, then failure of the allocation resulting
> > in ENOSPC being reported to the caller.
> >
> > It is caused by two possible issues. Firstly, we only consider the
> > lognest free extent and whether it would fit an inode chunk. If the
> > extent is not correctly aligned, then we can't allocate an inode
> > chunk in it regardless of the fact that it is large enough. This
> > tends to be a permanent error until space in the AG is freed.
> >
> > The second issue is that we don't actually lock the AGI or AGF when
> > we are doing these checks, and so by the time we get to actually
> > allocating the inode chunk the space we thought we had in the AG may
> > have been allocated. This tends to be a spurious error as it
> > requires a race to trigger. Hence this case is ignored in this patch
> > as the reported problem is for permanent errors.
> >
> > The first issue could be addressed by simply taking into account the
> > alignment when checking the longest extent. This, however, would
> > prevent allocation in AGs that have aligned, exact sized extents
> > free. However, this case should be fairly rare compared to the
> > number of allocations that occur near ENOSPC that would trigger this
> > condition.
> >
>
> I think the allocation of aligned and exact size extents is already
> prevented in the inode chunk allocation code (where it sets the
> alignment requirement of the alloc.) and the associated checks in
> xfs_alloc_fix_freelist() (in particular, the couple bits of logic there
> that consider args->alignment and args->minalignslop).
Right.
> This is an interesting observation made from early tests of the sparse
> inode chunk work. At the point where an AG no longer satisfies inode
> chunk allocations, the sparse chunk mechanism actually leads to behavior
> where full inode chunks are allocated a sparse chunk at a time in
> regions of free space that were technically capable of supporting inode
> chunks before resorting to sparse allocs, but that the pre-allocation
> checks were not granular enough to allow to proceed. Note that this
> behavior assumes a workload that sequentially allocates inodes so as to
> not compete with allocations for other purposes, of course.
Yeah, sparse inode chunks change this logic because we can do single
block allocation for a partial inode chunk. However, for all the
existing filesystems that can't do partial inode chunks we still
need to handle this problem case.
> This patch seems to make the higher level AG selection more consistent
> with the lower level allocation attempt, which seems reasonable to me
> because it shouldn't introduce any allocation failures that aren't going
> to end up as failures anyways.
That's pretty much the conclusion I came to - I thought about trying
to combine the selection and allocation loops, but that's a lot more
work than is needed to fix the extent size requirements between the
two loops.
> > Hence, when selecting the inode AG, take into account the inode
> > cluster alignment when checking the lognest free extent in the AG.
> > If we can't find any AGs with a contiguous free space large
> > enough to be aligned, drop the alignment addition and just try for
> > an AG that has enough contiguous free space available for an inode
> > chunk. This won't prevent issues from occurring, but should avoid
> > situations where other AGs have lots of free space but the selected
> > AG can't allocate due to alignment constraints.
> >
> > Reported-by: Arkadiusz Mi¿kiewicz <arekm@maven.pl>
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
>
> Reviewed-by: Brian Foster <bfoster@redhat.com>
Thanks!
> (minor comment nit in the last hunk)
Fixed.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2014-12-01 21:51 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-01 0:32 [PATCH] xfs: fix premature enospc on inode allocation Dave Chinner
2014-12-01 18:40 ` Brian Foster
2014-12-01 21:51 ` Dave Chinner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox