linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH, RFC] xfs: take indirect blocks into accounting when selecting an AG
@ 2016-12-09 17:15 Christoph Hellwig
  2016-12-09 17:32 ` Darrick J. Wong
  2016-12-10  6:16 ` Eryu Guan
  0 siblings, 2 replies; 9+ messages in thread
From: Christoph Hellwig @ 2016-12-09 17:15 UTC (permalink / raw)
  To: linux-xfs; +Cc: eguan, darrick.wong

We need to make sure that the indirect blocks (e.g. bmap btree blocks)
can be allocated from the same AG [1] when comitting to an AG for a
file data block allocation.  To do that we calculate the worst possible
indirect len and subtract that from the free space in the AG.

I don't really like how this makes the space allocator call back into
the bmap code (even if only for a trivial helper), but I can't think
of a better idea.

[1] strictly speaking the same AG or one with a higher AG number, but
that is so incredibly hard to express that we settle for the same AG.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_alloc.c |  7 ++++++-
 fs/xfs/libxfs/xfs_bmap.c  | 17 +++++++++++------
 fs/xfs/libxfs/xfs_bmap.h  |  5 +++++
 3 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index effb64c..037cbd8 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -38,6 +38,7 @@
 #include "xfs_buf_item.h"
 #include "xfs_log.h"
 #include "xfs_ag_resv.h"
+#include "xfs_bmap.h"
 
 struct workqueue_struct *xfs_alloc_wq;
 
@@ -2058,6 +2059,7 @@ xfs_alloc_space_available(
 	struct xfs_perag	*pag = args->pag;
 	xfs_extlen_t		longest;
 	xfs_extlen_t		reservation; /* blocks that are still reserved */
+	xfs_extlen_t		indlen = 0;
 	int			available;
 
 	if (flags & XFS_ALLOC_FLAG_FREEING)
@@ -2071,9 +2073,12 @@ xfs_alloc_space_available(
 	if ((args->minlen + args->alignment + args->minalignslop - 1) > longest)
 		return false;
 
+	if (xfs_alloc_is_userdata(args->datatype))
+		indlen = __xfs_bmap_worst_indlen(args->mp, max(args->minlen, args->maxlen));
+
 	/* do we have enough free space remaining for the allocation? */
 	available = (int)(pag->pagf_freeblks + pag->pagf_flcount -
-			  reservation - min_free - args->total);
+			  reservation - min_free - indlen - args->total);
 	if (available < (int)args->minleft || available <= 0)
 		return false;
 
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 266f641..b52a32d 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -183,19 +183,16 @@ xfs_bmbt_update(
  * Compute the worst-case number of indirect blocks that will be used
  * for ip's delayed extent of length "len".
  */
-STATIC xfs_filblks_t
-xfs_bmap_worst_indlen(
-	xfs_inode_t	*ip,		/* incore inode pointer */
+xfs_filblks_t
+__xfs_bmap_worst_indlen(
+	xfs_mount_t	*mp,		/* mount structure */
 	xfs_filblks_t	len)		/* delayed extent length */
 {
 	int		level;		/* btree level number */
 	int		maxrecs;	/* maximum record count at this level */
-	xfs_mount_t	*mp;		/* mount structure */
 	xfs_filblks_t	rval;		/* return value */
 	xfs_filblks_t   orig_len;
 
-	mp = ip->i_mount;
-
 	/* Calculate the worst-case size of the bmbt. */
 	orig_len = len;
 	maxrecs = mp->m_bmap_dmxr[0];
@@ -222,6 +219,14 @@ xfs_bmap_worst_indlen(
 	return rval;
 }
 
+STATIC xfs_filblks_t
+xfs_bmap_worst_indlen(
+	xfs_inode_t	*ip,		/* incore inode pointer */
+	xfs_filblks_t	len)		/* delayed extent length */
+{
+	return __xfs_bmap_worst_indlen(ip->i_mount, len);
+}
+
 /*
  * Calculate the default attribute fork offset for newly created inodes.
  */
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index cecd094..aa2964a 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -263,4 +263,9 @@ int	xfs_bmap_map_extent(struct xfs_mount *mp, struct xfs_defer_ops *dfops,
 int	xfs_bmap_unmap_extent(struct xfs_mount *mp, struct xfs_defer_ops *dfops,
 		struct xfs_inode *ip, struct xfs_bmbt_irec *imap);
 
+xfs_filblks_t
+__xfs_bmap_worst_indlen(
+	xfs_mount_t	*mp,		/* mount structure */
+	xfs_filblks_t	len);		/* delayed extent length */
+
 #endif	/* __XFS_BMAP_H__ */
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH, RFC] xfs: take indirect blocks into accounting when selecting an AG
  2016-12-09 17:15 [PATCH, RFC] xfs: take indirect blocks into accounting when selecting an AG Christoph Hellwig
@ 2016-12-09 17:32 ` Darrick J. Wong
  2016-12-09 17:41   ` Christoph Hellwig
  2016-12-10  6:16 ` Eryu Guan
  1 sibling, 1 reply; 9+ messages in thread
From: Darrick J. Wong @ 2016-12-09 17:32 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, eguan

On Fri, Dec 09, 2016 at 06:15:09PM +0100, Christoph Hellwig wrote:
> We need to make sure that the indirect blocks (e.g. bmap btree blocks)
> can be allocated from the same AG [1] when comitting to an AG for a
> file data block allocation.  To do that we calculate the worst possible
> indirect len and subtract that from the free space in the AG.
> 
> I don't really like how this makes the space allocator call back into
> the bmap code (even if only for a trivial helper), but I can't think
> of a better idea.
> 
> [1] strictly speaking the same AG or one with a higher AG number, but
> that is so incredibly hard to express that we settle for the same AG.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/libxfs/xfs_alloc.c |  7 ++++++-
>  fs/xfs/libxfs/xfs_bmap.c  | 17 +++++++++++------
>  fs/xfs/libxfs/xfs_bmap.h  |  5 +++++
>  3 files changed, 22 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index effb64c..037cbd8 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -38,6 +38,7 @@
>  #include "xfs_buf_item.h"
>  #include "xfs_log.h"
>  #include "xfs_ag_resv.h"
> +#include "xfs_bmap.h"
>  
>  struct workqueue_struct *xfs_alloc_wq;
>  
> @@ -2058,6 +2059,7 @@ xfs_alloc_space_available(
>  	struct xfs_perag	*pag = args->pag;
>  	xfs_extlen_t		longest;
>  	xfs_extlen_t		reservation; /* blocks that are still reserved */
> +	xfs_extlen_t		indlen = 0;
>  	int			available;
>  
>  	if (flags & XFS_ALLOC_FLAG_FREEING)
> @@ -2071,9 +2073,12 @@ xfs_alloc_space_available(
>  	if ((args->minlen + args->alignment + args->minalignslop - 1) > longest)
>  		return false;
>  
> +	if (xfs_alloc_is_userdata(args->datatype))
> +		indlen = __xfs_bmap_worst_indlen(args->mp, max(args->minlen, args->maxlen));

/me wonders, when is it the case that minlen > maxlen?

I'm also wondering why we can't just increase args->minleft to require
that we leave enough space in whichever AG we pick to expand to bmbt?
AFAICT that's the purpose of the minleft field.

(That said, I'd been working on a patch to do exactly this and it hasn't
solved the problem yet...)

It also would be useful to add a comment as to why we're doing this,
/* Make sure we leave enough space in this AG for a bmbt expansion. */

> +
>  	/* do we have enough free space remaining for the allocation? */
>  	available = (int)(pag->pagf_freeblks + pag->pagf_flcount -
> -			  reservation - min_free - args->total);
> +			  reservation - min_free - indlen - args->total);
>  	if (available < (int)args->minleft || available <= 0)
>  		return false;
>  
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index 266f641..b52a32d 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -183,19 +183,16 @@ xfs_bmbt_update(
>   * Compute the worst-case number of indirect blocks that will be used
>   * for ip's delayed extent of length "len".
>   */
> -STATIC xfs_filblks_t
> -xfs_bmap_worst_indlen(
> -	xfs_inode_t	*ip,		/* incore inode pointer */
> +xfs_filblks_t
> +__xfs_bmap_worst_indlen(
> +	xfs_mount_t	*mp,		/* mount structure */

struct xfs_mount?

>  	xfs_filblks_t	len)		/* delayed extent length */
>  {
>  	int		level;		/* btree level number */
>  	int		maxrecs;	/* maximum record count at this level */
> -	xfs_mount_t	*mp;		/* mount structure */
>  	xfs_filblks_t	rval;		/* return value */
>  	xfs_filblks_t   orig_len;
>  
> -	mp = ip->i_mount;
> -
>  	/* Calculate the worst-case size of the bmbt. */
>  	orig_len = len;
>  	maxrecs = mp->m_bmap_dmxr[0];
> @@ -222,6 +219,14 @@ xfs_bmap_worst_indlen(
>  	return rval;
>  }
>  
> +STATIC xfs_filblks_t
> +xfs_bmap_worst_indlen(
> +	xfs_inode_t	*ip,		/* incore inode pointer */
> +	xfs_filblks_t	len)		/* delayed extent length */
> +{
> +	return __xfs_bmap_worst_indlen(ip->i_mount, len);
> +}
> +
>  /*
>   * Calculate the default attribute fork offset for newly created inodes.
>   */
> diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
> index cecd094..aa2964a 100644
> --- a/fs/xfs/libxfs/xfs_bmap.h
> +++ b/fs/xfs/libxfs/xfs_bmap.h
> @@ -263,4 +263,9 @@ int	xfs_bmap_map_extent(struct xfs_mount *mp, struct xfs_defer_ops *dfops,
>  int	xfs_bmap_unmap_extent(struct xfs_mount *mp, struct xfs_defer_ops *dfops,
>  		struct xfs_inode *ip, struct xfs_bmbt_irec *imap);
>  
> +xfs_filblks_t
> +__xfs_bmap_worst_indlen(
> +	xfs_mount_t	*mp,		/* mount structure */

struct xfs_mount?

--D

> +	xfs_filblks_t	len);		/* delayed extent length */
> +
>  #endif	/* __XFS_BMAP_H__ */
> -- 
> 2.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH, RFC] xfs: take indirect blocks into accounting when selecting an AG
  2016-12-09 17:32 ` Darrick J. Wong
@ 2016-12-09 17:41   ` Christoph Hellwig
  2016-12-09 21:46     ` Dave Chinner
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2016-12-09 17:41 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs, eguan

On Fri, Dec 09, 2016 at 09:32:13AM -0800, Darrick J. Wong wrote:
> > +	if (xfs_alloc_is_userdata(args->datatype))
> > +		indlen = __xfs_bmap_worst_indlen(args->mp, max(args->minlen, args->maxlen));
> 
> /me wonders, when is it the case that minlen > maxlen?

Good question.  I just added that when I noticed minlen alone doesn't
work as we might need the bigger calculation based on maxlen.  I'll
do a quick audit and move to maxlen only.

> 
> I'm also wondering why we can't just increase args->minleft to require
> that we leave enough space in whichever AG we pick to expand to bmbt?
> AFAICT that's the purpose of the minleft field.

Not sure what the original intentions was, but as-is it seems pretty
b0rked.

E.g. xfs_bmap_btalloc, xfs_bmapi_allocate or xfs_alloc_vextentjust set
minleft to 0 when when we are low on space which make it a bit pointless.

Also in the bmap code where we set minleft we don't really know how
much we'll need as we'll only decide on the actual final allocation
size deep down in the allocator.  I'll do a little archeology session
now to figure out how we got the current minleft semantics, as they seem
really weird.

> /* Make sure we leave enough space in this AG for a bmbt expansion. */

Sure.

> > +xfs_filblks_t
> > +__xfs_bmap_worst_indlen(
> > +	xfs_mount_t	*mp,		/* mount structure */
> 
> struct xfs_mount?

Yeah, minimum changes for now.  If we end up going down this route I'd
probably just always pass the mount to xfs_bmap_worst_indlen, but
I wanted the minimal amount of change for now.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH, RFC] xfs: take indirect blocks into accounting when selecting an AG
  2016-12-09 17:41   ` Christoph Hellwig
@ 2016-12-09 21:46     ` Dave Chinner
  2016-12-10 16:22       ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Dave Chinner @ 2016-12-09 21:46 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Darrick J. Wong, linux-xfs, eguan

On Fri, Dec 09, 2016 at 06:41:45PM +0100, Christoph Hellwig wrote:
> On Fri, Dec 09, 2016 at 09:32:13AM -0800, Darrick J. Wong wrote:
> > > +	if (xfs_alloc_is_userdata(args->datatype))
> > > +		indlen = __xfs_bmap_worst_indlen(args->mp, max(args->minlen, args->maxlen));
> > 
> > /me wonders, when is it the case that minlen > maxlen?
> 
> Good question.  I just added that when I noticed minlen alone doesn't
> work as we might need the bigger calculation based on maxlen.  I'll
> do a quick audit and move to maxlen only.
> 
> > 
> > I'm also wondering why we can't just increase args->minleft to require
> > that we leave enough space in whichever AG we pick to expand to bmbt?
> > AFAICT that's the purpose of the minleft field.
> 
> Not sure what the original intentions was, but as-is it seems pretty
> b0rked.
> 
> E.g. xfs_bmap_btalloc, xfs_bmapi_allocate or xfs_alloc_vextentjust set
> minleft to 0 when when we are low on space which make it a bit pointless.

I'm pretty sure these cases were added as a mechanism to prevent
ENOSPC occurrences during delalloc conversion when, in the general
case, the allocation would have succeeded. i.e. "in most cases the
BMBT is not going to change shape, so let's just ignore the blocks
that might be needed for that and hope we don't need them".

> Also in the bmap code where we set minleft we don't really know how
> much we'll need as we'll only decide on the actual final allocation
> size deep down in the allocator.  I'll do a little archeology session
> now to figure out how we got the current minleft semantics, as they seem
> really weird.

I'd just set it according to maxlen. I don't really care if we can't
allocate the last few blocks in an AG for certain types of
allocation - the failure fallbacks should drop back to an allocation
attempt of maxlen = minlen and we can set minleft accordingly
for those. If it still fails, then we really have ENOSPC...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH, RFC] xfs: take indirect blocks into accounting when selecting an AG
  2016-12-09 17:15 [PATCH, RFC] xfs: take indirect blocks into accounting when selecting an AG Christoph Hellwig
  2016-12-09 17:32 ` Darrick J. Wong
@ 2016-12-10  6:16 ` Eryu Guan
  2016-12-10 16:23   ` Christoph Hellwig
  2016-12-10 16:42   ` Christoph Hellwig
  1 sibling, 2 replies; 9+ messages in thread
From: Eryu Guan @ 2016-12-10  6:16 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, darrick.wong

On Fri, Dec 09, 2016 at 06:15:09PM +0100, Christoph Hellwig wrote:
> We need to make sure that the indirect blocks (e.g. bmap btree blocks)
> can be allocated from the same AG [1] when comitting to an AG for a
> file data block allocation.  To do that we calculate the worst possible
> indirect len and subtract that from the free space in the AG.
> 
> I don't really like how this makes the space allocator call back into
> the bmap code (even if only for a trivial helper), but I can't think
> of a better idea.
> 
> [1] strictly speaking the same AG or one with a higher AG number, but
> that is so incredibly hard to express that we settle for the same AG.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

I hit "Internal error xfs_trans_cancel" and force shutdown while running
xfs/109 in a loop with xfs_2k_reflink test config, first hit at 62nd
iteration then hit at 32nd iteration. Detailed log appended at the end.

Thanks,
Eryu

[225344.831364] run fstests xfs/109 at 2016-12-10 13:57:00
[225345.157908] XFS (dm-4): Unmounting Filesystem
[225345.181257] XFS (dm-4): EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!
[225345.190230] XFS (dm-4): EXPERIMENTAL reflink feature enabled. Use at your own risk!
[225345.198400] XFS (dm-4): Mounting V5 Filesystem
[225345.237656] XFS (dm-4): Ending clean mount
[225345.257859] XFS (dm-4): Unmounting Filesystem
[225345.396752] XFS (dm-4): EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!
[225345.405718] XFS (dm-4): EXPERIMENTAL reflink feature enabled. Use at your own risk!
[225345.413849] XFS (dm-4): Mounting V5 Filesystem
[225345.454561] XFS (dm-4): Ending clean mount
[225351.615787] XFS (dm-4): Unmounting Filesystem
[225351.644122] XFS (dm-4): EXPERIMENTAL reverse mapping btree feature enabled. Use at your own risk!
[225351.653083] XFS (dm-4): EXPERIMENTAL reflink feature enabled. Use at your own risk!
[225351.661217] XFS (dm-4): Mounting V5 Filesystem
[225351.834617] XFS (dm-4): Ending clean mount
[225355.113963] XFS (dm-4): Internal error XFS_WANT_CORRUPTED_GOTO at line 3504 of file fs/xfs/libxfs/xfs_btree.c.  Caller xfs_bmap_add_extent_delay_real+0x659/0x2090 [xfs]
[225355.129263] CPU: 0 PID: 14779 Comm: kworker/u49:2 Tainted: G           OE   4.9.0-rc1.xfs+ #7
[225355.137864] Hardware name: IBM System x3550 M3 -[7944OEJ]-/90Y4784     , BIOS -[D6E150CUS-1.11]- 02/08/2011
[225355.147684] Workqueue: writeback wb_workfn (flush-253:4)
[225355.153153]  ffffc90003d874b0 ffffffff81363bec ffff88016f71c2b8 ffff88016f71c2b8
[225355.160704]  ffffc90003d874c8 ffffffffa07d018b ffffffffa0790fd9 ffffc90003d87578
[225355.168257]  ffffffffa079fba0 ffffc90003d874ec ffffc90003d875e4 0000000003d875e4
[225355.175811] Call Trace:
[225355.178352]  [<ffffffff81363bec>] dump_stack+0x63/0x87
[225355.183602]  [<ffffffffa07d018b>] xfs_error_report+0x3b/0x40 [xfs]
[225355.189885]  [<ffffffffa0790fd9>] ? xfs_bmap_add_extent_delay_real+0x659/0x2090 [xfs]
[225355.197818]  [<ffffffffa079fba0>] xfs_btree_insert+0x1b0/0x1c0 [xfs]
[225355.204279]  [<ffffffffa07b701b>] ? xfs_iext_insert+0x7b/0x110 [xfs]
[225355.210736]  [<ffffffffa0790fd9>] xfs_bmap_add_extent_delay_real+0x659/0x2090 [xfs]
[225355.218496]  [<ffffffffa079670b>] xfs_bmapi_write+0x78b/0xba0 [xfs]
[225355.224875]  [<ffffffffa07dc876>] xfs_iomap_write_allocate+0x196/0x3a0 [xfs]
[225355.232030]  [<ffffffffa07c5cbb>] xfs_map_blocks+0x1eb/0x260 [xfs]
[225355.238318]  [<ffffffffa07c6d7c>] xfs_do_writepage+0x1cc/0x6e0 [xfs]
[225355.244755]  [<ffffffff811ad8df>] write_cache_pages+0x26f/0x510
[225355.250761]  [<ffffffff81339daa>] ? blk_queue_bio+0x17a/0x3a0
[225355.256616]  [<ffffffffa07c6bb0>] ? xfs_vm_writepages+0xe0/0xe0 [xfs]
[225355.263168]  [<ffffffffa07c6b86>] xfs_vm_writepages+0xb6/0xe0 [xfs]
[225355.269516]  [<ffffffff811ae8fe>] do_writepages+0x1e/0x30
[225355.275000]  [<ffffffff81260795>] __writeback_single_inode+0x45/0x330
[225355.281523]  [<ffffffff81260fd0>] writeback_sb_inodes+0x280/0x570
[225355.287699]  [<ffffffff8126148f>] wb_writeback+0x10f/0x320
[225355.293266]  [<ffffffff81261e29>] wb_workfn+0x109/0x3f0
[225355.298577]  [<ffffffff810a8be2>] process_one_work+0x152/0x400
[225355.304494]  [<ffffffff810a94d5>] worker_thread+0x125/0x4b0
[225355.310149]  [<ffffffff810a93b0>] ? rescuer_thread+0x380/0x380
[225355.316064]  [<ffffffff810af039>] kthread+0xd9/0xf0
[225355.321026]  [<ffffffff810aef60>] ? kthread_park+0x60/0x60
[225355.326597]  [<ffffffff8170ff95>] ret_from_fork+0x25/0x30
[225355.332179] XFS (dm-4): Internal error xfs_trans_cancel at line 983 of file fs/xfs/xfs_trans.c.  Caller xfs_iomap_write_allocate+0x2ed/0x3a0 [xfs]
[225355.345433] CPU: 0 PID: 14779 Comm: kworker/u49:2 Tainted: G           OE   4.9.0-rc1.xfs+ #7
[225355.354033] Hardware name: IBM System x3550 M3 -[7944OEJ]-/90Y4784     , BIOS -[D6E150CUS-1.11]- 02/08/2011
[225355.363849] Workqueue: writeback wb_workfn (flush-253:4)
[225355.369277]  ffffc90003d877b8 ffffffff81363bec ffff88016f71cae0 0000000000000001
[225355.376830]  ffffc90003d877d0 ffffffffa07d018b ffffffffa07dc9cd ffffc90003d877f8
[225355.384385]  ffffffffa07ee876 ffff88025244c380 ffff8802795d6000 ffffc90003d87878
[225355.391938] Call Trace:
[225355.394475]  [<ffffffff81363bec>] dump_stack+0x63/0x87
[225355.399724]  [<ffffffffa07d018b>] xfs_error_report+0x3b/0x40 [xfs]
[225355.406015]  [<ffffffffa07dc9cd>] ? xfs_iomap_write_allocate+0x2ed/0x3a0 [xfs]
[225355.413345]  [<ffffffffa07ee876>] xfs_trans_cancel+0xb6/0xe0 [xfs]
[225355.419637]  [<ffffffffa07dc9cd>] xfs_iomap_write_allocate+0x2ed/0x3a0 [xfs]
[225355.426792]  [<ffffffffa07c5cbb>] xfs_map_blocks+0x1eb/0x260 [xfs]
[225355.433080]  [<ffffffffa07c6d7c>] xfs_do_writepage+0x1cc/0x6e0 [xfs]
[225355.439515]  [<ffffffff811ad8df>] write_cache_pages+0x26f/0x510
[225355.445518]  [<ffffffff81339daa>] ? blk_queue_bio+0x17a/0x3a0
[225355.451375]  [<ffffffffa07c6bb0>] ? xfs_vm_writepages+0xe0/0xe0 [xfs]
[225355.457927]  [<ffffffffa07c6b86>] xfs_vm_writepages+0xb6/0xe0 [xfs]
[225355.464278]  [<ffffffff811ae8fe>] do_writepages+0x1e/0x30
[225355.469762]  [<ffffffff81260795>] __writeback_single_inode+0x45/0x330
[225355.476286]  [<ffffffff81260fd0>] writeback_sb_inodes+0x280/0x570
[225355.482463]  [<ffffffff8126148f>] wb_writeback+0x10f/0x320
[225355.488030]  [<ffffffff81261e29>] wb_workfn+0x109/0x3f0
[225355.493340]  [<ffffffff810a8be2>] process_one_work+0x152/0x400
[225355.499254]  [<ffffffff810a94d5>] worker_thread+0x125/0x4b0
[225355.504908]  [<ffffffff810a93b0>] ? rescuer_thread+0x380/0x380
[225355.510823]  [<ffffffff810af039>] kthread+0xd9/0xf0
[225355.515784]  [<ffffffff810aef60>] ? kthread_park+0x60/0x60
[225355.521352]  [<ffffffff8170ff95>] ret_from_fork+0x25/0x30
[225355.526901] XFS (dm-4): xfs_do_force_shutdown(0x8) called from line 984 of file fs/xfs/xfs_trans.c.  Return address = 0xffffffffa07ee88f
[225355.540790] XFS (dm-4): Corruption of in-memory data detected.  Shutting down filesystem
[225355.548976] XFS (dm-4): Please umount the filesystem and rectify the problem(s)
[225355.556445] buffer_io_error: 502 callbacks suppressed
[225355.561593] Buffer I/O error on dev dm-4, logical block 71048, lost async page write
[225355.569420] Buffer I/O error on dev dm-4, logical block 71049, lost async page write
[225355.577250] Buffer I/O error on dev dm-4, logical block 71050, lost async page write
[225355.585077] Buffer I/O error on dev dm-4, logical block 71051, lost async page write
[225355.592906] Buffer I/O error on dev dm-4, logical block 71052, lost async page write
[225355.600733] Buffer I/O error on dev dm-4, logical block 71053, lost async page write
[225355.608561] Buffer I/O error on dev dm-4, logical block 71054, lost async page write
[225355.616387] Buffer I/O error on dev dm-4, logical block 71055, lost async page write
[225355.624218] Buffer I/O error on dev dm-4, logical block 71056, lost async page write
[225355.632048] Buffer I/O error on dev dm-4, logical block 71057, lost async page write
[225355.967859] XFS (dm-4): xfs_log_force: error -5 returned.
[225355.991059] XFS (dm-4): Unmounting Filesystem
[225355.995525] XFS (dm-4): xfs_log_force: error -5 returned.
[225356.001435] XFS (dm-4): xfs_log_force: error -5 returned.
[225356.374195] XFS (dm-3): Unmounting Filesystem

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH, RFC] xfs: take indirect blocks into accounting when selecting an AG
  2016-12-09 21:46     ` Dave Chinner
@ 2016-12-10 16:22       ` Christoph Hellwig
  0 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2016-12-10 16:22 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, Darrick J. Wong, linux-xfs, eguan

On Sat, Dec 10, 2016 at 08:46:51AM +1100, Dave Chinner wrote:
> I'm pretty sure these cases were added as a mechanism to prevent
> ENOSPC occurrences during delalloc conversion when, in the general
> case, the allocation would have succeeded. i.e. "in most cases the
> BMBT is not going to change shape, so let's just ignore the blocks
> that might be needed for that and hope we don't need them".

... and it it doesn't work we'll shut down the filesystem due to an
error inside a dirty transaction.  Anyway, the history and git-blame
say that this behavior has been there since day one.

> I'd just set it according to maxlen. I don't really care if we can't
> allocate the last few blocks in an AG for certain types of
> allocation - the failure fallbacks should drop back to an allocation
> attempt of maxlen = minlen and we can set minleft accordingly
> for those. If it still fails, then we really have ENOSPC...

I tried a patch doing that yesterday, but we still quickly hit
the failure in xfs/109. I must have messed up something and need
to review the whole are a bit more.  Here is that patch for reference:
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index effb64c..2722c66 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2634,12 +2634,10 @@ xfs_alloc_vextent(
 	xfs_agblock_t	agsize;	/* allocation group size */
 	int		error;
 	int		flags;	/* XFS_ALLOC_FLAG_... locking flags */
-	xfs_extlen_t	minleft;/* minimum left value, temp copy */
 	xfs_mount_t	*mp;	/* mount structure pointer */
 	xfs_agnumber_t	sagno;	/* starting allocation group number */
 	xfs_alloctype_t	type;	/* input allocation type */
 	int		bump_rotor = 0;
-	int		no_min = 0;
 	xfs_agnumber_t	rotorstep = xfs_rotorstep; /* inode32 agf stepper */
 
 	mp = args->mp;
@@ -2668,7 +2666,6 @@ xfs_alloc_vextent(
 		trace_xfs_alloc_vextent_badargs(args);
 		return 0;
 	}
-	minleft = args->minleft;
 
 	switch (type) {
 	case XFS_ALLOCTYPE_THIS_AG:
@@ -2679,9 +2676,7 @@ xfs_alloc_vextent(
 		 */
 		args->agno = XFS_FSB_TO_AGNO(mp, args->fsbno);
 		args->pag = xfs_perag_get(mp, args->agno);
-		args->minleft = 0;
 		error = xfs_alloc_fix_freelist(args, 0);
-		args->minleft = minleft;
 		if (error) {
 			trace_xfs_alloc_vextent_nofix(args);
 			goto error0;
@@ -2746,9 +2741,7 @@ xfs_alloc_vextent(
 		 */
 		for (;;) {
 			args->pag = xfs_perag_get(mp, args->agno);
-			if (no_min) args->minleft = 0;
 			error = xfs_alloc_fix_freelist(args, flags);
-			args->minleft = minleft;
 			if (error) {
 				trace_xfs_alloc_vextent_nofix(args);
 				goto error0;
@@ -2788,20 +2781,17 @@ xfs_alloc_vextent(
 			 * or switch to non-trylock mode.
 			 */
 			if (args->agno == sagno) {
-				if (no_min == 1) {
+				if (flags == 0) {
 					args->agbno = NULLAGBLOCK;
 					trace_xfs_alloc_vextent_allfailed(args);
 					break;
 				}
-				if (flags == 0) {
-					no_min = 1;
-				} else {
-					flags = 0;
-					if (type == XFS_ALLOCTYPE_START_BNO) {
-						args->agbno = XFS_FSB_TO_AGBNO(mp,
-							args->fsbno);
-						args->type = XFS_ALLOCTYPE_NEAR_BNO;
-					}
+
+				flags = 0;
+				if (type == XFS_ALLOCTYPE_START_BNO) {
+					args->agbno = XFS_FSB_TO_AGBNO(mp,
+						args->fsbno);
+					args->type = XFS_ALLOCTYPE_NEAR_BNO;
 				}
 			}
 			xfs_perag_put(args->pag);
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 6f28814..b6aaa26 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -3758,7 +3758,8 @@ xfs_bmap_btalloc(
 		args.alignment = 1;
 		args.minalignslop = 0;
 	}
-	args.minleft = ap->minleft;
+	ASSERT(args.total);
+	args.minleft = xfs_bmap_worst_indlen(ap->ip, args.total);
 	args.wasdel = ap->wasdel;
 	args.resv = XFS_AG_RESV_NONE;
 	args.datatype = ap->datatype;
@@ -3806,7 +3807,8 @@ xfs_bmap_btalloc(
 		args.fsbno = 0;
 		args.type = XFS_ALLOCTYPE_FIRST_AG;
 		args.total = ap->minlen;
-		args.minleft = 0;
+		ASSERT(args.total);
+		args.minleft = xfs_bmap_worst_indlen(ap->ip, args.total);
 		if ((error = xfs_alloc_vextent(&args)))
 			return error;
 		ap->dfops->dop_low = true;
@@ -4338,8 +4340,6 @@ xfs_bmapi_allocate(
 	if (error)
 		return error;
 
-	if (bma->dfops->dop_low)
-		bma->minleft = 0;
 	if (bma->cur)
 		bma->cur->bc_private.b.firstblock = *bma->firstblock;
 	if (bma->blkno == NULLFSBLOCK)
@@ -4569,15 +4569,6 @@ xfs_bmapi_write(
 
 	XFS_STATS_INC(mp, xs_blk_mapw);
 
-	if (*firstblock == NULLFSBLOCK) {
-		if (XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_BTREE)
-			bma.minleft = be16_to_cpu(ifp->if_broot->bb_level) + 1;
-		else
-			bma.minleft = 1;
-	} else {
-		bma.minleft = 0;
-	}
-
 	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
 		error = xfs_iread_extents(tp, ip, whichfork);
 		if (error)
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index cecd094..92e6755 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -49,7 +49,6 @@ struct xfs_bmalloca {
 
 	xfs_extlen_t		total;	/* total blocks needed for xaction */
 	xfs_extlen_t		minlen;	/* minimum allocation size (blocks) */
-	xfs_extlen_t		minleft; /* amount must be left after alloc */
 	bool			eof;	/* set if allocating past last extent */
 	bool			wasdel;	/* replacing a delayed allocation */
 	bool			aeof;	/* allocated space at eof */
diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c
index 8007d2b..c33eddb 100644
--- a/fs/xfs/libxfs/xfs_bmap_btree.c
+++ b/fs/xfs/libxfs/xfs_bmap_btree.c
@@ -455,18 +455,6 @@ xfs_bmbt_alloc_block(
 		args.fsbno = be64_to_cpu(start->l);
 try_another_ag:
 		args.type = XFS_ALLOCTYPE_START_BNO;
-		/*
-		 * Make sure there is sufficient room left in the AG to
-		 * complete a full tree split for an extent insert.  If
-		 * we are converting the middle part of an extent then
-		 * we may need space for two tree splits.
-		 *
-		 * We are relying on the caller to make the correct block
-		 * reservation for this operation to succeed.  If the
-		 * reservation amount is insufficient then we may fail a
-		 * block allocation here and corrupt the filesystem.
-		 */
-		args.minleft = args.tp->t_blk_res;
 	} else if (cur->bc_private.b.dfops->dop_low) {
 		args.type = XFS_ALLOCTYPE_START_BNO;
 	} else {
@@ -499,20 +487,6 @@ try_another_ag:
 		goto try_another_ag;
 	}
 
-	if (args.fsbno == NULLFSBLOCK && args.minleft) {
-		/*
-		 * Could not find an AG with enough free space to satisfy
-		 * a full btree split.  Try again without minleft and if
-		 * successful activate the lowspace algorithm.
-		 */
-		args.fsbno = 0;
-		args.type = XFS_ALLOCTYPE_FIRST_AG;
-		args.minleft = 0;
-		error = xfs_alloc_vextent(&args);
-		if (error)
-			goto error0;
-		cur->bc_private.b.dfops->dop_low = true;
-	}
 	if (args.fsbno == NULLFSBLOCK) {
 		XFS_BTREE_TRACE_CURSOR(cur, XBT_EXIT);
 		*stat = 0;

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH, RFC] xfs: take indirect blocks into accounting when selecting an AG
  2016-12-10  6:16 ` Eryu Guan
@ 2016-12-10 16:23   ` Christoph Hellwig
  2016-12-10 16:42   ` Christoph Hellwig
  1 sibling, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2016-12-10 16:23 UTC (permalink / raw)
  To: Eryu Guan; +Cc: Christoph Hellwig, linux-xfs, darrick.wong

On Sat, Dec 10, 2016 at 02:16:41PM +0800, Eryu Guan wrote:
> I hit "Internal error xfs_trans_cancel" and force shutdown while running
> xfs/109 in a loop with xfs_2k_reflink test config, first hit at 62nd
> iteration then hit at 32nd iteration. Detailed log appended at the end.

That's the bmap btree insert, so we're still having a problem figuring
out the right indlen somehow.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH, RFC] xfs: take indirect blocks into accounting when selecting an AG
  2016-12-10  6:16 ` Eryu Guan
  2016-12-10 16:23   ` Christoph Hellwig
@ 2016-12-10 16:42   ` Christoph Hellwig
  2016-12-11  5:19     ` Eryu Guan
  1 sibling, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2016-12-10 16:42 UTC (permalink / raw)
  To: Eryu Guan; +Cc: Christoph Hellwig, linux-xfs, darrick.wong

Btw, is this on 4.9-rc or for-next?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH, RFC] xfs: take indirect blocks into accounting when selecting an AG
  2016-12-10 16:42   ` Christoph Hellwig
@ 2016-12-11  5:19     ` Eryu Guan
  0 siblings, 0 replies; 9+ messages in thread
From: Eryu Guan @ 2016-12-11  5:19 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, darrick.wong

On Sat, Dec 10, 2016 at 05:42:36PM +0100, Christoph Hellwig wrote:
> Btw, is this on 4.9-rc or for-next?

It's for-next. Sorry, I should've included this info.

Thanks,
Eryu

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-12-11  5:19 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-09 17:15 [PATCH, RFC] xfs: take indirect blocks into accounting when selecting an AG Christoph Hellwig
2016-12-09 17:32 ` Darrick J. Wong
2016-12-09 17:41   ` Christoph Hellwig
2016-12-09 21:46     ` Dave Chinner
2016-12-10 16:22       ` Christoph Hellwig
2016-12-10  6:16 ` Eryu Guan
2016-12-10 16:23   ` Christoph Hellwig
2016-12-10 16:42   ` Christoph Hellwig
2016-12-11  5:19     ` Eryu Guan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).