All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH v2] xfs: extent size hints can round up extents past MAXEXTLEN
Date: Fri, 17 Apr 2015 10:03:08 +1000	[thread overview]
Message-ID: <20150417000308.GD15810@dastard> (raw)
In-Reply-To: <20150416222829.GE21261@dastard>

On Fri, Apr 17, 2015 at 08:28:29AM +1000, Dave Chinner wrote:
> On Thu, Apr 16, 2015 at 01:32:38PM -0400, Brian Foster wrote:
> > On Thu, Apr 16, 2015 at 03:00:50PM +1000, Dave Chinner wrote:
> > > + * Calculate the maximum extent length we can ask to allocate after taking into
> > > + * account the on-disk size limitations, the extent size hints and the size
> > > + * being requested. We have to deal with the extent size hint here because the
> > > + * allocation will attempt alignment and hence grow the length outwards by up to
> > > + * @extsz on either side.
> > > + */
> > > +static inline xfs_extlen_t
> > > +xfs_bmapi_max_extlen(
> > > +	struct xfs_inode	*ip,
> > > +	xfs_extlen_t		length)
> > > +{
> > > +	xfs_extlen_t		extsz = xfs_get_extsz_hint(ip);
> > > +	xfs_extlen_t		max_length = MAXEXTLEN;
> > > +
> > > +	if (extsz)
> > > +		max_length -= 2 * extsz - 1;
> > 
> > This can underflow or cause other issues if set to just the right value
> > (with smaller block sizes such that length can be trimmed to 0):
> 
> But I assumed the existing code was correct for this context. My
> bad. :/
> 
> > $ mkfs.xfs -f -bsize=1k <dev>
> > $ mount <dev> /mnt
> > $ xfs_io -f -c "extsize 1g" -c "pwrite 0 4k" -c fsync /mnt/file
> > pwrite64: No space left on device
> 
> Yup, because it 2^21 = 2G, and extsize = 1g puts max_length < 0.
> 
> Ok. So, the problem is that it is overestimating the amount of space
> that alignment will need, and that alignment cannot be guaranteed
> for extsz hints of over (MAXEXTLEN / 2) in size.
> 
> i.e. given an alignment (A[0-2]) and an extent (E[01]):
> 
>    A0                  A1                  A2
>    +-------------------+-------------------+
> 		     +ooo+
> 		     E0  E1
> 
> The problem is that the alignment done by xfs_bmap_extsize_align()
> only extends outwards (i.e. increases extent size). Hence E0 gets
> rounded down to A0-A2, and E1 gets extended to A2, which means we
> are adding almost 2 entire extent size hints to the allocation.
> That's where the reduction in length by two extsz values came from.
> 
> Now, for delayed allocation, this is just fine, because real
> allocation will break this delalloc extent up into two separate
> extents, and underflow wouldn't be noticed as delalloc extents are
> not physically limited to MAXEXTLEN and so nothing would have
> broken. Still, it's not the intended behaviour.
> 
> I'm not sure what the solution is yet - the fundamental problem here
> is the outwards alignment of both ends of the extent, and this
> MAXEXTLEN twiddling is just an encoding of that behaviour. I need to
> spend some time looking at xfs_bmap_extsize_align() and determining
> if there is something we can do differently here.

Ok, so the callers of xfs_bmap_extsize_align() are:

	xfs_bmapi_reserve_delalloc()
	xfs_bmap_btalloc()
	xfs_bmap_rtalloc().

For xfs_bmapi_reserve_delalloc(), the alignment does not need grow
outwards; it can be truncated mid-range, and the code should still
work. i.e.

   A0                  A1                  A2
   +-------------------+-------------------+
		     +ooo+
		     E0  E1
   +-------------------+
   R0                  R1

R[01] is a valid alignment and will result in a second allocation
occurring for this:

   A0                  A1                  A2
   +-------------------+-------------------+
		       +o+
		      E2  E1
                       +-------------------+
                       R1                  R2

And so the range we need allocation for (E[01]) will be allocated
and correctly extent size aligned.

For xfs_bmap_btalloc() - the problem case here - the code is a
little more complex. We do:

xfs_bmapi_write
  loop until all allocated {
    xfs_bmapi_allocate(bma)
      calc off/len
        xfs_bmap_btalloc(bma)
	  xfs_bmap_extsize_align(bma)
	  xfs_alloc_vextent
	  update bma->length
      BMBT insert
    trim returned map
  }

So we are doing alignment two steps removed from the off/len
calculation (xfs_bmap_rtalloc() is in the same boat). Hence the
question is whether xfs_bmap_extsize_align() can trim the range
being allocated and still have everything work....

Ok, upon further reading, the xfs_bmalloc structure (bma) that is
passed between these functions to track the allocation being done is
updated after allocation with the length of the extent allocated.
IOWs:

	bma->length = E(len)
	xfs_bmap_btalloc(bma)
	  A(len) = xfs_bmap_extsize_align(bma->length)
	  R(len) = xfs_alloc_vextent(A(len))
	  bma->length = R(len)

Hence this:

   A0                  A1                  A2
   +-------------------+-------------------+
		     +ooo+
		     E0  E1
   +-------------------+
   R0                  R1

Is a valid result from xfs_bmap_btalloc() and the loop in
xfs_bmapi_write() will do a second allocation and alignment as per
the above delalloc case. xfs_bmap_rtalloc() appears to mirror this
same structure, so should also have the same behaviour.

What this means is that we can actually reduce the requested
allocation to be only a partial overlap when aligning it, and
everything should still work. Let's now see how complex that makes
the code...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2015-04-17  0:09 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-16  5:00 [PATCH v2] xfs: extent size hints can round up extents past MAXEXTLEN Dave Chinner
2015-04-16 17:32 ` Brian Foster
2015-04-16 22:28   ` Dave Chinner
2015-04-17  0:03     ` Dave Chinner [this message]
2015-04-17 13:01       ` Brian Foster
2015-04-18 23:17         ` Dave Chinner
2015-04-19 13:33           ` Brian Foster
2015-04-19 23:59             ` Dave Chinner
2015-04-17 12:58     ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150417000308.GD15810@dastard \
    --to=david@fromorbit.com \
    --cc=bfoster@redhat.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.