Re: [RESEND PATCH v2] xfs: fix the entry condition of exact EOF block allocation optimization

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Jinliang Zheng <alexjlzheng@gmail.com>
Cc: alexjlzheng@tencent.com, cem@kernel.org, chandanbabu@kernel.org,
	dchinner@redhat.com, djwong@kernel.org, hch@infradead.org,
	linux-kernel@vger.kernel.org, linux-xfs@vger.kernel.org
Subject: Re: [RESEND PATCH v2] xfs: fix the entry condition of exact EOF block allocation optimization
Date: Fri, 6 Dec 2024 10:48:48 +1100	[thread overview]
Message-ID: <Z1I74KeyZRv2pBBT@dread.disaster.area> (raw)
In-Reply-To: <20241205121802.1232223-1-alexjlzheng@tencent.com>

On Thu, Dec 05, 2024 at 08:18:02PM +0800, Jinliang Zheng wrote:
> On Wed, 4 Dec 2024 07:40:20 +1100, Dave Chinner wrote:
> > On Sat, Nov 31, 2024 at 07:11:32PM +0800, Jinliang Zheng wrote:
> > > When we call create(), lseek() and write() sequentially, offset != 0
> > > cannot be used as a judgment condition for whether the file already
> > > has extents.
> > > 
> > > Furthermore, when xfs_bmap_adjacent() has not given a better blkno,
> > > it is not necessary to use exact EOF block allocation.
> > > 
> > > Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com>
> > > ---
> > > Changelog:
> > > - V2: Fix the entry condition
> > > - V1: https://lore.kernel.org/linux-xfs/ZyFJm7xg7Msd6eVr@dread.disaster.area/T/#t
> > > ---
> > >  fs/xfs/libxfs/xfs_bmap.c | 12 +++++++-----
> > >  1 file changed, 7 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> > > index 36dd08d13293..c1e5372b6b2e 100644
> > > --- a/fs/xfs/libxfs/xfs_bmap.c
> > > +++ b/fs/xfs/libxfs/xfs_bmap.c
> > > @@ -3531,12 +3531,14 @@ xfs_bmap_btalloc_at_eof(
> > >  	int			error;
> > >  
> > >  	/*
> > > -	 * If there are already extents in the file, try an exact EOF block
> > > -	 * allocation to extend the file as a contiguous extent. If that fails,
> > > -	 * or it's the first allocation in a file, just try for a stripe aligned
> > > -	 * allocation.
> > > +	 * If there are already extents in the file, and xfs_bmap_adjacent() has
> > > +	 * given a better blkno, try an exact EOF block allocation to extend the
> > > +	 * file as a contiguous extent. If that fails, or it's the first
> > > +	 * allocation in a file, just try for a stripe aligned allocation.
> > >  	 */
> > > -	if (ap->offset) {
> > > +	if (ap->prev.br_startoff != NULLFILEOFF &&
> > > +	     !isnullstartblock(ap->prev.br_startblock) &&
> > > +	     xfs_bmap_adjacent_valid(ap, ap->blkno, ap->prev.br_startblock)) {
> > 
> > There's no need for calling xfs_bmap_adjacent_valid() here -
> > we know that ap->blkno is valid because the
> > bounds checking has already been done by xfs_bmap_adjacent().
> 
> I'm sorry that I didn't express it clearly, what I meant here is: if we want
> to extend the file as a contiguous extent, then ap->blkno must be a better
> choice given by xfs_bmap_adjacent() than other default values.

Yes, but xfs_bmap_adjacent_valid() does not tell us that.

> /*
>  * If allocating at eof, and there's a previous real block,
>  * try to use its last block as our starting point.
>  */
> if (ap->eof && ap->prev.br_startoff != NULLFILEOFF &&
>     !isnullstartblock(ap->prev.br_startblock) &&
>     xfs_bmap_adjacent_valid(ap,
> 		ap->prev.br_startblock + ap->prev.br_blockcount,
> 		ap->prev.br_startblock)) {
> 	ap->blkno = ap->prev.br_startblock + ap->prev.br_blockcount; <--- better A

For people reading along: This sets the allocation target to the
end of the previous physical extent.

> 	/*
> 	 * Adjust for the gap between prevp and us.
> 	 */
> 	adjust = ap->offset -
> 		(ap->prev.br_startoff + ap->prev.br_blockcount);
> 	if (adjust && xfs_bmap_adjacent_valid(ap, ap->blkno + adjust,
> 			ap->prev.br_startblock))
> 		ap->blkno += adjust;                                 <--- better B

And this adjusts for the file offset of the new EOF allocation
being a distance beyond the previous extent. i.e.

file offset:	0	EOF	    ap->offset
layout:		+--prev--+-----hole-----+--new EOF allocation--+

After allocation:
file offset:	0	oEOF	      offset		      EOF
layout:		+--prev--+-----hole-----+--new EOF allocation--+
physical:	+--used--+-----free-----+-------used-----------+

And now when the write to fill the file offset hole (e.g. because of
racing concurrent extending AIO+DIO writes being issued out of
order), we end up with this non-EOF NEAR allocation being set up
over the hole in the file:

file offset:	0      ap->offset      			      EOF
layout:		+--prev--+-----hole-----+--------next----------+
                       ap->blkno

And the NEAR allocation will find the exact free space we left to
fill that hole, resulting in a file that looks like this:

file offset:	0					      EOF
layout:		+----------------------------------------------+
physical:	+----------------------------------------------+

i.e. a single contiguous extent.

> 	return true;

And it's important to note that xfs_bmap_adjacent returns true if
it selects a new target for exact allocation.

> }
> 
> Only when we reach 'better A' or 'better B' of xfs_bmap_adjacent() above, it
> is worth trying to use xfs_alloc_vextent_EXACT_bno(). Otherwise, NEAR is
> more suitable than EXACT.

Well, yes, that is exactly what the code was -trying- to do.
It was using ap->offset as a proxy for "there is a previous extent"
rather than an explicit check for "do we need exact allocation"

As you've rightly pointed out - this code is not correct in all
situations, nor optimal for all situations.

What I've been trying to point out to you is that your solution is
not optimal, either. 

> Therefore, we need xfs_bmap_adjacent() to determine whether xfs_bmap_adjacent()
> has indeed modified ap->blkno.

It already does, but we ignore it. If we want use exact allocation
only when we are doing EOF allocation:

Perhaps:

-	xfs_bmap_adjacent(ap);
+	if (!xfs_bmap_adjacent(ap))
+		ap->eof = false;

And then in xfs_bmap_btalloc_at_eof() all we need is

-	if (ap->offset) {
+	if (ap->eof) {

i.e. we only do exact allocation at EOF when xfs_bmap_adjacent has
set a target we want exact allocation for.

(note: don't confuse ap->eof and ap->aeof)

> > Actually, for another patch, the bounds checking in
> > xfs_bmap_adjacent_valid() is incorrect. What happens if the last AG
> > is a runt? i.e. it open codes xfs_verify_fsbno() and gets it wrong.
> 
> For general scenarios, I agree.

This *is* a general scenario. Every single extending allocation goes
through this path.

> But here, the parameters x and y of xfs_bmap_adjacent_valid() are both derived
> from ap->prev. Is it possible that it exceeds mp->m_sb.sb_agcount or
> mp->m_sb.sb_agblocks?

I think you missed the significance of the gap (file offset)
adjustment.

Write a couple of TB beyond EOF and see what happens. Then allocate
a file in the last AG that is a runt, and try to write a distance
beyond EOF that will land the target blkno between the size of
the runt AG and mp->m_sb.sb_agcount....

Hint: runt AG length < AGBNO(ap->blkno) < mp->m_sb.sb_agcount.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

     prev parent reply	other threads:[~2024-12-05 23:48 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-30 11:11 [RESEND PATCH v2] xfs: fix the entry condition of exact EOF block allocation optimization Jinliang Zheng
2024-12-03 20:40 ` Dave Chinner
2024-12-05 12:18   ` Jinliang Zheng
2024-12-05 23:48     ` Dave Chinner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z1I74KeyZRv2pBBT@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=alexjlzheng@gmail.com \
    --cc=alexjlzheng@tencent.com \
    --cc=cem@kernel.org \
    --cc=chandanbabu@kernel.org \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.