Re: xfs_alloc_ag_vextent_near() takes about 30ms to complete

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Mao Cheng <chengmao2010@gmail.com>, linux-xfs@vger.kernel.org
Subject: Re: xfs_alloc_ag_vextent_near() takes about 30ms to complete
Date: Fri, 26 Oct 2018 09:03:35 -0400	[thread overview]
Message-ID: <20181026130334.GA16468@bfoster> (raw)
In-Reply-To: <20181026010344.GD19305@dastard>

On Fri, Oct 26, 2018 at 12:03:44PM +1100, Dave Chinner wrote:
> On Thu, Oct 25, 2018 at 09:21:30AM -0400, Brian Foster wrote:
> > On Thu, Oct 25, 2018 at 09:35:23AM +1100, Dave Chinner wrote:
> > > On Wed, Oct 24, 2018 at 08:09:27AM -0400, Brian Foster wrote:
> > > > I'm wondering
> > > > if we could implement a smarter location based search using information
> > > > available in the by-size tree. For example, suppose we could identify
> > > > the closest minimally sized extents to agbno in order to better seed the
> > > > left/right starting points of the location based search. This of course
> > > > would require careful heuristics/tradeoffs to make sure we don't just
> > > > replace a bnobt scan with a cntbt scan.
> > > 
> > > I wouldn't bother. I'd just take the "last block" algorithm and make
> > > it search all the >= contiguous free space extents for best locality
> > > before dropping back to the minlen search.
> > > 
> > 
> > Ok, that makes sense. The caveat seems to be though that the "last
> > block" algorithm searches all of the applicable records to discover the
> > best locality. We could open up this search as such, but if free space
> > happens to be completely fragmented to >= requested extents, that could
> > mean every allocation falls into a full cntbt scan where a bnobt lookup
> > would result in a much faster allocation.
> 
> Yup, we'll need to bound it so it doesn't do stupid things. :P
> 

Yep.

> > So ISTM that we still need some kind of intelligent heuristic to fall
> > back to the second algorithm to cover the "too many" case. What exactly
> > that is may take some more thought, experimentation and testing.
> 
> Yeah, that's the difficulty with making core allocator algorithm
> changes - how to characterise the effect of the change. I'm not sure
> that's a huge problem in this case, though, because selecting a
> matching contig freespace is almost always going to be better for
> filesystem longetivty and freespace fragmentation resistance than
> slecting a shorter freespace and doing lots more small allocations.
> it's the 'lots of small allocations' that really makes the freespace
> framgmentation spiral out of control, so if we can avoid that until
> we've used all the matching contig free spaces we'll be better off
> in the long run.
> 

Ok, so I ran fs_mark against the metadump with your patch and a quick
hack to unconditionally scan the cntbt if maxlen extents are available
(up to mxr[0] records similar to your patch, to avoid excessive scans).
The xfs_alloc_find_best_extent() patch alone didn't have much of a
noticeable effect, but that is an isolated optimization and I'm only
doing coarse measurements atm that probably hide it in the noise.

The write workload improves quite a bit with the addition of the cntbt
change. Both throughput (via iostat 60s intervals) and fs_mark files/sec
change from a slow high/low sweeping behavior to much more consistent
and faster results. I see a sweep between 3-30 MB/s and ~30-250 f/sec
change to a much more consistent 27-39MB/s and ~200-300 f/s.

A 5 minute tracepoint sample consists of 100% xfs_alloc_near_first
events which means we never fell back to the bnobt based search. I'm not
sure the mxr thing is the right approach necessarily, I just wanted
something quick that would demonstrate the potential upside gains
without going off the rails. One related concern I have with restricting
the locality of the search too much, for example, is that we use
NEAR_BNO allocs for other things like inode allocation locality that
might not be represented in this simple write only workload.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

next prev parent reply	other threads:[~2018-10-26 21:40 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-23  7:56 xfs_alloc_ag_vextent_near() takes about 30ms to complete Mao Cheng
2018-10-23 14:53 ` Brian Foster
2018-10-24  3:01   ` Mao Cheng
2018-10-24  4:34     ` Dave Chinner
2018-10-24  9:02       ` Mao Cheng
2018-10-24 12:11         ` Brian Foster
2018-10-25  4:01           ` Mao Cheng
2018-10-25 14:55             ` Brian Foster
2018-10-24 12:09       ` Brian Foster
2018-10-24 22:35         ` Dave Chinner
2018-10-25 13:21           ` Brian Foster
2018-10-26  1:03             ` Dave Chinner
2018-10-26 13:03               ` Brian Foster [this message]
2018-10-27  3:16                 ` Dave Chinner
2018-10-28 14:09                   ` Brian Foster
2018-10-29  0:17                     ` Dave Chinner
2018-10-29  9:53                       ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181026130334.GA16468@bfoster \
    --to=bfoster@redhat.com \
    --cc=chengmao2010@gmail.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.