From: Dave Chinner <david@fromorbit.com>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH] xfs: speed up directory bestfree block scanning
Date: Fri, 5 Jan 2018 08:52:56 +1100 [thread overview]
Message-ID: <20180104215256.GD32627@dastard> (raw)
In-Reply-To: <20180104140452.GA23499@bfoster.bfoster>
On Thu, Jan 04, 2018 at 09:04:53AM -0500, Brian Foster wrote:
> On Thu, Jan 04, 2018 at 08:00:15AM +1100, Dave Chinner wrote:
> > On Wed, Jan 03, 2018 at 08:41:37AM -0500, Brian Foster wrote:
> > > On Wed, Jan 03, 2018 at 10:59:10PM +1100, Dave Chinner wrote:
> > > > In writing this, I think I can see a quick and simple change that
> > > > will fix this case and improve most other directory grow workloads
> > > > without affecting normal random directory insert/remove performance.
> > > > That is, do a reverse order search starting at the last block rather
> > > > than increasing order search starting at the first block.....
> > > >
> > > > Ok, now were are talking - performance and scalability improvements!
> > > >
> > > > create time(sec) / rate (files/s)
> > > > File count vanilla loop-fix +reverse
> > > > 10k 0.54 / 18.5k 0.53 / 18.9k 0.52 / 19.3k
> > > > 20k 1.10 / 18.1k 1.05 / 19.0k 1.00 / 20.0k
> > > > 100k 4.21 / 23.8k 3.91 / 25.6k 3.58 / 27.9k
> > > > 200k 9.66 / 20,7k 7.37 / 27.1k 7.08 / 28.3k
> > > > 1M 86.61 / 11.5k 48.26 / 20.7k 38.33 / 26.1k
> > > > 2M 206.13 / 9.7k 129.71 / 15.4k 82.20 / 24.3k
> > > > 10M 2843.57 / 3.5k 1817.39 / 5.5k 591.78 / 16.9k
> > > >
> > > > Theres still some non-linearity as we approach the 10M number, but
> > > > it's still 5x faster to 10M inodes than the existing code....
> > > >
> > >
> > > Nice improvement.. I still need to look at the code, but a quick first
> > > thought is that I wonder if there's somewhere we could stash a 'most
> > > recent freeblock' once we have to grow the directory, even if just as an
> > > in-core hint. Then we could jump straight to the latest block regardless
> > > of the workload.
.....
> > After sleeping on it, I suspect that there's a simple on-disk mod to
> > the dir3 header that will improve the search function for all
> > workloads. The dir3 free block header:
> >
> > struct xfs_dir3_free_hdr {
> > struct xfs_dir3_blk_hdr hdr;
> > __be32 firstdb; /* db of first entry */
> > __be32 nvalid; /* count of valid entries */
> > __be32 nused; /* count of used entries */
> > __be32 pad; /* 64 bit alignment */
> > };
> >
> > has 32 bits of padding in it, and the most entries a free block can
> > have is just under 2^15. Hence we can turn that into a "bestfree"
> > entry that tracks the largest freespace indexed by the block.
> >
> > Then the free block scan can start by checking the required length
> > against the largest freespace in the bestfree entry and skip the
> > block search altogether if there isn't an indexed block with enough
> > free space inside the freespace index block we are searching.
>
> So with this we'd still need to walk the range of free blocks, right?
> I.e., we'd just be able to shortcut individual free block processing
> (bests[] iteration) for those that obviously don't satisfy the request.
Right. In the case of the 5M entry directory I was looking at, there
were a total of 33 free index blocks. All the overhead was in
scanning the 2016 entries in each block, not in iterating over the
blocks. So if we can skip the entry scanning in a given block, then
we have a general search improvement. i.e. it makes the free block
index structure more like a skip list than a linear array.
Also, if we keep the index into the free index block of the largest
free space we have in the header, we can jump straight to it without
needing to scan. Then when we modify the free index during the
insert, we can do the "best free" scan at that point, similar to how
we keep other bestfree indexes up to date.
> Assuming I follow that correctly, that sounds reasonable from the
> perspective that it certainly eliminates work. I suppose how worthwhile
> it is probably depends on how much of an effect it has on the higher
> level directory workload. IOW, is performance measurably improved by
> skipping individual free block processing or is that cost mostly
> amortized by having to read/walk all free blocks in the first place?
According to the profile, the CPU cost of the read/walk of the free
index blocks was within the noise floor. It didn't stand out as a
major contributor, but I'd need to re-measure the overhead on a
random insert/delete workload. However, my gut feel is that if we
combined reverse order searching (for sequential insert) with block
skipping (for random insert) we'll get get substantial improvements
across all insert operations on large directories...
> It may very well be a worthwhile optimization. I think the larger point
> is that more isolated testing is probably required to confirm. The
> improvement from this patch alone doesn't necessarily translate to an
> answer one way or another because this patch creates conditions that
> allow us to skip most of the scan altogether (by jumping right to the
> most recently added block).
Yeah, lots of validation work, but given the overhead for a
non-trivial search distance I was measuring (>40% of CPU time @10M
inodes) I think it's worth persuing.
CHeers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2018-01-04 21:53 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-03 6:27 [PATCH] xfs: speed up directory bestfree block scanning Dave Chinner
2018-01-03 11:59 ` Dave Chinner
2018-01-03 13:41 ` Brian Foster
2018-01-03 21:00 ` Dave Chinner
2018-01-04 14:04 ` Brian Foster
2018-01-04 21:52 ` Dave Chinner [this message]
2018-01-03 13:28 ` Brian Foster
2018-01-03 20:38 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180104215256.GD32627@dastard \
--to=david@fromorbit.com \
--cc=bfoster@redhat.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).