From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 4/7] repair: parallelise phase 6
Date: Tue, 27 Oct 2020 16:10:44 +1100 [thread overview]
Message-ID: <20201027051044.GX7391@dread.disaster.area> (raw)
In-Reply-To: <20201022061100.GP9832@magnolia>
On Wed, Oct 21, 2020 at 11:11:00PM -0700, Darrick J. Wong wrote:
> On Thu, Oct 22, 2020 at 04:15:34PM +1100, Dave Chinner wrote:
> > +static void
> > +traverse_function(
> > + struct workqueue *wq,
> > + xfs_agnumber_t agno,
> > + void *arg)
> > +{
> > + struct ino_tree_node *irec;
> > prefetch_args_t *pf_args = arg;
> > + struct workqueue lwq;
> > + struct xfs_mount *mp = wq->wq_ctx;
> > +
> >
> > wait_for_inode_prefetch(pf_args);
> >
> > if (verbose)
> > do_log(_(" - agno = %d\n"), agno);
> >
> > + /*
> > + * The more AGs we have in flight at once, the fewer processing threads
> > + * per AG. This means we don't overwhelm the machine with hundreds of
> > + * threads when we start acting on lots of AGs at once. We just want
> > + * enough that we can keep multiple CPUs busy across multiple AGs.
> > + */
> > + workqueue_create_bound(&lwq, mp, ag_stride, 1000);
>
> Eeeeee, magic number! :)
>
> /me tosses in obligatory hand-wringing about 2000 CPU systems running
> out of work. How about ag_stride * 50 or something? :P
ag_stride already determines concurrency via how many AGs are being
scanned at once. However, it provides no insight into the depth of
the queue we need to use per AG.
What this magic number does is bound how deep the work queue gets
before we ask another worker thread to start also processing the
queue. We've already got async threads doing inode prefetch, so the
bound here throttles the rate at which inodes are
prefetched into the buffer cache. In general, we're going to be IO
bound and waiting on readahead rather than throttling on processing
the inodes, so all this bound is doing is preventing readahead from
running too far ahead of processing and potentially causing cache
thrashing.
However, we don't want to go using lots of threads to parallelise
the work within the AG when we have already parallelised across AGs.
We want the initial worker thread per AG to just keep working away
burning CPU while the prefetch code is blocking waiting for more
inodes from disk. Then we get another burst of work being queued,
and so on.
Hence the queue needs to be quite deep so that we can soak up the
bursts of processing that readahead triggers without asking lots of
worker threads to do work. However, if the worker thread hits some
big directories and starts falling behind readahead, that's when it
will hit the maximum queue depth and kick another thread to do work.
IOWs, the queue depth needs to be deep enough to prevent bursts from
triggering extra workers from running, but shallow enough that extra
workers will be scheduled when processing falls behind readahead.
I really don't have a good way to automatically calculate this
depth. I just figure that if we have a 1000 inodes queued up for
processing, we really should kick another thread to start working on
them. It's a simple solution, so I'd like to see if we have problems
with this simple threshold before we try to replace the magic number
with a magic heuristic....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2020-10-27 5:10 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-22 5:15 [PATCH 0/7] repair: Phase 6 performance improvements Dave Chinner
2020-10-22 5:15 ` [PATCH 1/7] workqueue: bound maximum queue depth Dave Chinner
2020-10-22 5:54 ` Darrick J. Wong
2020-10-22 8:11 ` Dave Chinner
2020-10-25 4:41 ` Darrick J. Wong
2020-10-26 22:29 ` Dave Chinner
2020-10-26 22:40 ` Darrick J. Wong
2020-10-26 22:57 ` Dave Chinner
2020-10-22 5:15 ` [PATCH 2/7] repair: Protect bad inode list with mutex Dave Chinner
2020-10-22 5:45 ` Darrick J. Wong
2020-10-29 9:35 ` Christoph Hellwig
2020-10-22 5:15 ` [PATCH 3/7] repair: protect inode chunk tree records with a mutex Dave Chinner
2020-10-22 6:02 ` Darrick J. Wong
2020-10-22 8:15 ` Dave Chinner
2020-10-29 16:45 ` Darrick J. Wong
2020-10-22 5:15 ` [PATCH 4/7] repair: parallelise phase 6 Dave Chinner
2020-10-22 6:11 ` Darrick J. Wong
2020-10-27 5:10 ` Dave Chinner [this message]
2020-10-29 17:20 ` Darrick J. Wong
2020-10-22 5:15 ` [PATCH 5/7] repair: don't duplicate names in " Dave Chinner
2020-10-22 6:21 ` Darrick J. Wong
2020-10-22 8:23 ` Dave Chinner
2020-10-22 15:53 ` Darrick J. Wong
2020-10-29 9:39 ` Christoph Hellwig
2020-10-22 5:15 ` [PATCH 6/7] repair: convert the dir byaddr hash to a radix tree Dave Chinner
2020-10-29 16:41 ` Darrick J. Wong
2020-10-22 5:15 ` [PATCH 7/7] repair: scale duplicate name checking in phase 6 Dave Chinner
2020-10-29 16:29 ` Darrick J. Wong
-- strict thread matches above, loose matches on Subject: below --
2021-03-19 1:33 [PATCH 0/7] repair: Phase 6 performance improvements Dave Chinner
2021-03-19 1:33 ` [PATCH 4/7] repair: parallelise phase 6 Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201027051044.GX7391@dread.disaster.area \
--to=david@fromorbit.com \
--cc=darrick.wong@oracle.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox