From: Brian Foster <bfoster@redhat.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 14/15] xfs: multithreaded iwalk implementation
Date: Tue, 2 Jul 2019 12:24:54 -0400 [thread overview]
Message-ID: <20190702162454.GF2866@bfoster> (raw)
In-Reply-To: <20190702155138.GR1404256@magnolia>
On Tue, Jul 02, 2019 at 08:51:38AM -0700, Darrick J. Wong wrote:
> On Tue, Jul 02, 2019 at 10:33:20AM -0400, Brian Foster wrote:
> > On Wed, Jun 26, 2019 at 01:45:25PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <darrick.wong@oracle.com>
> > >
> > > Create a parallel iwalk implementation and switch quotacheck to use it.
> > >
> > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > > ---
> > > fs/xfs/Makefile | 1
> > > fs/xfs/xfs_globals.c | 3 +
> > > fs/xfs/xfs_iwalk.c | 82 +++++++++++++++++++++++++++++++++
> > > fs/xfs/xfs_iwalk.h | 2 +
> > > fs/xfs/xfs_pwork.c | 126 ++++++++++++++++++++++++++++++++++++++++++++++++++
> > > fs/xfs/xfs_pwork.h | 58 +++++++++++++++++++++++
> > > fs/xfs/xfs_qm.c | 2 -
> > > fs/xfs/xfs_sysctl.h | 6 ++
> > > fs/xfs/xfs_sysfs.c | 40 ++++++++++++++++
> > > fs/xfs/xfs_trace.h | 18 +++++++
> > > 10 files changed, 337 insertions(+), 1 deletion(-)
> > > create mode 100644 fs/xfs/xfs_pwork.c
> > > create mode 100644 fs/xfs/xfs_pwork.h
> > >
> > >
> > > diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> > > index 74d30ef0dbce..48940a27d4aa 100644
> > > --- a/fs/xfs/Makefile
> > > +++ b/fs/xfs/Makefile
> > > @@ -84,6 +84,7 @@ xfs-y += xfs_aops.o \
> > > xfs_message.o \
> > > xfs_mount.o \
> > > xfs_mru_cache.o \
> > > + xfs_pwork.o \
> > > xfs_reflink.o \
> > > xfs_stats.o \
> > > xfs_super.o \
> > > diff --git a/fs/xfs/xfs_globals.c b/fs/xfs/xfs_globals.c
> > > index d0d377384120..a44b564871b5 100644
> > > --- a/fs/xfs/xfs_globals.c
> > > +++ b/fs/xfs/xfs_globals.c
> > > @@ -31,6 +31,9 @@ xfs_param_t xfs_params = {
> > > .fstrm_timer = { 1, 30*100, 3600*100},
> > > .eofb_timer = { 1, 300, 3600*24},
> > > .cowb_timer = { 1, 1800, 3600*24},
> > > +#ifdef DEBUG
> > > + .pwork_threads = { -1, -1, NR_CPUS },
> >
> > So I noticed that /sys/fs/xfs/debug/pwork_threads was still 0 by default
> > with this patch, then realized that we've only initialized it in
> > xfs_params and not xfs_globals. The sysfs handlers added by this patch
> > use the latter whereas the former looks related to /proc. It looks like
> > this should be fixed up to init the global field to -1 and probably drop
> > the xfs_params bits since we don't expose the value via /proc.
>
> Oh, you're right. I'll drop the xfs_params bits.
>
> >
> > > +#endif
> > > };
> > >
> > > struct xfs_globals xfs_globals = {
> > ...
> > > diff --git a/fs/xfs/xfs_pwork.c b/fs/xfs/xfs_pwork.c
> > > new file mode 100644
> > > index 000000000000..779596ed9432
> > > --- /dev/null
> > > +++ b/fs/xfs/xfs_pwork.c
> > > @@ -0,0 +1,126 @@
> > ...
> > > +/*
> > > + * Return the amount of parallelism that the data device can handle, or 0 for
> > > + * no limit.
> > > + */
> > > +unsigned int
> > > +xfs_pwork_guess_datadev_parallelism(
> > > + struct xfs_mount *mp)
> > > +{
> > > + struct xfs_buftarg *btp = mp->m_ddev_targp;
> > > + int iomin;
> > > + int ioopt;
> > > +
> > > + if (blk_queue_nonrot(btp->bt_bdev->bd_queue))
> > > + return num_online_cpus();
> > > +
> > > + if (mp->m_sb.sb_width && mp->m_sb.sb_unit)
> > > + return mp->m_sb.sb_width / mp->m_sb.sb_unit;
> > > +
> > > + iomin = bdev_io_min(btp->bt_bdev);
> > > + ioopt = bdev_io_opt(btp->bt_bdev);
> > > + if (iomin && ioopt)
> > > + return ioopt / iomin;
> > > +
> > > + return 1;
> > > +}
> >
> > I may have lost track of where we left off with this but IIRC we still
> > needed some numbers with regard to multi-device storage where
> > sb_width/sb_unit doesn't necessarily describe parallelism (i.e.
> > RAID5/6). I'm not sure what your goal is for this patch vs. the rest of
> > the series, but I'd be fine with merging this with just the
> > non-rotational bit for now if we add some kind of conservative maximum
> > (4? 8?) to the heuristic. It seems kind of risky to me to parallelize
> > 100s of AGs just because we might have that many CPUs/AGs on a
> > particular storage setup (we may also generate warning messages if a
> > system has a CPU count that exceeds the workqueue limit).
>
> Hm, how about the most conservative threading that I can think of?
>
> unsigned int
> xfs_pwork_guess_datadev_parallelism(
> struct xfs_mount *mp)
> {
> return blk_queue_nonrot(...) ? 2 : 1;
> }
>
> Since that did seem to yield speedups for everything except the dumb USB
> mass storage case. Then I can work on building a case for more threads
> (and figuring out the maximums) for 5.4...
>
Ack, that sounds good to me. Then we can get this code in sooner and
worry about opening up the parallelization incrementally based on
more focused testing/analysis.
Brian
> --D
>
> >
> > ...
> > > diff --git a/fs/xfs/xfs_sysfs.c b/fs/xfs/xfs_sysfs.c
> > > index cabda13f3c64..a146a2e61be2 100644
> > > --- a/fs/xfs/xfs_sysfs.c
> > > +++ b/fs/xfs/xfs_sysfs.c
> > > @@ -206,11 +206,51 @@ always_cow_show(
> > > }
> > > XFS_SYSFS_ATTR_RW(always_cow);
> > >
> > > +#ifdef DEBUG
> > > +/*
> > > + * Override how many threads the parallel work queue is allowed to create.
> > > + * This has to be a debug-only global (instead of an errortag) because one of
> > > + * the main users of parallel workqueues is mount time quotacheck.
> > > + */
> > > +STATIC ssize_t
> > > +pwork_threads_store(
> > > + struct kobject *kobject,
> > > + const char *buf,
> > > + size_t count)
> > > +{
> > > + int ret;
> > > + int val;
> > > +
> > > + ret = kstrtoint(buf, 0, &val);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + if (val < 0 || val > num_possible_cpus())
> > > + return -EINVAL;
> > > +
> >
> > We need to allow assignment of -1 now too.
>
> <nod>
>
> --D
>
> > Brian
> >
> > > + xfs_globals.pwork_threads = val;
> > > +
> > > + return count;
> > > +}
> > > +
> > > +STATIC ssize_t
> > > +pwork_threads_show(
> > > + struct kobject *kobject,
> > > + char *buf)
> > > +{
> > > + return snprintf(buf, PAGE_SIZE, "%d\n", xfs_globals.pwork_threads);
> > > +}
> > > +XFS_SYSFS_ATTR_RW(pwork_threads);
> > > +#endif /* DEBUG */
> > > +
> > > static struct attribute *xfs_dbg_attrs[] = {
> > > ATTR_LIST(bug_on_assert),
> > > ATTR_LIST(log_recovery_delay),
> > > ATTR_LIST(mount_delay),
> > > ATTR_LIST(always_cow),
> > > +#ifdef DEBUG
> > > + ATTR_LIST(pwork_threads),
> > > +#endif
> > > NULL,
> > > };
> > >
> > > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > > index f9bb1d50bc0e..658cbade1998 100644
> > > --- a/fs/xfs/xfs_trace.h
> > > +++ b/fs/xfs/xfs_trace.h
> > > @@ -3556,6 +3556,24 @@ TRACE_EVENT(xfs_iwalk_ag_rec,
> > > __entry->startino, __entry->freemask)
> > > )
> > >
> > > +TRACE_EVENT(xfs_pwork_init,
> > > + TP_PROTO(struct xfs_mount *mp, unsigned int nr_threads, pid_t pid),
> > > + TP_ARGS(mp, nr_threads, pid),
> > > + TP_STRUCT__entry(
> > > + __field(dev_t, dev)
> > > + __field(unsigned int, nr_threads)
> > > + __field(pid_t, pid)
> > > + ),
> > > + TP_fast_assign(
> > > + __entry->dev = mp->m_super->s_dev;
> > > + __entry->nr_threads = nr_threads;
> > > + __entry->pid = pid;
> > > + ),
> > > + TP_printk("dev %d:%d nr_threads %u pid %u",
> > > + MAJOR(__entry->dev), MINOR(__entry->dev),
> > > + __entry->nr_threads, __entry->pid)
> > > +)
> > > +
> > > #endif /* _TRACE_XFS_H */
> > >
> > > #undef TRACE_INCLUDE_PATH
> > >
next prev parent reply other threads:[~2019-07-02 16:24 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-26 20:43 [PATCH v6 00/15] xfs: refactor and improve inode iteration Darrick J. Wong
2019-06-26 20:44 ` [PATCH 01/15] xfs: create iterator error codes Darrick J. Wong
2019-06-26 20:44 ` [PATCH 02/15] xfs: create simplified inode walk function Darrick J. Wong
2019-07-02 14:23 ` Brian Foster
2019-06-26 20:44 ` [PATCH 03/15] xfs: convert quotacheck to use the new iwalk functions Darrick J. Wong
2019-06-26 20:44 ` [PATCH 04/15] xfs: bulkstat should copy lastip whenever userspace supplies one Darrick J. Wong
2019-06-26 20:44 ` [PATCH 05/15] xfs: remove unnecessary includes of xfs_itable.h Darrick J. Wong
2019-06-26 20:44 ` [PATCH 06/15] xfs: convert bulkstat to new iwalk infrastructure Darrick J. Wong
2019-06-26 20:44 ` [PATCH 07/15] xfs: calculate inode walk prefetch more carefully Darrick J. Wong
2019-07-02 14:24 ` Brian Foster
2019-07-02 14:49 ` Darrick J. Wong
2019-06-26 20:44 ` [PATCH 08/15] xfs: move bulkstat ichunk helpers to iwalk code Darrick J. Wong
2019-06-26 20:44 ` [PATCH 09/15] xfs: change xfs_iwalk_grab_ichunk to use startino, not lastino Darrick J. Wong
2019-06-26 20:44 ` [PATCH 10/15] xfs: clean up long conditionals in xfs_iwalk_ichunk_ra Darrick J. Wong
2019-06-26 20:45 ` [PATCH 11/15] xfs: refactor xfs_iwalk_grab_ichunk Darrick J. Wong
2019-06-26 20:45 ` [PATCH 12/15] xfs: refactor iwalk code to handle walking inobt records Darrick J. Wong
2019-06-26 20:45 ` [PATCH 13/15] xfs: refactor INUMBERS to use iwalk functions Darrick J. Wong
2019-06-26 20:45 ` [PATCH 14/15] xfs: multithreaded iwalk implementation Darrick J. Wong
2019-07-02 14:33 ` Brian Foster
2019-07-02 15:51 ` Darrick J. Wong
2019-07-02 16:24 ` Brian Foster [this message]
2019-07-02 16:53 ` [PATCH v2 " Darrick J. Wong
2019-07-02 17:40 ` Brian Foster
2019-06-26 20:45 ` [PATCH 15/15] xfs: poll waiting for quotacheck Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190702162454.GF2866@bfoster \
--to=bfoster@redhat.com \
--cc=darrick.wong@oracle.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).