From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2296DC433E0 for ; Fri, 29 Jan 2021 16:34:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DFDE164E02 for ; Fri, 29 Jan 2021 16:34:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231244AbhA2Qdr (ORCPT ); Fri, 29 Jan 2021 11:33:47 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:46439 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231448AbhA2Qca (ORCPT ); Fri, 29 Jan 2021 11:32:30 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1611937863; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=LhChUmlvZ9YK0rbfOVLOPbJAHLVEpxVcXRUp8EyWyq0=; b=HqFXnjjf2EXkrkuT0K1URKp2R6gaVpv18anX0oDYLKed12aOS+BP79AqEoncssuxKexhaf Lj/YXUEXhvjgSA1qygxxIzQXze3cBq6QKgq6aLwk54Cc+IstWvncEIWxoAs7GYskn0ajBV sI/3m9knTnjHG68YI+SFDVx5xDAE77E= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-171-14d5IYWcPWmihV5Gobh77Q-1; Fri, 29 Jan 2021 11:30:59 -0500 X-MC-Unique: 14d5IYWcPWmihV5Gobh77Q-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 656638066ED; Fri, 29 Jan 2021 16:30:58 +0000 (UTC) Received: from bfoster (ovpn-114-23.rdu2.redhat.com [10.10.114.23]) by smtp.corp.redhat.com (Postfix) with ESMTPS id CA1F919725; Fri, 29 Jan 2021 16:30:57 +0000 (UTC) Date: Fri, 29 Jan 2021 11:30:56 -0500 From: Brian Foster To: "Darrick J. Wong" Cc: linux-xfs@vger.kernel.org, hch@infradead.org, david@fromorbit.com Subject: Re: [PATCH 1/2] xfs: increase the default parallelism levels of pwork clients Message-ID: <20210129163056.GI2665284@bfoster> References: <161181380539.1525344.442839530784024643.stgit@magnolia> <161181381108.1525344.3612916862426784356.stgit@magnolia> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <161181381108.1525344.3612916862426784356.stgit@magnolia> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org On Wed, Jan 27, 2021 at 10:03:31PM -0800, Darrick J. Wong wrote: > From: Darrick J. Wong > > Increase the parallelism level for pwork clients to the workqueue > defaults so that we can take advantage of computers with a lot of CPUs > and a lot of hardware. On fast systems this will speed up quotacheck by > a large factor, and the following posteof/cowblocks cleanup series will > use the functionality presented in this patch to run garbage collection > as quickly as possible. > > We do this by switching the pwork workqueue to unbounded, since the > current user (quotacheck) runs lengthy scans for each work item and we > don't care about dispatching the work on a warm cpu cache or anything > like that. Also set WQ_SYSFS so that we can monitor where the wq is > running. > > Signed-off-by: Darrick J. Wong > --- I feel like this should be two patches, but otherwise: Reviewed-by: Brian Foster > Documentation/admin-guide/xfs.rst | 38 +++++++++++++++++++++++++++++++++++++ > fs/xfs/xfs_iwalk.c | 5 +---- > fs/xfs/xfs_pwork.c | 25 +++++------------------- > fs/xfs/xfs_pwork.h | 4 +--- > 4 files changed, 45 insertions(+), 27 deletions(-) > > > diff --git a/Documentation/admin-guide/xfs.rst b/Documentation/admin-guide/xfs.rst > index 86de8a1ad91c..b00b1eece9de 100644 > --- a/Documentation/admin-guide/xfs.rst > +++ b/Documentation/admin-guide/xfs.rst > @@ -495,3 +495,41 @@ the class and error context. For example, the default values for > "metadata/ENODEV" are "0" rather than "-1" so that this error handler defaults > to "fail immediately" behaviour. This is done because ENODEV is a fatal, > unrecoverable error no matter how many times the metadata IO is retried. > + > +Workqueue Concurrency > +===================== > + > +XFS uses kernel workqueues to parallelize metadata update processes. This > +enables it to take advantage of storage hardware that can service many IO > +operations simultaneously. This interface exposes internal implementation > +details of XFS, and as such is explicitly not part of any userspace API/ABI > +guarantee the kernel may give userspace. These are undocumented features of > +the generic workqueue implementation XFS uses for concurrency, and they are > +provided here purely for diagnostic and tuning purposes and may change at any > +time in the future. > + > +The control knobs for a filesystem's workqueues are organized by task at hand > +and the short name of the data device. They all can be found in: > + > + /sys/bus/workqueue/devices/${task}!${device} > + > +================ =========== > + Task Description > +================ =========== > + xfs_iwalk-$pid Inode scans of the entire filesystem. Currently limited to > + mount time quotacheck. > +================ =========== > + > +For example, the knobs for the quotacheck workqueue for /dev/nvme0n1 would be > +found in /sys/bus/workqueue/devices/xfs_iwalk-1111!nvme0n1/. > + > +The interesting knobs for XFS workqueues are as follows: > + > +============ =========== > + Knob Description > +============ =========== > + max_active Maximum number of background threads that can be started to > + run the work. > + cpumask CPUs upon which the threads are allowed to run. > + nice Relative priority of scheduling the threads. These are the > + same nice levels that can be applied to userspace processes. > diff --git a/fs/xfs/xfs_iwalk.c b/fs/xfs/xfs_iwalk.c > index eae3aff9bc97..c4a340f1f1e1 100644 > --- a/fs/xfs/xfs_iwalk.c > +++ b/fs/xfs/xfs_iwalk.c > @@ -618,15 +618,12 @@ xfs_iwalk_threaded( > { > struct xfs_pwork_ctl pctl; > xfs_agnumber_t agno = XFS_INO_TO_AGNO(mp, startino); > - unsigned int nr_threads; > int error; > > ASSERT(agno < mp->m_sb.sb_agcount); > ASSERT(!(flags & ~XFS_IWALK_FLAGS_ALL)); > > - nr_threads = xfs_pwork_guess_datadev_parallelism(mp); > - error = xfs_pwork_init(mp, &pctl, xfs_iwalk_ag_work, "xfs_iwalk", > - nr_threads); > + error = xfs_pwork_init(mp, &pctl, xfs_iwalk_ag_work, "xfs_iwalk"); > if (error) > return error; > > diff --git a/fs/xfs/xfs_pwork.c b/fs/xfs/xfs_pwork.c > index b03333f1c84a..c283b801cc5d 100644 > --- a/fs/xfs/xfs_pwork.c > +++ b/fs/xfs/xfs_pwork.c > @@ -61,16 +61,18 @@ xfs_pwork_init( > struct xfs_mount *mp, > struct xfs_pwork_ctl *pctl, > xfs_pwork_work_fn work_fn, > - const char *tag, > - unsigned int nr_threads) > + const char *tag) > { > + unsigned int nr_threads = 0; > + > #ifdef DEBUG > if (xfs_globals.pwork_threads >= 0) > nr_threads = xfs_globals.pwork_threads; > #endif > trace_xfs_pwork_init(mp, nr_threads, current->pid); > > - pctl->wq = alloc_workqueue("%s-%d", WQ_FREEZABLE, nr_threads, tag, > + pctl->wq = alloc_workqueue("%s-%d", > + WQ_UNBOUND | WQ_SYSFS | WQ_FREEZABLE, nr_threads, tag, > current->pid); > if (!pctl->wq) > return -ENOMEM; > @@ -117,20 +119,3 @@ xfs_pwork_poll( > atomic_read(&pctl->nr_work) == 0, HZ) == 0) > touch_softlockup_watchdog(); > } > - > -/* > - * Return the amount of parallelism that the data device can handle, or 0 for > - * no limit. > - */ > -unsigned int > -xfs_pwork_guess_datadev_parallelism( > - struct xfs_mount *mp) > -{ > - struct xfs_buftarg *btp = mp->m_ddev_targp; > - > - /* > - * For now we'll go with the most conservative setting possible, > - * which is two threads for an SSD and 1 thread everywhere else. > - */ > - return blk_queue_nonrot(btp->bt_bdev->bd_disk->queue) ? 2 : 1; > -} > diff --git a/fs/xfs/xfs_pwork.h b/fs/xfs/xfs_pwork.h > index 8133124cf3bb..c0ef81fc85dd 100644 > --- a/fs/xfs/xfs_pwork.h > +++ b/fs/xfs/xfs_pwork.h > @@ -51,11 +51,9 @@ xfs_pwork_want_abort( > } > > int xfs_pwork_init(struct xfs_mount *mp, struct xfs_pwork_ctl *pctl, > - xfs_pwork_work_fn work_fn, const char *tag, > - unsigned int nr_threads); > + xfs_pwork_work_fn work_fn, const char *tag); > void xfs_pwork_queue(struct xfs_pwork_ctl *pctl, struct xfs_pwork *pwork); > int xfs_pwork_destroy(struct xfs_pwork_ctl *pctl); > void xfs_pwork_poll(struct xfs_pwork_ctl *pctl); > -unsigned int xfs_pwork_guess_datadev_parallelism(struct xfs_mount *mp); > > #endif /* __XFS_PWORK_H__ */ >