From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91331C2BA19 for ; Wed, 15 Apr 2020 11:05:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6DD9C20737 for ; Wed, 15 Apr 2020 11:05:47 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="MgvJQXmp" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2896713AbgDOLFm (ORCPT ); Wed, 15 Apr 2020 07:05:42 -0400 Received: from us-smtp-1.mimecast.com ([205.139.110.61]:52208 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2896715AbgDOLFO (ORCPT ); Wed, 15 Apr 2020 07:05:14 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1586948709; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=hvFAYLafm/ryaS+FMXc167MaWsY1MvAzp7mjyFLWf8E=; b=MgvJQXmpNi7UR+ddKn+zRuWhO6Ml+gvZXwS65N3Bb/yj9LJ2hmY9xsR3yKsmdfonde8Uop EEFGyjBOmu7VCvSTT1WhwCsR1tyq69nlV3q713q/0/wJVtEfPAYW/4E5MIBUuASGt3n0RE 7S+av64rzPn+40qTDPcCUURkmOFc+BE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-2-cd4msewYNu26LjaVY-HEnA-1; Wed, 15 Apr 2020 07:05:07 -0400 X-MC-Unique: cd4msewYNu26LjaVY-HEnA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C5C771005513; Wed, 15 Apr 2020 11:05:06 +0000 (UTC) Received: from bfoster (dhcp-41-2.bos.redhat.com [10.18.41.2]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 642AE19C70; Wed, 15 Apr 2020 11:05:06 +0000 (UTC) Date: Wed, 15 Apr 2020 07:05:04 -0400 From: Brian Foster To: "Darrick J. Wong" Cc: linux-xfs@vger.kernel.org, Dave Chinner Subject: Re: [PATCH v2] xfs: move inode flush to the sync workqueue Message-ID: <20200415110504.GA2140@bfoster> References: <20200415041529.GL6742@magnolia> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200415041529.GL6742@magnolia> X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org On Tue, Apr 14, 2020 at 09:15:29PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong > > Move the inode dirty data flushing to a workqueue so that multiple > threads can take advantage of a single thread's flushing work. The > ratelimiting technique used in bdd4ee4 was not successful, because > threads that skipped the inode flush scan due to ratelimiting would > ENOSPC early, which caused occasional (but noticeable) changes in > behavior and sporadic fstest regressions. > > Therfore, make all the writer threads wait on a single inode flush, Therefore > which eliminates both the stampeding hoards of flushers and the small hordes ? :) > window in which a write could fail with ENOSPC because it lost the > ratelimit race after even another thread freed space. > > Fixes: bdd4ee4f8407 ("xfs: ratelimit inode flush on buffered write ENOSPC") > Signed-off-by: Darrick J. Wong > --- > v2: run it on the sync workqueue > --- Reviewed-by: Brian Foster > fs/xfs/xfs_mount.h | 6 +++++- > fs/xfs/xfs_super.c | 40 ++++++++++++++++++++++------------------ > 2 files changed, 27 insertions(+), 19 deletions(-) > > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h > index 50c43422fa17..b2e4598fdf7d 100644 > --- a/fs/xfs/xfs_mount.h > +++ b/fs/xfs/xfs_mount.h > @@ -167,8 +167,12 @@ typedef struct xfs_mount { > struct xfs_kobj m_error_meta_kobj; > struct xfs_error_cfg m_error_cfg[XFS_ERR_CLASS_MAX][XFS_ERR_ERRNO_MAX]; > struct xstats m_stats; /* per-fs stats */ > - struct ratelimit_state m_flush_inodes_ratelimit; > > + /* > + * Workqueue item so that we can coalesce multiple inode flush attempts > + * into a single flush. > + */ > + struct work_struct m_flush_inodes_work; > struct workqueue_struct *m_buf_workqueue; > struct workqueue_struct *m_unwritten_workqueue; > struct workqueue_struct *m_cil_workqueue; > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > index abf06bf9c3f3..424bb9a2d532 100644 > --- a/fs/xfs/xfs_super.c > +++ b/fs/xfs/xfs_super.c > @@ -516,6 +516,20 @@ xfs_destroy_mount_workqueues( > destroy_workqueue(mp->m_buf_workqueue); > } > > +static void > +xfs_flush_inodes_worker( > + struct work_struct *work) > +{ > + struct xfs_mount *mp = container_of(work, struct xfs_mount, > + m_flush_inodes_work); > + struct super_block *sb = mp->m_super; > + > + if (down_read_trylock(&sb->s_umount)) { > + sync_inodes_sb(sb); > + up_read(&sb->s_umount); > + } > +} > + > /* > * Flush all dirty data to disk. Must not be called while holding an XFS_ILOCK > * or a page lock. We use sync_inodes_sb() here to ensure we block while waiting > @@ -526,15 +540,15 @@ void > xfs_flush_inodes( > struct xfs_mount *mp) > { > - struct super_block *sb = mp->m_super; > - > - if (!__ratelimit(&mp->m_flush_inodes_ratelimit)) > + /* > + * If flush_work() returns true then that means we waited for a flush > + * which was already in progress. Don't bother running another scan. > + */ > + if (flush_work(&mp->m_flush_inodes_work)) > return; > > - if (down_read_trylock(&sb->s_umount)) { > - sync_inodes_sb(sb); > - up_read(&sb->s_umount); > - } > + queue_work(mp->m_sync_workqueue, &mp->m_flush_inodes_work); > + flush_work(&mp->m_flush_inodes_work); > } > > /* Catch misguided souls that try to use this interface on XFS */ > @@ -1369,17 +1383,6 @@ xfs_fc_fill_super( > if (error) > goto out_free_names; > > - /* > - * Cap the number of invocations of xfs_flush_inodes to 16 for every > - * quarter of a second. The magic numbers here were determined by > - * observation neither to cause stalls in writeback when there are a > - * lot of IO threads and the fs is near ENOSPC, nor cause any fstest > - * regressions. YMMV. > - */ > - ratelimit_state_init(&mp->m_flush_inodes_ratelimit, HZ / 4, 16); > - ratelimit_set_flags(&mp->m_flush_inodes_ratelimit, > - RATELIMIT_MSG_ON_RELEASE); > - > error = xfs_init_mount_workqueues(mp); > if (error) > goto out_close_devices; > @@ -1752,6 +1755,7 @@ static int xfs_init_fs_context( > spin_lock_init(&mp->m_perag_lock); > mutex_init(&mp->m_growlock); > atomic_set(&mp->m_active_trans, 0); > + INIT_WORK(&mp->m_flush_inodes_work, xfs_flush_inodes_worker); > INIT_DELAYED_WORK(&mp->m_reclaim_work, xfs_reclaim_worker); > INIT_DELAYED_WORK(&mp->m_eofblocks_work, xfs_eofblocks_worker); > INIT_DELAYED_WORK(&mp->m_cowblocks_work, xfs_cowblocks_worker); >