Linux NFS development
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Jan Kara <jack@suse.cz>,
	Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: linux-nfs@vger.kernel.org, Neil Brown <neilb@suse.de>
Subject: Re: [PATCH 3/3] nfs: Block on write congestion
Date: Thu, 13 Jun 2024 16:04:12 -0400	[thread overview]
Message-ID: <fe224e6691204c200f9dced6aa56380cb4ddb3e6.camel@kernel.org> (raw)
In-Reply-To: <20240613082821.849-3-jack@suse.cz>

On Thu, 2024-06-13 at 10:28 +0200, Jan Kara wrote:
> Commit 6df25e58532b ("nfs: remove reliance on bdi congestion")
> introduced NFS-private solution for limiting number of writes
> outstanding against a particular server. Unlike previous bdi congestion
> this algorithm actually works and limits number of outstanding writeback
> pages to nfs_congestion_kb which scales with amount of client's memory
> and is capped at 256 MB. As a result some workloads such as random
> buffered writes over NFS got slower (from ~170 MB/s to ~126 MB/s). The
> fio command to reproduce is:
> 

That 256M cap was set back in 2007. I wonder if we ought to reconsider
that. It might be worth experimenting these days with a higher cap on
larger machines. Might that improve throughput even more?

> fio --direct=0 --ioengine=sync --thread --invalidate=1 --group_reporting=1
>   --runtime=300 --fallocate=posix --ramp_time=10 --new_group --rw=randwrite
>   --size=64256m --numjobs=4 --bs=4k --fsync_on_close=1 --end_fsync=1
> 
> This happens because the client sends ~256 MB worth of dirty pages to
> the server and any further background writeback request is ignored until
> the number of writeback pages gets below the threshold of 192 MB. By the
> time this happens and clients decides to trigger another round of
> writeback, the server often has no pages to write and the disk is idle.
> 
> To fix this problem and make the client react faster to eased congestion
> of the server by blocking waiting for congestion to resolve instead of
> aborting writeback. This improves the random 4k buffered write
> throughput to 184 MB/s.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/nfs/client.c           |  1 +
>  fs/nfs/write.c            | 12 ++++++++----
>  include/linux/nfs_fs_sb.h |  1 +
>  3 files changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> index 3b252dceebf5..8286edd6062d 100644
> --- a/fs/nfs/client.c
> +++ b/fs/nfs/client.c
> @@ -994,6 +994,7 @@ struct nfs_server *nfs_alloc_server(void)
>  
>  	server->change_attr_type = NFS4_CHANGE_TYPE_IS_UNDEFINED;
>  
> +	init_waitqueue_head(&server->write_congestion_wait);
>  	atomic_long_set(&server->writeback, 0);
>  
>  	ida_init(&server->openowner_id);
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index c6255d7edd3c..21a5f1e90859 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -423,8 +423,10 @@ static void nfs_folio_end_writeback(struct folio *folio)
>  
>  	folio_end_writeback(folio);
>  	if (atomic_long_dec_return(&nfss->writeback) <
> -	    NFS_CONGESTION_OFF_THRESH)
> +	    NFS_CONGESTION_OFF_THRESH) {
>  		nfss->write_congested = 0;
> +		wake_up_all(&nfss->write_congestion_wait);
> +	}
>  }
>  
>  static void nfs_page_end_writeback(struct nfs_page *req)
> @@ -698,12 +700,14 @@ int nfs_writepages(struct address_space *mapping, struct writeback_control *wbc)
>  	struct nfs_pageio_descriptor pgio;
>  	struct nfs_io_completion *ioc = NULL;
>  	unsigned int mntflags = NFS_SERVER(inode)->flags;
> +	struct nfs_server *nfss = NFS_SERVER(inode);
>  	int priority = 0;
>  	int err;
>  
> -	if (wbc->sync_mode == WB_SYNC_NONE &&
> -	    NFS_SERVER(inode)->write_congested)
> -		return 0;
> +	/* Wait with writeback until write congestion eases */
> +	if (wbc->sync_mode == WB_SYNC_NONE && nfss->write_congested)
> +		wait_event(nfss->write_congestion_wait,
> +			   nfss->write_congested == 0);
>  

It seems odd to block here, but if that helps throughput then so be it.

>  	nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGES);
>  
> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> index 92de074e63b9..38a128796a76 100644
> --- a/include/linux/nfs_fs_sb.h
> +++ b/include/linux/nfs_fs_sb.h
> @@ -140,6 +140,7 @@ struct nfs_server {
>  	struct rpc_clnt *	client_acl;	/* ACL RPC client handle */
>  	struct nlm_host		*nlm_host;	/* NLM client handle */
>  	struct nfs_iostats __percpu *io_stats;	/* I/O statistics */
> +	wait_queue_head_t	write_congestion_wait;	/* wait until write congestion eases */
>  	atomic_long_t		writeback;	/* number of writeback pages */
>  	unsigned int		write_congested;/* flag set when writeback gets too high */
>  	unsigned int		flags;		/* various flags */

Reviewed-by: Jeff Layton <jlayton@kernel.org>

  parent reply	other threads:[~2024-06-13 20:04 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-13  8:28 [PATCH 0/3] nfs: Improve throughput for random buffered writes Jan Kara
2024-06-13  8:28 ` [PATCH 1/3] nfs: Drop pointless check from nfs_commit_release_pages() Jan Kara
2024-06-13  9:17   ` Sagi Grimberg
2024-06-13 19:55   ` Jeff Layton
2024-06-13  8:28 ` [PATCH 2/3] nfs: Properly initialize server->writeback Jan Kara
2024-06-13  9:17   ` Sagi Grimberg
2024-06-13 19:56   ` Jeff Layton
2024-06-14 11:33     ` Jan Kara
2024-06-13  8:28 ` [PATCH 3/3] nfs: Block on write congestion Jan Kara
2024-06-13  9:18   ` Sagi Grimberg
2024-06-13 20:04   ` Jeff Layton [this message]
2024-06-14 11:44     ` Jan Kara
2024-06-13 22:56   ` NeilBrown
2024-06-14 11:57     ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fe224e6691204c200f9dced6aa56380cb4ddb3e6.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=jack@suse.cz \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=trond.myklebust@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox