From: Jeff Layton <jlayton@kernel.org>
To: Jan Kara <jack@suse.cz>,
Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: linux-nfs@vger.kernel.org, Neil Brown <neilb@suse.de>
Subject: Re: [PATCH 3/3] nfs: Block on write congestion
Date: Thu, 13 Jun 2024 16:04:12 -0400 [thread overview]
Message-ID: <fe224e6691204c200f9dced6aa56380cb4ddb3e6.camel@kernel.org> (raw)
In-Reply-To: <20240613082821.849-3-jack@suse.cz>
On Thu, 2024-06-13 at 10:28 +0200, Jan Kara wrote:
> Commit 6df25e58532b ("nfs: remove reliance on bdi congestion")
> introduced NFS-private solution for limiting number of writes
> outstanding against a particular server. Unlike previous bdi congestion
> this algorithm actually works and limits number of outstanding writeback
> pages to nfs_congestion_kb which scales with amount of client's memory
> and is capped at 256 MB. As a result some workloads such as random
> buffered writes over NFS got slower (from ~170 MB/s to ~126 MB/s). The
> fio command to reproduce is:
>
That 256M cap was set back in 2007. I wonder if we ought to reconsider
that. It might be worth experimenting these days with a higher cap on
larger machines. Might that improve throughput even more?
> fio --direct=0 --ioengine=sync --thread --invalidate=1 --group_reporting=1
> --runtime=300 --fallocate=posix --ramp_time=10 --new_group --rw=randwrite
> --size=64256m --numjobs=4 --bs=4k --fsync_on_close=1 --end_fsync=1
>
> This happens because the client sends ~256 MB worth of dirty pages to
> the server and any further background writeback request is ignored until
> the number of writeback pages gets below the threshold of 192 MB. By the
> time this happens and clients decides to trigger another round of
> writeback, the server often has no pages to write and the disk is idle.
>
> To fix this problem and make the client react faster to eased congestion
> of the server by blocking waiting for congestion to resolve instead of
> aborting writeback. This improves the random 4k buffered write
> throughput to 184 MB/s.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> fs/nfs/client.c | 1 +
> fs/nfs/write.c | 12 ++++++++----
> include/linux/nfs_fs_sb.h | 1 +
> 3 files changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> index 3b252dceebf5..8286edd6062d 100644
> --- a/fs/nfs/client.c
> +++ b/fs/nfs/client.c
> @@ -994,6 +994,7 @@ struct nfs_server *nfs_alloc_server(void)
>
> server->change_attr_type = NFS4_CHANGE_TYPE_IS_UNDEFINED;
>
> + init_waitqueue_head(&server->write_congestion_wait);
> atomic_long_set(&server->writeback, 0);
>
> ida_init(&server->openowner_id);
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index c6255d7edd3c..21a5f1e90859 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -423,8 +423,10 @@ static void nfs_folio_end_writeback(struct folio *folio)
>
> folio_end_writeback(folio);
> if (atomic_long_dec_return(&nfss->writeback) <
> - NFS_CONGESTION_OFF_THRESH)
> + NFS_CONGESTION_OFF_THRESH) {
> nfss->write_congested = 0;
> + wake_up_all(&nfss->write_congestion_wait);
> + }
> }
>
> static void nfs_page_end_writeback(struct nfs_page *req)
> @@ -698,12 +700,14 @@ int nfs_writepages(struct address_space *mapping, struct writeback_control *wbc)
> struct nfs_pageio_descriptor pgio;
> struct nfs_io_completion *ioc = NULL;
> unsigned int mntflags = NFS_SERVER(inode)->flags;
> + struct nfs_server *nfss = NFS_SERVER(inode);
> int priority = 0;
> int err;
>
> - if (wbc->sync_mode == WB_SYNC_NONE &&
> - NFS_SERVER(inode)->write_congested)
> - return 0;
> + /* Wait with writeback until write congestion eases */
> + if (wbc->sync_mode == WB_SYNC_NONE && nfss->write_congested)
> + wait_event(nfss->write_congestion_wait,
> + nfss->write_congested == 0);
>
It seems odd to block here, but if that helps throughput then so be it.
> nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGES);
>
> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> index 92de074e63b9..38a128796a76 100644
> --- a/include/linux/nfs_fs_sb.h
> +++ b/include/linux/nfs_fs_sb.h
> @@ -140,6 +140,7 @@ struct nfs_server {
> struct rpc_clnt * client_acl; /* ACL RPC client handle */
> struct nlm_host *nlm_host; /* NLM client handle */
> struct nfs_iostats __percpu *io_stats; /* I/O statistics */
> + wait_queue_head_t write_congestion_wait; /* wait until write congestion eases */
> atomic_long_t writeback; /* number of writeback pages */
> unsigned int write_congested;/* flag set when writeback gets too high */
> unsigned int flags; /* various flags */
Reviewed-by: Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2024-06-13 20:04 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-13 8:28 [PATCH 0/3] nfs: Improve throughput for random buffered writes Jan Kara
2024-06-13 8:28 ` [PATCH 1/3] nfs: Drop pointless check from nfs_commit_release_pages() Jan Kara
2024-06-13 9:17 ` Sagi Grimberg
2024-06-13 19:55 ` Jeff Layton
2024-06-13 8:28 ` [PATCH 2/3] nfs: Properly initialize server->writeback Jan Kara
2024-06-13 9:17 ` Sagi Grimberg
2024-06-13 19:56 ` Jeff Layton
2024-06-14 11:33 ` Jan Kara
2024-06-13 8:28 ` [PATCH 3/3] nfs: Block on write congestion Jan Kara
2024-06-13 9:18 ` Sagi Grimberg
2024-06-13 20:04 ` Jeff Layton [this message]
2024-06-14 11:44 ` Jan Kara
2024-06-13 22:56 ` NeilBrown
2024-06-14 11:57 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fe224e6691204c200f9dced6aa56380cb4ddb3e6.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=jack@suse.cz \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=trond.myklebust@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox