public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH RESEND v2 0/3] nfs: Improve throughput for random buffered writes
@ 2024-07-01 10:50 Jan Kara
  2024-07-01 10:50 ` [PATCH v2 1/3] nfs: Drop pointless check from nfs_commit_release_pages() Jan Kara
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Jan Kara @ 2024-07-01 10:50 UTC (permalink / raw)
  To: Anna Schumaker
  Cc: Trond Myklebust, linux-nfs, Sagi Grimberg, Jeff Layton, Jan Kara

[Resending because of messed up mailing list address]

Hello,

this is second revision of my patch series improving NFS throughput for
buffered writes.

Changes since v1:
* Added Reviewed-by tags
* Made sleep waiting for congestion to resolve killable

Original cover letter below:

I was thinking how to best address the performance regression coming from
NFS write congestion. After considering various options and concerns raised
in the previous discussion, I've got an idea for a simple option that could
help to keep the server more busy - just mimick what block devices do and
block the flush worker waiting for congestion to resolve instead of aborting
the writeback. And it actually helps enough that I don't think more complex
solutions are warranted at this point.

This patch series has two preparatory cleanups and then a patch implementing
this idea.

								Honza

Previous versions:
Link: http://lore.kernel.org/r/20240612153022.25454-1-jack@suse.cz # v1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/3] nfs: Drop pointless check from nfs_commit_release_pages()
  2024-07-01 10:50 [PATCH RESEND v2 0/3] nfs: Improve throughput for random buffered writes Jan Kara
@ 2024-07-01 10:50 ` Jan Kara
  2024-07-01 10:50 ` [PATCH v2 2/3] nfs: Properly initialize server->writeback Jan Kara
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Jan Kara @ 2024-07-01 10:50 UTC (permalink / raw)
  To: Anna Schumaker
  Cc: Trond Myklebust, linux-nfs, Sagi Grimberg, Jeff Layton, Jan Kara

nfss->writeback is updated only when we are ending page writeback and at
that moment we also clear nfss->write_congested. So there's no point in
rechecking congestion state in nfs_commit_release_pages(). Drop the
pointless check.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/nfs/write.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 5fc12a721ec3..c6255d7edd3c 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1837,7 +1837,6 @@ static void nfs_commit_release_pages(struct nfs_commit_data *data)
 	struct nfs_page	*req;
 	int status = data->task.tk_status;
 	struct nfs_commit_info cinfo;
-	struct nfs_server *nfss;
 	struct folio *folio;
 
 	while (!list_empty(&data->pages)) {
@@ -1880,9 +1879,6 @@ static void nfs_commit_release_pages(struct nfs_commit_data *data)
 		/* Latency breaker */
 		cond_resched();
 	}
-	nfss = NFS_SERVER(data->inode);
-	if (atomic_long_read(&nfss->writeback) < NFS_CONGESTION_OFF_THRESH)
-		nfss->write_congested = 0;
 
 	nfs_init_cinfo(&cinfo, data->inode, data->dreq);
 	nfs_commit_end(cinfo.mds);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/3] nfs: Properly initialize server->writeback
  2024-07-01 10:50 [PATCH RESEND v2 0/3] nfs: Improve throughput for random buffered writes Jan Kara
  2024-07-01 10:50 ` [PATCH v2 1/3] nfs: Drop pointless check from nfs_commit_release_pages() Jan Kara
@ 2024-07-01 10:50 ` Jan Kara
  2024-07-01 11:23   ` Jeff Layton
  2024-07-01 10:50 ` [PATCH v2 3/3] nfs: Block on write congestion Jan Kara
  2024-07-17 15:58 ` [PATCH RESEND v2 0/3] nfs: Improve throughput for random buffered writes Jan Kara
  3 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2024-07-01 10:50 UTC (permalink / raw)
  To: Anna Schumaker
  Cc: Trond Myklebust, linux-nfs, Sagi Grimberg, Jeff Layton, Jan Kara

Atomic types should better be initialized with atomic_long_set() instead
of relying on zeroing done by kzalloc(). Clean this up.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/nfs/client.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index de77848ae654..3b252dceebf5 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -994,6 +994,8 @@ struct nfs_server *nfs_alloc_server(void)
 
 	server->change_attr_type = NFS4_CHANGE_TYPE_IS_UNDEFINED;
 
+	atomic_long_set(&server->writeback, 0);
+
 	ida_init(&server->openowner_id);
 	ida_init(&server->lockowner_id);
 	pnfs_init_server(server);
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 3/3] nfs: Block on write congestion
  2024-07-01 10:50 [PATCH RESEND v2 0/3] nfs: Improve throughput for random buffered writes Jan Kara
  2024-07-01 10:50 ` [PATCH v2 1/3] nfs: Drop pointless check from nfs_commit_release_pages() Jan Kara
  2024-07-01 10:50 ` [PATCH v2 2/3] nfs: Properly initialize server->writeback Jan Kara
@ 2024-07-01 10:50 ` Jan Kara
  2024-07-17 15:58 ` [PATCH RESEND v2 0/3] nfs: Improve throughput for random buffered writes Jan Kara
  3 siblings, 0 replies; 9+ messages in thread
From: Jan Kara @ 2024-07-01 10:50 UTC (permalink / raw)
  To: Anna Schumaker
  Cc: Trond Myklebust, linux-nfs, Sagi Grimberg, Jeff Layton, Jan Kara

Commit 6df25e58532b ("nfs: remove reliance on bdi congestion")
introduced NFS-private solution for limiting number of writes
outstanding against a particular server. Unlike previous bdi congestion
this algorithm actually works and limits number of outstanding writeback
pages to nfs_congestion_kb which scales with amount of client's memory
and is capped at 256 MB. As a result some workloads such as random
buffered writes over NFS got slower (from ~170 MB/s to ~126 MB/s). The
fio command to reproduce is:

fio --direct=0 --ioengine=sync --thread --invalidate=1 --group_reporting=1
  --runtime=300 --fallocate=posix --ramp_time=10 --new_group --rw=randwrite
  --size=64256m --numjobs=4 --bs=4k --fsync_on_close=1 --end_fsync=1

This happens because the client sends ~256 MB worth of dirty pages to
the server and any further background writeback request is ignored until
the number of writeback pages gets below the threshold of 192 MB. By the
time this happens and clients decides to trigger another round of
writeback, the server often has no pages to write and the disk is idle.

To fix this problem and make the client react faster to eased congestion
of the server by blocking waiting for congestion to resolve instead of
aborting writeback. This improves the random 4k buffered write
throughput to 184 MB/s.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/nfs/client.c           |  1 +
 fs/nfs/write.c            | 15 +++++++++++----
 include/linux/nfs_fs_sb.h |  1 +
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 3b252dceebf5..8286edd6062d 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -994,6 +994,7 @@ struct nfs_server *nfs_alloc_server(void)
 
 	server->change_attr_type = NFS4_CHANGE_TYPE_IS_UNDEFINED;
 
+	init_waitqueue_head(&server->write_congestion_wait);
 	atomic_long_set(&server->writeback, 0);
 
 	ida_init(&server->openowner_id);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index c6255d7edd3c..d5f8f77a2352 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -423,8 +423,10 @@ static void nfs_folio_end_writeback(struct folio *folio)
 
 	folio_end_writeback(folio);
 	if (atomic_long_dec_return(&nfss->writeback) <
-	    NFS_CONGESTION_OFF_THRESH)
+	    NFS_CONGESTION_OFF_THRESH) {
 		nfss->write_congested = 0;
+		wake_up_all(&nfss->write_congestion_wait);
+	}
 }
 
 static void nfs_page_end_writeback(struct nfs_page *req)
@@ -698,12 +700,17 @@ int nfs_writepages(struct address_space *mapping, struct writeback_control *wbc)
 	struct nfs_pageio_descriptor pgio;
 	struct nfs_io_completion *ioc = NULL;
 	unsigned int mntflags = NFS_SERVER(inode)->flags;
+	struct nfs_server *nfss = NFS_SERVER(inode);
 	int priority = 0;
 	int err;
 
-	if (wbc->sync_mode == WB_SYNC_NONE &&
-	    NFS_SERVER(inode)->write_congested)
-		return 0;
+	/* Wait with writeback until write congestion eases */
+	if (wbc->sync_mode == WB_SYNC_NONE && nfss->write_congested) {
+		err = wait_event_killable(nfss->write_congestion_wait,
+					  nfss->write_congested == 0);
+		if (err)
+			return err;
+	}
 
 	nfs_inc_stats(inode, NFSIOS_VFSWRITEPAGES);
 
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 92de074e63b9..38a128796a76 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -140,6 +140,7 @@ struct nfs_server {
 	struct rpc_clnt *	client_acl;	/* ACL RPC client handle */
 	struct nlm_host		*nlm_host;	/* NLM client handle */
 	struct nfs_iostats __percpu *io_stats;	/* I/O statistics */
+	wait_queue_head_t	write_congestion_wait;	/* wait until write congestion eases */
 	atomic_long_t		writeback;	/* number of writeback pages */
 	unsigned int		write_congested;/* flag set when writeback gets too high */
 	unsigned int		flags;		/* various flags */
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/3] nfs: Properly initialize server->writeback
  2024-07-01 10:50 ` [PATCH v2 2/3] nfs: Properly initialize server->writeback Jan Kara
@ 2024-07-01 11:23   ` Jeff Layton
  0 siblings, 0 replies; 9+ messages in thread
From: Jeff Layton @ 2024-07-01 11:23 UTC (permalink / raw)
  To: Jan Kara, Anna Schumaker; +Cc: Trond Myklebust, linux-nfs, Sagi Grimberg

On Mon, 2024-07-01 at 12:50 +0200, Jan Kara wrote:
> Atomic types should better be initialized with atomic_long_set()
> instead
> of relying on zeroing done by kzalloc(). Clean this up.
> 
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/nfs/client.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> index de77848ae654..3b252dceebf5 100644
> --- a/fs/nfs/client.c
> +++ b/fs/nfs/client.c
> @@ -994,6 +994,8 @@ struct nfs_server *nfs_alloc_server(void)
>  
>  	server->change_attr_type = NFS4_CHANGE_TYPE_IS_UNDEFINED;
>  
> +	atomic_long_set(&server->writeback, 0);
> +
>  	ida_init(&server->openowner_id);
>  	ida_init(&server->lockowner_id);
>  	pnfs_init_server(server);

It seems a little pointless, but ok.

Acked-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RESEND v2 0/3] nfs: Improve throughput for random buffered writes
  2024-07-01 10:50 [PATCH RESEND v2 0/3] nfs: Improve throughput for random buffered writes Jan Kara
                   ` (2 preceding siblings ...)
  2024-07-01 10:50 ` [PATCH v2 3/3] nfs: Block on write congestion Jan Kara
@ 2024-07-17 15:58 ` Jan Kara
  2024-07-17 17:44   ` Anna Schumaker
  3 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2024-07-17 15:58 UTC (permalink / raw)
  To: Anna Schumaker
  Cc: Trond Myklebust, linux-nfs, Sagi Grimberg, Jeff Layton, Jan Kara

Ping? I don't see these patches being in NFS git tree. Did they fall
through the cracks?

								Honza

On Mon 01-07-24 12:50:45, Jan Kara wrote:
> [Resending because of messed up mailing list address]
> 
> Hello,
> 
> this is second revision of my patch series improving NFS throughput for
> buffered writes.
> 
> Changes since v1:
> * Added Reviewed-by tags
> * Made sleep waiting for congestion to resolve killable
> 
> Original cover letter below:
> 
> I was thinking how to best address the performance regression coming from
> NFS write congestion. After considering various options and concerns raised
> in the previous discussion, I've got an idea for a simple option that could
> help to keep the server more busy - just mimick what block devices do and
> block the flush worker waiting for congestion to resolve instead of aborting
> the writeback. And it actually helps enough that I don't think more complex
> solutions are warranted at this point.
> 
> This patch series has two preparatory cleanups and then a patch implementing
> this idea.
> 
> 								Honza
> 
> Previous versions:
> Link: http://lore.kernel.org/r/20240612153022.25454-1-jack@suse.cz # v1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RESEND v2 0/3] nfs: Improve throughput for random buffered writes
  2024-07-17 15:58 ` [PATCH RESEND v2 0/3] nfs: Improve throughput for random buffered writes Jan Kara
@ 2024-07-17 17:44   ` Anna Schumaker
  2024-07-18  9:47     ` Jan Kara
  0 siblings, 1 reply; 9+ messages in thread
From: Anna Schumaker @ 2024-07-17 17:44 UTC (permalink / raw)
  To: Jan Kara; +Cc: Trond Myklebust, linux-nfs, Sagi Grimberg, Jeff Layton

Hi Jan,

On Wed, Jul 17, 2024 at 11:58 AM Jan Kara <jack@suse.cz> wrote:
>
> Ping? I don't see these patches being in NFS git tree. Did they fall
> through the cracks?
>

I have these patches in my tree starting with this commit:
http://git.linux-nfs.org/?p=anna/linux-nfs.git;a=commit;h=37d4159dd25ade59ce0fecc75984240e5f7abc14

I hope this helps!
Anna

>                                                                 Honza
>
> On Mon 01-07-24 12:50:45, Jan Kara wrote:
> > [Resending because of messed up mailing list address]
> >
> > Hello,
> >
> > this is second revision of my patch series improving NFS throughput for
> > buffered writes.
> >
> > Changes since v1:
> > * Added Reviewed-by tags
> > * Made sleep waiting for congestion to resolve killable
> >
> > Original cover letter below:
> >
> > I was thinking how to best address the performance regression coming from
> > NFS write congestion. After considering various options and concerns raised
> > in the previous discussion, I've got an idea for a simple option that could
> > help to keep the server more busy - just mimick what block devices do and
> > block the flush worker waiting for congestion to resolve instead of aborting
> > the writeback. And it actually helps enough that I don't think more complex
> > solutions are warranted at this point.
> >
> > This patch series has two preparatory cleanups and then a patch implementing
> > this idea.
> >
> >                                                               Honza
> >
> > Previous versions:
> > Link: http://lore.kernel.org/r/20240612153022.25454-1-jack@suse.cz # v1
> >
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RESEND v2 0/3] nfs: Improve throughput for random buffered writes
  2024-07-17 17:44   ` Anna Schumaker
@ 2024-07-18  9:47     ` Jan Kara
  2024-07-18 19:26       ` Anna Schumaker
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2024-07-18  9:47 UTC (permalink / raw)
  To: Anna Schumaker
  Cc: Jan Kara, Trond Myklebust, linux-nfs, Sagi Grimberg, Jeff Layton

Hi Anna!

On Wed 17-07-24 13:44:57, Anna Schumaker wrote:
> On Wed, Jul 17, 2024 at 11:58 AM Jan Kara <jack@suse.cz> wrote:
> >
> > Ping? I don't see these patches being in NFS git tree. Did they fall
> > through the cracks?
> 
> I have these patches in my tree starting with this commit:
> http://git.linux-nfs.org/?p=anna/linux-nfs.git;a=commit;h=37d4159dd25ade59ce0fecc75984240e5f7abc14

Aha, great! I was checking the tree mentioned in MAINTAINERS file which is
Trond's one and because it seemed fairly active it didn't occur to me you
are maintaining your tree as well. Thanks for the link!

One more question: Do you plan to push these changes for 6.11 or 6.12?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RESEND v2 0/3] nfs: Improve throughput for random buffered writes
  2024-07-18  9:47     ` Jan Kara
@ 2024-07-18 19:26       ` Anna Schumaker
  0 siblings, 0 replies; 9+ messages in thread
From: Anna Schumaker @ 2024-07-18 19:26 UTC (permalink / raw)
  To: Jan Kara; +Cc: Trond Myklebust, linux-nfs, Sagi Grimberg, Jeff Layton

On Thu, Jul 18, 2024 at 5:47 AM Jan Kara <jack@suse.cz> wrote:
>
> Hi Anna!
>
> On Wed 17-07-24 13:44:57, Anna Schumaker wrote:
> > On Wed, Jul 17, 2024 at 11:58 AM Jan Kara <jack@suse.cz> wrote:
> > >
> > > Ping? I don't see these patches being in NFS git tree. Did they fall
> > > through the cracks?
> >
> > I have these patches in my tree starting with this commit:
> > http://git.linux-nfs.org/?p=anna/linux-nfs.git;a=commit;h=37d4159dd25ade59ce0fecc75984240e5f7abc14
>
> Aha, great! I was checking the tree mentioned in MAINTAINERS file which is
> Trond's one and because it seemed fairly active it didn't occur to me you
> are maintaining your tree as well. Thanks for the link!

And this is the first time I've thought about adding my tree to the
MAINTAINERS file. I should probably do that to avoid confusion in the
future!

>
> One more question: Do you plan to push these changes for 6.11 or 6.12?

They'll be in my pull request for 6.11 that I hope to have out in the
next hour or so.

Anna

>
>                                                                 Honza
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-07-18 19:26 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-01 10:50 [PATCH RESEND v2 0/3] nfs: Improve throughput for random buffered writes Jan Kara
2024-07-01 10:50 ` [PATCH v2 1/3] nfs: Drop pointless check from nfs_commit_release_pages() Jan Kara
2024-07-01 10:50 ` [PATCH v2 2/3] nfs: Properly initialize server->writeback Jan Kara
2024-07-01 11:23   ` Jeff Layton
2024-07-01 10:50 ` [PATCH v2 3/3] nfs: Block on write congestion Jan Kara
2024-07-17 15:58 ` [PATCH RESEND v2 0/3] nfs: Improve throughput for random buffered writes Jan Kara
2024-07-17 17:44   ` Anna Schumaker
2024-07-18  9:47     ` Jan Kara
2024-07-18 19:26       ` Anna Schumaker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox