NFS (pNFS) and VM dirty bytes

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* NFS (pNFS) and VM dirty bytes
@ 2019-06-03 15:07 Mkrtchyan, Tigran
  2019-06-03 16:13 ` Trond Myklebust
  0 siblings, 1 reply; 2+ messages in thread
From: Mkrtchyan, Tigran @ 2019-06-03 15:07 UTC (permalink / raw)
  To: linux-nfs

Dear NFS fellows,

though this is not directly NFS issue, I post this question
here as we mostly affected by NFS clients (and you have enough
kernel connection to route it to the right people).

We have 25 new data processing nodes with 32 cores, 256 GB RAM and 25 Gb/s NIC.
They run CentOS 7 (but this is irrelevant, I think).

When each node runs 24 parallel write incentive (75% write, 25% read) workloads, we see a spike of
IO errors on close. Client runs into timeout due to slow network or IO starvation on the NFS servers.
It stumbles, disconnects, establishes a new connection and stumbled again...

As default values for dirty pages is

vm.dirty_background_bytes = 0
vm.dirty_background_ratio = 10
vm.dirty_bytes = 0
vm.dirty_ratio = 30

the first data get sent when at least 25GB of data is accumulated.

To get the full deployment more responsive, we have reduced default numbers to something more reasonable:

vm.dirty_background_ratio = 0
vm.dirty_ratio = 0
vm.dirty_background_bytes = 67108864
vm.dirty_bytes = 536870912

IOW, we force client to start to send data as soon as 64MB is written. The question is how get this
values optimal and how make them file system/mount point specific.

Thanks in advance,
   Tigran.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: NFS (pNFS) and VM dirty bytes
  2019-06-03 15:07 NFS (pNFS) and VM dirty bytes Mkrtchyan, Tigran
@ 2019-06-03 16:13 ` Trond Myklebust
  0 siblings, 0 replies; 2+ messages in thread
From: Trond Myklebust @ 2019-06-03 16:13 UTC (permalink / raw)
  To: tigran.mkrtchyan@desy.de, linux-nfs@vger.kernel.org

On Mon, 2019-06-03 at 17:07 +0200, Mkrtchyan, Tigran wrote:
> 
> Dear NFS fellows,
> 
> though this is not directly NFS issue, I post this question
> here as we mostly affected by NFS clients (and you have enough
> kernel connection to route it to the right people).
> 
> We have 25 new data processing nodes with 32 cores, 256 GB RAM and 25
> Gb/s NIC.
> They run CentOS 7 (but this is irrelevant, I think).
> 
> When each node runs 24 parallel write incentive (75% write, 25% read)
> workloads, we see a spike of
> IO errors on close. Client runs into timeout due to slow network or
> IO starvation on the NFS servers.
> It stumbles, disconnects, establishes a new connection and stumbled
> again...

You can adjust the pNFS timeout behaviour using the 'dataserver_timeo'
and 'dataserver_retrans' module parameters on both the files and
flexfiles pNFS driver modules.

> 
> As default values for dirty pages is
> 
> vm.dirty_background_bytes = 0
> vm.dirty_background_ratio = 10
> vm.dirty_bytes = 0
> vm.dirty_ratio = 30
> 
> the first data get sent when at least 25GB of data is accumulated.
> 
> To get the full deployment more responsive, we have reduced default
> numbers to something more reasonable:
> 
> vm.dirty_background_ratio = 0
> vm.dirty_ratio = 0
> vm.dirty_background_bytes = 67108864
> vm.dirty_bytes = 536870912
> 
> IOW, we force client to start to send data as soon as 64MB is
> written. The question is how get this
> values optimal and how make them file system/mount point specific.

The memory management system knows nothing about mount points, and the
filesystems know nothing about the memory management limits. That is by
design.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2019-06-03 16:13 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-06-03 15:07 NFS (pNFS) and VM dirty bytes Mkrtchyan, Tigran
2019-06-03 16:13 ` Trond Myklebust

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).