Tuning NFS client write pagecache

All of lore.kernel.org
 help / color / mirror / Atom feed

* Tuning NFS client write pagecache
@ 2010-08-06 12:21 Matthew Hodgson
  2010-08-06 13:26 ` Jim Rees
  0 siblings, 1 reply; 19+ messages in thread
From: Matthew Hodgson @ 2010-08-06 12:21 UTC (permalink / raw)
  To: linux-nfs

Hi all,

Is there any way to tune the linux NFSv3 client to prefer to write data 
straight to an async-mounted server, rather than having large writes to 
a file stack up in the local pagecache before being synced on close()?

I have an application which (stupidly) expects system calls to return 
fairly rapidly, otherwise an application-layer timeout occurs.  If I 
write (say) 100MB of data to an NFS share with the app, the write()s 
return almost immediately as the local pagecache is filled up - but then 
close() blocks for several minutes as the data is synced to the server 
over a slowish link.  Mounting the share as -o sync fixes this, as does 
opening the file O_SYNC or O_DIRECT - but ideally I want to generally 
encourage the client to flush a bit more aggressively to the server 
without the performance hit of making every write explicitly synchronous.

Is there a way to cap the size of pagecache that the NFS client uses?

This is currently on a 2.6.18 kernel (Centos 5.5), although I'm more 
than happy to use something less prehistoric if that's what it takes.

M.

-- 
Matthew Hodgson
Development Program Manager
OpenMarket | www.openmarket.com/europe
matthew.hodgson@openmarket.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Tuning NFS client write pagecache
  2010-08-06 12:21 Tuning NFS client write pagecache Matthew Hodgson
@ 2010-08-06 13:26 ` Jim Rees
  2010-08-06 14:05   ` Peter Chacko
  2010-08-06 16:29   ` Matthew Hodgson
  0 siblings, 2 replies; 19+ messages in thread
From: Jim Rees @ 2010-08-06 13:26 UTC (permalink / raw)
  To: Matthew Hodgson; +Cc: linux-nfs

Matthew Hodgson wrote:

  Is there any way to tune the linux NFSv3 client to prefer to write
  data straight to an async-mounted server, rather than having large
  writes to a file stack up in the local pagecache before being synced
  on close()?

It's been a while since I've done this, but I think you can tune this with
vm.dirty_writeback_centisecs and vm.dirty_background_ratio sysctls.  The
data will still go through the page cache but you can reduce the amount that
stacks up.

There are other places where the data can get buffered, like the rpc layer,
but it won't sit there any longer than it takes for it to go out the wire.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Tuning NFS client write pagecache
  2010-08-06 13:26 ` Jim Rees
@ 2010-08-06 14:05   ` Peter Chacko
  2010-08-06 17:37     ` Trond Myklebust
  2010-08-06 16:29   ` Matthew Hodgson
  1 sibling, 1 reply; 19+ messages in thread
From: Peter Chacko @ 2010-08-06 14:05 UTC (permalink / raw)
  To: Jim Rees; +Cc: Matthew Hodgson, linux-nfs

Some distributed file systems such as IBM's SANFS, support direct IO
to the target storage....without going through a cache... ( This
feature is useful, for write only work load....say, we are backing up
huge data to an NFS share....).

I think if not available, we should add a DIO mount option, that tell
the VFS not to cache any data, so that close operation will not stall.

With the open-to-close , cache coherence protocol of NFS, an
aggressive caching client, is a performance downer for many work-loads
that is write-mostly.



On Fri, Aug 6, 2010 at 2:26 PM, Jim Rees <rees@umich.edu> wrote:
> Matthew Hodgson wrote:
>
>  Is there any way to tune the linux NFSv3 client to prefer to write
>  data straight to an async-mounted server, rather than having large
>  writes to a file stack up in the local pagecache before being synced
>  on close()?
>
> It's been a while since I've done this, but I think you can tune this with
> vm.dirty_writeback_centisecs and vm.dirty_background_ratio sysctls.  The
> data will still go through the page cache but you can reduce the amount that
> stacks up.
>
> There are other places where the data can get buffered, like the rpc layer,
> but it won't sit there any longer than it takes for it to go out the wire.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Tuning NFS client write pagecache
  2010-08-06 14:05   ` Peter Chacko
@ 2010-08-06 17:37     ` Trond Myklebust
  2010-08-06 19:29       ` Peter Chacko
  0 siblings, 1 reply; 19+ messages in thread
From: Trond Myklebust @ 2010-08-06 17:37 UTC (permalink / raw)
  To: Peter Chacko; +Cc: Jim Rees, Matthew Hodgson, linux-nfs

On Fri, 2010-08-06 at 15:05 +0100, Peter Chacko wrote:
> Some distributed file systems such as IBM's SANFS, support direct IO
> to the target storage....without going through a cache... ( This
> feature is useful, for write only work load....say, we are backing up
> huge data to an NFS share....).
> 
> I think if not available, we should add a DIO mount option, that tell
> the VFS not to cache any data, so that close operation will not stall.

Ugh no! Applications that need direct IO should be using open(O_DIRECT),
not relying on hacks like mount options.

> With the open-to-close , cache coherence protocol of NFS, an
> aggressive caching client, is a performance downer for many work-loads
> that is write-mostly.

We already have full support for vectored aio/dio in the NFS for those
applications that want to use it.

Trond

> 
> 
> 
> On Fri, Aug 6, 2010 at 2:26 PM, Jim Rees <rees@umich.edu> wrote:
> > Matthew Hodgson wrote:
> >
> >  Is there any way to tune the linux NFSv3 client to prefer to write
> >  data straight to an async-mounted server, rather than having large
> >  writes to a file stack up in the local pagecache before being synced
> >  on close()?
> >
> > It's been a while since I've done this, but I think you can tune this with
> > vm.dirty_writeback_centisecs and vm.dirty_background_ratio sysctls.  The
> > data will still go through the page cache but you can reduce the amount that
> > stacks up.
> >
> > There are other places where the data can get buffered, like the rpc layer,
> > but it won't sit there any longer than it takes for it to go out the wire.
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Tuning NFS client write pagecache
  2010-08-06 17:37     ` Trond Myklebust
@ 2010-08-06 19:29       ` Peter Chacko
  2010-08-06 19:39         ` Trond Myklebust
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Chacko @ 2010-08-06 19:29 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Jim Rees, Matthew Hodgson, linux-nfs

Imagine a third party backup app for which a customer has no source
code. (that doesn't use open system call O_DIRECT mode) backing up
millions of files through NFS....How can we do a non-cached IO to the
target server ?  we cannot use O_DIRECT option here as we don't have
the source code....If we have mount option, its works just right
....if we can have read-only mounts, why not have a dio-only mount ?

A true application-aware storage systems(in this case NFS client) ,
which is the next generation storage systems should do, should absorb
the application needs that may apply to the whole FS....

i don't say O_DIRECT flag is a bad idea, but it will only work with a
regular application that do IO to some files.....this is not the best
solution when NFS server is used as the storage for secondary data,
where NFS client runs third party applications thats otherwise run
best in a local storage as there is no caching issues....

What do you think ?

On Fri, Aug 6, 2010 at 11:07 PM, Trond Myklebust
<trond.myklebust@fys.uio.no> wrote:
> On Fri, 2010-08-06 at 15:05 +0100, Peter Chacko wrote:
>> Some distributed file systems such as IBM's SANFS, support direct IO
>> to the target storage....without going through a cache... ( This
>> feature is useful, for write only work load....say, we are backing up
>> huge data to an NFS share....).
>>
>> I think if not available, we should add a DIO mount option, that tell
>> the VFS not to cache any data, so that close operation will not stall.
>
> Ugh no! Applications that need direct IO should be using open(O_DIRECT),
> not relying on hacks like mount options.
>
>> With the open-to-close , cache coherence protocol of NFS, an
>> aggressive caching client, is a performance downer for many work-loads
>> that is write-mostly.
>
> We already have full support for vectored aio/dio in the NFS for those
> applications that want to use it.
>
> Trond
>
>>
>>
>>
>> On Fri, Aug 6, 2010 at 2:26 PM, Jim Rees <rees@umich.edu> wrote:
>> > Matthew Hodgson wrote:
>> >
>> >  Is there any way to tune the linux NFSv3 client to prefer to write
>> >  data straight to an async-mounted server, rather than having large
>> >  writes to a file stack up in the local pagecache before being synced
>> >  on close()?
>> >
>> > It's been a while since I've done this, but I think you can tune this with
>> > vm.dirty_writeback_centisecs and vm.dirty_background_ratio sysctls.  The
>> > data will still go through the page cache but you can reduce the amount that
>> > stacks up.
>> >
>> > There are other places where the data can get buffered, like the rpc layer,
>> > but it won't sit there any longer than it takes for it to go out the wire.
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Tuning NFS client write pagecache
  2010-08-06 19:29       ` Peter Chacko
@ 2010-08-06 19:39         ` Trond Myklebust
  2010-08-07  3:15           ` Peter Chacko
  0 siblings, 1 reply; 19+ messages in thread
From: Trond Myklebust @ 2010-08-06 19:39 UTC (permalink / raw)
  To: Peter Chacko; +Cc: Jim Rees, Matthew Hodgson, linux-nfs

On Sat, 2010-08-07 at 00:59 +0530, Peter Chacko wrote:
> Imagine a third party backup app for which a customer has no source
> code. (that doesn't use open system call O_DIRECT mode) backing up
> millions of files through NFS....How can we do a non-cached IO to the
> target server ?  we cannot use O_DIRECT option here as we don't have
> the source code....If we have mount option, its works just right
> ....if we can have read-only mounts, why not have a dio-only mount ?
> 
> A true application-aware storage systems(in this case NFS client) ,
> which is the next generation storage systems should do, should absorb
> the application needs that may apply to the whole FS....
> 
> i don't say O_DIRECT flag is a bad idea, but it will only work with a
> regular application that do IO to some files.....this is not the best
> solution when NFS server is used as the storage for secondary data,
> where NFS client runs third party applications thats otherwise run
> best in a local storage as there is no caching issues....
> 
> What do you think ?

I think that we've had O_DIRECT support in the kernel for more than six
years now. If there are backup vendors out there that haven't been
paying attention, then I'd suggest looking at other vendors.

Trond

> On Fri, Aug 6, 2010 at 11:07 PM, Trond Myklebust
> <trond.myklebust@fys.uio.no> wrote:
> > On Fri, 2010-08-06 at 15:05 +0100, Peter Chacko wrote:
> >> Some distributed file systems such as IBM's SANFS, support direct IO
> >> to the target storage....without going through a cache... ( This
> >> feature is useful, for write only work load....say, we are backing up
> >> huge data to an NFS share....).
> >>
> >> I think if not available, we should add a DIO mount option, that tell
> >> the VFS not to cache any data, so that close operation will not stall.
> >
> > Ugh no! Applications that need direct IO should be using open(O_DIRECT),
> > not relying on hacks like mount options.
> >
> >> With the open-to-close , cache coherence protocol of NFS, an
> >> aggressive caching client, is a performance downer for many work-loads
> >> that is write-mostly.
> >
> > We already have full support for vectored aio/dio in the NFS for those
> > applications that want to use it.
> >
> > Trond
> >
> >>
> >>
> >>
> >> On Fri, Aug 6, 2010 at 2:26 PM, Jim Rees <rees@umich.edu> wrote:
> >> > Matthew Hodgson wrote:
> >> >
> >> >  Is there any way to tune the linux NFSv3 client to prefer to write
> >> >  data straight to an async-mounted server, rather than having large
> >> >  writes to a file stack up in the local pagecache before being synced
> >> >  on close()?
> >> >
> >> > It's been a while since I've done this, but I think you can tune this with
> >> > vm.dirty_writeback_centisecs and vm.dirty_background_ratio sysctls.  The
> >> > data will still go through the page cache but you can reduce the amount that
> >> > stacks up.
> >> >
> >> > There are other places where the data can get buffered, like the rpc layer,
> >> > but it won't sit there any longer than it takes for it to go out the wire.
> >> > --
> >> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >> > the body of a message to majordomo@vger.kernel.org
> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> >
> >
> >




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Tuning NFS client write pagecache
  2010-08-06 19:39         ` Trond Myklebust
@ 2010-08-07  3:15           ` Peter Chacko
  2010-08-10 16:27             ` Chuck Lever
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Chacko @ 2010-08-07  3:15 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Jim Rees, Matthew Hodgson, linux-nfs

I think you are not understanding the use case of a  file-system wide,
non-cached IO for NFS.

Imagine a case when a unix shell programmer  create a backup
script,who doesn't know C programming or system calls....he just wants
to use a  cp -R sourcedir  /targetDir.  Where targetDir is an NFS
mounted share.

How can we use programmatical , per file-session interface to O_DIRECT
flag here ?

We need a file-system wide direct IO mechanisms ,the best place to
have is at the mount time. We cannot tell all sysadmins to go and
learn programming....or backup vendors to change their code that they
wrote 10 - 12 years ago...... Operating system functionalities should
cover a large audience, with different levels of  training/skills.

I hope you got my point here....

On Sat, Aug 7, 2010 at 1:09 AM, Trond Myklebust
<trond.myklebust@fys.uio.no> wrote:
> On Sat, 2010-08-07 at 00:59 +0530, Peter Chacko wrote:
>> Imagine a third party backup app for which a customer has no source
>> code. (that doesn't use open system call O_DIRECT mode) backing up
>> millions of files through NFS....How can we do a non-cached IO to the
>> target server ?  we cannot use O_DIRECT option here as we don't have
>> the source code....If we have mount option, its works just right
>> ....if we can have read-only mounts, why not have a dio-only mount ?
>>
>> A true application-Yaware storage systems(in this case NFS client) ,
>> which is the next generation storage systems should do, should absorb
>> the application needs that may apply to the whole FS....
>>
>> i don't say O_DIRECT flag is a bad idea, but it will only work with a
>> regular application that do IO to some files.....this is not the best
>> solution when NFS server is used as the storage for secondary data,
>> where NFS client runs third party applications thats otherwise run
>> best in a local storage as there is no caching issues....
>>
>> What do you think ?
>
> I think that we've had O_DIRECT support in the kernel for more than six
> years now. If there are backup vendors out there that haven't been
> paying attention, then I'd suggest looking at other vendors.
>
> Trond
>
>> On Fri, Aug 6, 2010 at 11:07 PM, Trond Myklebust
>> <trond.myklebust@fys.uio.no> wrote:
>> > On Fri, 2010-08-06 at 15:05 +0100, Peter Chacko wrote:
>> >> Some distributed file systems such as IBM's SANFS, support direct IO
>> >> to the target storage....without going through a cache... ( This
>> >> feature is useful, for write only work load....say, we are backing up
>> >> huge data to an NFS share....).
>> >>
>> >> I think if not available, we should add a DIO mount option, that tell
>> >> the VFS not to cache any data, so that close operation will not stall.
>> >
>> > Ugh no! Applications that need direct IO should be using open(O_DIRECT),
>> > not relying on hacks like mount options.
>> >
>> >> With the open-to-close , cache coherence protocol of NFS, an
>> >> aggressive caching client, is a performance downer for many work-loads
>> >> that is write-mostly.
>> >
>> > We already have full support for vectored aio/dio in the NFS for those
>> > applications that want to use it.
>> >
>> > Trond
>> >
>> >>
>> >>
>> >>
>> >> On Fri, Aug 6, 2010 at 2:26 PM, Jim Rees <rees@umich.edu> wrote:
>> >> > Matthew Hodgson wrote:
>> >> >
>> >> >  Is there any way to tune the linux NFSv3 client to prefer to write
>> >> >  data straight to an async-mounted server, rather than having large
>> >> >  writes to a file stack up in the local pagecache before being synced
>> >> >  on close()?
>> >> >
>> >> > It's been a while since I've done this, but I think you can tune this with
>> >> > vm.dirty_writeback_centisecs and vm.dirty_background_ratio sysctls.  The
>> >> > data will still go through the page cache but you can reduce the amount that
>> >> > stacks up.
>> >> >
>> >> > There are other places where the data can get buffered, like the rpc layer,
>> >> > but it won't sit there any longer than it takes for it to go out the wire.
>> >> > --
>> >> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> >> > the body of a message to majordomo@vger.kernel.org
>> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> >
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>> >
>> >
>> >
>
>
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Tuning NFS client write pagecache
  2010-08-07  3:15           ` Peter Chacko
@ 2010-08-10 16:27             ` Chuck Lever
  2010-08-10 17:52               ` Peter Chacko
  2010-08-10 20:50               ` Gilliam, PaulX J
  0 siblings, 2 replies; 19+ messages in thread
From: Chuck Lever @ 2010-08-10 16:27 UTC (permalink / raw)
  To: Peter Chacko; +Cc: Trond Myklebust, Jim Rees, Matthew Hodgson, linux-nfs


On Aug 6, 2010, at 9:15 PM, Peter Chacko wrote:

> I think you are not understanding the use case of a  file-system wide,
> non-cached IO for NFS.
> 
> Imagine a case when a unix shell programmer  create a backup
> script,who doesn't know C programming or system calls....he just wants
> to use a  cp -R sourcedir  /targetDir.  Where targetDir is an NFS
> mounted share.
> 
> How can we use programmatical , per file-session interface to O_DIRECT
> flag here ?
> 
> We need a file-system wide direct IO mechanisms ,the best place to
> have is at the mount time. We cannot tell all sysadmins to go and
> learn programming....or backup vendors to change their code that they
> wrote 10 - 12 years ago...... Operating system functionalities should
> cover a large audience, with different levels of  training/skills.
> 
> I hope you got my point here....

The reason Linux doesn't support a filesystem wide option is that direct I/O has as much potential to degrade performance as it does to improve it.  The performance degradation can affect other applications on the same file system and other clients connected to the same server.  So it can be an exceptionally unfriendly thing to do for your neighbors if an application is stupid or malicious.

To make direct I/O work well, applications have to use it sparingly and appropriately.  They usually maintain their own buffer cache in lieu of the client's generic page cache.  Applications like shells and editors depend on an NFS client's local page cache to work well.

So, we have chosen to support direct I/O only when each file is opened, not as a file system wide option.  This is a much narrower application of this feature, and has a better chance of helping performance in special cases while not destroying it broadly.

So far I haven't read anything here that clearly states a requirement we have overlooked in the past.

For your "cp" example, the NFS community is looking at ways to reduce the overhead of file copy operations by offloading them to the server.  The file data doesn't have to travel over the network to the client.  Someone recently said when you leave this kind of choice up to users, they will usually choose exactly the wrong option.  This is a clear case where the system and application developers will choose better than users who have no programming skills.


> On Sat, Aug 7, 2010 at 1:09 AM, Trond Myklebust
> <trond.myklebust@fys.uio.no> wrote:
>> On Sat, 2010-08-07 at 00:59 +0530, Peter Chacko wrote:
>>> Imagine a third party backup app for which a customer has no source
>>> code. (that doesn't use open system call O_DIRECT mode) backing up
>>> millions of files through NFS....How can we do a non-cached IO to the
>>> target server ?  we cannot use O_DIRECT option here as we don't have
>>> the source code....If we have mount option, its works just right
>>> ....if we can have read-only mounts, why not have a dio-only mount ?
>>> 
>>> A true application-Yaware storage systems(in this case NFS client) ,
>>> which is the next generation storage systems should do, should absorb
>>> the application needs that may apply to the whole FS....
>>> 
>>> i don't say O_DIRECT flag is a bad idea, but it will only work with a
>>> regular application that do IO to some files.....this is not the best
>>> solution when NFS server is used as the storage for secondary data,
>>> where NFS client runs third party applications thats otherwise run
>>> best in a local storage as there is no caching issues....
>>> 
>>> What do you think ?
>> 
>> I think that we've had O_DIRECT support in the kernel for more than six
>> years now. If there are backup vendors out there that haven't been
>> paying attention, then I'd suggest looking at other vendors.
>> 
>> Trond
>> 
>>> On Fri, Aug 6, 2010 at 11:07 PM, Trond Myklebust
>>> <trond.myklebust@fys.uio.no> wrote:
>>>> On Fri, 2010-08-06 at 15:05 +0100, Peter Chacko wrote:
>>>>> Some distributed file systems such as IBM's SANFS, support direct IO
>>>>> to the target storage....without going through a cache... ( This
>>>>> feature is useful, for write only work load....say, we are backing up
>>>>> huge data to an NFS share....).
>>>>> 
>>>>> I think if not available, we should add a DIO mount option, that tell
>>>>> the VFS not to cache any data, so that close operation will not stall.
>>>> 
>>>> Ugh no! Applications that need direct IO should be using open(O_DIRECT),
>>>> not relying on hacks like mount options.
>>>> 
>>>>> With the open-to-close , cache coherence protocol of NFS, an
>>>>> aggressive caching client, is a performance downer for many work-loads
>>>>> that is write-mostly.
>>>> 
>>>> We already have full support for vectored aio/dio in the NFS for those
>>>> applications that want to use it.
>>>> 
>>>> Trond
>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Fri, Aug 6, 2010 at 2:26 PM, Jim Rees <rees@umich.edu> wrote:
>>>>>> Matthew Hodgson wrote:
>>>>>> 
>>>>>>  Is there any way to tune the linux NFSv3 client to prefer to write
>>>>>>  data straight to an async-mounted server, rather than having large
>>>>>>  writes to a file stack up in the local pagecache before being synced
>>>>>>  on close()?
>>>>>> 
>>>>>> It's been a while since I've done this, but I think you can tune this with
>>>>>> vm.dirty_writeback_centisecs and vm.dirty_background_ratio sysctls.  The
>>>>>> data will still go through the page cache but you can reduce the amount that
>>>>>> stacks up.
>>>>>> 
>>>>>> There are other places where the data can get buffered, like the rpc layer,
>>>>>> but it won't sit there any longer than it takes for it to go out the wire.
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>> 
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>> 
>>>> 
>>>> 
>>>> 
>> 
>> 
>> 
>> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Tuning NFS client write pagecache
  2010-08-10 16:27             ` Chuck Lever
@ 2010-08-10 17:52               ` Peter Chacko
  2010-08-10 18:19                 ` David Brodbeck
  2010-08-10 19:16                 ` Chuck Lever
  2010-08-10 20:50               ` Gilliam, PaulX J
  1 sibling, 2 replies; 19+ messages in thread
From: Peter Chacko @ 2010-08-10 17:52 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Trond Myklebust, Jim Rees, Matthew Hodgson, linux-nfs

Dear chuck,

Yes, if we perform a bulk cp operations, data need not go through
network, if both source and destination are on the NFS...if thats not
the case, we have to move data across network...

Most of the time, NFS (or NAS for that matter) best serve the
enterprise as a D2D backup destination. Either backup server is NFS or
media server is NFS client.

Its very beneficial if NFS can start its business in DIO mode.....so
that backup admins can just write simple scripts to move terabytes of
data ...without buying any exotic backup software....

And caching itself is not useful  for any streaming datapath.(Be it
NFS cache,or memory cache or cpu cache or even a web cache).. backup
is write-only operation, for all file objects...

if application needs, we should have a mechanism to mount NFS client
FS, without enabling client caching...

See veritas VxFS avoids disk caching for Databases, through QuickIO
option.....We should have a similar mechanisms for NFS....

Whats your thoughts ? what are the architectural/design level issues
we will encounter,
if we bring this feature to NFS? Is there any patch available for this ?

How does V4 fare here ?

On Tue, Aug 10, 2010 at 9:57 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
>
> On Aug 6, 2010, at 9:15 PM, Peter Chacko wrote:
>
>> I think you are not understanding the use case of a  file-system wide,
>> non-cached IO for NFS.
>>
>> Imagine a case when a unix shell programmer  create a backup
>> script,who doesn't know C programming or system calls....he just wants
>> to use a  cp -R sourcedir  /targetDir.  Where targetDir is an NFS
>> mounted share.
>>
>> How can we use programmatical , per file-session interface to O_DIRECT
>> flag here ?
>>
>> We need a file-system wide direct IO mechanisms ,the best place to
>> have is at the mount time. We cannot tell all sysadmins to go and
>> learn programming....or backup vendors to change their code that they
>> wrote 10 - 12 years ago...... Operating system functionalities should
>> cover a large audience, with different levels of  training/skills.
>>
>> I hope you got my point here....
>
> The reason Linux doesn't support a filesystem wide option is that direct I/O has as much potential to degrade performance as it does to improve it.  The performance degradation can affect other applications on the same file system and other clients connected to the same server.  So it can be an exceptionally unfriendly thing to do for your neighbors if an application is stupid or malicious.
>
> To make direct I/O work well, applications have to use it sparingly and appropriately.  They usually maintain their own buffer cache in lieu of the client's generic page cache.  Applications like shells and editors depend on an NFS client's local page cache to work well.
>
> So, we have chosen to support direct I/O only when each file is opened, not as a file system wide option.  This is a much narrower application of this feature, and has a better chance of helping performance in special cases while not destroying it broadly.
>
> So far I haven't read anything here that clearly states a requirement we have overlooked in the past.
>
> For your "cp" example, the NFS community is looking at ways to reduce the overhead of file copy operations by offloading them to the server.  The file data doesn't have to travel over the network to the client.  Someone recently said when you leave this kind of choice up to users, they will usually choose exactly the wrong option.  This is a clear case where the system and application developers will choose better than users who have no programming skills.
>
>
>> On Sat, Aug 7, 2010 at 1:09 AM, Trond Myklebust
>> <trond.myklebust@fys.uio.no> wrote:
>>> On Sat, 2010-08-07 at 00:59 +0530, Peter Chacko wrote:
>>>> Imagine a third party backup app for which a customer has no source
>>>> code. (that doesn't use open system call O_DIRECT mode) backing up
>>>> millions of files through NFS....How can we do a non-cached IO to the
>>>> target server ?  we cannot use O_DIRECT option here as we don't have
>>>> the source code....If we have mount option, its works just right
>>>> ....if we can have read-only mounts, why not have a dio-only mount ?
>>>>
>>>> A true application-Yaware storage systems(in this case NFS client) ,
>>>> which is the next generation storage systems should do, should absorb
>>>> the application needs that may apply to the whole FS....
>>>>
>>>> i don't say O_DIRECT flag is a bad idea, but it will only work with a
>>>> regular application that do IO to some files.....this is not the best
>>>> solution when NFS server is used as the storage for secondary data,
>>>> where NFS client runs third party applications thats otherwise run
>>>> best in a local storage as there is no caching issues....
>>>>
>>>> What do you think ?
>>>
>>> I think that we've had O_DIRECT support in the kernel for more than six
>>> years now. If there are backup vendors out there that haven't been
>>> paying attention, then I'd suggest looking at other vendors.
>>>
>>> Trond
>>>
>>>> On Fri, Aug 6, 2010 at 11:07 PM, Trond Myklebust
>>>> <trond.myklebust@fys.uio.no> wrote:
>>>>> On Fri, 2010-08-06 at 15:05 +0100, Peter Chacko wrote:
>>>>>> Some distributed file systems such as IBM's SANFS, support direct IO
>>>>>> to the target storage....without going through a cache... ( This
>>>>>> feature is useful, for write only work load....say, we are backing up
>>>>>> huge data to an NFS share....).
>>>>>>
>>>>>> I think if not available, we should add a DIO mount option, that tell
>>>>>> the VFS not to cache any data, so that close operation will not stall.
>>>>>
>>>>> Ugh no! Applications that need direct IO should be using open(O_DIRECT),
>>>>> not relying on hacks like mount options.
>>>>>
>>>>>> With the open-to-close , cache coherence protocol of NFS, an
>>>>>> aggressive caching client, is a performance downer for many work-loads
>>>>>> that is write-mostly.
>>>>>
>>>>> We already have full support for vectored aio/dio in the NFS for those
>>>>> applications that want to use it.
>>>>>
>>>>> Trond
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 6, 2010 at 2:26 PM, Jim Rees <rees@umich.edu> wrote:
>>>>>>> Matthew Hodgson wrote:
>>>>>>>
>>>>>>>  Is there any way to tune the linux NFSv3 client to prefer to write
>>>>>>>  data straight to an async-mounted server, rather than having large
>>>>>>>  writes to a file stack up in the local pagecache before being synced
>>>>>>>  on close()?
>>>>>>>
>>>>>>> It's been a while since I've done this, but I think you can tune this with
>>>>>>> vm.dirty_writeback_centisecs and vm.dirty_background_ratio sysctls.  The
>>>>>>> data will still go through the page cache but you can reduce the amount that
>>>>>>> stacks up.
>>>>>>>
>>>>>>> There are other places where the data can get buffered, like the rpc layer,
>>>>>>> but it won't sit there any longer than it takes for it to go out the wire.
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>
>
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Tuning NFS client write pagecache
  2010-08-10 17:52               ` Peter Chacko
@ 2010-08-10 18:19                 ` David Brodbeck
  2010-08-10 19:16                 ` Chuck Lever
  1 sibling, 0 replies; 19+ messages in thread
From: David Brodbeck @ 2010-08-10 18:19 UTC (permalink / raw)
  To: linux-nfs

On Aug 10, 2010, at 10:52 AM, Peter Chacko wrote:
> And caching itself is not useful  for any streaming datapath.(Be it
> NFS cache,or memory cache or cpu cache or even a web cache).. backup
> is write-only operation, for all file objects...

It seems to me this is only true if you're talking strictly about full backups.  Any kind of incremental or differential backup is likely to do a significant amount of reading in order to determine what files need to be sent, unless it's storing data about each backup locally.  rsync, for example, needs to read from both filesystems to build up its file list.

-- 

David Brodbeck
System Administrator, Linguistics
University of Washington

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Tuning NFS client write pagecache
  2010-08-10 17:52               ` Peter Chacko
  2010-08-10 18:19                 ` David Brodbeck
@ 2010-08-10 19:16                 ` Chuck Lever
  1 sibling, 0 replies; 19+ messages in thread
From: Chuck Lever @ 2010-08-10 19:16 UTC (permalink / raw)
  To: Peter Chacko; +Cc: Trond Myklebust, Jim Rees, Matthew Hodgson, linux-nfs


On Aug 10, 2010, at 11:52 AM, Peter Chacko wrote:

> Dear chuck,
> 
> Yes, if we perform a bulk cp operations, data need not go through
> network, if both source and destination are on the NFS...if thats not
> the case, we have to move data across network...
> 
> Most of the time, NFS (or NAS for that matter) best serve the
> enterprise as a D2D backup destination. Either backup server is NFS or
> media server is NFS client.
> 
> Its very beneficial if NFS can start its business in DIO mode.....so
> that backup admins can just write simple scripts to move terabytes of
> data ...without buying any exotic backup software....

I believe there is a command line flag on the common utilities to operate in direct I/O mode.  I'm not in front of Linux right now, so I can't check if this is still true.  If that's the case, it would be simple to modify scripts to specify that flag when doing data copies.

> And caching itself is not useful  for any streaming datapath.(Be it
> NFS cache,or memory cache or cpu cache or even a web cache).. backup
> is write-only operation, for all file objects...

No one is suggesting otherwise.  Our user space file system interfaces allow plenty of flexibility here.  You can specify O_DIRECT or use madvise_foo(3) or fadvise_foo(3) to make the kernel behave as needed.

The problem here is there really is no good way to get the kernel to guess what an application needs.  It will almost always guess wrong in some important cases.

> if application needs, we should have a mechanism to mount NFS client
> FS, without enabling client caching...

We have a mechanism for disabling caching on a per-file basis.  This is fine-grained control.  I've never found a compelling reason to enable it at once across a whole file system, yet there are good reasons not to allow such a thing, and focus only on individual files and applications.

> See veritas VxFS avoids disk caching for Databases, through QuickIO
> option.....We should have a similar mechanisms for NFS....

Database scalability is exactly why I wrote the Linux NFS client's O_DIRECT support.

> Whats your thoughts ? what are the architectural/design level issues
> we will encounter,
> if we bring this feature to NFS? Is there any patch available for this ?

Support for uncached I/O has been in the Linux NFS client since RHAS 2.1, and available upstream since roughly 2.4.20 (yes, 2.4, not 2.6).

> How does V4 fare here ?

NFSv4 supports direct I/O just like the other versions of the protocol.  Direct I/O is version agnostic.

> 
> On Tue, Aug 10, 2010 at 9:57 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
>> 
>> On Aug 6, 2010, at 9:15 PM, Peter Chacko wrote:
>> 
>>> I think you are not understanding the use case of a  file-system wide,
>>> non-cached IO for NFS.
>>> 
>>> Imagine a case when a unix shell programmer  create a backup
>>> script,who doesn't know C programming or system calls....he just wants
>>> to use a  cp -R sourcedir  /targetDir.  Where targetDir is an NFS
>>> mounted share.
>>> 
>>> How can we use programmatical , per file-session interface to O_DIRECT
>>> flag here ?
>>> 
>>> We need a file-system wide direct IO mechanisms ,the best place to
>>> have is at the mount time. We cannot tell all sysadmins to go and
>>> learn programming....or backup vendors to change their code that they
>>> wrote 10 - 12 years ago...... Operating system functionalities should
>>> cover a large audience, with different levels of  training/skills.
>>> 
>>> I hope you got my point here....
>> 
>> The reason Linux doesn't support a filesystem wide option is that direct I/O has as much potential to degrade performance as it does to improve it.  The performance degradation can affect other applications on the same file system and other clients connected to the same server.  So it can be an exceptionally unfriendly thing to do for your neighbors if an application is stupid or malicious.
>> 
>> To make direct I/O work well, applications have to use it sparingly and appropriately.  They usually maintain their own buffer cache in lieu of the client's generic page cache.  Applications like shells and editors depend on an NFS client's local page cache to work well.
>> 
>> So, we have chosen to support direct I/O only when each file is opened, not as a file system wide option.  This is a much narrower application of this feature, and has a better chance of helping performance in special cases while not destroying it broadly.
>> 
>> So far I haven't read anything here that clearly states a requirement we have overlooked in the past.
>> 
>> For your "cp" example, the NFS community is looking at ways to reduce the overhead of file copy operations by offloading them to the server.  The file data doesn't have to travel over the network to the client.  Someone recently said when you leave this kind of choice up to users, they will usually choose exactly the wrong option.  This is a clear case where the system and application developers will choose better than users who have no programming skills.
>> 
>> 
>>> On Sat, Aug 7, 2010 at 1:09 AM, Trond Myklebust
>>> <trond.myklebust@fys.uio.no> wrote:
>>>> On Sat, 2010-08-07 at 00:59 +0530, Peter Chacko wrote:
>>>>> Imagine a third party backup app for which a customer has no source
>>>>> code. (that doesn't use open system call O_DIRECT mode) backing up
>>>>> millions of files through NFS....How can we do a non-cached IO to the
>>>>> target server ?  we cannot use O_DIRECT option here as we don't have
>>>>> the source code....If we have mount option, its works just right
>>>>> ....if we can have read-only mounts, why not have a dio-only mount ?
>>>>> 
>>>>> A true application-Yaware storage systems(in this case NFS client) ,
>>>>> which is the next generation storage systems should do, should absorb
>>>>> the application needs that may apply to the whole FS....
>>>>> 
>>>>> i don't say O_DIRECT flag is a bad idea, but it will only work with a
>>>>> regular application that do IO to some files.....this is not the best
>>>>> solution when NFS server is used as the storage for secondary data,
>>>>> where NFS client runs third party applications thats otherwise run
>>>>> best in a local storage as there is no caching issues....
>>>>> 
>>>>> What do you think ?
>>>> 
>>>> I think that we've had O_DIRECT support in the kernel for more than six
>>>> years now. If there are backup vendors out there that haven't been
>>>> paying attention, then I'd suggest looking at other vendors.
>>>> 
>>>> Trond
>>>> 
>>>>> On Fri, Aug 6, 2010 at 11:07 PM, Trond Myklebust
>>>>> <trond.myklebust@fys.uio.no> wrote:
>>>>>> On Fri, 2010-08-06 at 15:05 +0100, Peter Chacko wrote:
>>>>>>> Some distributed file systems such as IBM's SANFS, support direct IO
>>>>>>> to the target storage....without going through a cache... ( This
>>>>>>> feature is useful, for write only work load....say, we are backing up
>>>>>>> huge data to an NFS share....).
>>>>>>> 
>>>>>>> I think if not available, we should add a DIO mount option, that tell
>>>>>>> the VFS not to cache any data, so that close operation will not stall.
>>>>>> 
>>>>>> Ugh no! Applications that need direct IO should be using open(O_DIRECT),
>>>>>> not relying on hacks like mount options.
>>>>>> 
>>>>>>> With the open-to-close , cache coherence protocol of NFS, an
>>>>>>> aggressive caching client, is a performance downer for many work-loads
>>>>>>> that is write-mostly.
>>>>>> 
>>>>>> We already have full support for vectored aio/dio in the NFS for those
>>>>>> applications that want to use it.
>>>>>> 
>>>>>> Trond
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, Aug 6, 2010 at 2:26 PM, Jim Rees <rees@umich.edu> wrote:
>>>>>>>> Matthew Hodgson wrote:
>>>>>>>> 
>>>>>>>>  Is there any way to tune the linux NFSv3 client to prefer to write
>>>>>>>>  data straight to an async-mounted server, rather than having large
>>>>>>>>  writes to a file stack up in the local pagecache before being synced
>>>>>>>>  on close()?
>>>>>>>> 
>>>>>>>> It's been a while since I've done this, but I think you can tune this with
>>>>>>>> vm.dirty_writeback_centisecs and vm.dirty_background_ratio sysctls.  The
>>>>>>>> data will still go through the page cache but you can reduce the amount that
>>>>>>>> stacks up.
>>>>>>>> 
>>>>>>>> There are other places where the data can get buffered, like the rpc layer,
>>>>>>>> but it won't sit there any longer than it takes for it to go out the wire.
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>> 
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> --
>> Chuck Lever
>> chuck[dot]lever[at]oracle[dot]com
>> 
>> 
>> 
>> 

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: Tuning NFS client write pagecache
  2010-08-10 16:27             ` Chuck Lever
  2010-08-10 17:52               ` Peter Chacko
@ 2010-08-10 20:50               ` Gilliam, PaulX J
  2010-08-10 21:47                 ` Chuck Lever
  1 sibling, 1 reply; 19+ messages in thread
From: Gilliam, PaulX J @ 2010-08-10 20:50 UTC (permalink / raw)
  To: Chuck Lever, Peter Chacko
  Cc: Trond Myklebust, Jim Rees, Matthew Hodgson,
	linux-nfs@vger.kernel.org



>-----Original Message-----
>From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs-
>owner@vger.kernel.org] On Behalf Of Chuck Lever
>Sent: Tuesday, August 10, 2010 9:27 AM
>To: Peter Chacko
>Cc: Trond Myklebust; Jim Rees; Matthew Hodgson; linux-nfs@vger.kernel.org
>Subject: Re: Tuning NFS client write pagecache
>
>
>On Aug 6, 2010, at 9:15 PM, Peter Chacko wrote:
>
>> I think you are not understanding the use case of a  file-system wide,
>> non-cached IO for NFS.
>>
>> Imagine a case when a unix shell programmer  create a backup
>> script,who doesn't know C programming or system calls....he just wants
>> to use a  cp -R sourcedir  /targetDir.  Where targetDir is an NFS
>> mounted share.
>>
>> How can we use programmatical , per file-session interface to O_DIRECT
>> flag here ?
>>
>> We need a file-system wide direct IO mechanisms ,the best place to
>> have is at the mount time. We cannot tell all sysadmins to go and
>> learn programming....or backup vendors to change their code that they
>> wrote 10 - 12 years ago...... Operating system functionalities should
>> cover a large audience, with different levels of  training/skills.
>>
>> I hope you got my point here....
>
>The reason Linux doesn't support a filesystem wide option is that direct
>I/O has as much potential to degrade performance as it does to improve it.
>The performance degradation can affect other applications on the same file
>system and other clients connected to the same server.  So it can be an
>exceptionally unfriendly thing to do for your neighbors if an application
>is stupid or malicious.

Please forgive my ignorance, but could you give a example or two?  I can understand how direct I/O can degrade the performance of the application that is using it.  But I can't see how other applications' performance would be affected.  Unless maybe it would increase the network traffic due to the lack of write consolidation.  I can see that:  many small writes instead of one larger one.

I don't need details, just a couple of sketchy examples so I can visualize what you are referring to.

Thanks for increasing my understanding,

-=# Paul Gilliam #=-


>To make direct I/O work well, applications have to use it sparingly and
>appropriately.  They usually maintain their own buffer cache in lieu of the
>client's generic page cache.  Applications like shells and editors depend
>on an NFS client's local page cache to work well.
>
>So, we have chosen to support direct I/O only when each file is opened, not
>as a file system wide option.  This is a much narrower application of this
>feature, and has a better chance of helping performance in special cases
>while not destroying it broadly.
>
>So far I haven't read anything here that clearly states a requirement we
>have overlooked in the past.
>
>For your "cp" example, the NFS community is looking at ways to reduce the
>overhead of file copy operations by offloading them to the server.  The
>file data doesn't have to travel over the network to the client.  Someone
>recently said when you leave this kind of choice up to users, they will
>usually choose exactly the wrong option.  This is a clear case where the
>system and application developers will choose better than users who have no
>programming skills.
>
>
>> On Sat, Aug 7, 2010 at 1:09 AM, Trond Myklebust
>> <trond.myklebust@fys.uio.no> wrote:
>>> On Sat, 2010-08-07 at 00:59 +0530, Peter Chacko wrote:
>>>> Imagine a third party backup app for which a customer has no source
>>>> code. (that doesn't use open system call O_DIRECT mode) backing up
>>>> millions of files through NFS....How can we do a non-cached IO to the
>>>> target server ?  we cannot use O_DIRECT option here as we don't have
>>>> the source code....If we have mount option, its works just right
>>>> ....if we can have read-only mounts, why not have a dio-only mount ?
>>>>
>>>> A true application-Yaware storage systems(in this case NFS client) ,
>>>> which is the next generation storage systems should do, should absorb
>>>> the application needs that may apply to the whole FS....
>>>>
>>>> i don't say O_DIRECT flag is a bad idea, but it will only work with a
>>>> regular application that do IO to some files.....this is not the best
>>>> solution when NFS server is used as the storage for secondary data,
>>>> where NFS client runs third party applications thats otherwise run
>>>> best in a local storage as there is no caching issues....
>>>>
>>>> What do you think ?
>>>
>>> I think that we've had O_DIRECT support in the kernel for more than six
>>> years now. If there are backup vendors out there that haven't been
>>> paying attention, then I'd suggest looking at other vendors.
>>>
>>> Trond
>>>
>>>> On Fri, Aug 6, 2010 at 11:07 PM, Trond Myklebust
>>>> <trond.myklebust@fys.uio.no> wrote:
>>>>> On Fri, 2010-08-06 at 15:05 +0100, Peter Chacko wrote:
>>>>>> Some distributed file systems such as IBM's SANFS, support direct IO
>>>>>> to the target storage....without going through a cache... ( This
>>>>>> feature is useful, for write only work load....say, we are backing up
>>>>>> huge data to an NFS share....).
>>>>>>
>>>>>> I think if not available, we should add a DIO mount option, that tell
>>>>>> the VFS not to cache any data, so that close operation will not
>stall.
>>>>>
>>>>> Ugh no! Applications that need direct IO should be using
>open(O_DIRECT),
>>>>> not relying on hacks like mount options.
>>>>>
>>>>>> With the open-to-close , cache coherence protocol of NFS, an
>>>>>> aggressive caching client, is a performance downer for many work-
>loads
>>>>>> that is write-mostly.
>>>>>
>>>>> We already have full support for vectored aio/dio in the NFS for those
>>>>> applications that want to use it.
>>>>>
>>>>> Trond
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 6, 2010 at 2:26 PM, Jim Rees <rees@umich.edu> wrote:
>>>>>>> Matthew Hodgson wrote:
>>>>>>>
>>>>>>>  Is there any way to tune the linux NFSv3 client to prefer to write
>>>>>>>  data straight to an async-mounted server, rather than having large
>>>>>>>  writes to a file stack up in the local pagecache before being
>synced
>>>>>>>  on close()?
>>>>>>>
>>>>>>> It's been a while since I've done this, but I think you can tune
>this with
>>>>>>> vm.dirty_writeback_centisecs and vm.dirty_background_ratio sysctls.
>The
>>>>>>> data will still go through the page cache but you can reduce the
>amount that
>>>>>>> stacks up.
>>>>>>>
>>>>>>> There are other places where the data can get buffered, like the rpc
>layer,
>>>>>>> but it won't sit there any longer than it takes for it to go out the
>wire.
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs"
>in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs"
>in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>--
>Chuck Lever
>chuck[dot]lever[at]oracle[dot]com
>
>
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Tuning NFS client write pagecache
  2010-08-10 20:50               ` Gilliam, PaulX J
@ 2010-08-10 21:47                 ` Chuck Lever
  2010-08-11  2:09                   ` Peter Chacko
  0 siblings, 1 reply; 19+ messages in thread
From: Chuck Lever @ 2010-08-10 21:47 UTC (permalink / raw)
  To: Gilliam, PaulX J
  Cc: Peter Chacko, Trond Myklebust, Jim Rees, Matthew Hodgson,
	linux-nfs@vger.kernel.org


On Aug 10, 2010, at 2:50 PM, Gilliam, PaulX J wrote:

> 
> 
>> -----Original Message-----
>> From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs-
>> owner@vger.kernel.org] On Behalf Of Chuck Lever
>> Sent: Tuesday, August 10, 2010 9:27 AM
>> To: Peter Chacko
>> Cc: Trond Myklebust; Jim Rees; Matthew Hodgson; linux-nfs@vger.kernel.org
>> Subject: Re: Tuning NFS client write pagecache
>> 
>> 
>> On Aug 6, 2010, at 9:15 PM, Peter Chacko wrote:
>> 
>>> I think you are not understanding the use case of a  file-system wide,
>>> non-cached IO for NFS.
>>> 
>>> Imagine a case when a unix shell programmer  create a backup
>>> script,who doesn't know C programming or system calls....he just wants
>>> to use a  cp -R sourcedir  /targetDir.  Where targetDir is an NFS
>>> mounted share.
>>> 
>>> How can we use programmatical , per file-session interface to O_DIRECT
>>> flag here ?
>>> 
>>> We need a file-system wide direct IO mechanisms ,the best place to
>>> have is at the mount time. We cannot tell all sysadmins to go and
>>> learn programming....or backup vendors to change their code that they
>>> wrote 10 - 12 years ago...... Operating system functionalities should
>>> cover a large audience, with different levels of  training/skills.
>>> 
>>> I hope you got my point here....
>> 
>> The reason Linux doesn't support a filesystem wide option is that direct
>> I/O has as much potential to degrade performance as it does to improve it.
>> The performance degradation can affect other applications on the same file
>> system and other clients connected to the same server.  So it can be an
>> exceptionally unfriendly thing to do for your neighbors if an application
>> is stupid or malicious.
> 
> Please forgive my ignorance, but could you give a example or two?  I can understand how direct I/O can degrade the performance of the application that is using it.  But I can't see how other applications' performance would be affected.  Unless maybe it would increase the network traffic due to the lack of write consolidation.  I can see that:  many small writes instead of one larger one.

Most typical desktop applications perform small writes, a lot of rereads of the same data, and depend on read-ahead for good performance.  Application developers assume a local data cache in order to keep their programs simple.  To get good performance, even on local file systems, their applications would have to maintain their own data cache (in fact, that is what direct I/O-enabled applications do already).

Having no data cache on the NFS client means that all of this I/O would be exposed to the network and the NFS server.   That's an opportunity cost paid for by all other users of the network and NFS server.  Exposing that excess I/O activity will have a broad effect on the amount of I/O the system as whole (clients, network, server) can perform.

If you have one NFS client running just a few apps, you may not notice the difference (unless you have a low bandwidth network).  But NFS pretty much requires good client-side caching to scale in the number of clients and amount of I/O.

> I don't need details, just a couple of sketchy examples so I can visualize what you are referring to.
> 
> Thanks for increasing my understanding,
> 
> -=# Paul Gilliam #=-
> 
> 
>> To make direct I/O work well, applications have to use it sparingly and
>> appropriately.  They usually maintain their own buffer cache in lieu of the
>> client's generic page cache.  Applications like shells and editors depend
>> on an NFS client's local page cache to work well.
>> 
>> So, we have chosen to support direct I/O only when each file is opened, not
>> as a file system wide option.  This is a much narrower application of this
>> feature, and has a better chance of helping performance in special cases
>> while not destroying it broadly.
>> 
>> So far I haven't read anything here that clearly states a requirement we
>> have overlooked in the past.
>> 
>> For your "cp" example, the NFS community is looking at ways to reduce the
>> overhead of file copy operations by offloading them to the server.  The
>> file data doesn't have to travel over the network to the client.  Someone
>> recently said when you leave this kind of choice up to users, they will
>> usually choose exactly the wrong option.  This is a clear case where the
>> system and application developers will choose better than users who have no
>> programming skills.
>> 
>> 
>>> On Sat, Aug 7, 2010 at 1:09 AM, Trond Myklebust
>>> <trond.myklebust@fys.uio.no> wrote:
>>>> On Sat, 2010-08-07 at 00:59 +0530, Peter Chacko wrote:
>>>>> Imagine a third party backup app for which a customer has no source
>>>>> code. (that doesn't use open system call O_DIRECT mode) backing up
>>>>> millions of files through NFS....How can we do a non-cached IO to the
>>>>> target server ?  we cannot use O_DIRECT option here as we don't have
>>>>> the source code....If we have mount option, its works just right
>>>>> ....if we can have read-only mounts, why not have a dio-only mount ?
>>>>> 
>>>>> A true application-Yaware storage systems(in this case NFS client) ,
>>>>> which is the next generation storage systems should do, should absorb
>>>>> the application needs that may apply to the whole FS....
>>>>> 
>>>>> i don't say O_DIRECT flag is a bad idea, but it will only work with a
>>>>> regular application that do IO to some files.....this is not the best
>>>>> solution when NFS server is used as the storage for secondary data,
>>>>> where NFS client runs third party applications thats otherwise run
>>>>> best in a local storage as there is no caching issues....
>>>>> 
>>>>> What do you think ?
>>>> 
>>>> I think that we've had O_DIRECT support in the kernel for more than six
>>>> years now. If there are backup vendors out there that haven't been
>>>> paying attention, then I'd suggest looking at other vendors.
>>>> 
>>>> Trond
>>>> 
>>>>> On Fri, Aug 6, 2010 at 11:07 PM, Trond Myklebust
>>>>> <trond.myklebust@fys.uio.no> wrote:
>>>>>> On Fri, 2010-08-06 at 15:05 +0100, Peter Chacko wrote:
>>>>>>> Some distributed file systems such as IBM's SANFS, support direct IO
>>>>>>> to the target storage....without going through a cache... ( This
>>>>>>> feature is useful, for write only work load....say, we are backing up
>>>>>>> huge data to an NFS share....).
>>>>>>> 
>>>>>>> I think if not available, we should add a DIO mount option, that tell
>>>>>>> the VFS not to cache any data, so that close operation will not
>> stall.
>>>>>> 
>>>>>> Ugh no! Applications that need direct IO should be using
>> open(O_DIRECT),
>>>>>> not relying on hacks like mount options.
>>>>>> 
>>>>>>> With the open-to-close , cache coherence protocol of NFS, an
>>>>>>> aggressive caching client, is a performance downer for many work-
>> loads
>>>>>>> that is write-mostly.
>>>>>> 
>>>>>> We already have full support for vectored aio/dio in the NFS for those
>>>>>> applications that want to use it.
>>>>>> 
>>>>>> Trond
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, Aug 6, 2010 at 2:26 PM, Jim Rees <rees@umich.edu> wrote:
>>>>>>>> Matthew Hodgson wrote:
>>>>>>>> 
>>>>>>>> Is there any way to tune the linux NFSv3 client to prefer to write
>>>>>>>> data straight to an async-mounted server, rather than having large
>>>>>>>> writes to a file stack up in the local pagecache before being
>> synced
>>>>>>>> on close()?
>>>>>>>> 
>>>>>>>> It's been a while since I've done this, but I think you can tune
>> this with
>>>>>>>> vm.dirty_writeback_centisecs and vm.dirty_background_ratio sysctls.
>> The
>>>>>>>> data will still go through the page cache but you can reduce the
>> amount that
>>>>>>>> stacks up.
>>>>>>>> 
>>>>>>>> There are other places where the data can get buffered, like the rpc
>> layer,
>>>>>>>> but it won't sit there any longer than it takes for it to go out the
>> wire.
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs"
>> in
>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>> 
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs"
>> in
>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> --
>> Chuck Lever
>> chuck[dot]lever[at]oracle[dot]com
>> 
>> 
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Tuning NFS client write pagecache
  2010-08-10 21:47                 ` Chuck Lever
@ 2010-08-11  2:09                   ` Peter Chacko
  2010-08-11 16:05                     ` Chuck Lever
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Chacko @ 2010-08-11  2:09 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Gilliam, PaulX J, Trond Myklebust, Jim Rees, Matthew Hodgson,
	linux-nfs@vger.kernel.org

thanks GILLIiam for your message and Chuck for your detailed
explanation...out of your long term work with NFS.

Gilliam, most incremental backup systems use hashes/checksums to
determine the new data(deltas) not by reading all the data from the
server(or that data they wrote to the chache) but from a local
database that the backup agent keeps....rsync only requires fixed
length block checksums from the server, and it computes a rolling
checksums(weak and strong) on the clients and detect duplication....it
also doesn'r re-read the data at NFS level.


Chuck,

Ok i will then check to see the command line option to request the DIO
mode for NFS, as you suggested.

yes i other wise I fully understand the need of client caching.....for
desktop bound or any general purpose applications... AFS, cacheFS are
all good products in its own right.....but the only problem in such
cases are cache coherence issues...(i mean other application clientss
are not guaranteed to get the latest data,on their read) ..as NFS
honor only open-to-close session semantics.

The situation i have is that,

we have a data protection product, that has agents on indvidual
servers and a  storage gateway.(which is an NFS mounted box). The only
purpose of this box is to store all data, in a streaming write
mode.....for all the data coming from 10s of agents....essentially
this acts like a VTL target....from this node, to NFS server node,
there is no data travelling in the reverse path (or from the client
cache to the application).

THis is the only use we put NFS under....

For recovery, its again a streamed read...... we never updating the
read data, or re-reading the updated data....This is special , single
function box.....

What do you think the best mount options for this scenario ?

I greatly appreciate y your time explaining ..

Thanks
peter.



On Wed, Aug 11, 2010 at 3:17 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
>
> On Aug 10, 2010, at 2:50 PM, Gilliam, PaulX J wrote:
>
>>
>>
>>> -----Original Message-----
>>> From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs-
>>> owner@vger.kernel.org] On Behalf Of Chuck Lever
>>> Sent: Tuesday, August 10, 2010 9:27 AM
>>> To: Peter Chacko
>>> Cc: Trond Myklebust; Jim Rees; Matthew Hodgson; linux-nfs@vger.kernel.org
>>> Subject: Re: Tuning NFS client write pagecache
>>>
>>>
>>> On Aug 6, 2010, at 9:15 PM, Peter Chacko wrote:
>>>
>>>> I think you are not understanding the use case of a  file-system wide,
>>>> non-cached IO for NFS.
>>>>
>>>> Imagine a case when a unix shell programmer  create a backup
>>>> script,who doesn't know C programming or system calls....he just wants
>>>> to use a  cp -R sourcedir  /targetDir.  Where targetDir is an NFS
>>>> mounted share.
>>>>
>>>> How can we use programmatical , per file-session interface to O_DIRECT
>>>> flag here ?
>>>>
>>>> We need a file-system wide direct IO mechanisms ,the best place to
>>>> have is at the mount time. We cannot tell all sysadmins to go and
>>>> learn programming....or backup vendors to change their code that they
>>>> wrote 10 - 12 years ago...... Operating system functionalities should
>>>> cover a large audience, with different levels of  training/skills.
>>>>
>>>> I hope you got my point here....
>>>
>>> The reason Linux doesn't support a filesystem wide option is that direct
>>> I/O has as much potential to degrade performance as it does to improve it.
>>> The performance degradation can affect other applications on the same file
>>> system and other clients connected to the same server.  So it can be an
>>> exceptionally unfriendly thing to do for your neighbors if an application
>>> is stupid or malicious.
>>
>> Please forgive my ignorance, but could you give a example or two?  I can understand how direct I/O can degrade the performance of the application that is using it.  But I can't see how other applications' performance would be affected.  Unless maybe it would increase the network traffic due to the lack of write consolidation.  I can see that:  many small writes instead of one larger one.
>
> Most typical desktop applications perform small writes, a lot of rereads of the same data, and depend on read-ahead for good performance.  Application developers assume a local data cache in order to keep their programs simple.  To get good performance, even on local file systems, their applications would have to maintain their own data cache (in fact, that is what direct I/O-enabled applications do already).
>
> Having no data cache on the NFS client means that all of this I/O would be exposed to the network and the NFS server.   That's an opportunity cost paid for by all other users of the network and NFS server.  Exposing that excess I/O activity will have a broad effect on the amount of I/O the system as whole (clients, network, server) can perform.
>
> If you have one NFS client running just a few apps, you may not notice the difference (unless you have a low bandwidth network).  But NFS pretty much requires good client-side caching to scale in the number of clients and amount of I/O.
>
>> I don't need details, just a couple of sketchy examples so I can visualize what you are referring to.
>>
>> Thanks for increasing my understanding,
>>
>> -=# Paul Gilliam #=-
>>
>>
>>> To make direct I/O work well, applications have to use it sparingly and
>>> appropriately.  They usually maintain their own buffer cache in lieu of the
>>> client's generic page cache.  Applications like shells and editors depend
>>> on an NFS client's local page cache to work well.
>>>
>>> So, we have chosen to support direct I/O only when each file is opened, not
>>> as a file system wide option.  This is a much narrower application of this
>>> feature, and has a better chance of helping performance in special cases
>>> while not destroying it broadly.
>>>
>>> So far I haven't read anything here that clearly states a requirement we
>>> have overlooked in the past.
>>>
>>> For your "cp" example, the NFS community is looking at ways to reduce the
>>> overhead of file copy operations by offloading them to the server.  The
>>> file data doesn't have to travel over the network to the client.  Someone
>>> recently said when you leave this kind of choice up to users, they will
>>> usually choose exactly the wrong option.  This is a clear case where the
>>> system and application developers will choose better than users who have no
>>> programming skills.
>>>
>>>
>>>> On Sat, Aug 7, 2010 at 1:09 AM, Trond Myklebust
>>>> <trond.myklebust@fys.uio.no> wrote:
>>>>> On Sat, 2010-08-07 at 00:59 +0530, Peter Chacko wrote:
>>>>>> Imagine a third party backup app for which a customer has no source
>>>>>> code. (that doesn't use open system call O_DIRECT mode) backing up
>>>>>> millions of files through NFS....How can we do a non-cached IO to the
>>>>>> target server ?  we cannot use O_DIRECT option here as we don't have
>>>>>> the source code....If we have mount option, its works just right
>>>>>> ....if we can have read-only mounts, why not have a dio-only mount ?
>>>>>>
>>>>>> A true application-Yaware storage systems(in this case NFS client) ,
>>>>>> which is the next generation storage systems should do, should absorb
>>>>>> the application needs that may apply to the whole FS....
>>>>>>
>>>>>> i don't say O_DIRECT flag is a bad idea, but it will only work with a
>>>>>> regular application that do IO to some files.....this is not the best
>>>>>> solution when NFS server is used as the storage for secondary data,
>>>>>> where NFS client runs third party applications thats otherwise run
>>>>>> best in a local storage as there is no caching issues....
>>>>>>
>>>>>> What do you think ?
>>>>>
>>>>> I think that we've had O_DIRECT support in the kernel for more than six
>>>>> years now. If there are backup vendors out there that haven't been
>>>>> paying attention, then I'd suggest looking at other vendors.
>>>>>
>>>>> Trond
>>>>>
>>>>>> On Fri, Aug 6, 2010 at 11:07 PM, Trond Myklebust
>>>>>> <trond.myklebust@fys.uio.no> wrote:
>>>>>>> On Fri, 2010-08-06 at 15:05 +0100, Peter Chacko wrote:
>>>>>>>> Some distributed file systems such as IBM's SANFS, support direct IO
>>>>>>>> to the target storage....without going through a cache... ( This
>>>>>>>> feature is useful, for write only work load....say, we are backing up
>>>>>>>> huge data to an NFS share....).
>>>>>>>>
>>>>>>>> I think if not available, we should add a DIO mount option, that tell
>>>>>>>> the VFS not to cache any data, so that close operation will not
>>> stall.
>>>>>>>
>>>>>>> Ugh no! Applications that need direct IO should be using
>>> open(O_DIRECT),
>>>>>>> not relying on hacks like mount options.
>>>>>>>
>>>>>>>> With the open-to-close , cache coherence protocol of NFS, an
>>>>>>>> aggressive caching client, is a performance downer for many work-
>>> loads
>>>>>>>> that is write-mostly.
>>>>>>>
>>>>>>> We already have full support for vectored aio/dio in the NFS for those
>>>>>>> applications that want to use it.
>>>>>>>
>>>>>>> Trond
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Aug 6, 2010 at 2:26 PM, Jim Rees <rees@umich.edu> wrote:
>>>>>>>>> Matthew Hodgson wrote:
>>>>>>>>>
>>>>>>>>> Is there any way to tune the linux NFSv3 client to prefer to write
>>>>>>>>> data straight to an async-mounted server, rather than having large
>>>>>>>>> writes to a file stack up in the local pagecache before being
>>> synced
>>>>>>>>> on close()?
>>>>>>>>>
>>>>>>>>> It's been a while since I've done this, but I think you can tune
>>> this with
>>>>>>>>> vm.dirty_writeback_centisecs and vm.dirty_background_ratio sysctls.
>>> The
>>>>>>>>> data will still go through the page cache but you can reduce the
>>> amount that
>>>>>>>>> stacks up.
>>>>>>>>>
>>>>>>>>> There are other places where the data can get buffered, like the rpc
>>> layer,
>>>>>>>>> but it won't sit there any longer than it takes for it to go out the
>>> wire.
>>>>>>>>> --
>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs"
>>> in
>>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs"
>>> in
>>>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>> --
>>> Chuck Lever
>>> chuck[dot]lever[at]oracle[dot]com
>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>
>
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Tuning NFS client write pagecache
  2010-08-11  2:09                   ` Peter Chacko
@ 2010-08-11 16:05                     ` Chuck Lever
  2010-08-11 17:14                       ` Peter Chacko
  0 siblings, 1 reply; 19+ messages in thread
From: Chuck Lever @ 2010-08-11 16:05 UTC (permalink / raw)
  To: Peter Chacko; +Cc: linux-nfs@vger.kernel.org Mailing list

[ Trimming CC: list ]

On Aug 10, 2010, at 8:09 PM, Peter Chacko wrote:

> Chuck,
> 
> Ok i will then check to see the command line option to request the DIO
> mode for NFS, as you suggested.
> 
> yes i other wise I fully understand the need of client caching.....for
> desktop bound or any general purpose applications... AFS, cacheFS are
> all good products in its own right.....but the only problem in such
> cases are cache coherence issues...(i mean other application clientss
> are not guaranteed to get the latest data,on their read) ..as NFS
> honor only open-to-close session semantics.
> 
> The situation i have is that,
> 
> we have a data protection product, that has agents on indvidual
> servers and a  storage gateway.(which is an NFS mounted box). The only
> purpose of this box is to store all data, in a streaming write
> mode.....for all the data coming from 10s of agents....essentially
> this acts like a VTL target....from this node, to NFS server node,
> there is no data travelling in the reverse path (or from the client
> cache to the application).
> 
> THis is the only use we put NFS under....
> 
> For recovery, its again a streamed read...... we never updating the
> read data, or re-reading the updated data....This is special , single
> function box.....
> 
> What do you think the best mount options for this scenario ?

What is the data rate (both IOPS and data throughput) of both the read and write cases?  How large are application read and write ops, on average?  What kind of networking is deployed?  What is the server and clients (hardware and OS)?

And, I assume you are asking because the environment is not performing as you expect.  Can you detail your performance issues?

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Tuning NFS client write pagecache
  2010-08-11 16:05                     ` Chuck Lever
@ 2010-08-11 17:14                       ` Peter Chacko
  2010-08-11 20:51                         ` Chuck Lever
  0 siblings, 1 reply; 19+ messages in thread
From: Peter Chacko @ 2010-08-11 17:14 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs@vger.kernel.org Mailing list

We typically use 100MB/1GbE....and the server Storage is
SATA/SCSI...for IOPs i have not really measured  the NFS client
performance to tell you the exact number, and we use write size
4k/8ks...MTU size of the link is 1500 bytes...

But we got noticeable uniform throughput(without a bursty traffic),
and overall performance   when we hand-code NFS RPC
operations(including MOUNT to get the root File handle) and send to
server, that wrote all data to server at the NFS interface.(a sort of
directNFS from the user space)..without going through kernel mode VFS
interface of  NFS client driver. I was just wondering to get the same
performance on native nfs client...

Its still a matter of opinion about what  control we should give to
applications and what OS should control.....!!

As we test more, i can send you more test data about this ..

Finally applications will end up re-invent the wheel to suits it
special needs :-)

How does ORACLE's directNFS deal this ?

thanks chuck for your thoughts !

On Wed, Aug 11, 2010 at 9:35 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
> [ Trimming CC: list ]
>
> On Aug 10, 2010, at 8:09 PM, Peter Chacko wrote:
>
>> Chuck,
>>
>> Ok i will then check to see the command line option to request the DIO
>> mode for NFS, as you suggested.
>>
>> yes i other wise I fully understand the need of client caching.....for
>> desktop bound or any general purpose applications... AFS, cacheFS are
>> all good products in its own right.....but the only problem in such
>> cases are cache coherence issues...(i mean other application clientss
>> are not guaranteed to get the latest data,on their read) ..as NFS
>> honor only open-to-close session semantics.
>>
>> The situation i have is that,
>>
>> we have a data protection product, that has agents on indvidual
>> servers and a  storage gateway.(which is an NFS mounted box). The only
>> purpose of this box is to store all data, in a streaming write
>> mode.....for all the data coming from 10s of agents....essentially
>> this acts like a VTL target....from this node, to NFS server node,
>> there is no data travelling in the reverse path (or from the client
>> cache to the application).
>>
>> THis is the only use we put NFS under....
>>
>> For recovery, its again a streamed read...... we never updating the
>> read data, or re-reading the updated data....This is special , single
>> function box.....
>>
>> What do you think the best mount options for this scenario ?
>
> What is the data rate (both IOPS and data throughput) of both the read and write cases?  How large are application read and write ops, on average?  What kind of networking is deployed?  What is the server and clients (hardware and OS)?
>
> And, I assume you are asking because the environment is not performing as you expect.  Can you detail your performance issues?
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>
>
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Tuning NFS client write pagecache
  2010-08-11 17:14                       ` Peter Chacko
@ 2010-08-11 20:51                         ` Chuck Lever
  0 siblings, 0 replies; 19+ messages in thread
From: Chuck Lever @ 2010-08-11 20:51 UTC (permalink / raw)
  To: Peter Chacko; +Cc: linux-nfs@vger.kernel.org Mailing list

On Aug 11, 2010, at 11:14 AM, Peter Chacko wrote:
> We typically use 100MB/1GbE....and the server Storage is
> SATA/SCSI...for IOPs i have not really measured  the NFS client
> performance to tell you the exact number, and we use write size
> 4k/8ks...MTU size of the link is 1500 bytes...

> But we got noticeable uniform throughput(without a bursty traffic),
> and overall performance   when we hand-code NFS RPC
> operations(including MOUNT to get the root File handle) and send to
> server, that wrote all data to server at the NFS interface.(a sort of
> directNFS from the user space)..without going through kernel mode VFS
> interface of  NFS client driver. I was just wondering to get the same
> performance on native nfs client...

Again, I'm not hearing a clearly stated performance issue.

It doesn't sound like anything that can't easily be handled by the default mount options in any late model Linux distribution.  NFSv3 over TCP with the largest rsize and wsize negotiated with the server should easily handle this workload.

> Its still a matter of opinion about what  control we should give to
> applications and what OS should control.....!!
> 
> As we test more, i can send you more test data about this ..
> 
> Finally applications will end up re-invent the wheel to suits it
> special needs :-)

Given just this information, I don't see anything that suggests you can't implement all of this with POSIX system calls and the kernel NFS client.  Client-side data caching may waste a few resources for write-only and read-once workloads, but the kernel will reclaim memory when needed.  Your application can also use standard system calls to control the cached data, if it is a real concern.

> How does ORACLE's directNFS deal this ?

I work on the Linux kernel NFS client, so I can't really give dNFS specifics with any kind of authority.

dNFS is useful because the database already has its own built-in buffer cache, manages a very large resident set, and often needs tight cache coherency with other nodes in a database cluster (which is achieved via a separate cache protocol, rather than relying on NFS and OS caching behavior).

dNFS is logically quite similar to doing direct I/O through the kernel's NFS client.  The advantages of dNFS over direct I/O via the kernel are:

  1.  dNFS is a part of the Oracle database application, and thus the internal APIs and NFS behavior are always the same across all operating systems, and

  2.  dNFS allows a somewhat shorter code path and fewer context switches per I/O.  This is usually only critical on systems that require immense scaling.

I haven't heard anything, so far, that suggests your workload has these requirements.

> thanks chuck for your thoughts !
> 
> On Wed, Aug 11, 2010 at 9:35 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
>> [ Trimming CC: list ]
>> 
>> On Aug 10, 2010, at 8:09 PM, Peter Chacko wrote:
>> 
>>> Chuck,
>>> 
>>> Ok i will then check to see the command line option to request the DIO
>>> mode for NFS, as you suggested.
>>> 
>>> yes i other wise I fully understand the need of client caching.....for
>>> desktop bound or any general purpose applications... AFS, cacheFS are
>>> all good products in its own right.....but the only problem in such
>>> cases are cache coherence issues...(i mean other application clientss
>>> are not guaranteed to get the latest data,on their read) ..as NFS
>>> honor only open-to-close session semantics.
>>> 
>>> The situation i have is that,
>>> 
>>> we have a data protection product, that has agents on indvidual
>>> servers and a  storage gateway.(which is an NFS mounted box). The only
>>> purpose of this box is to store all data, in a streaming write
>>> mode.....for all the data coming from 10s of agents....essentially
>>> this acts like a VTL target....from this node, to NFS server node,
>>> there is no data travelling in the reverse path (or from the client
>>> cache to the application).
>>> 
>>> THis is the only use we put NFS under....
>>> 
>>> For recovery, its again a streamed read...... we never updating the
>>> read data, or re-reading the updated data....This is special , single
>>> function box.....
>>> 
>>> What do you think the best mount options for this scenario ?
>> 
>> What is the data rate (both IOPS and data throughput) of both the read and write cases?  How large are application read and write ops, on average?  What kind of networking is deployed?  What is the server and clients (hardware and OS)?
>> 
>> And, I assume you are asking because the environment is not performing as you expect.  Can you detail your performance issues?
>> 
>> --
>> Chuck Lever
>> chuck[dot]lever[at]oracle[dot]com
>> 
>> 
>> 
>> 

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Tuning NFS client write pagecache
  2010-08-06 13:26 ` Jim Rees
  2010-08-06 14:05   ` Peter Chacko
@ 2010-08-06 16:29   ` Matthew Hodgson
  2010-08-07  0:25     ` Matthew Hodgson
  1 sibling, 1 reply; 19+ messages in thread
From: Matthew Hodgson @ 2010-08-06 16:29 UTC (permalink / raw)
  To: Jim Rees; +Cc: linux-nfs

Hi,

Jim Rees wrote:
> Matthew Hodgson wrote:
> 
>   Is there any way to tune the linux NFSv3 client to prefer to write
>   data straight to an async-mounted server, rather than having large
>   writes to a file stack up in the local pagecache before being synced
>   on close()?
> 
> It's been a while since I've done this, but I think you can tune this with
> vm.dirty_writeback_centisecs and vm.dirty_background_ratio sysctls.  The
> data will still go through the page cache but you can reduce the amount
> that
> stacks up.

Yup, that does the trick - I'd tried this earlier, but hadn't gone far 
enough - seemingly I need to drop vm.dirty_writeback_centisecs down to 1 
(and vm.dirty_background_ratio to 1) for the back-pressure to propagate 
correctly for this use case.  Thanks for the pointer!

In other news, whilst saturating the ~10Mb/s pipe during the big write 
to the server, I'm seeing huge delays of >10 seconds on trying to do 
trivial operations such as ls'ing small directories.  Is this normal, or 
is there some kind of tunable scheduling on the client to avoid a single 
big transfer wedging the machine?

thanks,

Matthew

-- 
Matthew Hodgson
Development Program Manager
OpenMarket | www.openmarket.com/europe
matthew.hodgson@openmarket.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Tuning NFS client write pagecache
  2010-08-06 16:29   ` Matthew Hodgson
@ 2010-08-07  0:25     ` Matthew Hodgson
  0 siblings, 0 replies; 19+ messages in thread
From: Matthew Hodgson @ 2010-08-07  0:25 UTC (permalink / raw)
  To: Jim Rees; +Cc: linux-nfs

On 06/08/2010 17:29, Matthew Hodgson wrote:
>> Matthew Hodgson wrote:
>>
>> Is there any way to tune the linux NFSv3 client to prefer to write
>> data straight to an async-mounted server, rather than having large
>> writes to a file stack up in the local pagecache before being synced
>> on close()?
>
> In other news, whilst saturating the ~10Mb/s pipe during the big write
> to the server, I'm seeing huge delays of >10 seconds on trying to do
> trivial operations such as ls'ing small directories. Is this normal, or
> is there some kind of tunable scheduling on the client to avoid a single
> big transfer wedging the machine?

Hm, on reading the archives, it seems that this is a fairly common 
complaint when dealing with large sequential workloads - a sideeffect of 
the write pagecache not writing out smoothly.

What is the status of the '[PATCH] improve the performance of large 
sequential write NFS workloads' patchset at 
http://www.spinics.net/lists/linux-nfs/msg11131.html?  It seems that it, 
and its predecessors, are intended to fix precisely this issue.  It 
doesn't seem to have landed in mainline, though, and I can't find any 
mention of it since http://lwn.net/Articles/373868/.

thanks,

Matthew

-- 
Matthew Hodgson
Development Program Manager
OpenMarket | www.openmarket.com/europe
matthew.hodgson@openmarket.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2010-08-11 20:53 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-06 12:21 Tuning NFS client write pagecache Matthew Hodgson
2010-08-06 13:26 ` Jim Rees
2010-08-06 14:05   ` Peter Chacko
2010-08-06 17:37     ` Trond Myklebust
2010-08-06 19:29       ` Peter Chacko
2010-08-06 19:39         ` Trond Myklebust
2010-08-07  3:15           ` Peter Chacko
2010-08-10 16:27             ` Chuck Lever
2010-08-10 17:52               ` Peter Chacko
2010-08-10 18:19                 ` David Brodbeck
2010-08-10 19:16                 ` Chuck Lever
2010-08-10 20:50               ` Gilliam, PaulX J
2010-08-10 21:47                 ` Chuck Lever
2010-08-11  2:09                   ` Peter Chacko
2010-08-11 16:05                     ` Chuck Lever
2010-08-11 17:14                       ` Peter Chacko
2010-08-11 20:51                         ` Chuck Lever
2010-08-06 16:29   ` Matthew Hodgson
2010-08-07  0:25     ` Matthew Hodgson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.