* Tuning NFS client write pagecache @ 2010-08-06 12:21 Matthew Hodgson 2010-08-06 13:26 ` Jim Rees 0 siblings, 1 reply; 19+ messages in thread From: Matthew Hodgson @ 2010-08-06 12:21 UTC (permalink / raw) To: linux-nfs Hi all, Is there any way to tune the linux NFSv3 client to prefer to write data straight to an async-mounted server, rather than having large writes to a file stack up in the local pagecache before being synced on close()? I have an application which (stupidly) expects system calls to return fairly rapidly, otherwise an application-layer timeout occurs. If I write (say) 100MB of data to an NFS share with the app, the write()s return almost immediately as the local pagecache is filled up - but then close() blocks for several minutes as the data is synced to the server over a slowish link. Mounting the share as -o sync fixes this, as does opening the file O_SYNC or O_DIRECT - but ideally I want to generally encourage the client to flush a bit more aggressively to the server without the performance hit of making every write explicitly synchronous. Is there a way to cap the size of pagecache that the NFS client uses? This is currently on a 2.6.18 kernel (Centos 5.5), although I'm more than happy to use something less prehistoric if that's what it takes. M. -- Matthew Hodgson Development Program Manager OpenMarket | www.openmarket.com/europe matthew.hodgson@openmarket.com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Tuning NFS client write pagecache 2010-08-06 12:21 Tuning NFS client write pagecache Matthew Hodgson @ 2010-08-06 13:26 ` Jim Rees 2010-08-06 14:05 ` Peter Chacko 2010-08-06 16:29 ` Matthew Hodgson 0 siblings, 2 replies; 19+ messages in thread From: Jim Rees @ 2010-08-06 13:26 UTC (permalink / raw) To: Matthew Hodgson; +Cc: linux-nfs Matthew Hodgson wrote: Is there any way to tune the linux NFSv3 client to prefer to write data straight to an async-mounted server, rather than having large writes to a file stack up in the local pagecache before being synced on close()? It's been a while since I've done this, but I think you can tune this with vm.dirty_writeback_centisecs and vm.dirty_background_ratio sysctls. The data will still go through the page cache but you can reduce the amount that stacks up. There are other places where the data can get buffered, like the rpc layer, but it won't sit there any longer than it takes for it to go out the wire. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Tuning NFS client write pagecache 2010-08-06 13:26 ` Jim Rees @ 2010-08-06 14:05 ` Peter Chacko 2010-08-06 17:37 ` Trond Myklebust 2010-08-06 16:29 ` Matthew Hodgson 1 sibling, 1 reply; 19+ messages in thread From: Peter Chacko @ 2010-08-06 14:05 UTC (permalink / raw) To: Jim Rees; +Cc: Matthew Hodgson, linux-nfs Some distributed file systems such as IBM's SANFS, support direct IO to the target storage....without going through a cache... ( This feature is useful, for write only work load....say, we are backing up huge data to an NFS share....). I think if not available, we should add a DIO mount option, that tell the VFS not to cache any data, so that close operation will not stall. With the open-to-close , cache coherence protocol of NFS, an aggressive caching client, is a performance downer for many work-loads that is write-mostly. On Fri, Aug 6, 2010 at 2:26 PM, Jim Rees <rees@umich.edu> wrote: > Matthew Hodgson wrote: > > Is there any way to tune the linux NFSv3 client to prefer to write > data straight to an async-mounted server, rather than having large > writes to a file stack up in the local pagecache before being synced > on close()? > > It's been a while since I've done this, but I think you can tune this with > vm.dirty_writeback_centisecs and vm.dirty_background_ratio sysctls. The > data will still go through the page cache but you can reduce the amount that > stacks up. > > There are other places where the data can get buffered, like the rpc layer, > but it won't sit there any longer than it takes for it to go out the wire. > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Tuning NFS client write pagecache 2010-08-06 14:05 ` Peter Chacko @ 2010-08-06 17:37 ` Trond Myklebust 2010-08-06 19:29 ` Peter Chacko 0 siblings, 1 reply; 19+ messages in thread From: Trond Myklebust @ 2010-08-06 17:37 UTC (permalink / raw) To: Peter Chacko; +Cc: Jim Rees, Matthew Hodgson, linux-nfs On Fri, 2010-08-06 at 15:05 +0100, Peter Chacko wrote: > Some distributed file systems such as IBM's SANFS, support direct IO > to the target storage....without going through a cache... ( This > feature is useful, for write only work load....say, we are backing up > huge data to an NFS share....). > > I think if not available, we should add a DIO mount option, that tell > the VFS not to cache any data, so that close operation will not stall. Ugh no! Applications that need direct IO should be using open(O_DIRECT), not relying on hacks like mount options. > With the open-to-close , cache coherence protocol of NFS, an > aggressive caching client, is a performance downer for many work-loads > that is write-mostly. We already have full support for vectored aio/dio in the NFS for those applications that want to use it. Trond > > > > On Fri, Aug 6, 2010 at 2:26 PM, Jim Rees <rees@umich.edu> wrote: > > Matthew Hodgson wrote: > > > > Is there any way to tune the linux NFSv3 client to prefer to write > > data straight to an async-mounted server, rather than having large > > writes to a file stack up in the local pagecache before being synced > > on close()? > > > > It's been a while since I've done this, but I think you can tune this with > > vm.dirty_writeback_centisecs and vm.dirty_background_ratio sysctls. The > > data will still go through the page cache but you can reduce the amount that > > stacks up. > > > > There are other places where the data can get buffered, like the rpc layer, > > but it won't sit there any longer than it takes for it to go out the wire. > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Tuning NFS client write pagecache 2010-08-06 17:37 ` Trond Myklebust @ 2010-08-06 19:29 ` Peter Chacko 2010-08-06 19:39 ` Trond Myklebust 0 siblings, 1 reply; 19+ messages in thread From: Peter Chacko @ 2010-08-06 19:29 UTC (permalink / raw) To: Trond Myklebust; +Cc: Jim Rees, Matthew Hodgson, linux-nfs Imagine a third party backup app for which a customer has no source code. (that doesn't use open system call O_DIRECT mode) backing up millions of files through NFS....How can we do a non-cached IO to the target server ? we cannot use O_DIRECT option here as we don't have the source code....If we have mount option, its works just right ....if we can have read-only mounts, why not have a dio-only mount ? A true application-aware storage systems(in this case NFS client) , which is the next generation storage systems should do, should absorb the application needs that may apply to the whole FS.... i don't say O_DIRECT flag is a bad idea, but it will only work with a regular application that do IO to some files.....this is not the best solution when NFS server is used as the storage for secondary data, where NFS client runs third party applications thats otherwise run best in a local storage as there is no caching issues.... What do you think ? On Fri, Aug 6, 2010 at 11:07 PM, Trond Myklebust <trond.myklebust@fys.uio.no> wrote: > On Fri, 2010-08-06 at 15:05 +0100, Peter Chacko wrote: >> Some distributed file systems such as IBM's SANFS, support direct IO >> to the target storage....without going through a cache... ( This >> feature is useful, for write only work load....say, we are backing up >> huge data to an NFS share....). >> >> I think if not available, we should add a DIO mount option, that tell >> the VFS not to cache any data, so that close operation will not stall. > > Ugh no! Applications that need direct IO should be using open(O_DIRECT), > not relying on hacks like mount options. > >> With the open-to-close , cache coherence protocol of NFS, an >> aggressive caching client, is a performance downer for many work-loads >> that is write-mostly. > > We already have full support for vectored aio/dio in the NFS for those > applications that want to use it. > > Trond > >> >> >> >> On Fri, Aug 6, 2010 at 2:26 PM, Jim Rees <rees@umich.edu> wrote: >> > Matthew Hodgson wrote: >> > >> > Is there any way to tune the linux NFSv3 client to prefer to write >> > data straight to an async-mounted server, rather than having large >> > writes to a file stack up in the local pagecache before being synced >> > on close()? >> > >> > It's been a while since I've done this, but I think you can tune this with >> > vm.dirty_writeback_centisecs and vm.dirty_background_ratio sysctls. The >> > data will still go through the page cache but you can reduce the amount that >> > stacks up. >> > >> > There are other places where the data can get buffered, like the rpc layer, >> > but it won't sit there any longer than it takes for it to go out the wire. >> > -- >> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> > the body of a message to majordomo@vger.kernel.org >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >> > >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Tuning NFS client write pagecache 2010-08-06 19:29 ` Peter Chacko @ 2010-08-06 19:39 ` Trond Myklebust 2010-08-07 3:15 ` Peter Chacko 0 siblings, 1 reply; 19+ messages in thread From: Trond Myklebust @ 2010-08-06 19:39 UTC (permalink / raw) To: Peter Chacko; +Cc: Jim Rees, Matthew Hodgson, linux-nfs On Sat, 2010-08-07 at 00:59 +0530, Peter Chacko wrote: > Imagine a third party backup app for which a customer has no source > code. (that doesn't use open system call O_DIRECT mode) backing up > millions of files through NFS....How can we do a non-cached IO to the > target server ? we cannot use O_DIRECT option here as we don't have > the source code....If we have mount option, its works just right > ....if we can have read-only mounts, why not have a dio-only mount ? > > A true application-aware storage systems(in this case NFS client) , > which is the next generation storage systems should do, should absorb > the application needs that may apply to the whole FS.... > > i don't say O_DIRECT flag is a bad idea, but it will only work with a > regular application that do IO to some files.....this is not the best > solution when NFS server is used as the storage for secondary data, > where NFS client runs third party applications thats otherwise run > best in a local storage as there is no caching issues.... > > What do you think ? I think that we've had O_DIRECT support in the kernel for more than six years now. If there are backup vendors out there that haven't been paying attention, then I'd suggest looking at other vendors. Trond > On Fri, Aug 6, 2010 at 11:07 PM, Trond Myklebust > <trond.myklebust@fys.uio.no> wrote: > > On Fri, 2010-08-06 at 15:05 +0100, Peter Chacko wrote: > >> Some distributed file systems such as IBM's SANFS, support direct IO > >> to the target storage....without going through a cache... ( This > >> feature is useful, for write only work load....say, we are backing up > >> huge data to an NFS share....). > >> > >> I think if not available, we should add a DIO mount option, that tell > >> the VFS not to cache any data, so that close operation will not stall. > > > > Ugh no! Applications that need direct IO should be using open(O_DIRECT), > > not relying on hacks like mount options. > > > >> With the open-to-close , cache coherence protocol of NFS, an > >> aggressive caching client, is a performance downer for many work-loads > >> that is write-mostly. > > > > We already have full support for vectored aio/dio in the NFS for those > > applications that want to use it. > > > > Trond > > > >> > >> > >> > >> On Fri, Aug 6, 2010 at 2:26 PM, Jim Rees <rees@umich.edu> wrote: > >> > Matthew Hodgson wrote: > >> > > >> > Is there any way to tune the linux NFSv3 client to prefer to write > >> > data straight to an async-mounted server, rather than having large > >> > writes to a file stack up in the local pagecache before being synced > >> > on close()? > >> > > >> > It's been a while since I've done this, but I think you can tune this with > >> > vm.dirty_writeback_centisecs and vm.dirty_background_ratio sysctls. The > >> > data will still go through the page cache but you can reduce the amount that > >> > stacks up. > >> > > >> > There are other places where the data can get buffered, like the rpc layer, > >> > but it won't sit there any longer than it takes for it to go out the wire. > >> > -- > >> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > >> > the body of a message to majordomo@vger.kernel.org > >> > More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Tuning NFS client write pagecache 2010-08-06 19:39 ` Trond Myklebust @ 2010-08-07 3:15 ` Peter Chacko 2010-08-10 16:27 ` Chuck Lever 0 siblings, 1 reply; 19+ messages in thread From: Peter Chacko @ 2010-08-07 3:15 UTC (permalink / raw) To: Trond Myklebust; +Cc: Jim Rees, Matthew Hodgson, linux-nfs I think you are not understanding the use case of a file-system wide, non-cached IO for NFS. Imagine a case when a unix shell programmer create a backup script,who doesn't know C programming or system calls....he just wants to use a cp -R sourcedir /targetDir. Where targetDir is an NFS mounted share. How can we use programmatical , per file-session interface to O_DIRECT flag here ? We need a file-system wide direct IO mechanisms ,the best place to have is at the mount time. We cannot tell all sysadmins to go and learn programming....or backup vendors to change their code that they wrote 10 - 12 years ago...... Operating system functionalities should cover a large audience, with different levels of training/skills. I hope you got my point here.... On Sat, Aug 7, 2010 at 1:09 AM, Trond Myklebust <trond.myklebust@fys.uio.no> wrote: > On Sat, 2010-08-07 at 00:59 +0530, Peter Chacko wrote: >> Imagine a third party backup app for which a customer has no source >> code. (that doesn't use open system call O_DIRECT mode) backing up >> millions of files through NFS....How can we do a non-cached IO to the >> target server ? we cannot use O_DIRECT option here as we don't have >> the source code....If we have mount option, its works just right >> ....if we can have read-only mounts, why not have a dio-only mount ? >> >> A true application-Yaware storage systems(in this case NFS client) , >> which is the next generation storage systems should do, should absorb >> the application needs that may apply to the whole FS.... >> >> i don't say O_DIRECT flag is a bad idea, but it will only work with a >> regular application that do IO to some files.....this is not the best >> solution when NFS server is used as the storage for secondary data, >> where NFS client runs third party applications thats otherwise run >> best in a local storage as there is no caching issues.... >> >> What do you think ? > > I think that we've had O_DIRECT support in the kernel for more than six > years now. If there are backup vendors out there that haven't been > paying attention, then I'd suggest looking at other vendors. > > Trond > >> On Fri, Aug 6, 2010 at 11:07 PM, Trond Myklebust >> <trond.myklebust@fys.uio.no> wrote: >> > On Fri, 2010-08-06 at 15:05 +0100, Peter Chacko wrote: >> >> Some distributed file systems such as IBM's SANFS, support direct IO >> >> to the target storage....without going through a cache... ( This >> >> feature is useful, for write only work load....say, we are backing up >> >> huge data to an NFS share....). >> >> >> >> I think if not available, we should add a DIO mount option, that tell >> >> the VFS not to cache any data, so that close operation will not stall. >> > >> > Ugh no! Applications that need direct IO should be using open(O_DIRECT), >> > not relying on hacks like mount options. >> > >> >> With the open-to-close , cache coherence protocol of NFS, an >> >> aggressive caching client, is a performance downer for many work-loads >> >> that is write-mostly. >> > >> > We already have full support for vectored aio/dio in the NFS for those >> > applications that want to use it. >> > >> > Trond >> > >> >> >> >> >> >> >> >> On Fri, Aug 6, 2010 at 2:26 PM, Jim Rees <rees@umich.edu> wrote: >> >> > Matthew Hodgson wrote: >> >> > >> >> > Is there any way to tune the linux NFSv3 client to prefer to write >> >> > data straight to an async-mounted server, rather than having large >> >> > writes to a file stack up in the local pagecache before being synced >> >> > on close()? >> >> > >> >> > It's been a while since I've done this, but I think you can tune this with >> >> > vm.dirty_writeback_centisecs and vm.dirty_background_ratio sysctls. The >> >> > data will still go through the page cache but you can reduce the amount that >> >> > stacks up. >> >> > >> >> > There are other places where the data can get buffered, like the rpc layer, >> >> > but it won't sit there any longer than it takes for it to go out the wire. >> >> > -- >> >> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> >> > the body of a message to majordomo@vger.kernel.org >> >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > >> >> -- >> >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> >> the body of a message to majordomo@vger.kernel.org >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > >> > >> > >> > > > > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Tuning NFS client write pagecache 2010-08-07 3:15 ` Peter Chacko @ 2010-08-10 16:27 ` Chuck Lever 2010-08-10 17:52 ` Peter Chacko 2010-08-10 20:50 ` Gilliam, PaulX J 0 siblings, 2 replies; 19+ messages in thread From: Chuck Lever @ 2010-08-10 16:27 UTC (permalink / raw) To: Peter Chacko; +Cc: Trond Myklebust, Jim Rees, Matthew Hodgson, linux-nfs On Aug 6, 2010, at 9:15 PM, Peter Chacko wrote: > I think you are not understanding the use case of a file-system wide, > non-cached IO for NFS. > > Imagine a case when a unix shell programmer create a backup > script,who doesn't know C programming or system calls....he just wants > to use a cp -R sourcedir /targetDir. Where targetDir is an NFS > mounted share. > > How can we use programmatical , per file-session interface to O_DIRECT > flag here ? > > We need a file-system wide direct IO mechanisms ,the best place to > have is at the mount time. We cannot tell all sysadmins to go and > learn programming....or backup vendors to change their code that they > wrote 10 - 12 years ago...... Operating system functionalities should > cover a large audience, with different levels of training/skills. > > I hope you got my point here.... The reason Linux doesn't support a filesystem wide option is that direct I/O has as much potential to degrade performance as it does to improve it. The performance degradation can affect other applications on the same file system and other clients connected to the same server. So it can be an exceptionally unfriendly thing to do for your neighbors if an application is stupid or malicious. To make direct I/O work well, applications have to use it sparingly and appropriately. They usually maintain their own buffer cache in lieu of the client's generic page cache. Applications like shells and editors depend on an NFS client's local page cache to work well. So, we have chosen to support direct I/O only when each file is opened, not as a file system wide option. This is a much narrower application of this feature, and has a better chance of helping performance in special cases while not destroying it broadly. So far I haven't read anything here that clearly states a requirement we have overlooked in the past. For your "cp" example, the NFS community is looking at ways to reduce the overhead of file copy operations by offloading them to the server. The file data doesn't have to travel over the network to the client. Someone recently said when you leave this kind of choice up to users, they will usually choose exactly the wrong option. This is a clear case where the system and application developers will choose better than users who have no programming skills. > On Sat, Aug 7, 2010 at 1:09 AM, Trond Myklebust > <trond.myklebust@fys.uio.no> wrote: >> On Sat, 2010-08-07 at 00:59 +0530, Peter Chacko wrote: >>> Imagine a third party backup app for which a customer has no source >>> code. (that doesn't use open system call O_DIRECT mode) backing up >>> millions of files through NFS....How can we do a non-cached IO to the >>> target server ? we cannot use O_DIRECT option here as we don't have >>> the source code....If we have mount option, its works just right >>> ....if we can have read-only mounts, why not have a dio-only mount ? >>> >>> A true application-Yaware storage systems(in this case NFS client) , >>> which is the next generation storage systems should do, should absorb >>> the application needs that may apply to the whole FS.... >>> >>> i don't say O_DIRECT flag is a bad idea, but it will only work with a >>> regular application that do IO to some files.....this is not the best >>> solution when NFS server is used as the storage for secondary data, >>> where NFS client runs third party applications thats otherwise run >>> best in a local storage as there is no caching issues.... >>> >>> What do you think ? >> >> I think that we've had O_DIRECT support in the kernel for more than six >> years now. If there are backup vendors out there that haven't been >> paying attention, then I'd suggest looking at other vendors. >> >> Trond >> >>> On Fri, Aug 6, 2010 at 11:07 PM, Trond Myklebust >>> <trond.myklebust@fys.uio.no> wrote: >>>> On Fri, 2010-08-06 at 15:05 +0100, Peter Chacko wrote: >>>>> Some distributed file systems such as IBM's SANFS, support direct IO >>>>> to the target storage....without going through a cache... ( This >>>>> feature is useful, for write only work load....say, we are backing up >>>>> huge data to an NFS share....). >>>>> >>>>> I think if not available, we should add a DIO mount option, that tell >>>>> the VFS not to cache any data, so that close operation will not stall. >>>> >>>> Ugh no! Applications that need direct IO should be using open(O_DIRECT), >>>> not relying on hacks like mount options. >>>> >>>>> With the open-to-close , cache coherence protocol of NFS, an >>>>> aggressive caching client, is a performance downer for many work-loads >>>>> that is write-mostly. >>>> >>>> We already have full support for vectored aio/dio in the NFS for those >>>> applications that want to use it. >>>> >>>> Trond >>>> >>>>> >>>>> >>>>> >>>>> On Fri, Aug 6, 2010 at 2:26 PM, Jim Rees <rees@umich.edu> wrote: >>>>>> Matthew Hodgson wrote: >>>>>> >>>>>> Is there any way to tune the linux NFSv3 client to prefer to write >>>>>> data straight to an async-mounted server, rather than having large >>>>>> writes to a file stack up in the local pagecache before being synced >>>>>> on close()? >>>>>> >>>>>> It's been a while since I've done this, but I think you can tune this with >>>>>> vm.dirty_writeback_centisecs and vm.dirty_background_ratio sysctls. The >>>>>> data will still go through the page cache but you can reduce the amount that >>>>>> stacks up. >>>>>> >>>>>> There are other places where the data can get buffered, like the rpc layer, >>>>>> but it won't sit there any longer than it takes for it to go out the wire. >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >>>> >>>> >> >> >> >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Tuning NFS client write pagecache 2010-08-10 16:27 ` Chuck Lever @ 2010-08-10 17:52 ` Peter Chacko 2010-08-10 18:19 ` David Brodbeck 2010-08-10 19:16 ` Chuck Lever 2010-08-10 20:50 ` Gilliam, PaulX J 1 sibling, 2 replies; 19+ messages in thread From: Peter Chacko @ 2010-08-10 17:52 UTC (permalink / raw) To: Chuck Lever; +Cc: Trond Myklebust, Jim Rees, Matthew Hodgson, linux-nfs Dear chuck, Yes, if we perform a bulk cp operations, data need not go through network, if both source and destination are on the NFS...if thats not the case, we have to move data across network... Most of the time, NFS (or NAS for that matter) best serve the enterprise as a D2D backup destination. Either backup server is NFS or media server is NFS client. Its very beneficial if NFS can start its business in DIO mode.....so that backup admins can just write simple scripts to move terabytes of data ...without buying any exotic backup software.... And caching itself is not useful for any streaming datapath.(Be it NFS cache,or memory cache or cpu cache or even a web cache).. backup is write-only operation, for all file objects... if application needs, we should have a mechanism to mount NFS client FS, without enabling client caching... See veritas VxFS avoids disk caching for Databases, through QuickIO option.....We should have a similar mechanisms for NFS.... Whats your thoughts ? what are the architectural/design level issues we will encounter, if we bring this feature to NFS? Is there any patch available for this ? How does V4 fare here ? On Tue, Aug 10, 2010 at 9:57 PM, Chuck Lever <chuck.lever@oracle.com> wrote: > > On Aug 6, 2010, at 9:15 PM, Peter Chacko wrote: > >> I think you are not understanding the use case of a file-system wide, >> non-cached IO for NFS. >> >> Imagine a case when a unix shell programmer create a backup >> script,who doesn't know C programming or system calls....he just wants >> to use a cp -R sourcedir /targetDir. Where targetDir is an NFS >> mounted share. >> >> How can we use programmatical , per file-session interface to O_DIRECT >> flag here ? >> >> We need a file-system wide direct IO mechanisms ,the best place to >> have is at the mount time. We cannot tell all sysadmins to go and >> learn programming....or backup vendors to change their code that they >> wrote 10 - 12 years ago...... Operating system functionalities should >> cover a large audience, with different levels of training/skills. >> >> I hope you got my point here.... > > The reason Linux doesn't support a filesystem wide option is that direct I/O has as much potential to degrade performance as it does to improve it. The performance degradation can affect other applications on the same file system and other clients connected to the same server. So it can be an exceptionally unfriendly thing to do for your neighbors if an application is stupid or malicious. > > To make direct I/O work well, applications have to use it sparingly and appropriately. They usually maintain their own buffer cache in lieu of the client's generic page cache. Applications like shells and editors depend on an NFS client's local page cache to work well. > > So, we have chosen to support direct I/O only when each file is opened, not as a file system wide option. This is a much narrower application of this feature, and has a better chance of helping performance in special cases while not destroying it broadly. > > So far I haven't read anything here that clearly states a requirement we have overlooked in the past. > > For your "cp" example, the NFS community is looking at ways to reduce the overhead of file copy operations by offloading them to the server. The file data doesn't have to travel over the network to the client. Someone recently said when you leave this kind of choice up to users, they will usually choose exactly the wrong option. This is a clear case where the system and application developers will choose better than users who have no programming skills. > > >> On Sat, Aug 7, 2010 at 1:09 AM, Trond Myklebust >> <trond.myklebust@fys.uio.no> wrote: >>> On Sat, 2010-08-07 at 00:59 +0530, Peter Chacko wrote: >>>> Imagine a third party backup app for which a customer has no source >>>> code. (that doesn't use open system call O_DIRECT mode) backing up >>>> millions of files through NFS....How can we do a non-cached IO to the >>>> target server ? we cannot use O_DIRECT option here as we don't have >>>> the source code....If we have mount option, its works just right >>>> ....if we can have read-only mounts, why not have a dio-only mount ? >>>> >>>> A true application-Yaware storage systems(in this case NFS client) , >>>> which is the next generation storage systems should do, should absorb >>>> the application needs that may apply to the whole FS.... >>>> >>>> i don't say O_DIRECT flag is a bad idea, but it will only work with a >>>> regular application that do IO to some files.....this is not the best >>>> solution when NFS server is used as the storage for secondary data, >>>> where NFS client runs third party applications thats otherwise run >>>> best in a local storage as there is no caching issues.... >>>> >>>> What do you think ? >>> >>> I think that we've had O_DIRECT support in the kernel for more than six >>> years now. If there are backup vendors out there that haven't been >>> paying attention, then I'd suggest looking at other vendors. >>> >>> Trond >>> >>>> On Fri, Aug 6, 2010 at 11:07 PM, Trond Myklebust >>>> <trond.myklebust@fys.uio.no> wrote: >>>>> On Fri, 2010-08-06 at 15:05 +0100, Peter Chacko wrote: >>>>>> Some distributed file systems such as IBM's SANFS, support direct IO >>>>>> to the target storage....without going through a cache... ( This >>>>>> feature is useful, for write only work load....say, we are backing up >>>>>> huge data to an NFS share....). >>>>>> >>>>>> I think if not available, we should add a DIO mount option, that tell >>>>>> the VFS not to cache any data, so that close operation will not stall. >>>>> >>>>> Ugh no! Applications that need direct IO should be using open(O_DIRECT), >>>>> not relying on hacks like mount options. >>>>> >>>>>> With the open-to-close , cache coherence protocol of NFS, an >>>>>> aggressive caching client, is a performance downer for many work-loads >>>>>> that is write-mostly. >>>>> >>>>> We already have full support for vectored aio/dio in the NFS for those >>>>> applications that want to use it. >>>>> >>>>> Trond >>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Aug 6, 2010 at 2:26 PM, Jim Rees <rees@umich.edu> wrote: >>>>>>> Matthew Hodgson wrote: >>>>>>> >>>>>>> Is there any way to tune the linux NFSv3 client to prefer to write >>>>>>> data straight to an async-mounted server, rather than having large >>>>>>> writes to a file stack up in the local pagecache before being synced >>>>>>> on close()? >>>>>>> >>>>>>> It's been a while since I've done this, but I think you can tune this with >>>>>>> vm.dirty_writeback_centisecs and vm.dirty_background_ratio sysctls. The >>>>>>> data will still go through the page cache but you can reduce the amount that >>>>>>> stacks up. >>>>>>> >>>>>>> There are other places where the data can get buffered, like the rpc layer, >>>>>>> but it won't sit there any longer than it takes for it to go out the wire. >>>>>>> -- >>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>> >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>>>> >>>>> >>>>> >>> >>> >>> >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > Chuck Lever > chuck[dot]lever[at]oracle[dot]com > > > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Tuning NFS client write pagecache 2010-08-10 17:52 ` Peter Chacko @ 2010-08-10 18:19 ` David Brodbeck 2010-08-10 19:16 ` Chuck Lever 1 sibling, 0 replies; 19+ messages in thread From: David Brodbeck @ 2010-08-10 18:19 UTC (permalink / raw) To: linux-nfs On Aug 10, 2010, at 10:52 AM, Peter Chacko wrote: > And caching itself is not useful for any streaming datapath.(Be it > NFS cache,or memory cache or cpu cache or even a web cache).. backup > is write-only operation, for all file objects... It seems to me this is only true if you're talking strictly about full backups. Any kind of incremental or differential backup is likely to do a significant amount of reading in order to determine what files need to be sent, unless it's storing data about each backup locally. rsync, for example, needs to read from both filesystems to build up its file list. -- David Brodbeck System Administrator, Linguistics University of Washington ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Tuning NFS client write pagecache 2010-08-10 17:52 ` Peter Chacko 2010-08-10 18:19 ` David Brodbeck @ 2010-08-10 19:16 ` Chuck Lever 1 sibling, 0 replies; 19+ messages in thread From: Chuck Lever @ 2010-08-10 19:16 UTC (permalink / raw) To: Peter Chacko; +Cc: Trond Myklebust, Jim Rees, Matthew Hodgson, linux-nfs On Aug 10, 2010, at 11:52 AM, Peter Chacko wrote: > Dear chuck, > > Yes, if we perform a bulk cp operations, data need not go through > network, if both source and destination are on the NFS...if thats not > the case, we have to move data across network... > > Most of the time, NFS (or NAS for that matter) best serve the > enterprise as a D2D backup destination. Either backup server is NFS or > media server is NFS client. > > Its very beneficial if NFS can start its business in DIO mode.....so > that backup admins can just write simple scripts to move terabytes of > data ...without buying any exotic backup software.... I believe there is a command line flag on the common utilities to operate in direct I/O mode. I'm not in front of Linux right now, so I can't check if this is still true. If that's the case, it would be simple to modify scripts to specify that flag when doing data copies. > And caching itself is not useful for any streaming datapath.(Be it > NFS cache,or memory cache or cpu cache or even a web cache).. backup > is write-only operation, for all file objects... No one is suggesting otherwise. Our user space file system interfaces allow plenty of flexibility here. You can specify O_DIRECT or use madvise_foo(3) or fadvise_foo(3) to make the kernel behave as needed. The problem here is there really is no good way to get the kernel to guess what an application needs. It will almost always guess wrong in some important cases. > if application needs, we should have a mechanism to mount NFS client > FS, without enabling client caching... We have a mechanism for disabling caching on a per-file basis. This is fine-grained control. I've never found a compelling reason to enable it at once across a whole file system, yet there are good reasons not to allow such a thing, and focus only on individual files and applications. > See veritas VxFS avoids disk caching for Databases, through QuickIO > option.....We should have a similar mechanisms for NFS.... Database scalability is exactly why I wrote the Linux NFS client's O_DIRECT support. > Whats your thoughts ? what are the architectural/design level issues > we will encounter, > if we bring this feature to NFS? Is there any patch available for this ? Support for uncached I/O has been in the Linux NFS client since RHAS 2.1, and available upstream since roughly 2.4.20 (yes, 2.4, not 2.6). > How does V4 fare here ? NFSv4 supports direct I/O just like the other versions of the protocol. Direct I/O is version agnostic. > > On Tue, Aug 10, 2010 at 9:57 PM, Chuck Lever <chuck.lever@oracle.com> wrote: >> >> On Aug 6, 2010, at 9:15 PM, Peter Chacko wrote: >> >>> I think you are not understanding the use case of a file-system wide, >>> non-cached IO for NFS. >>> >>> Imagine a case when a unix shell programmer create a backup >>> script,who doesn't know C programming or system calls....he just wants >>> to use a cp -R sourcedir /targetDir. Where targetDir is an NFS >>> mounted share. >>> >>> How can we use programmatical , per file-session interface to O_DIRECT >>> flag here ? >>> >>> We need a file-system wide direct IO mechanisms ,the best place to >>> have is at the mount time. We cannot tell all sysadmins to go and >>> learn programming....or backup vendors to change their code that they >>> wrote 10 - 12 years ago...... Operating system functionalities should >>> cover a large audience, with different levels of training/skills. >>> >>> I hope you got my point here.... >> >> The reason Linux doesn't support a filesystem wide option is that direct I/O has as much potential to degrade performance as it does to improve it. The performance degradation can affect other applications on the same file system and other clients connected to the same server. So it can be an exceptionally unfriendly thing to do for your neighbors if an application is stupid or malicious. >> >> To make direct I/O work well, applications have to use it sparingly and appropriately. They usually maintain their own buffer cache in lieu of the client's generic page cache. Applications like shells and editors depend on an NFS client's local page cache to work well. >> >> So, we have chosen to support direct I/O only when each file is opened, not as a file system wide option. This is a much narrower application of this feature, and has a better chance of helping performance in special cases while not destroying it broadly. >> >> So far I haven't read anything here that clearly states a requirement we have overlooked in the past. >> >> For your "cp" example, the NFS community is looking at ways to reduce the overhead of file copy operations by offloading them to the server. The file data doesn't have to travel over the network to the client. Someone recently said when you leave this kind of choice up to users, they will usually choose exactly the wrong option. This is a clear case where the system and application developers will choose better than users who have no programming skills. >> >> >>> On Sat, Aug 7, 2010 at 1:09 AM, Trond Myklebust >>> <trond.myklebust@fys.uio.no> wrote: >>>> On Sat, 2010-08-07 at 00:59 +0530, Peter Chacko wrote: >>>>> Imagine a third party backup app for which a customer has no source >>>>> code. (that doesn't use open system call O_DIRECT mode) backing up >>>>> millions of files through NFS....How can we do a non-cached IO to the >>>>> target server ? we cannot use O_DIRECT option here as we don't have >>>>> the source code....If we have mount option, its works just right >>>>> ....if we can have read-only mounts, why not have a dio-only mount ? >>>>> >>>>> A true application-Yaware storage systems(in this case NFS client) , >>>>> which is the next generation storage systems should do, should absorb >>>>> the application needs that may apply to the whole FS.... >>>>> >>>>> i don't say O_DIRECT flag is a bad idea, but it will only work with a >>>>> regular application that do IO to some files.....this is not the best >>>>> solution when NFS server is used as the storage for secondary data, >>>>> where NFS client runs third party applications thats otherwise run >>>>> best in a local storage as there is no caching issues.... >>>>> >>>>> What do you think ? >>>> >>>> I think that we've had O_DIRECT support in the kernel for more than six >>>> years now. If there are backup vendors out there that haven't been >>>> paying attention, then I'd suggest looking at other vendors. >>>> >>>> Trond >>>> >>>>> On Fri, Aug 6, 2010 at 11:07 PM, Trond Myklebust >>>>> <trond.myklebust@fys.uio.no> wrote: >>>>>> On Fri, 2010-08-06 at 15:05 +0100, Peter Chacko wrote: >>>>>>> Some distributed file systems such as IBM's SANFS, support direct IO >>>>>>> to the target storage....without going through a cache... ( This >>>>>>> feature is useful, for write only work load....say, we are backing up >>>>>>> huge data to an NFS share....). >>>>>>> >>>>>>> I think if not available, we should add a DIO mount option, that tell >>>>>>> the VFS not to cache any data, so that close operation will not stall. >>>>>> >>>>>> Ugh no! Applications that need direct IO should be using open(O_DIRECT), >>>>>> not relying on hacks like mount options. >>>>>> >>>>>>> With the open-to-close , cache coherence protocol of NFS, an >>>>>>> aggressive caching client, is a performance downer for many work-loads >>>>>>> that is write-mostly. >>>>>> >>>>>> We already have full support for vectored aio/dio in the NFS for those >>>>>> applications that want to use it. >>>>>> >>>>>> Trond >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Aug 6, 2010 at 2:26 PM, Jim Rees <rees@umich.edu> wrote: >>>>>>>> Matthew Hodgson wrote: >>>>>>>> >>>>>>>> Is there any way to tune the linux NFSv3 client to prefer to write >>>>>>>> data straight to an async-mounted server, rather than having large >>>>>>>> writes to a file stack up in the local pagecache before being synced >>>>>>>> on close()? >>>>>>>> >>>>>>>> It's been a while since I've done this, but I think you can tune this with >>>>>>>> vm.dirty_writeback_centisecs and vm.dirty_background_ratio sysctls. The >>>>>>>> data will still go through the page cache but you can reduce the amount that >>>>>>>> stacks up. >>>>>>>> >>>>>>>> There are other places where the data can get buffered, like the rpc layer, >>>>>>>> but it won't sit there any longer than it takes for it to go out the wire. >>>>>>>> -- >>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>> >>>>>>> -- >>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>>> >>>>>> >>>>>> >>>> >>>> >>>> >>>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- >> Chuck Lever >> chuck[dot]lever[at]oracle[dot]com >> >> >> >> -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 19+ messages in thread
* RE: Tuning NFS client write pagecache 2010-08-10 16:27 ` Chuck Lever 2010-08-10 17:52 ` Peter Chacko @ 2010-08-10 20:50 ` Gilliam, PaulX J 2010-08-10 21:47 ` Chuck Lever 1 sibling, 1 reply; 19+ messages in thread From: Gilliam, PaulX J @ 2010-08-10 20:50 UTC (permalink / raw) To: Chuck Lever, Peter Chacko Cc: Trond Myklebust, Jim Rees, Matthew Hodgson, linux-nfs@vger.kernel.org >-----Original Message----- >From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs- >owner@vger.kernel.org] On Behalf Of Chuck Lever >Sent: Tuesday, August 10, 2010 9:27 AM >To: Peter Chacko >Cc: Trond Myklebust; Jim Rees; Matthew Hodgson; linux-nfs@vger.kernel.org >Subject: Re: Tuning NFS client write pagecache > > >On Aug 6, 2010, at 9:15 PM, Peter Chacko wrote: > >> I think you are not understanding the use case of a file-system wide, >> non-cached IO for NFS. >> >> Imagine a case when a unix shell programmer create a backup >> script,who doesn't know C programming or system calls....he just wants >> to use a cp -R sourcedir /targetDir. Where targetDir is an NFS >> mounted share. >> >> How can we use programmatical , per file-session interface to O_DIRECT >> flag here ? >> >> We need a file-system wide direct IO mechanisms ,the best place to >> have is at the mount time. We cannot tell all sysadmins to go and >> learn programming....or backup vendors to change their code that they >> wrote 10 - 12 years ago...... Operating system functionalities should >> cover a large audience, with different levels of training/skills. >> >> I hope you got my point here.... > >The reason Linux doesn't support a filesystem wide option is that direct >I/O has as much potential to degrade performance as it does to improve it. >The performance degradation can affect other applications on the same file >system and other clients connected to the same server. So it can be an >exceptionally unfriendly thing to do for your neighbors if an application >is stupid or malicious. Please forgive my ignorance, but could you give a example or two? I can understand how direct I/O can degrade the performance of the application that is using it. But I can't see how other applications' performance would be affected. Unless maybe it would increase the network traffic due to the lack of write consolidation. I can see that: many small writes instead of one larger one. I don't need details, just a couple of sketchy examples so I can visualize what you are referring to. Thanks for increasing my understanding, -=# Paul Gilliam #=- >To make direct I/O work well, applications have to use it sparingly and >appropriately. They usually maintain their own buffer cache in lieu of the >client's generic page cache. Applications like shells and editors depend >on an NFS client's local page cache to work well. > >So, we have chosen to support direct I/O only when each file is opened, not >as a file system wide option. This is a much narrower application of this >feature, and has a better chance of helping performance in special cases >while not destroying it broadly. > >So far I haven't read anything here that clearly states a requirement we >have overlooked in the past. > >For your "cp" example, the NFS community is looking at ways to reduce the >overhead of file copy operations by offloading them to the server. The >file data doesn't have to travel over the network to the client. Someone >recently said when you leave this kind of choice up to users, they will >usually choose exactly the wrong option. This is a clear case where the >system and application developers will choose better than users who have no >programming skills. > > >> On Sat, Aug 7, 2010 at 1:09 AM, Trond Myklebust >> <trond.myklebust@fys.uio.no> wrote: >>> On Sat, 2010-08-07 at 00:59 +0530, Peter Chacko wrote: >>>> Imagine a third party backup app for which a customer has no source >>>> code. (that doesn't use open system call O_DIRECT mode) backing up >>>> millions of files through NFS....How can we do a non-cached IO to the >>>> target server ? we cannot use O_DIRECT option here as we don't have >>>> the source code....If we have mount option, its works just right >>>> ....if we can have read-only mounts, why not have a dio-only mount ? >>>> >>>> A true application-Yaware storage systems(in this case NFS client) , >>>> which is the next generation storage systems should do, should absorb >>>> the application needs that may apply to the whole FS.... >>>> >>>> i don't say O_DIRECT flag is a bad idea, but it will only work with a >>>> regular application that do IO to some files.....this is not the best >>>> solution when NFS server is used as the storage for secondary data, >>>> where NFS client runs third party applications thats otherwise run >>>> best in a local storage as there is no caching issues.... >>>> >>>> What do you think ? >>> >>> I think that we've had O_DIRECT support in the kernel for more than six >>> years now. If there are backup vendors out there that haven't been >>> paying attention, then I'd suggest looking at other vendors. >>> >>> Trond >>> >>>> On Fri, Aug 6, 2010 at 11:07 PM, Trond Myklebust >>>> <trond.myklebust@fys.uio.no> wrote: >>>>> On Fri, 2010-08-06 at 15:05 +0100, Peter Chacko wrote: >>>>>> Some distributed file systems such as IBM's SANFS, support direct IO >>>>>> to the target storage....without going through a cache... ( This >>>>>> feature is useful, for write only work load....say, we are backing up >>>>>> huge data to an NFS share....). >>>>>> >>>>>> I think if not available, we should add a DIO mount option, that tell >>>>>> the VFS not to cache any data, so that close operation will not >stall. >>>>> >>>>> Ugh no! Applications that need direct IO should be using >open(O_DIRECT), >>>>> not relying on hacks like mount options. >>>>> >>>>>> With the open-to-close , cache coherence protocol of NFS, an >>>>>> aggressive caching client, is a performance downer for many work- >loads >>>>>> that is write-mostly. >>>>> >>>>> We already have full support for vectored aio/dio in the NFS for those >>>>> applications that want to use it. >>>>> >>>>> Trond >>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Aug 6, 2010 at 2:26 PM, Jim Rees <rees@umich.edu> wrote: >>>>>>> Matthew Hodgson wrote: >>>>>>> >>>>>>> Is there any way to tune the linux NFSv3 client to prefer to write >>>>>>> data straight to an async-mounted server, rather than having large >>>>>>> writes to a file stack up in the local pagecache before being >synced >>>>>>> on close()? >>>>>>> >>>>>>> It's been a while since I've done this, but I think you can tune >this with >>>>>>> vm.dirty_writeback_centisecs and vm.dirty_background_ratio sysctls. >The >>>>>>> data will still go through the page cache but you can reduce the >amount that >>>>>>> stacks up. >>>>>>> >>>>>>> There are other places where the data can get buffered, like the rpc >layer, >>>>>>> but it won't sit there any longer than it takes for it to go out the >wire. >>>>>>> -- >>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" >in >>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>> >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" >in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>>>> >>>>> >>>>> >>> >>> >>> >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >-- >Chuck Lever >chuck[dot]lever[at]oracle[dot]com > > > >-- >To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Tuning NFS client write pagecache 2010-08-10 20:50 ` Gilliam, PaulX J @ 2010-08-10 21:47 ` Chuck Lever 2010-08-11 2:09 ` Peter Chacko 0 siblings, 1 reply; 19+ messages in thread From: Chuck Lever @ 2010-08-10 21:47 UTC (permalink / raw) To: Gilliam, PaulX J Cc: Peter Chacko, Trond Myklebust, Jim Rees, Matthew Hodgson, linux-nfs@vger.kernel.org On Aug 10, 2010, at 2:50 PM, Gilliam, PaulX J wrote: > > >> -----Original Message----- >> From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs- >> owner@vger.kernel.org] On Behalf Of Chuck Lever >> Sent: Tuesday, August 10, 2010 9:27 AM >> To: Peter Chacko >> Cc: Trond Myklebust; Jim Rees; Matthew Hodgson; linux-nfs@vger.kernel.org >> Subject: Re: Tuning NFS client write pagecache >> >> >> On Aug 6, 2010, at 9:15 PM, Peter Chacko wrote: >> >>> I think you are not understanding the use case of a file-system wide, >>> non-cached IO for NFS. >>> >>> Imagine a case when a unix shell programmer create a backup >>> script,who doesn't know C programming or system calls....he just wants >>> to use a cp -R sourcedir /targetDir. Where targetDir is an NFS >>> mounted share. >>> >>> How can we use programmatical , per file-session interface to O_DIRECT >>> flag here ? >>> >>> We need a file-system wide direct IO mechanisms ,the best place to >>> have is at the mount time. We cannot tell all sysadmins to go and >>> learn programming....or backup vendors to change their code that they >>> wrote 10 - 12 years ago...... Operating system functionalities should >>> cover a large audience, with different levels of training/skills. >>> >>> I hope you got my point here.... >> >> The reason Linux doesn't support a filesystem wide option is that direct >> I/O has as much potential to degrade performance as it does to improve it. >> The performance degradation can affect other applications on the same file >> system and other clients connected to the same server. So it can be an >> exceptionally unfriendly thing to do for your neighbors if an application >> is stupid or malicious. > > Please forgive my ignorance, but could you give a example or two? I can understand how direct I/O can degrade the performance of the application that is using it. But I can't see how other applications' performance would be affected. Unless maybe it would increase the network traffic due to the lack of write consolidation. I can see that: many small writes instead of one larger one. Most typical desktop applications perform small writes, a lot of rereads of the same data, and depend on read-ahead for good performance. Application developers assume a local data cache in order to keep their programs simple. To get good performance, even on local file systems, their applications would have to maintain their own data cache (in fact, that is what direct I/O-enabled applications do already). Having no data cache on the NFS client means that all of this I/O would be exposed to the network and the NFS server. That's an opportunity cost paid for by all other users of the network and NFS server. Exposing that excess I/O activity will have a broad effect on the amount of I/O the system as whole (clients, network, server) can perform. If you have one NFS client running just a few apps, you may not notice the difference (unless you have a low bandwidth network). But NFS pretty much requires good client-side caching to scale in the number of clients and amount of I/O. > I don't need details, just a couple of sketchy examples so I can visualize what you are referring to. > > Thanks for increasing my understanding, > > -=# Paul Gilliam #=- > > >> To make direct I/O work well, applications have to use it sparingly and >> appropriately. They usually maintain their own buffer cache in lieu of the >> client's generic page cache. Applications like shells and editors depend >> on an NFS client's local page cache to work well. >> >> So, we have chosen to support direct I/O only when each file is opened, not >> as a file system wide option. This is a much narrower application of this >> feature, and has a better chance of helping performance in special cases >> while not destroying it broadly. >> >> So far I haven't read anything here that clearly states a requirement we >> have overlooked in the past. >> >> For your "cp" example, the NFS community is looking at ways to reduce the >> overhead of file copy operations by offloading them to the server. The >> file data doesn't have to travel over the network to the client. Someone >> recently said when you leave this kind of choice up to users, they will >> usually choose exactly the wrong option. This is a clear case where the >> system and application developers will choose better than users who have no >> programming skills. >> >> >>> On Sat, Aug 7, 2010 at 1:09 AM, Trond Myklebust >>> <trond.myklebust@fys.uio.no> wrote: >>>> On Sat, 2010-08-07 at 00:59 +0530, Peter Chacko wrote: >>>>> Imagine a third party backup app for which a customer has no source >>>>> code. (that doesn't use open system call O_DIRECT mode) backing up >>>>> millions of files through NFS....How can we do a non-cached IO to the >>>>> target server ? we cannot use O_DIRECT option here as we don't have >>>>> the source code....If we have mount option, its works just right >>>>> ....if we can have read-only mounts, why not have a dio-only mount ? >>>>> >>>>> A true application-Yaware storage systems(in this case NFS client) , >>>>> which is the next generation storage systems should do, should absorb >>>>> the application needs that may apply to the whole FS.... >>>>> >>>>> i don't say O_DIRECT flag is a bad idea, but it will only work with a >>>>> regular application that do IO to some files.....this is not the best >>>>> solution when NFS server is used as the storage for secondary data, >>>>> where NFS client runs third party applications thats otherwise run >>>>> best in a local storage as there is no caching issues.... >>>>> >>>>> What do you think ? >>>> >>>> I think that we've had O_DIRECT support in the kernel for more than six >>>> years now. If there are backup vendors out there that haven't been >>>> paying attention, then I'd suggest looking at other vendors. >>>> >>>> Trond >>>> >>>>> On Fri, Aug 6, 2010 at 11:07 PM, Trond Myklebust >>>>> <trond.myklebust@fys.uio.no> wrote: >>>>>> On Fri, 2010-08-06 at 15:05 +0100, Peter Chacko wrote: >>>>>>> Some distributed file systems such as IBM's SANFS, support direct IO >>>>>>> to the target storage....without going through a cache... ( This >>>>>>> feature is useful, for write only work load....say, we are backing up >>>>>>> huge data to an NFS share....). >>>>>>> >>>>>>> I think if not available, we should add a DIO mount option, that tell >>>>>>> the VFS not to cache any data, so that close operation will not >> stall. >>>>>> >>>>>> Ugh no! Applications that need direct IO should be using >> open(O_DIRECT), >>>>>> not relying on hacks like mount options. >>>>>> >>>>>>> With the open-to-close , cache coherence protocol of NFS, an >>>>>>> aggressive caching client, is a performance downer for many work- >> loads >>>>>>> that is write-mostly. >>>>>> >>>>>> We already have full support for vectored aio/dio in the NFS for those >>>>>> applications that want to use it. >>>>>> >>>>>> Trond >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Aug 6, 2010 at 2:26 PM, Jim Rees <rees@umich.edu> wrote: >>>>>>>> Matthew Hodgson wrote: >>>>>>>> >>>>>>>> Is there any way to tune the linux NFSv3 client to prefer to write >>>>>>>> data straight to an async-mounted server, rather than having large >>>>>>>> writes to a file stack up in the local pagecache before being >> synced >>>>>>>> on close()? >>>>>>>> >>>>>>>> It's been a while since I've done this, but I think you can tune >> this with >>>>>>>> vm.dirty_writeback_centisecs and vm.dirty_background_ratio sysctls. >> The >>>>>>>> data will still go through the page cache but you can reduce the >> amount that >>>>>>>> stacks up. >>>>>>>> >>>>>>>> There are other places where the data can get buffered, like the rpc >> layer, >>>>>>>> but it won't sit there any longer than it takes for it to go out the >> wire. >>>>>>>> -- >>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" >> in >>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>> >>>>>>> -- >>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" >> in >>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>>> >>>>>> >>>>>> >>>> >>>> >>>> >>>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- >> Chuck Lever >> chuck[dot]lever[at]oracle[dot]com >> >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Tuning NFS client write pagecache 2010-08-10 21:47 ` Chuck Lever @ 2010-08-11 2:09 ` Peter Chacko 2010-08-11 16:05 ` Chuck Lever 0 siblings, 1 reply; 19+ messages in thread From: Peter Chacko @ 2010-08-11 2:09 UTC (permalink / raw) To: Chuck Lever Cc: Gilliam, PaulX J, Trond Myklebust, Jim Rees, Matthew Hodgson, linux-nfs@vger.kernel.org thanks GILLIiam for your message and Chuck for your detailed explanation...out of your long term work with NFS. Gilliam, most incremental backup systems use hashes/checksums to determine the new data(deltas) not by reading all the data from the server(or that data they wrote to the chache) but from a local database that the backup agent keeps....rsync only requires fixed length block checksums from the server, and it computes a rolling checksums(weak and strong) on the clients and detect duplication....it also doesn'r re-read the data at NFS level. Chuck, Ok i will then check to see the command line option to request the DIO mode for NFS, as you suggested. yes i other wise I fully understand the need of client caching.....for desktop bound or any general purpose applications... AFS, cacheFS are all good products in its own right.....but the only problem in such cases are cache coherence issues...(i mean other application clientss are not guaranteed to get the latest data,on their read) ..as NFS honor only open-to-close session semantics. The situation i have is that, we have a data protection product, that has agents on indvidual servers and a storage gateway.(which is an NFS mounted box). The only purpose of this box is to store all data, in a streaming write mode.....for all the data coming from 10s of agents....essentially this acts like a VTL target....from this node, to NFS server node, there is no data travelling in the reverse path (or from the client cache to the application). THis is the only use we put NFS under.... For recovery, its again a streamed read...... we never updating the read data, or re-reading the updated data....This is special , single function box..... What do you think the best mount options for this scenario ? I greatly appreciate y your time explaining .. Thanks peter. On Wed, Aug 11, 2010 at 3:17 AM, Chuck Lever <chuck.lever@oracle.com> wrote: > > On Aug 10, 2010, at 2:50 PM, Gilliam, PaulX J wrote: > >> >> >>> -----Original Message----- >>> From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs- >>> owner@vger.kernel.org] On Behalf Of Chuck Lever >>> Sent: Tuesday, August 10, 2010 9:27 AM >>> To: Peter Chacko >>> Cc: Trond Myklebust; Jim Rees; Matthew Hodgson; linux-nfs@vger.kernel.org >>> Subject: Re: Tuning NFS client write pagecache >>> >>> >>> On Aug 6, 2010, at 9:15 PM, Peter Chacko wrote: >>> >>>> I think you are not understanding the use case of a file-system wide, >>>> non-cached IO for NFS. >>>> >>>> Imagine a case when a unix shell programmer create a backup >>>> script,who doesn't know C programming or system calls....he just wants >>>> to use a cp -R sourcedir /targetDir. Where targetDir is an NFS >>>> mounted share. >>>> >>>> How can we use programmatical , per file-session interface to O_DIRECT >>>> flag here ? >>>> >>>> We need a file-system wide direct IO mechanisms ,the best place to >>>> have is at the mount time. We cannot tell all sysadmins to go and >>>> learn programming....or backup vendors to change their code that they >>>> wrote 10 - 12 years ago...... Operating system functionalities should >>>> cover a large audience, with different levels of training/skills. >>>> >>>> I hope you got my point here.... >>> >>> The reason Linux doesn't support a filesystem wide option is that direct >>> I/O has as much potential to degrade performance as it does to improve it. >>> The performance degradation can affect other applications on the same file >>> system and other clients connected to the same server. So it can be an >>> exceptionally unfriendly thing to do for your neighbors if an application >>> is stupid or malicious. >> >> Please forgive my ignorance, but could you give a example or two? I can understand how direct I/O can degrade the performance of the application that is using it. But I can't see how other applications' performance would be affected. Unless maybe it would increase the network traffic due to the lack of write consolidation. I can see that: many small writes instead of one larger one. > > Most typical desktop applications perform small writes, a lot of rereads of the same data, and depend on read-ahead for good performance. Application developers assume a local data cache in order to keep their programs simple. To get good performance, even on local file systems, their applications would have to maintain their own data cache (in fact, that is what direct I/O-enabled applications do already). > > Having no data cache on the NFS client means that all of this I/O would be exposed to the network and the NFS server. That's an opportunity cost paid for by all other users of the network and NFS server. Exposing that excess I/O activity will have a broad effect on the amount of I/O the system as whole (clients, network, server) can perform. > > If you have one NFS client running just a few apps, you may not notice the difference (unless you have a low bandwidth network). But NFS pretty much requires good client-side caching to scale in the number of clients and amount of I/O. > >> I don't need details, just a couple of sketchy examples so I can visualize what you are referring to. >> >> Thanks for increasing my understanding, >> >> -=# Paul Gilliam #=- >> >> >>> To make direct I/O work well, applications have to use it sparingly and >>> appropriately. They usually maintain their own buffer cache in lieu of the >>> client's generic page cache. Applications like shells and editors depend >>> on an NFS client's local page cache to work well. >>> >>> So, we have chosen to support direct I/O only when each file is opened, not >>> as a file system wide option. This is a much narrower application of this >>> feature, and has a better chance of helping performance in special cases >>> while not destroying it broadly. >>> >>> So far I haven't read anything here that clearly states a requirement we >>> have overlooked in the past. >>> >>> For your "cp" example, the NFS community is looking at ways to reduce the >>> overhead of file copy operations by offloading them to the server. The >>> file data doesn't have to travel over the network to the client. Someone >>> recently said when you leave this kind of choice up to users, they will >>> usually choose exactly the wrong option. This is a clear case where the >>> system and application developers will choose better than users who have no >>> programming skills. >>> >>> >>>> On Sat, Aug 7, 2010 at 1:09 AM, Trond Myklebust >>>> <trond.myklebust@fys.uio.no> wrote: >>>>> On Sat, 2010-08-07 at 00:59 +0530, Peter Chacko wrote: >>>>>> Imagine a third party backup app for which a customer has no source >>>>>> code. (that doesn't use open system call O_DIRECT mode) backing up >>>>>> millions of files through NFS....How can we do a non-cached IO to the >>>>>> target server ? we cannot use O_DIRECT option here as we don't have >>>>>> the source code....If we have mount option, its works just right >>>>>> ....if we can have read-only mounts, why not have a dio-only mount ? >>>>>> >>>>>> A true application-Yaware storage systems(in this case NFS client) , >>>>>> which is the next generation storage systems should do, should absorb >>>>>> the application needs that may apply to the whole FS.... >>>>>> >>>>>> i don't say O_DIRECT flag is a bad idea, but it will only work with a >>>>>> regular application that do IO to some files.....this is not the best >>>>>> solution when NFS server is used as the storage for secondary data, >>>>>> where NFS client runs third party applications thats otherwise run >>>>>> best in a local storage as there is no caching issues.... >>>>>> >>>>>> What do you think ? >>>>> >>>>> I think that we've had O_DIRECT support in the kernel for more than six >>>>> years now. If there are backup vendors out there that haven't been >>>>> paying attention, then I'd suggest looking at other vendors. >>>>> >>>>> Trond >>>>> >>>>>> On Fri, Aug 6, 2010 at 11:07 PM, Trond Myklebust >>>>>> <trond.myklebust@fys.uio.no> wrote: >>>>>>> On Fri, 2010-08-06 at 15:05 +0100, Peter Chacko wrote: >>>>>>>> Some distributed file systems such as IBM's SANFS, support direct IO >>>>>>>> to the target storage....without going through a cache... ( This >>>>>>>> feature is useful, for write only work load....say, we are backing up >>>>>>>> huge data to an NFS share....). >>>>>>>> >>>>>>>> I think if not available, we should add a DIO mount option, that tell >>>>>>>> the VFS not to cache any data, so that close operation will not >>> stall. >>>>>>> >>>>>>> Ugh no! Applications that need direct IO should be using >>> open(O_DIRECT), >>>>>>> not relying on hacks like mount options. >>>>>>> >>>>>>>> With the open-to-close , cache coherence protocol of NFS, an >>>>>>>> aggressive caching client, is a performance downer for many work- >>> loads >>>>>>>> that is write-mostly. >>>>>>> >>>>>>> We already have full support for vectored aio/dio in the NFS for those >>>>>>> applications that want to use it. >>>>>>> >>>>>>> Trond >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Aug 6, 2010 at 2:26 PM, Jim Rees <rees@umich.edu> wrote: >>>>>>>>> Matthew Hodgson wrote: >>>>>>>>> >>>>>>>>> Is there any way to tune the linux NFSv3 client to prefer to write >>>>>>>>> data straight to an async-mounted server, rather than having large >>>>>>>>> writes to a file stack up in the local pagecache before being >>> synced >>>>>>>>> on close()? >>>>>>>>> >>>>>>>>> It's been a while since I've done this, but I think you can tune >>> this with >>>>>>>>> vm.dirty_writeback_centisecs and vm.dirty_background_ratio sysctls. >>> The >>>>>>>>> data will still go through the page cache but you can reduce the >>> amount that >>>>>>>>> stacks up. >>>>>>>>> >>>>>>>>> There are other places where the data can get buffered, like the rpc >>> layer, >>>>>>>>> but it won't sit there any longer than it takes for it to go out the >>> wire. >>>>>>>>> -- >>>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" >>> in >>>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>>> >>>>>>>> -- >>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" >>> in >>>>>>>> the body of a message to majordomo@vger.kernel.org >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>>>> >>>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> -- >>> Chuck Lever >>> chuck[dot]lever[at]oracle[dot]com >>> >>> >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > Chuck Lever > chuck[dot]lever[at]oracle[dot]com > > > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Tuning NFS client write pagecache 2010-08-11 2:09 ` Peter Chacko @ 2010-08-11 16:05 ` Chuck Lever 2010-08-11 17:14 ` Peter Chacko 0 siblings, 1 reply; 19+ messages in thread From: Chuck Lever @ 2010-08-11 16:05 UTC (permalink / raw) To: Peter Chacko; +Cc: linux-nfs@vger.kernel.org Mailing list [ Trimming CC: list ] On Aug 10, 2010, at 8:09 PM, Peter Chacko wrote: > Chuck, > > Ok i will then check to see the command line option to request the DIO > mode for NFS, as you suggested. > > yes i other wise I fully understand the need of client caching.....for > desktop bound or any general purpose applications... AFS, cacheFS are > all good products in its own right.....but the only problem in such > cases are cache coherence issues...(i mean other application clientss > are not guaranteed to get the latest data,on their read) ..as NFS > honor only open-to-close session semantics. > > The situation i have is that, > > we have a data protection product, that has agents on indvidual > servers and a storage gateway.(which is an NFS mounted box). The only > purpose of this box is to store all data, in a streaming write > mode.....for all the data coming from 10s of agents....essentially > this acts like a VTL target....from this node, to NFS server node, > there is no data travelling in the reverse path (or from the client > cache to the application). > > THis is the only use we put NFS under.... > > For recovery, its again a streamed read...... we never updating the > read data, or re-reading the updated data....This is special , single > function box..... > > What do you think the best mount options for this scenario ? What is the data rate (both IOPS and data throughput) of both the read and write cases? How large are application read and write ops, on average? What kind of networking is deployed? What is the server and clients (hardware and OS)? And, I assume you are asking because the environment is not performing as you expect. Can you detail your performance issues? -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Tuning NFS client write pagecache 2010-08-11 16:05 ` Chuck Lever @ 2010-08-11 17:14 ` Peter Chacko 2010-08-11 20:51 ` Chuck Lever 0 siblings, 1 reply; 19+ messages in thread From: Peter Chacko @ 2010-08-11 17:14 UTC (permalink / raw) To: Chuck Lever; +Cc: linux-nfs@vger.kernel.org Mailing list We typically use 100MB/1GbE....and the server Storage is SATA/SCSI...for IOPs i have not really measured the NFS client performance to tell you the exact number, and we use write size 4k/8ks...MTU size of the link is 1500 bytes... But we got noticeable uniform throughput(without a bursty traffic), and overall performance when we hand-code NFS RPC operations(including MOUNT to get the root File handle) and send to server, that wrote all data to server at the NFS interface.(a sort of directNFS from the user space)..without going through kernel mode VFS interface of NFS client driver. I was just wondering to get the same performance on native nfs client... Its still a matter of opinion about what control we should give to applications and what OS should control.....!! As we test more, i can send you more test data about this .. Finally applications will end up re-invent the wheel to suits it special needs :-) How does ORACLE's directNFS deal this ? thanks chuck for your thoughts ! On Wed, Aug 11, 2010 at 9:35 PM, Chuck Lever <chuck.lever@oracle.com> wrote: > [ Trimming CC: list ] > > On Aug 10, 2010, at 8:09 PM, Peter Chacko wrote: > >> Chuck, >> >> Ok i will then check to see the command line option to request the DIO >> mode for NFS, as you suggested. >> >> yes i other wise I fully understand the need of client caching.....for >> desktop bound or any general purpose applications... AFS, cacheFS are >> all good products in its own right.....but the only problem in such >> cases are cache coherence issues...(i mean other application clientss >> are not guaranteed to get the latest data,on their read) ..as NFS >> honor only open-to-close session semantics. >> >> The situation i have is that, >> >> we have a data protection product, that has agents on indvidual >> servers and a storage gateway.(which is an NFS mounted box). The only >> purpose of this box is to store all data, in a streaming write >> mode.....for all the data coming from 10s of agents....essentially >> this acts like a VTL target....from this node, to NFS server node, >> there is no data travelling in the reverse path (or from the client >> cache to the application). >> >> THis is the only use we put NFS under.... >> >> For recovery, its again a streamed read...... we never updating the >> read data, or re-reading the updated data....This is special , single >> function box..... >> >> What do you think the best mount options for this scenario ? > > What is the data rate (both IOPS and data throughput) of both the read and write cases? How large are application read and write ops, on average? What kind of networking is deployed? What is the server and clients (hardware and OS)? > > And, I assume you are asking because the environment is not performing as you expect. Can you detail your performance issues? > > -- > Chuck Lever > chuck[dot]lever[at]oracle[dot]com > > > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Tuning NFS client write pagecache 2010-08-11 17:14 ` Peter Chacko @ 2010-08-11 20:51 ` Chuck Lever 0 siblings, 0 replies; 19+ messages in thread From: Chuck Lever @ 2010-08-11 20:51 UTC (permalink / raw) To: Peter Chacko; +Cc: linux-nfs@vger.kernel.org Mailing list On Aug 11, 2010, at 11:14 AM, Peter Chacko wrote: > We typically use 100MB/1GbE....and the server Storage is > SATA/SCSI...for IOPs i have not really measured the NFS client > performance to tell you the exact number, and we use write size > 4k/8ks...MTU size of the link is 1500 bytes... > But we got noticeable uniform throughput(without a bursty traffic), > and overall performance when we hand-code NFS RPC > operations(including MOUNT to get the root File handle) and send to > server, that wrote all data to server at the NFS interface.(a sort of > directNFS from the user space)..without going through kernel mode VFS > interface of NFS client driver. I was just wondering to get the same > performance on native nfs client... Again, I'm not hearing a clearly stated performance issue. It doesn't sound like anything that can't easily be handled by the default mount options in any late model Linux distribution. NFSv3 over TCP with the largest rsize and wsize negotiated with the server should easily handle this workload. > Its still a matter of opinion about what control we should give to > applications and what OS should control.....!! > > As we test more, i can send you more test data about this .. > > Finally applications will end up re-invent the wheel to suits it > special needs :-) Given just this information, I don't see anything that suggests you can't implement all of this with POSIX system calls and the kernel NFS client. Client-side data caching may waste a few resources for write-only and read-once workloads, but the kernel will reclaim memory when needed. Your application can also use standard system calls to control the cached data, if it is a real concern. > How does ORACLE's directNFS deal this ? I work on the Linux kernel NFS client, so I can't really give dNFS specifics with any kind of authority. dNFS is useful because the database already has its own built-in buffer cache, manages a very large resident set, and often needs tight cache coherency with other nodes in a database cluster (which is achieved via a separate cache protocol, rather than relying on NFS and OS caching behavior). dNFS is logically quite similar to doing direct I/O through the kernel's NFS client. The advantages of dNFS over direct I/O via the kernel are: 1. dNFS is a part of the Oracle database application, and thus the internal APIs and NFS behavior are always the same across all operating systems, and 2. dNFS allows a somewhat shorter code path and fewer context switches per I/O. This is usually only critical on systems that require immense scaling. I haven't heard anything, so far, that suggests your workload has these requirements. > thanks chuck for your thoughts ! > > On Wed, Aug 11, 2010 at 9:35 PM, Chuck Lever <chuck.lever@oracle.com> wrote: >> [ Trimming CC: list ] >> >> On Aug 10, 2010, at 8:09 PM, Peter Chacko wrote: >> >>> Chuck, >>> >>> Ok i will then check to see the command line option to request the DIO >>> mode for NFS, as you suggested. >>> >>> yes i other wise I fully understand the need of client caching.....for >>> desktop bound or any general purpose applications... AFS, cacheFS are >>> all good products in its own right.....but the only problem in such >>> cases are cache coherence issues...(i mean other application clientss >>> are not guaranteed to get the latest data,on their read) ..as NFS >>> honor only open-to-close session semantics. >>> >>> The situation i have is that, >>> >>> we have a data protection product, that has agents on indvidual >>> servers and a storage gateway.(which is an NFS mounted box). The only >>> purpose of this box is to store all data, in a streaming write >>> mode.....for all the data coming from 10s of agents....essentially >>> this acts like a VTL target....from this node, to NFS server node, >>> there is no data travelling in the reverse path (or from the client >>> cache to the application). >>> >>> THis is the only use we put NFS under.... >>> >>> For recovery, its again a streamed read...... we never updating the >>> read data, or re-reading the updated data....This is special , single >>> function box..... >>> >>> What do you think the best mount options for this scenario ? >> >> What is the data rate (both IOPS and data throughput) of both the read and write cases? How large are application read and write ops, on average? What kind of networking is deployed? What is the server and clients (hardware and OS)? >> >> And, I assume you are asking because the environment is not performing as you expect. Can you detail your performance issues? >> >> -- >> Chuck Lever >> chuck[dot]lever[at]oracle[dot]com >> >> >> >> -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Tuning NFS client write pagecache 2010-08-06 13:26 ` Jim Rees 2010-08-06 14:05 ` Peter Chacko @ 2010-08-06 16:29 ` Matthew Hodgson 2010-08-07 0:25 ` Matthew Hodgson 1 sibling, 1 reply; 19+ messages in thread From: Matthew Hodgson @ 2010-08-06 16:29 UTC (permalink / raw) To: Jim Rees; +Cc: linux-nfs Hi, Jim Rees wrote: > Matthew Hodgson wrote: > > Is there any way to tune the linux NFSv3 client to prefer to write > data straight to an async-mounted server, rather than having large > writes to a file stack up in the local pagecache before being synced > on close()? > > It's been a while since I've done this, but I think you can tune this with > vm.dirty_writeback_centisecs and vm.dirty_background_ratio sysctls. The > data will still go through the page cache but you can reduce the amount > that > stacks up. Yup, that does the trick - I'd tried this earlier, but hadn't gone far enough - seemingly I need to drop vm.dirty_writeback_centisecs down to 1 (and vm.dirty_background_ratio to 1) for the back-pressure to propagate correctly for this use case. Thanks for the pointer! In other news, whilst saturating the ~10Mb/s pipe during the big write to the server, I'm seeing huge delays of >10 seconds on trying to do trivial operations such as ls'ing small directories. Is this normal, or is there some kind of tunable scheduling on the client to avoid a single big transfer wedging the machine? thanks, Matthew -- Matthew Hodgson Development Program Manager OpenMarket | www.openmarket.com/europe matthew.hodgson@openmarket.com ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Tuning NFS client write pagecache 2010-08-06 16:29 ` Matthew Hodgson @ 2010-08-07 0:25 ` Matthew Hodgson 0 siblings, 0 replies; 19+ messages in thread From: Matthew Hodgson @ 2010-08-07 0:25 UTC (permalink / raw) To: Jim Rees; +Cc: linux-nfs On 06/08/2010 17:29, Matthew Hodgson wrote: >> Matthew Hodgson wrote: >> >> Is there any way to tune the linux NFSv3 client to prefer to write >> data straight to an async-mounted server, rather than having large >> writes to a file stack up in the local pagecache before being synced >> on close()? > > In other news, whilst saturating the ~10Mb/s pipe during the big write > to the server, I'm seeing huge delays of >10 seconds on trying to do > trivial operations such as ls'ing small directories. Is this normal, or > is there some kind of tunable scheduling on the client to avoid a single > big transfer wedging the machine? Hm, on reading the archives, it seems that this is a fairly common complaint when dealing with large sequential workloads - a sideeffect of the write pagecache not writing out smoothly. What is the status of the '[PATCH] improve the performance of large sequential write NFS workloads' patchset at http://www.spinics.net/lists/linux-nfs/msg11131.html? It seems that it, and its predecessors, are intended to fix precisely this issue. It doesn't seem to have landed in mainline, though, and I can't find any mention of it since http://lwn.net/Articles/373868/. thanks, Matthew -- Matthew Hodgson Development Program Manager OpenMarket | www.openmarket.com/europe matthew.hodgson@openmarket.com ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2010-08-11 20:53 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-08-06 12:21 Tuning NFS client write pagecache Matthew Hodgson 2010-08-06 13:26 ` Jim Rees 2010-08-06 14:05 ` Peter Chacko 2010-08-06 17:37 ` Trond Myklebust 2010-08-06 19:29 ` Peter Chacko 2010-08-06 19:39 ` Trond Myklebust 2010-08-07 3:15 ` Peter Chacko 2010-08-10 16:27 ` Chuck Lever 2010-08-10 17:52 ` Peter Chacko 2010-08-10 18:19 ` David Brodbeck 2010-08-10 19:16 ` Chuck Lever 2010-08-10 20:50 ` Gilliam, PaulX J 2010-08-10 21:47 ` Chuck Lever 2010-08-11 2:09 ` Peter Chacko 2010-08-11 16:05 ` Chuck Lever 2010-08-11 17:14 ` Peter Chacko 2010-08-11 20:51 ` Chuck Lever 2010-08-06 16:29 ` Matthew Hodgson 2010-08-07 0:25 ` Matthew Hodgson
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.