* NFS Mount Option 'nofsc' @ 2012-02-08 2:45 Derek McEachern 2012-02-08 4:55 ` Myklebust, Trond 0 siblings, 1 reply; 17+ messages in thread From: Derek McEachern @ 2012-02-08 2:45 UTC (permalink / raw) To: linux-nfs I joined the mailing list shortly after Neil sent out a request for volunteer to update the nfs man page documenting the 'fsc'/'nofsc' options. I suspect this may stem from a ticket we opened with Suse inquiring about these options. Coming from a Solaris background we typically use the 'forcedirectio' option for certain mounts and I was looking for the same thing in Linux. The typically advice seems to be use 'noac' but the description in the man page doesn't seem to match what I would expect from 'forcedirectio', namely no buffering on the client. Poking around the kernel I found the 'fsc'/'nofsc' options and my question is does 'nofsc' provide 'forcedirectio' functionality? Thanks, Derek ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: NFS Mount Option 'nofsc' 2012-02-08 2:45 NFS Mount Option 'nofsc' Derek McEachern @ 2012-02-08 4:55 ` Myklebust, Trond 2012-02-08 7:43 ` Harshula 2012-02-08 18:13 ` Derek McEachern 0 siblings, 2 replies; 17+ messages in thread From: Myklebust, Trond @ 2012-02-08 4:55 UTC (permalink / raw) To: Derek McEachern; +Cc: linux-nfs@vger.kernel.org T24gVHVlLCAyMDEyLTAyLTA3IGF0IDIwOjQ1IC0wNjAwLCBEZXJlayBNY0VhY2hlcm4gd3JvdGU6 DQo+IEkgam9pbmVkIHRoZSBtYWlsaW5nIGxpc3Qgc2hvcnRseSBhZnRlciBOZWlsIHNlbnQgb3V0 IGEgcmVxdWVzdCBmb3IgDQo+IHZvbHVudGVlciB0byB1cGRhdGUgdGhlIG5mcyBtYW4gcGFnZSBk b2N1bWVudGluZyB0aGUgJ2ZzYycvJ25vZnNjJyANCj4gb3B0aW9ucy4gSSBzdXNwZWN0IHRoaXMg bWF5IHN0ZW0gZnJvbSBhIHRpY2tldCB3ZSBvcGVuZWQgd2l0aCBTdXNlIA0KPiBpbnF1aXJpbmcg YWJvdXQgdGhlc2Ugb3B0aW9ucy4NCj4gDQo+IENvbWluZyBmcm9tIGEgU29sYXJpcyBiYWNrZ3Jv dW5kIHdlIHR5cGljYWxseSB1c2UgdGhlICdmb3JjZWRpcmVjdGlvJyANCj4gb3B0aW9uIGZvciBj ZXJ0YWluIG1vdW50cyBhbmQgSSB3YXMgbG9va2luZyBmb3IgdGhlIHNhbWUgdGhpbmcgaW4gTGlu dXguIA0KPiBUaGUgdHlwaWNhbGx5IGFkdmljZSBzZWVtcyB0byBiZSB1c2UgJ25vYWMnIGJ1dCB0 aGUgZGVzY3JpcHRpb24gaW4gdGhlIA0KPiBtYW4gcGFnZSBkb2Vzbid0IHNlZW0gdG8gbWF0Y2gg d2hhdCBJIHdvdWxkIGV4cGVjdCBmcm9tICdmb3JjZWRpcmVjdGlvJywgDQo+IG5hbWVseSBubyBi dWZmZXJpbmcgb24gdGhlIGNsaWVudC4NCj4gDQo+IFBva2luZyBhcm91bmQgdGhlIGtlcm5lbCBJ IGZvdW5kIHRoZSAnZnNjJy8nbm9mc2MnIG9wdGlvbnMgYW5kIG15IA0KPiBxdWVzdGlvbiBpcyBk b2VzICdub2ZzYycgcHJvdmlkZSAnZm9yY2VkaXJlY3RpbycgZnVuY3Rpb25hbGl0eT8NCg0KTm8u IFRoZXJlIGlzIG5vIGVxdWl2YWxlbnQgdG8gdGhlIFNvbGFyaXMgImZvcmNlZGlyZWN0aW8iIG1v dW50IG9wdGlvbg0KaW4gTGludXguDQpBcHBsaWNhdGlvbnMgdGhhdCBuZWVkIHRvIHVzZSB1bmNh Y2hlZCBpL28gYXJlIHJlcXVpcmVkIHRvIHVzZSB0aGUNCk9fRElSRUNUIG9wZW4oKSBtb2RlIGlu c3RlYWQsIHNpbmNlIHByZXR0eSBtdWNoIGFsbCBvZiB0aGVtIG5lZWQgdG8gYmUNCnJld3JpdHRl biB0byBkZWFsIHdpdGggdGhlIHN1YnRsZXRpZXMgaW52b2x2ZWQgYW55d2F5Lg0KDQpUcm9uZA0K LS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lcg0KDQpOZXRB cHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cubmV0YXBwLmNvbQ0KDQo= ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: NFS Mount Option 'nofsc' 2012-02-08 4:55 ` Myklebust, Trond @ 2012-02-08 7:43 ` Harshula 2012-02-08 15:40 ` Chuck Lever 2012-02-08 18:13 ` Derek McEachern 1 sibling, 1 reply; 17+ messages in thread From: Harshula @ 2012-02-08 7:43 UTC (permalink / raw) To: Myklebust, Trond; +Cc: Derek McEachern, linux-nfs@vger.kernel.org Hi Trond, On Wed, 2012-02-08 at 04:55 +0000, Myklebust, Trond wrote: > Applications that need to use uncached i/o are required to use the > O_DIRECT open() mode instead, since pretty much all of them need to be > rewritten to deal with the subtleties involved anyway. Could you please expand on the subtleties involved that require an application to be rewritten if forcedirectio mount option was available? A scenario where forcedirectio would be useful is when an application reads nearly a TB of data from local disks, processes that data and then dumps it to an NFS mount. All that happens while other processes are reading/writing to the local disks. The application does not have an O_DIRECT option nor is the source code available. With paged I/O the problem we see is that the NFS client system reaches dirty_bytes/dirty_ratio threshold and then blocks/forces all the processes to flush dirty pages. This effectively 'locks' up the NFS client system while the NFS dirty pages are pushed slowly over the wire to the NFS server. Some of the processes that have nothing to do with writing to the NFS mount are badly impacted. A forcedirectio mount option would be very helpful in this scenario. Do you have any advice on alleviating such problems on the NFS client by only using existing tunables? Thanks, # ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: NFS Mount Option 'nofsc' 2012-02-08 7:43 ` Harshula @ 2012-02-08 15:40 ` Chuck Lever 2012-02-09 3:56 ` Harshula 0 siblings, 1 reply; 17+ messages in thread From: Chuck Lever @ 2012-02-08 15:40 UTC (permalink / raw) To: Harshula; +Cc: Myklebust, Trond, Derek McEachern, linux-nfs@vger.kernel.org On Feb 8, 2012, at 2:43 AM, Harshula wrote: > Hi Trond, > > On Wed, 2012-02-08 at 04:55 +0000, Myklebust, Trond wrote: > >> Applications that need to use uncached i/o are required to use the >> O_DIRECT open() mode instead, since pretty much all of them need to be >> rewritten to deal with the subtleties involved anyway. > > Could you please expand on the subtleties involved that require an > application to be rewritten if forcedirectio mount option was available? > > A scenario where forcedirectio would be useful is when an application > reads nearly a TB of data from local disks, processes that data and then > dumps it to an NFS mount. All that happens while other processes are > reading/writing to the local disks. The application does not have an > O_DIRECT option nor is the source code available. > > With paged I/O the problem we see is that the NFS client system reaches > dirty_bytes/dirty_ratio threshold and then blocks/forces all the > processes to flush dirty pages. This effectively 'locks' up the NFS > client system while the NFS dirty pages are pushed slowly over the wire > to the NFS server. Some of the processes that have nothing to do with > writing to the NFS mount are badly impacted. A forcedirectio mount > option would be very helpful in this scenario. Do you have any advice on > alleviating such problems on the NFS client by only using existing > tunables? Using direct I/O would be a work-around. The fundamental problem is the architecture of the VM system, and over time we have been making improvements there. Instead of a mount option, you can fix your application to use direct I/O. Or you can change it to provide the kernel with (better) hints about the disposition of the data it is generating (madvise and fadvise system calls). (On Linux we assume you have source code and can make such changes. I realize this is not true for proprietary applications). You could try using the "sync" mount option to cause the NFS client to push writes to the server immediately rather than delaying them. This would also slow down applications that aggressively dirties pages on the client. Meanwhile, you can dial down the dirty_ratio and especially the dirty_background_ratio settings to trigger earlier writeback. We've also found increasing min_free_bytes has positive effects. The exact settings depend on how much memory your client has. Experimenting yourself is pretty harmless, so I won't give exact settings here. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: NFS Mount Option 'nofsc' 2012-02-08 15:40 ` Chuck Lever @ 2012-02-09 3:56 ` Harshula 2012-02-09 4:12 ` Myklebust, Trond 0 siblings, 1 reply; 17+ messages in thread From: Harshula @ 2012-02-09 3:56 UTC (permalink / raw) To: Chuck Lever; +Cc: Myklebust, Trond, Derek McEachern, linux-nfs@vger.kernel.org Hi Chuck, On Wed, 2012-02-08 at 10:40 -0500, Chuck Lever wrote: > On Feb 8, 2012, at 2:43 AM, Harshula wrote: > > Could you please expand on the subtleties involved that require an > > application to be rewritten if forcedirectio mount option was available? > > > > A scenario where forcedirectio would be useful is when an application > > reads nearly a TB of data from local disks, processes that data and then > > dumps it to an NFS mount. All that happens while other processes are > > reading/writing to the local disks. The application does not have an > > O_DIRECT option nor is the source code available. > > > > With paged I/O the problem we see is that the NFS client system reaches > > dirty_bytes/dirty_ratio threshold and then blocks/forces all the > > processes to flush dirty pages. This effectively 'locks' up the NFS > > client system while the NFS dirty pages are pushed slowly over the wire > > to the NFS server. Some of the processes that have nothing to do with > > writing to the NFS mount are badly impacted. A forcedirectio mount > > option would be very helpful in this scenario. Do you have any advice on > > alleviating such problems on the NFS client by only using existing > > tunables? > > Using direct I/O would be a work-around. The fundamental problem is > the architecture of the VM system, and over time we have been making > improvements there. > > Instead of a mount option, you can fix your application to use direct > I/O. Or you can change it to provide the kernel with (better) hints > about the disposition of the data it is generating (madvise and > fadvise system calls). (On Linux we assume you have source code and > can make such changes. I realize this is not true for proprietary > applications). > > You could try using the "sync" mount option to cause the NFS client to > push writes to the server immediately rather than delaying them. This > would also slow down applications that aggressively dirties pages on > the client. > > Meanwhile, you can dial down the dirty_ratio and especially the > dirty_background_ratio settings to trigger earlier writeback. We've > also found increasing min_free_bytes has positive effects. The exact > settings depend on how much memory your client has. Experimenting > yourself is pretty harmless, so I won't give exact settings here. Thanks for the reply. Unfortunately, not all vendors provide the source code, so using O_DIRECT or fsync is not always an option. Lowering dirty_bytes/dirty_ratio and dirty_background_bytes/dirty_background_ratio did help as it smoothed out the data transfer over the wire by pushing data out to the NFS server sooner. Otherwise, I was seeing the data transfer over the wire having idle periods while >10GiB of pages were being dirtied by the processes, then congestion as soon as the dirty_ratio was reached and the frantic flushing of dirty pages to the NFS server. However, modifying dirty_* tunables has a system-wide impact, hence it was not accepted. The "sync" option, depending on the NFS server, may impact the NFS server's performance when serving many NFS clients. But still worth a try. The other hack that seems to work is periodically triggering an nfs_getattr(), via ls -l, to force the dirty pages to be flushed to the NFS server. Not exactly elegant ... Thanks, # ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: NFS Mount Option 'nofsc' 2012-02-09 3:56 ` Harshula @ 2012-02-09 4:12 ` Myklebust, Trond 2012-02-09 5:51 ` Harshula 0 siblings, 1 reply; 17+ messages in thread From: Myklebust, Trond @ 2012-02-09 4:12 UTC (permalink / raw) To: Harshula; +Cc: Chuck Lever, Derek McEachern, linux-nfs@vger.kernel.org T24gVGh1LCAyMDEyLTAyLTA5IGF0IDE0OjU2ICsxMTAwLCBIYXJzaHVsYSB3cm90ZToNCj4gSGkg Q2h1Y2ssDQo+IA0KPiBPbiBXZWQsIDIwMTItMDItMDggYXQgMTA6NDAgLTA1MDAsIENodWNrIExl dmVyIHdyb3RlOg0KPiA+IE9uIEZlYiA4LCAyMDEyLCBhdCAyOjQzIEFNLCBIYXJzaHVsYSB3cm90 ZToNCj4gDQo+ID4gPiBDb3VsZCB5b3UgcGxlYXNlIGV4cGFuZCBvbiB0aGUgc3VidGxldGllcyBp bnZvbHZlZCB0aGF0IHJlcXVpcmUgYW4NCj4gPiA+IGFwcGxpY2F0aW9uIHRvIGJlIHJld3JpdHRl biBpZiBmb3JjZWRpcmVjdGlvIG1vdW50IG9wdGlvbiB3YXMgYXZhaWxhYmxlPw0KPiA+ID4gDQo+ ID4gPiBBIHNjZW5hcmlvIHdoZXJlIGZvcmNlZGlyZWN0aW8gd291bGQgYmUgdXNlZnVsIGlzIHdo ZW4gYW4gYXBwbGljYXRpb24NCj4gPiA+IHJlYWRzIG5lYXJseSBhIFRCIG9mIGRhdGEgZnJvbSBs b2NhbCBkaXNrcywgcHJvY2Vzc2VzIHRoYXQgZGF0YSBhbmQgdGhlbg0KPiA+ID4gZHVtcHMgaXQg dG8gYW4gTkZTIG1vdW50LiBBbGwgdGhhdCBoYXBwZW5zIHdoaWxlIG90aGVyIHByb2Nlc3NlcyBh cmUNCj4gPiA+IHJlYWRpbmcvd3JpdGluZyB0byB0aGUgbG9jYWwgZGlza3MuIFRoZSBhcHBsaWNh dGlvbiBkb2VzIG5vdCBoYXZlIGFuDQo+ID4gPiBPX0RJUkVDVCBvcHRpb24gbm9yIGlzIHRoZSBz b3VyY2UgY29kZSBhdmFpbGFibGUuDQoNCm1vdW50IC1vc3luYyB3b3JrcyBqdXN0IGFzIHdlbGwg YXMgZm9yY2VkaXJlY3RpbyBmb3IgdGhpcy4NCg0KPiA+ID4gV2l0aCBwYWdlZCBJL08gdGhlIHBy b2JsZW0gd2Ugc2VlIGlzIHRoYXQgdGhlIE5GUyBjbGllbnQgc3lzdGVtIHJlYWNoZXMNCj4gPiA+ IGRpcnR5X2J5dGVzL2RpcnR5X3JhdGlvIHRocmVzaG9sZCBhbmQgdGhlbiBibG9ja3MvZm9yY2Vz IGFsbCB0aGUNCj4gPiA+IHByb2Nlc3NlcyB0byBmbHVzaCBkaXJ0eSBwYWdlcy4gVGhpcyBlZmZl Y3RpdmVseSAnbG9ja3MnIHVwIHRoZSBORlMNCj4gPiA+IGNsaWVudCBzeXN0ZW0gd2hpbGUgdGhl IE5GUyBkaXJ0eSBwYWdlcyBhcmUgcHVzaGVkIHNsb3dseSBvdmVyIHRoZSB3aXJlDQo+ID4gPiB0 byB0aGUgTkZTIHNlcnZlci4gU29tZSBvZiB0aGUgcHJvY2Vzc2VzIHRoYXQgaGF2ZSBub3RoaW5n IHRvIGRvIHdpdGgNCj4gPiA+IHdyaXRpbmcgdG8gdGhlIE5GUyBtb3VudCBhcmUgYmFkbHkgaW1w YWN0ZWQuIEEgZm9yY2VkaXJlY3RpbyBtb3VudA0KPiA+ID4gb3B0aW9uIHdvdWxkIGJlIHZlcnkg aGVscGZ1bCBpbiB0aGlzIHNjZW5hcmlvLiBEbyB5b3UgaGF2ZSBhbnkgYWR2aWNlIG9uDQo+ID4g PiBhbGxldmlhdGluZyBzdWNoIHByb2JsZW1zIG9uIHRoZSBORlMgY2xpZW50IGJ5IG9ubHkgdXNp bmcgZXhpc3RpbmcNCj4gPiA+IHR1bmFibGVzPw0KPiA+IA0KPiA+IFVzaW5nIGRpcmVjdCBJL08g d291bGQgYmUgYSB3b3JrLWFyb3VuZC4gIFRoZSBmdW5kYW1lbnRhbCBwcm9ibGVtIGlzDQo+ID4g dGhlIGFyY2hpdGVjdHVyZSBvZiB0aGUgVk0gc3lzdGVtLCBhbmQgb3ZlciB0aW1lIHdlIGhhdmUg YmVlbiBtYWtpbmcNCj4gPiBpbXByb3ZlbWVudHMgdGhlcmUuDQoNClRoZSBhcmd1bWVudCBhYm92 ZSBkb2Vzbid0IHByb3ZpZGVkIGFueSBtb3RpdmUgZm9yIHVzaW5nIGRpcmVjdGlvDQoodW5jYWNo ZWQgaS9vKSB2cyBzeW5jaHJvbm91cyBpL28uIEkgc2VlIG5vIHJlYXNvbiB3aHkgZm9yY2VkDQpz eW5jaHJvbm91cyBpL28gd291bGQgYmUgYSBwcm9ibGVtIGhlcmUuDQoNCj4gPiBJbnN0ZWFkIG9m IGEgbW91bnQgb3B0aW9uLCB5b3UgY2FuIGZpeCB5b3VyIGFwcGxpY2F0aW9uIHRvIHVzZSBkaXJl Y3QNCj4gPiBJL08uICBPciB5b3UgY2FuIGNoYW5nZSBpdCB0byBwcm92aWRlIHRoZSBrZXJuZWwg d2l0aCAoYmV0dGVyKSBoaW50cw0KPiA+IGFib3V0IHRoZSBkaXNwb3NpdGlvbiBvZiB0aGUgZGF0 YSBpdCBpcyBnZW5lcmF0aW5nIChtYWR2aXNlIGFuZA0KPiA+IGZhZHZpc2Ugc3lzdGVtIGNhbGxz KS4gIChPbiBMaW51eCB3ZSBhc3N1bWUgeW91IGhhdmUgc291cmNlIGNvZGUgYW5kDQo+ID4gY2Fu IG1ha2Ugc3VjaCBjaGFuZ2VzLiAgSSByZWFsaXplIHRoaXMgaXMgbm90IHRydWUgZm9yIHByb3By aWV0YXJ5DQo+ID4gYXBwbGljYXRpb25zKS4NCj4gPiANCj4gPiBZb3UgY291bGQgdHJ5IHVzaW5n IHRoZSAic3luYyIgbW91bnQgb3B0aW9uIHRvIGNhdXNlIHRoZSBORlMgY2xpZW50IHRvDQo+ID4g cHVzaCB3cml0ZXMgdG8gdGhlIHNlcnZlciBpbW1lZGlhdGVseSByYXRoZXIgdGhhbiBkZWxheWlu ZyB0aGVtLiAgVGhpcw0KPiA+IHdvdWxkIGFsc28gc2xvdyBkb3duIGFwcGxpY2F0aW9ucyB0aGF0 IGFnZ3Jlc3NpdmVseSBkaXJ0aWVzIHBhZ2VzIG9uDQo+ID4gdGhlIGNsaWVudC4NCj4gPiANCj4g PiBNZWFud2hpbGUsIHlvdSBjYW4gZGlhbCBkb3duIHRoZSBkaXJ0eV9yYXRpbyBhbmQgZXNwZWNp YWxseSB0aGUNCj4gPiBkaXJ0eV9iYWNrZ3JvdW5kX3JhdGlvIHNldHRpbmdzIHRvIHRyaWdnZXIg ZWFybGllciB3cml0ZWJhY2suICBXZSd2ZQ0KPiA+IGFsc28gZm91bmQgaW5jcmVhc2luZyBtaW5f ZnJlZV9ieXRlcyBoYXMgcG9zaXRpdmUgZWZmZWN0cy4gIFRoZSBleGFjdA0KPiA+IHNldHRpbmdz IGRlcGVuZCBvbiBob3cgbXVjaCBtZW1vcnkgeW91ciBjbGllbnQgaGFzLiAgRXhwZXJpbWVudGlu Zw0KPiA+IHlvdXJzZWxmIGlzIHByZXR0eSBoYXJtbGVzcywgc28gSSB3b24ndCBnaXZlIGV4YWN0 IHNldHRpbmdzIGhlcmUuDQo+IA0KPiBUaGFua3MgZm9yIHRoZSByZXBseS4gVW5mb3J0dW5hdGVs eSwgbm90IGFsbCB2ZW5kb3JzIHByb3ZpZGUgdGhlIHNvdXJjZQ0KPiBjb2RlLCBzbyB1c2luZyBP X0RJUkVDVCBvciBmc3luYyBpcyBub3QgYWx3YXlzIGFuIG9wdGlvbi4gDQoNClRoaXMgaXMgd2hh dCB2ZW5kb3Igc3VwcG9ydCBpcyBmb3IuIFdpdGggY2xvc2VkIHNvdXJjZSBzb2Z0d2FyZSB5b3UN CmdlbmVyYWxseSBnZXRzIHdoYXQgeW91IHBheXMgZm9yLg0KDQo+IExvd2VyaW5nIGRpcnR5X2J5 dGVzL2RpcnR5X3JhdGlvIGFuZA0KPiBkaXJ0eV9iYWNrZ3JvdW5kX2J5dGVzL2RpcnR5X2JhY2tn cm91bmRfcmF0aW8gZGlkIGhlbHAgYXMgaXQgc21vb3RoZWQNCj4gb3V0IHRoZSBkYXRhIHRyYW5z ZmVyIG92ZXIgdGhlIHdpcmUgYnkgcHVzaGluZyBkYXRhIG91dCB0byB0aGUgTkZTDQo+IHNlcnZl ciBzb29uZXIuIE90aGVyd2lzZSwgSSB3YXMgc2VlaW5nIHRoZSBkYXRhIHRyYW5zZmVyIG92ZXIg dGhlIHdpcmUNCj4gaGF2aW5nIGlkbGUgcGVyaW9kcyB3aGlsZSA+MTBHaUIgb2YgcGFnZXMgd2Vy ZSBiZWluZyBkaXJ0aWVkIGJ5IHRoZQ0KPiBwcm9jZXNzZXMsIHRoZW4gY29uZ2VzdGlvbiBhcyBz b29uIGFzIHRoZSBkaXJ0eV9yYXRpbyB3YXMgcmVhY2hlZCBhbmQNCj4gdGhlIGZyYW50aWMgZmx1 c2hpbmcgb2YgZGlydHkgcGFnZXMgdG8gdGhlIE5GUyBzZXJ2ZXIuIEhvd2V2ZXIsDQo+IG1vZGlm eWluZyBkaXJ0eV8qIHR1bmFibGVzIGhhcyBhIHN5c3RlbS13aWRlIGltcGFjdCwgaGVuY2UgaXQg d2FzIG5vdA0KPiBhY2NlcHRlZC4NCj4gDQo+IFRoZSAic3luYyIgb3B0aW9uLCBkZXBlbmRpbmcg b24gdGhlIE5GUyBzZXJ2ZXIsIG1heSBpbXBhY3QgdGhlIE5GUw0KPiBzZXJ2ZXIncyBwZXJmb3Jt YW5jZSB3aGVuIHNlcnZpbmcgbWFueSBORlMgY2xpZW50cy4gQnV0IHN0aWxsIHdvcnRoIGENCj4g dHJ5Lg0KDQpXaGF0IG9uIGVhcnRoIG1ha2VzIHlvdSB0aGluayB0aGF0IGRpcmVjdGlvIHdvdWxk IGJlIGFueSBkaWZmZXJlbnQ/IElmDQp5b3VyIHBlcmZvcm1hbmNlIHJlcXVpcmVtZW50cyBjYW4n dCBjb3BlIHdpdGggJ3N5bmMnLCB0aGVuIHRoZXkgc3VyZSBhcw0KaGVsbCB3b24ndCBkZWFsIHdl bGwgd2l0aCAnZnNjJy4NCg0KRGlyZWN0aW8gaXMgX3N5bmNocm9ub3VzXyBqdXN0IGxpa2UgJ3N5 bmMnLiBUaGUgYmlnIGRpZmZlcmVuY2UgaXMgdGhhdA0Kd2l0aCAnc3luYycgdGhlbiBhdCBsZWFz dCB0aG9zZSByZWFkcyBhcmUgc3RpbGwgY2FjaGVkLg0KDQo+IFRoZSBvdGhlciBoYWNrIHRoYXQg c2VlbXMgdG8gd29yayBpcyBwZXJpb2RpY2FsbHkgdHJpZ2dlcmluZyBhbg0KPiBuZnNfZ2V0YXR0 cigpLCB2aWEgbHMgLWwsIHRvIGZvcmNlIHRoZSBkaXJ0eSBwYWdlcyB0byBiZSBmbHVzaGVkIHRv IHRoZQ0KPiBORlMgc2VydmVyLiBOb3QgZXhhY3RseSBlbGVnYW50IC4uLg0KDQo/Pz8/Pz8/Pz8/ Pz8/Pz8/Pz8/Pz8/Pz8/Pz8/Pz8/PyANCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5G UyBjbGllbnQgbWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29t DQp3d3cubmV0YXBwLmNvbQ0KDQo= ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: NFS Mount Option 'nofsc' 2012-02-09 4:12 ` Myklebust, Trond @ 2012-02-09 5:51 ` Harshula 2012-02-09 14:48 ` Malahal Naineni 2012-02-09 15:31 ` Myklebust, Trond 0 siblings, 2 replies; 17+ messages in thread From: Harshula @ 2012-02-09 5:51 UTC (permalink / raw) To: Myklebust, Trond; +Cc: Chuck Lever, Derek McEachern, linux-nfs@vger.kernel.org Hi Trond, Thanks for the reply. Could you please elaborate on the subtleties involved that require an application to be rewritten if forcedirectio mount option was available? On Thu, 2012-02-09 at 04:12 +0000, Myklebust, Trond wrote: > On Thu, 2012-02-09 at 14:56 +1100, Harshula wrote: > > > > The "sync" option, depending on the NFS server, may impact the NFS > > server's performance when serving many NFS clients. But still worth a > > try. > > What on earth makes you think that directio would be any different? Like I said, sync is still worth a try. I will do O_DIRECT Vs sync mount option runs and see what the numbers look like. A while back the numbers for cached Vs direct small random writes showed as the number of threads increased the cached performance fell well below direct performance. In this case I'll be looking at large streaming writes, so completely different scenario, but I'd like to verify the numbers first. Just to be clear, I am not disagreeing with you. "sync" maybe sufficient for the scenario I described earlier. > If > your performance requirements can't cope with 'sync', then they sure as > hell won't deal well with 'fsc'. "fsc"? > Directio is _synchronous_ just like 'sync'. The big difference is that > with 'sync' then at least those reads are still cached. There's another scenario, which we talked about a while back, where the cached async reads of a slowly growing file (tail) was spitting out non-exist NULLs to user space. The forcedirectio mount option should prevent that. Furthermore, the "sync" mount option will not help anymore because you removed nfs_readpage_sync(). > > The other hack that seems to work is periodically triggering an > > nfs_getattr(), via ls -l, to force the dirty pages to be flushed to the > > NFS server. Not exactly elegant ... > > ???????????????????????????????? int nfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) { struct inode *inode = dentry->d_inode; int need_atime = NFS_I(inode)->cache_validity & NFS_INO_INVALID_ATIME; int err; /* Flush out writes to the server in order to update c/mtime. */ if (S_ISREG(inode->i_mode)) { err = filemap_write_and_wait(inode->i_mapping); if (err) goto out; } Thanks, # ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: NFS Mount Option 'nofsc' 2012-02-09 5:51 ` Harshula @ 2012-02-09 14:48 ` Malahal Naineni 2012-02-09 15:31 ` Myklebust, Trond 1 sibling, 0 replies; 17+ messages in thread From: Malahal Naineni @ 2012-02-09 14:48 UTC (permalink / raw) To: linux-nfs@vger.kernel.org Harshula [harshula@redhat.com] wrote: > Hi Trond, > > Thanks for the reply. Could you please elaborate on the subtleties > involved that require an application to be rewritten if forcedirectio > mount option was available? > > On Thu, 2012-02-09 at 04:12 +0000, Myklebust, Trond wrote: > > On Thu, 2012-02-09 at 14:56 +1100, Harshula wrote: > > > > > > The "sync" option, depending on the NFS server, may impact the NFS > > > server's performance when serving many NFS clients. But still worth a > > > try. > > > > What on earth makes you think that directio would be any different? > > Like I said, sync is still worth a try. I will do O_DIRECT Vs sync mount > option runs and see what the numbers look like. A while back the numbers > for cached Vs direct small random writes showed as the number of threads > increased the cached performance fell well below direct performance. In > this case I'll be looking at large streaming writes, so completely > different scenario, but I'd like to verify the numbers first. directio and sync behavior should be same on server side, but it would be a different story on the client though. The above behavior you described is expected on the client. Thanks, Malahal. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: NFS Mount Option 'nofsc' 2012-02-09 5:51 ` Harshula 2012-02-09 14:48 ` Malahal Naineni @ 2012-02-09 15:31 ` Myklebust, Trond 2012-02-10 8:07 ` Harshula 1 sibling, 1 reply; 17+ messages in thread From: Myklebust, Trond @ 2012-02-09 15:31 UTC (permalink / raw) To: Harshula; +Cc: Chuck Lever, Derek McEachern, linux-nfs@vger.kernel.org T24gVGh1LCAyMDEyLTAyLTA5IGF0IDE2OjUxICsxMTAwLCBIYXJzaHVsYSB3cm90ZToNCj4gSGkg VHJvbmQsDQo+IA0KPiBUaGFua3MgZm9yIHRoZSByZXBseS4gQ291bGQgeW91IHBsZWFzZSBlbGFi b3JhdGUgb24gdGhlIHN1YnRsZXRpZXMNCj4gaW52b2x2ZWQgdGhhdCByZXF1aXJlIGFuIGFwcGxp Y2F0aW9uIHRvIGJlIHJld3JpdHRlbiBpZiBmb3JjZWRpcmVjdGlvDQo+IG1vdW50IG9wdGlvbiB3 YXMgYXZhaWxhYmxlPw0KDQpGaXJzdGx5LCB3ZSBkb24ndCBzdXBwb3J0IE9fRElSRUNUK09fQVBQ RU5EIChzaW5jZSB0aGUgTkZTIHByb3RvY29sDQppdHNlbGYgZG9lc24ndCBzdXBwb3J0IGF0b21p YyBhcHBlbmRzKSwgc28gdGhhdCB3b3VsZCBicmVhayBhIGJ1bmNoIG9mDQphcHBsaWNhdGlvbnMu DQoNClNlY29uZGx5LCB1bmNhY2hlZCBJL08gbWVhbnMgdGhhdCByZWFkKCkgYW5kIHdyaXRlKCkg cmVxdWVzdHMgbmVlZCB0byBiZQ0Kc2VyaWFsaXNlZCBieSB0aGUgYXBwbGljYXRpb24gaXRzZWxm LCBzaW5jZSB0aGVyZSBhcmUgbm8gYXRvbWljaXR5IG9yDQpvcmRlcmluZyBndWFyYW50ZWVzIGF0 IHRoZSBWRlMsIE5GUyBvciBSUEMgY2FsbCBsZXZlbC4gTm9ybWFsbHksIHRoZQ0KcGFnZSBjYWNo ZSBzZXJ2aWNlcyByZWFkKCkgcmVxdWVzdHMgaWYgdGhlcmUgYXJlIG91dHN0YW5kaW5nIHdyaXRl cywgYW5kDQpzbyBwcm92aWRlcyB0aGUgYXRvbWljaXR5IGd1YXJhbnRlZXMgdGhhdCBQT1NJWCBy ZXF1aXJlcy4NCklPVzogaWYgYSB3cml0ZSgpIG9jY3VycyB3aGlsZSB5b3UgYXJlIHJlYWRpbmcs IHRoZSBhcHBsaWNhdGlvbiBtYXkgZW5kDQp1cCByZXRyaWV2aW5nIHBhcnQgb2YgdGhlIG9sZCBk YXRhLCBhbmQgcGFydCBvZiB0aGUgbmV3IGRhdGEgaW5zdGVhZCBvZg0KZWl0aGVyIG9uZSBvciB0 aGUgb3RoZXIuDQoNCklPVzogeW91ciBhcHBsaWNhdGlvbiBzdGlsbCBuZWVkcyB0byBiZSBhd2Fy ZSBvZiB0aGUgZmFjdCB0aGF0IGl0IGlzDQp1c2luZyBPX0RJUkVDVCwgYW5kIHlvdSBhcmUgYmV0 dGVyIG9mIGFkZGluZyBleHBsaWNpdCBzdXBwb3J0IGZvciBpdA0KcmF0aGVyIHRoYW4gaGFja3kg Y2x1Z2VzIHN1Y2ggYXMgYSBmb3JjZWRpcmVjdGlvIG9wdGlvbi4NCg0KPiBPbiBUaHUsIDIwMTIt MDItMDkgYXQgMDQ6MTIgKzAwMDAsIE15a2xlYnVzdCwgVHJvbmQgd3JvdGU6DQo+ID4gT24gVGh1 LCAyMDEyLTAyLTA5IGF0IDE0OjU2ICsxMTAwLCBIYXJzaHVsYSB3cm90ZToNCj4gPiA+DQo+ID4g PiBUaGUgInN5bmMiIG9wdGlvbiwgZGVwZW5kaW5nIG9uIHRoZSBORlMgc2VydmVyLCBtYXkgaW1w YWN0IHRoZSBORlMNCj4gPiA+IHNlcnZlcidzIHBlcmZvcm1hbmNlIHdoZW4gc2VydmluZyBtYW55 IE5GUyBjbGllbnRzLiBCdXQgc3RpbGwgd29ydGggYQ0KPiA+ID4gdHJ5Lg0KPiA+IA0KPiA+IFdo YXQgb24gZWFydGggbWFrZXMgeW91IHRoaW5rIHRoYXQgZGlyZWN0aW8gd291bGQgYmUgYW55IGRp ZmZlcmVudD8NCj4gDQo+IExpa2UgSSBzYWlkLCBzeW5jIGlzIHN0aWxsIHdvcnRoIGEgdHJ5LiBJ IHdpbGwgZG8gT19ESVJFQ1QgVnMgc3luYyBtb3VudA0KPiBvcHRpb24gcnVucyBhbmQgc2VlIHdo YXQgdGhlIG51bWJlcnMgbG9vayBsaWtlLiBBIHdoaWxlIGJhY2sgdGhlIG51bWJlcnMNCj4gZm9y IGNhY2hlZCBWcyBkaXJlY3Qgc21hbGwgcmFuZG9tIHdyaXRlcyBzaG93ZWQgYXMgdGhlIG51bWJl ciBvZiB0aHJlYWRzDQo+IGluY3JlYXNlZCB0aGUgY2FjaGVkIHBlcmZvcm1hbmNlIGZlbGwgd2Vs bCBiZWxvdyBkaXJlY3QgcGVyZm9ybWFuY2UuIEluDQo+IHRoaXMgY2FzZSBJJ2xsIGJlIGxvb2tp bmcgYXQgbGFyZ2Ugc3RyZWFtaW5nIHdyaXRlcywgc28gY29tcGxldGVseQ0KPiBkaWZmZXJlbnQg c2NlbmFyaW8sIGJ1dCBJJ2QgbGlrZSB0byB2ZXJpZnkgdGhlIG51bWJlcnMgZmlyc3QuDQo+IA0K PiBKdXN0IHRvIGJlIGNsZWFyLCBJIGFtIG5vdCBkaXNhZ3JlZWluZyB3aXRoIHlvdS4gInN5bmMi IG1heWJlIHN1ZmZpY2llbnQNCj4gZm9yIHRoZSBzY2VuYXJpbyBJIGRlc2NyaWJlZCBlYXJsaWVy Lg0KPiANCj4gPiBJZg0KPiA+IHlvdXIgcGVyZm9ybWFuY2UgcmVxdWlyZW1lbnRzIGNhbid0IGNv cGUgd2l0aCAnc3luYycsIHRoZW4gdGhleSBzdXJlIGFzDQo+ID4gaGVsbCB3b24ndCBkZWFsIHdl bGwgd2l0aCAnZnNjJy4NCj4gDQo+ICJmc2MiPyANCj4gDQo+ID4gRGlyZWN0aW8gaXMgX3N5bmNo cm9ub3VzXyBqdXN0IGxpa2UgJ3N5bmMnLiBUaGUgYmlnIGRpZmZlcmVuY2UgaXMgdGhhdA0KPiA+ IHdpdGggJ3N5bmMnIHRoZW4gYXQgbGVhc3QgdGhvc2UgcmVhZHMgYXJlIHN0aWxsIGNhY2hlZC4N Cj4gDQo+IFRoZXJlJ3MgYW5vdGhlciBzY2VuYXJpbywgd2hpY2ggd2UgdGFsa2VkIGFib3V0IGEg d2hpbGUgYmFjaywgd2hlcmUgdGhlDQo+IGNhY2hlZCBhc3luYyByZWFkcyBvZiBhIHNsb3dseSBn cm93aW5nIGZpbGUgKHRhaWwpIHdhcyBzcGl0dGluZyBvdXQNCj4gbm9uLWV4aXN0IE5VTExzIHRv IHVzZXIgc3BhY2UuIFRoZSBmb3JjZWRpcmVjdGlvIG1vdW50IG9wdGlvbiBzaG91bGQNCj4gcHJl dmVudCB0aGF0LiBGdXJ0aGVybW9yZSwgdGhlICJzeW5jIiBtb3VudCBvcHRpb24gd2lsbCBub3Qg aGVscCBhbnltb3JlDQo+IGJlY2F1c2UgeW91IHJlbW92ZWQgbmZzX3JlYWRwYWdlX3N5bmMoKS4N Cg0KTm8uIFNlZSB0aGUgcG9pbnRzIGFib3V0IE9fQVBQRU5EIGFuZCBzZXJpYWxpc2F0aW9uIG9m IHJlYWQoKSBhbmQNCndyaXRlKCkgYWJvdmUuIFlvdSBtYXkgc3RpbGwgZW5kIHVwIHNlZWluZyBO VUwgY2hhcmFjdGVycyAoYW5kIGluZGVlZA0Kd29yc2UgZm9ybXMgb2YgY29ycnVwdGlvbikuDQoN Cj4gPiA+IFRoZSBvdGhlciBoYWNrIHRoYXQgc2VlbXMgdG8gd29yayBpcyBwZXJpb2RpY2FsbHkg dHJpZ2dlcmluZyBhbg0KPiA+ID4gbmZzX2dldGF0dHIoKSwgdmlhIGxzIC1sLCB0byBmb3JjZSB0 aGUgZGlydHkgcGFnZXMgdG8gYmUgZmx1c2hlZCB0byB0aGUNCj4gPiA+IE5GUyBzZXJ2ZXIuIE5v dCBleGFjdGx5IGVsZWdhbnQgLi4uDQo+ID4gDQo+ID4gPz8/Pz8/Pz8/Pz8/Pz8/Pz8/Pz8/Pz8/ Pz8/Pz8/Pz8gDQo+IA0KPiBpbnQgbmZzX2dldGF0dHIoc3RydWN0IHZmc21vdW50ICptbnQsIHN0 cnVjdCBkZW50cnkgKmRlbnRyeSwgc3RydWN0IGtzdGF0ICpzdGF0KQ0KPiB7DQo+ICAgICAgICAg c3RydWN0IGlub2RlICppbm9kZSA9IGRlbnRyeS0+ZF9pbm9kZTsNCj4gICAgICAgICBpbnQgbmVl ZF9hdGltZSA9IE5GU19JKGlub2RlKS0+Y2FjaGVfdmFsaWRpdHkgJiBORlNfSU5PX0lOVkFMSURf QVRJTUU7DQo+ICAgICAgICAgaW50IGVycjsNCj4gDQo+ICAgICAgICAgLyogRmx1c2ggb3V0IHdy aXRlcyB0byB0aGUgc2VydmVyIGluIG9yZGVyIHRvIHVwZGF0ZSBjL210aW1lLiAgKi8NCj4gICAg ICAgICBpZiAoU19JU1JFRyhpbm9kZS0+aV9tb2RlKSkgew0KPiAgICAgICAgICAgICAgICAgZXJy ID0gZmlsZW1hcF93cml0ZV9hbmRfd2FpdChpbm9kZS0+aV9tYXBwaW5nKTsNCj4gICAgICAgICAg ICAgICAgIGlmIChlcnIpDQo+ICAgICAgICAgICAgICAgICAgICAgICAgIGdvdG8gb3V0Ow0KPiAg ICAgICAgIH0NCg0KSSdtIGF3YXJlIG9mIHRoYXQgY29kZS4gVGhlIHBvaW50IGlzIHRoYXQgJy1v c3luYycgZG9lcyB0aGF0IGZvciBmcmVlLg0KDQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXgg TkZTIGNsaWVudCBtYWludGFpbmVyDQoNCk5ldEFwcA0KVHJvbmQuTXlrbGVidXN0QG5ldGFwcC5j b20NCnd3dy5uZXRhcHAuY29tDQoNCg== ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: NFS Mount Option 'nofsc' 2012-02-09 15:31 ` Myklebust, Trond @ 2012-02-10 8:07 ` Harshula 2012-02-10 16:48 ` Myklebust, Trond 0 siblings, 1 reply; 17+ messages in thread From: Harshula @ 2012-02-10 8:07 UTC (permalink / raw) To: Myklebust, Trond; +Cc: Chuck Lever, Derek McEachern, linux-nfs@vger.kernel.org Hi Trond, On Thu, 2012-02-09 at 15:31 +0000, Myklebust, Trond wrote: > On Thu, 2012-02-09 at 16:51 +1100, Harshula wrote: > > Hi Trond, > > > > Thanks for the reply. Could you please elaborate on the subtleties > > involved that require an application to be rewritten if forcedirectio > > mount option was available? > > Firstly, we don't support O_DIRECT+O_APPEND (since the NFS protocol > itself doesn't support atomic appends), so that would break a bunch of > applications. > > Secondly, uncached I/O means that read() and write() requests need to be > serialised by the application itself, since there are no atomicity or > ordering guarantees at the VFS, NFS or RPC call level. Normally, the > page cache services read() requests if there are outstanding writes, and > so provides the atomicity guarantees that POSIX requires. > IOW: if a write() occurs while you are reading, the application may end > up retrieving part of the old data, and part of the new data instead of > either one or the other. > > IOW: your application still needs to be aware of the fact that it is > using O_DIRECT, and you are better of adding explicit support for it > rather than hacky cluges such as a forcedirectio option. Thanks. Would it be accurate to say that if there were only either streaming writes or (xor) streaming reads to any given file on the NFS mount, the application would not need to be rewritten? Do you see forcedirectio as a sharp object that someone could stab themselves with? > > There's another scenario, which we talked about a while back, where the > > cached async reads of a slowly growing file (tail) was spitting out > > non-exist NULLs to user space. The forcedirectio mount option should > > prevent that. Furthermore, the "sync" mount option will not help anymore > > because you removed nfs_readpage_sync(). > > No. See the points about O_APPEND and serialisation of read() and > write() above. You may still end up seeing NUL characters (and indeed > worse forms of corruption). If the NFS client only does cached async reads of a slowly growing file (tail), what's the problem? Is nfs_readpage_sync() gone forever, or could it be revived? > > > > The other hack that seems to work is periodically triggering an > > > > nfs_getattr(), via ls -l, to force the dirty pages to be flushed to the > > > > NFS server. Not exactly elegant ... > > > > > > ???????????????????????????????? > > > > int nfs_getattr(struct vfsmount *mnt, struct dentry *dentry, struct kstat *stat) > > { > > struct inode *inode = dentry->d_inode; > > int need_atime = NFS_I(inode)->cache_validity & NFS_INO_INVALID_ATIME; > > int err; > > > > /* Flush out writes to the server in order to update c/mtime. */ > > if (S_ISREG(inode->i_mode)) { > > err = filemap_write_and_wait(inode->i_mapping); > > if (err) > > goto out; > > } > > I'm aware of that code. The point is that '-osync' does that for free. -osync also impacts the performance of the entire NFS mount. With aforementioned hack, you can isolate the specific file(s) that need their dirty pages to be flushed frequently to avoid hitting global dirty page limit. cya, # ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: NFS Mount Option 'nofsc' 2012-02-10 8:07 ` Harshula @ 2012-02-10 16:48 ` Myklebust, Trond 2012-02-20 5:35 ` Harshula 0 siblings, 1 reply; 17+ messages in thread From: Myklebust, Trond @ 2012-02-10 16:48 UTC (permalink / raw) To: Harshula; +Cc: Chuck Lever, Derek McEachern, linux-nfs@vger.kernel.org T24gRnJpLCAyMDEyLTAyLTEwIGF0IDE5OjA3ICsxMTAwLCBIYXJzaHVsYSB3cm90ZToNCj4gT24g VGh1LCAyMDEyLTAyLTA5IGF0IDE1OjMxICswMDAwLCBNeWtsZWJ1c3QsIFRyb25kIHdyb3RlOg0K PiBUaGFua3MuIFdvdWxkIGl0IGJlIGFjY3VyYXRlIHRvIHNheSB0aGF0IGlmIHRoZXJlIHdlcmUg b25seSBlaXRoZXINCj4gc3RyZWFtaW5nIHdyaXRlcyBvciAoeG9yKSBzdHJlYW1pbmcgcmVhZHMg dG8gYW55IGdpdmVuIGZpbGUgb24gdGhlIE5GUw0KPiBtb3VudCwgdGhlIGFwcGxpY2F0aW9uIHdv dWxkIG5vdCBuZWVkIHRvIGJlIHJld3JpdHRlbj8gDQoNClRoYXQgc2hvdWxkIG5vcm1hbGx5IHdv cmsuDQoNCj4gRG8geW91IHNlZSBmb3JjZWRpcmVjdGlvIGFzIGEgc2hhcnAgb2JqZWN0IHRoYXQg c29tZW9uZSBjb3VsZCBzdGFiDQo+IHRoZW1zZWx2ZXMgd2l0aD8NCg0KWWVzLiBJdCBkb2VzIGxl YWQgdG8gc29tZSB2ZXJ5IHN1YnRsZSBQT1NJWCB2aW9sYXRpb25zLg0KDQo+ID4gPiBUaGVyZSdz IGFub3RoZXIgc2NlbmFyaW8sIHdoaWNoIHdlIHRhbGtlZCBhYm91dCBhIHdoaWxlIGJhY2ssIHdo ZXJlIHRoZQ0KPiA+ID4gY2FjaGVkIGFzeW5jIHJlYWRzIG9mIGEgc2xvd2x5IGdyb3dpbmcgZmls ZSAodGFpbCkgd2FzIHNwaXR0aW5nIG91dA0KPiA+ID4gbm9uLWV4aXN0IE5VTExzIHRvIHVzZXIg c3BhY2UuIFRoZSBmb3JjZWRpcmVjdGlvIG1vdW50IG9wdGlvbiBzaG91bGQNCj4gPiA+IHByZXZl bnQgdGhhdC4gRnVydGhlcm1vcmUsIHRoZSAic3luYyIgbW91bnQgb3B0aW9uIHdpbGwgbm90IGhl bHAgYW55bW9yZQ0KPiA+ID4gYmVjYXVzZSB5b3UgcmVtb3ZlZCBuZnNfcmVhZHBhZ2Vfc3luYygp Lg0KPiA+IA0KPiA+IE5vLiBTZWUgdGhlIHBvaW50cyBhYm91dCBPX0FQUEVORCBhbmQgc2VyaWFs aXNhdGlvbiBvZiByZWFkKCkgYW5kDQo+ID4gd3JpdGUoKSBhYm92ZS4gWW91IG1heSBzdGlsbCBl bmQgdXAgc2VlaW5nIE5VTCBjaGFyYWN0ZXJzIChhbmQgaW5kZWVkDQo+ID4gd29yc2UgZm9ybXMg b2YgY29ycnVwdGlvbikuDQo+IA0KPiBJZiB0aGUgTkZTIGNsaWVudCBvbmx5IGRvZXMgY2FjaGVk IGFzeW5jIHJlYWRzIG9mIGEgc2xvd2x5IGdyb3dpbmcgZmlsZQ0KPiAodGFpbCksIHdoYXQncyB0 aGUgcHJvYmxlbT8gSXMgbmZzX3JlYWRwYWdlX3N5bmMoKSBnb25lIGZvcmV2ZXIsIG9yDQo+IGNv dWxkIGl0IGJlIHJldml2ZWQ/DQoNCkl0IHdvdWxkbid0IGhlbHAgYXQgYWxsLiBUaGUgcHJvYmxl bSBpcyB0aGUgVk0ncyBoYW5kbGluZyBvZiBwYWdlcyB2cw0KdGhlIE5GUyBoYW5kbGluZyBvZiBm aWxlIHNpemUuDQoNClRoZSBWTSBiYXNpY2FsbHkgdXNlcyB0aGUgZmlsZSBzaXplIGluIG9yZGVy IHRvIGRldGVybWluZSBob3cgbXVjaCBkYXRhDQphIHBhZ2UgY29udGFpbnMuIElmIHRoYXQgZmls ZSBzaXplIGNoYW5nZWQgYmV0d2VlbiB0aGUgaW5zdGFuY2Ugd2UNCmZpbmlzaGVkIHRoZSBSRUFE IFJQQyBjYWxsLCBhbmQgdGhlIGluc3RhbmNlIHRoZSBWTSBnZXRzIHJvdW5kIHRvDQpsb2NraW5n IHRoZSBwYWdlIGFnYWluLCByZWFkaW5nIHRoZSBkYXRhIGFuZCB0aGVuIGNoZWNraW5nIHRoZSBm aWxlDQpzaXplLCB0aGVuIHRoZSBWTSBtYXkgZW5kIHVwIGNvcHlpbmcgZGF0YSBiZXlvbmQgdGhl IGVuZCBvZiB0aGF0DQpyZXRyaWV2ZWQgYnkgdGhlIFJQQyBjYWxsLg0KDQo+IC1vc3luYyBhbHNv IGltcGFjdHMgdGhlIHBlcmZvcm1hbmNlIG9mIHRoZSBlbnRpcmUgTkZTIG1vdW50LiBXaXRoDQo+ IGFmb3JlbWVudGlvbmVkIGhhY2ssIHlvdSBjYW4gaXNvbGF0ZSB0aGUgc3BlY2lmaWMgZmlsZShz KSB0aGF0IG5lZWQNCj4gdGhlaXIgZGlydHkgcGFnZXMgdG8gYmUgZmx1c2hlZCBmcmVxdWVudGx5 IHRvIGF2b2lkIGhpdHRpbmcgZ2xvYmFsIGRpcnR5DQo+IHBhZ2UgbGltaXQuDQoNClNvIGRvZXMg Zm9yY2VkaXJlY3Rpby4gLi4uYW5kIGl0IGFsc28gaW1wYWN0cyB0aGUgcGVyZm9ybWFuY2Ugb2Yg cmVhZHMNCmZvciB0aGUgZW50aXJlIE5GUyBtb3VudC4NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QN CkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBu ZXRhcHAuY29tDQp3d3cubmV0YXBwLmNvbQ0KDQo= ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: NFS Mount Option 'nofsc' 2012-02-10 16:48 ` Myklebust, Trond @ 2012-02-20 5:35 ` Harshula 0 siblings, 0 replies; 17+ messages in thread From: Harshula @ 2012-02-20 5:35 UTC (permalink / raw) To: Myklebust, Trond; +Cc: Chuck Lever, Derek McEachern, linux-nfs@vger.kernel.org Hi Trond, On Fri, 2012-02-10 at 16:48 +0000, Myklebust, Trond wrote: > On Fri, 2012-02-10 at 19:07 +1100, Harshula wrote: > > Do you see forcedirectio as a sharp object that someone could stab > > themselves with? > > Yes. It does lead to some very subtle POSIX violations. I'm trying out the alternatives. Your list of reasons were convincing. Thanks. > > If the NFS client only does cached async reads of a slowly growing file > > (tail), what's the problem? Is nfs_readpage_sync() gone forever, or > > could it be revived? > > It wouldn't help at all. The problem is the VM's handling of pages vs > the NFS handling of file size. > > The VM basically uses the file size in order to determine how much data > a page contains. If that file size changed between the instance we > finished the READ RPC call, and the instance the VM gets round to > locking the page again, reading the data and then checking the file > size, then the VM may end up copying data beyond the end of that > retrieved by the RPC call. nfs_readpage_sync() keeps doing rsize reads (or PAGE SIZE reads if rsize > PAGE SIZE) till the entire PAGE has been filled or EOF is hit. Since these are synchronous reads, the subsequent READ RPC call is not sent until the previous READ RPC reply arrives. Hence, the READ RPC reply contains the latest metadata about the file, from the NFS server, before deciding whether or not to do more READ RPC calls. That is not the case with the asynchronous READ RPC calls which are queued to be sent before the replies are received. This results in not READing enough data from the NFS server even when the READ RPC reply explicitly states that the file has grown. This mismatch of data and file size is then presented to the VM. If you look at nfs_readpage_sync() code, it does not worry about adjusting the number of bytes to read if it is past the *current* EOF. Only the async code adjusts the number of bytes to read if it is past the *current* EOF. Furthermore, testing showed that using -osync (while nfs_readpage_sync() existed) avoided the NULLs being presented to userspace. cya, # ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: NFS Mount Option 'nofsc' 2012-02-08 4:55 ` Myklebust, Trond 2012-02-08 7:43 ` Harshula @ 2012-02-08 18:13 ` Derek McEachern 2012-02-08 18:15 ` Chuck Lever 1 sibling, 1 reply; 17+ messages in thread From: Derek McEachern @ 2012-02-08 18:13 UTC (permalink / raw) To: Myklebust, Trond, linux-nfs@vger.kernel.org -------- Original Message -------- Subject: Re: NFS Mount Option 'nofsc' From: Myklebust, Trond <Trond.Myklebust@netapp.com> To: Derek McEachern <derekm@ti.com> CC: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org> Date: Tuesday, February 07, 2012 10:55:04 PM > On Tue, 2012-02-07 at 20:45 -0600, Derek McEachern wrote: >> I joined the mailing list shortly after Neil sent out a request for >> volunteer to update the nfs man page documenting the 'fsc'/'nofsc' >> options. I suspect this may stem from a ticket we opened with Suse >> inquiring about these options. >> >> Coming from a Solaris background we typically use the 'forcedirectio' >> option for certain mounts and I was looking for the same thing in Linux. >> The typically advice seems to be use 'noac' but the description in the >> man page doesn't seem to match what I would expect from 'forcedirectio', >> namely no buffering on the client. >> >> Poking around the kernel I found the 'fsc'/'nofsc' options and my >> question is does 'nofsc' provide 'forcedirectio' functionality? > No. There is no equivalent to the Solaris "forcedirectio" mount option > in Linux. > Applications that need to use uncached i/o are required to use the > O_DIRECT open() mode instead, since pretty much all of them need to be > rewritten to deal with the subtleties involved anyway. > > Trond So then what exact functionality if provided by the 'nofsc' option? It would seem to me from a write perspective that between noac and the sync option it is pretty close to forcedirectio. From the man page describing sync "any system call that writes data to files on that mount point causes that data to be flushed to the server before the system call returns control to user space." Maybe I've answered one of my questions as flushing the data to the server before returning to user space is really what I'm after. The userspace app should be blocked until the write has been acknowledged by the server and if the server is an NFS appliance then I don't necessarily care if it has committed the data to disk as I expect it to managed its cache properly. Though I still want to understand what 'nofsc' is doing. Derek ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: NFS Mount Option 'nofsc' 2012-02-08 18:13 ` Derek McEachern @ 2012-02-08 18:15 ` Chuck Lever 2012-02-08 19:52 ` Derek McEachern 0 siblings, 1 reply; 17+ messages in thread From: Chuck Lever @ 2012-02-08 18:15 UTC (permalink / raw) To: Derek McEachern; +Cc: Myklebust, Trond, linux-nfs@vger.kernel.org On Feb 8, 2012, at 1:13 PM, Derek McEachern wrote: > > > -------- Original Message -------- > Subject: Re: NFS Mount Option 'nofsc' > From: Myklebust, Trond <Trond.Myklebust@netapp.com> > To: Derek McEachern <derekm@ti.com> > CC: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org> > Date: Tuesday, February 07, 2012 10:55:04 PM >> On Tue, 2012-02-07 at 20:45 -0600, Derek McEachern wrote: >>> I joined the mailing list shortly after Neil sent out a request for >>> volunteer to update the nfs man page documenting the 'fsc'/'nofsc' >>> options. I suspect this may stem from a ticket we opened with Suse >>> inquiring about these options. >>> >>> Coming from a Solaris background we typically use the 'forcedirectio' >>> option for certain mounts and I was looking for the same thing in Linux. >>> The typically advice seems to be use 'noac' but the description in the >>> man page doesn't seem to match what I would expect from 'forcedirectio', >>> namely no buffering on the client. >>> >>> Poking around the kernel I found the 'fsc'/'nofsc' options and my >>> question is does 'nofsc' provide 'forcedirectio' functionality? >> No. There is no equivalent to the Solaris "forcedirectio" mount option >> in Linux. >> Applications that need to use uncached i/o are required to use the >> O_DIRECT open() mode instead, since pretty much all of them need to be >> rewritten to deal with the subtleties involved anyway. >> >> Trond > > So then what exact functionality if provided by the 'nofsc' option? It would seem to me from a write perspective that between noac and the sync option it is pretty close to forcedirectio. > > From the man page describing sync "any system call that writes data to files on that mount point causes that data to be flushed to the server before the system call returns control to user space." > > Maybe I've answered one of my questions as flushing the data to the server before returning to user space is really what I'm after. The userspace app should be blocked until the write has been acknowledged by the server and if the server is an NFS appliance then I don't necessarily care if it has committed the data to disk as I expect it to managed its cache properly. > > Though I still want to understand what 'nofsc' is doing. "nofsc" disables file caching on the client's local disk. It has nothing to do with direct I/O. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: NFS Mount Option 'nofsc' 2012-02-08 18:15 ` Chuck Lever @ 2012-02-08 19:52 ` Derek McEachern 2012-02-08 20:00 ` Chuck Lever 0 siblings, 1 reply; 17+ messages in thread From: Derek McEachern @ 2012-02-08 19:52 UTC (permalink / raw) To: Chuck Lever; +Cc: Myklebust, Trond, linux-nfs@vger.kernel.org -------- Original Message -------- Subject: Re: NFS Mount Option 'nofsc' From: Chuck Lever <chuck.lever@oracle.com> To: Derek McEachern <derekm@ti.com> CC: "Myklebust, Trond" <Trond.Myklebust@netapp.com>, "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org> Date: Wednesday, February 08, 2012 12:15:37 PM >> So then what exact functionality if provided by the 'nofsc' option? It would seem to me from a write perspective that between noac and the sync option it is pretty close to forcedirectio. >> >> From the man page describing sync "any system call that writes data to files on that mount point causes that data to be flushed to the server before the system call returns control to user space." >> >> Maybe I've answered one of my questions as flushing the data to the server before returning to user space is really what I'm after. The userspace app should be blocked until the write has been acknowledged by the server and if the server is an NFS appliance then I don't necessarily care if it has committed the data to disk as I expect it to managed its cache properly. >> >> Though I still want to understand what 'nofsc' is doing. > "nofsc" disables file caching on the client's local disk. It has nothing to do with direct I/O. > If 'nofsc' disables file caching on the client's local disk does that mean that write from userspace could go to kernel memory, then potentially to client's local disk, before being committed over network to the nfs server? This seems really odd. What would be the use case for this? Derek ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: NFS Mount Option 'nofsc' 2012-02-08 19:52 ` Derek McEachern @ 2012-02-08 20:00 ` Chuck Lever 2012-02-08 21:16 ` Derek McEachern 0 siblings, 1 reply; 17+ messages in thread From: Chuck Lever @ 2012-02-08 20:00 UTC (permalink / raw) To: Derek McEachern; +Cc: Myklebust, Trond, linux-nfs@vger.kernel.org On Feb 8, 2012, at 2:52 PM, Derek McEachern wrote: > > > -------- Original Message -------- > Subject: Re: NFS Mount Option 'nofsc' > From: Chuck Lever <chuck.lever@oracle.com> > To: Derek McEachern <derekm@ti.com> > CC: "Myklebust, Trond" <Trond.Myklebust@netapp.com>, "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org> > Date: Wednesday, February 08, 2012 12:15:37 PM > >>> So then what exact functionality if provided by the 'nofsc' option? It would seem to me from a write perspective that between noac and the sync option it is pretty close to forcedirectio. >>> >>> From the man page describing sync "any system call that writes data to files on that mount point causes that data to be flushed to the server before the system call returns control to user space." >>> >>> Maybe I've answered one of my questions as flushing the data to the server before returning to user space is really what I'm after. The userspace app should be blocked until the write has been acknowledged by the server and if the server is an NFS appliance then I don't necessarily care if it has committed the data to disk as I expect it to managed its cache properly. >>> >>> Though I still want to understand what 'nofsc' is doing. >> "nofsc" disables file caching on the client's local disk. It has nothing to do with direct I/O. >> > > If 'nofsc' disables file caching on the client's local disk does that mean that write from userspace could go to kernel memory, then potentially to client's local disk, before being committed over network to the nfs server? > > This seems really odd. What would be the use case for this? With "fsc", writes are indeed slower, but reads of a very large file that rarely changes are on average much better. If a file is significantly larger than a client's page cache, a client can cache that file on its local disk, and get local read speeds instead of going over the wire. Additionally if multiple clients have to access the same large file, it reduces the load on the storage server if they have their own local copies of that file, since the file is too large for the clients to cache in their page cache. This also has the benefit of keeping the file data cached across client reboots. This feature is an optimization for HPC workloads, where a large number of clients access very large read-mostly datasets on a handful of storage servers. The clients' local fsc absorbs much of the aggregate read workload, allowing storage servers to scale to a larger number of clients. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: NFS Mount Option 'nofsc' 2012-02-08 20:00 ` Chuck Lever @ 2012-02-08 21:16 ` Derek McEachern 0 siblings, 0 replies; 17+ messages in thread From: Derek McEachern @ 2012-02-08 21:16 UTC (permalink / raw) To: Chuck Lever; +Cc: Myklebust, Trond, linux-nfs@vger.kernel.org -------- Original Message -------- Subject: Re: NFS Mount Option 'nofsc' From: Chuck Lever <chuck.lever@oracle.com> To: Derek McEachern <derekm@ti.com> CC: "Myklebust, Trond" <Trond.Myklebust@netapp.com>, "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org> Date: Wednesday, February 08, 2012 2:00:24 PM > On Feb 8, 2012, at 2:52 PM, Derek McEachern wrote: > >> If 'nofsc' disables file caching on the client's local disk does that mean that write from userspace could go to kernel memory, then potentially to client's local disk, before being committed over network to the nfs server? >> >> This seems really odd. What would be the use case for this? > With "fsc", writes are indeed slower, but reads of a very large file that rarely changes are on average much better. If a file is significantly larger than a client's page cache, a client can cache that file on its local disk, and get local read speeds instead of going over the wire. > > Additionally if multiple clients have to access the same large file, it reduces the load on the storage server if they have their own local copies of that file, since the file is too large for the clients to cache in their page cache. This also has the benefit of keeping the file data cached across client reboots. > > This feature is an optimization for HPC workloads, where a large number of clients access very large read-mostly datasets on a handful of storage servers. The clients' local fsc absorbs much of the aggregate read workload, allowing storage servers to scale to a larger number of clients. > Thank you, this makes sense for 'fsc'. I'm going to assume then that the default is 'nofsc' if nothing is specified. Derek ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2012-02-20 5:36 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-02-08 2:45 NFS Mount Option 'nofsc' Derek McEachern 2012-02-08 4:55 ` Myklebust, Trond 2012-02-08 7:43 ` Harshula 2012-02-08 15:40 ` Chuck Lever 2012-02-09 3:56 ` Harshula 2012-02-09 4:12 ` Myklebust, Trond 2012-02-09 5:51 ` Harshula 2012-02-09 14:48 ` Malahal Naineni 2012-02-09 15:31 ` Myklebust, Trond 2012-02-10 8:07 ` Harshula 2012-02-10 16:48 ` Myklebust, Trond 2012-02-20 5:35 ` Harshula 2012-02-08 18:13 ` Derek McEachern 2012-02-08 18:15 ` Chuck Lever 2012-02-08 19:52 ` Derek McEachern 2012-02-08 20:00 ` Chuck Lever 2012-02-08 21:16 ` Derek McEachern
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).