* Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels [not found] <1696396609.119284.1394040541217.JavaMail.zimbra@xes-inc.com> @ 2014-03-05 17:45 ` Andrew Martin 2014-03-05 20:11 ` Jim Rees ` (2 more replies) 0 siblings, 3 replies; 55+ messages in thread From: Andrew Martin @ 2014-03-05 17:45 UTC (permalink / raw) To: linux-nfs Hello, Is it safe to use the "soft" mount option with proto=tcp on newer kernels (e.g 3.2 and newer)? Currently using the "defaults" nfs mount options on Ubuntu 12.04 results in processes blocking forever in uninterruptable sleep if they attempt to access a mountpoint while the NFS server is offline. I would prefer that NFS simply return an error to the clients after retrying a few times, however I also cannot have data loss. From the man page, I think these options will give that effect? soft,proto=tcp,timeo=10,retrans=3 >From my understanding, this will cause NFS to retry the connection 3 times (once per second), and then if all 3 are unsuccessful return an error to the application. Is this correct? Is there a risk of data loss or corruption by using "soft" in this way? Or is there a better way to approach this? Thanks, Andrew Martin ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-05 17:45 ` Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels Andrew Martin @ 2014-03-05 20:11 ` Jim Rees 2014-03-05 20:41 ` Andrew Martin 2014-03-05 20:15 ` Brian Hawley 2014-03-06 3:50 ` NeilBrown 2 siblings, 1 reply; 55+ messages in thread From: Jim Rees @ 2014-03-05 20:11 UTC (permalink / raw) To: Andrew Martin; +Cc: linux-nfs I prefer hard,intr which lets you interrupt the hung process. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-05 20:11 ` Jim Rees @ 2014-03-05 20:41 ` Andrew Martin 2014-03-05 21:11 ` Jim Rees 0 siblings, 1 reply; 55+ messages in thread From: Andrew Martin @ 2014-03-05 20:41 UTC (permalink / raw) To: Jim Rees; +Cc: linux-nfs ----- Original Message ----- > From: "Jim Rees" <rees@umich.edu> > To: "Andrew Martin" <amartin@xes-inc.com> > Cc: linux-nfs@vger.kernel.org > Sent: Wednesday, March 5, 2014 2:11:49 PM > Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels > > I prefer hard,intr which lets you interrupt the hung process. > Isn't intr/nointr deprecated (since kernel 2.6.25)? ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-05 20:41 ` Andrew Martin @ 2014-03-05 21:11 ` Jim Rees 2014-03-06 3:34 ` NeilBrown 0 siblings, 1 reply; 55+ messages in thread From: Jim Rees @ 2014-03-05 21:11 UTC (permalink / raw) To: Andrew Martin; +Cc: linux-nfs Andrew Martin wrote: Isn't intr/nointr deprecated (since kernel 2.6.25)? It isn't so much that it's deprecated as that it's now the default (except that only SIGKILL will work). ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-05 21:11 ` Jim Rees @ 2014-03-06 3:34 ` NeilBrown 2014-03-06 3:47 ` Jim Rees 0 siblings, 1 reply; 55+ messages in thread From: NeilBrown @ 2014-03-06 3:34 UTC (permalink / raw) To: Jim Rees; +Cc: Andrew Martin, linux-nfs [-- Attachment #1: Type: text/plain, Size: 715 bytes --] On Wed, 5 Mar 2014 16:11:24 -0500 Jim Rees <rees@umich.edu> wrote: > Andrew Martin wrote: > > Isn't intr/nointr deprecated (since kernel 2.6.25)? > > It isn't so much that it's deprecated as that it's now the default (except > that only SIGKILL will work). Not quite correct. Any signal will work providing its behaviour is to kill the process. So SIGKILL will always work, and SIGTERM SIGINT SIGQUIT etc will work providing that aren't caught or ignored by the process. NeilBrown > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 3:34 ` NeilBrown @ 2014-03-06 3:47 ` Jim Rees 2014-03-06 4:37 ` NeilBrown 0 siblings, 1 reply; 55+ messages in thread From: Jim Rees @ 2014-03-06 3:47 UTC (permalink / raw) To: NeilBrown; +Cc: Andrew Martin, linux-nfs NeilBrown wrote: On Wed, 5 Mar 2014 16:11:24 -0500 Jim Rees <rees@umich.edu> wrote: > Andrew Martin wrote: > > Isn't intr/nointr deprecated (since kernel 2.6.25)? > > It isn't so much that it's deprecated as that it's now the default (except > that only SIGKILL will work). Not quite correct. Any signal will work providing its behaviour is to kill the process. So SIGKILL will always work, and SIGTERM SIGINT SIGQUIT etc will work providing that aren't caught or ignored by the process. If that's true, then the man page is wrong and someone should fix it. I'll work up a patch if someone can confirm the behavior. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 3:47 ` Jim Rees @ 2014-03-06 4:37 ` NeilBrown 0 siblings, 0 replies; 55+ messages in thread From: NeilBrown @ 2014-03-06 4:37 UTC (permalink / raw) To: Jim Rees; +Cc: Andrew Martin, linux-nfs [-- Attachment #1: Type: text/plain, Size: 1443 bytes --] On Wed, 5 Mar 2014 22:47:27 -0500 Jim Rees <rees@umich.edu> wrote: > NeilBrown wrote: > > On Wed, 5 Mar 2014 16:11:24 -0500 Jim Rees <rees@umich.edu> wrote: > > > Andrew Martin wrote: > > > > Isn't intr/nointr deprecated (since kernel 2.6.25)? > > > > It isn't so much that it's deprecated as that it's now the default (except > > that only SIGKILL will work). > > Not quite correct. Any signal will work providing its behaviour is to kill > the process. So SIGKILL will always work, and SIGTERM SIGINT SIGQUIT etc > will work providing that aren't caught or ignored by the process. > > If that's true, then the man page is wrong and someone should fix it. I'll > work up a patch if someone can confirm the behavior. I just mounted a filesystem, turned off my network connection, ran "ls -l" and then tried to kill the "ls".... To my surprise, only SIGKILL worked. I looked more closely and discovered that "ls" catches SIGHUP SIGINT SIGQUIT SIGTERM, so those signals won't kill it.... So I tried to "cat" a file on the NFS filesystem. 'cat' doesn't catch any signals. SIGHUP SIGTERM SIGINT all worked on 'cat'. 'df' also responds to 'SIGINT'. It would be nice if 'ls' only caught signals while printing (so it can restore the default colour) and didn't during 'stat' and 'readdir'. But maybe no-one cares enough. So the man page is not quite accurate. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-05 17:45 ` Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels Andrew Martin 2014-03-05 20:11 ` Jim Rees @ 2014-03-05 20:15 ` Brian Hawley 2014-03-05 20:54 ` Chuck Lever 2014-03-06 9:37 ` Ric Wheeler 2014-03-06 3:50 ` NeilBrown 2 siblings, 2 replies; 55+ messages in thread From: Brian Hawley @ 2014-03-05 20:15 UTC (permalink / raw) To: Andrew Martin, linux-nfs-owner, linux-nfs DQpJbiBteSBleHBlcmllbmNlLCB5b3Ugd29uJ3QgZ2V0IHRoZSBpL28gZXJyb3JzIHJlcG9ydGVk IGJhY2sgdG8gdGhlIHJlYWQvd3JpdGUvY2xvc2Ugb3BlcmF0aW9ucy4gICBJIGRvbid0IGtub3cg Zm9yIGNlcnRhaW4sIGJ1dCBJIHN1c3BlY3QgdGhpcyBtYXkgYmUgZHVlIHRvIGNhY2hpbmcgYW5k IGNodW5raW5nIHRvIHR1cm4gSS9vIG1hdGNoaW5nIHRoZSByc2l6ZS93c2l6ZSBzZXR0aW5nczsg YW5kIHBvc3NpYmx5IHRoZSBmYWN0IHRoYXQgdGhlIHBlZXIgZGlzY29ubmVjdGlvbiBpc24ndCBu b3RpY2VkIHVubGVzcyB0aGUgbmZzIHNlcnZlciByZXNldHMgKGllIGNhYmxlIGRpc2Nvbm5lY3Rp b24gaXNuJ3Qgc3VmZmljaWVudCkuDQoNClRoZSBpbmFiaWxpdHkgdG8gZ2V0IHRoZSBpL28gZXJy b3JzIGJhY2sgdG8gdGhlIGFwcGxpY2F0aW9uIGhhcyBiZWVuIGEgbWFqb3IgcGFpbiBmb3IgdXMu DQoNCk9uIGEgbGFyayB3ZSBkaWQgZmluZCB0aGF0IHJlcGVhdGVkIHVubW9udCAtZidzIGRvZXMg Z2V0IGkvbyBlcnJvcnMgYmFjayB0byB0aGUgYXBwbGljYXRpb24sIGJ1dCBpc24ndCBvdXIgcHJl ZmVycmVkIHdheS4NCg0KDQotLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KRnJvbTogQW5kcmV3 IE1hcnRpbiA8YW1hcnRpbkB4ZXMtaW5jLmNvbT4NClNlbmRlcjogbGludXgtbmZzLW93bmVyQHZn ZXIua2VybmVsLm9yZw0KRGF0ZTogCVdlZCwgNSBNYXIgMjAxNCAxMTo0NToyNCANClRvOiA8bGlu dXgtbmZzQHZnZXIua2VybmVsLm9yZz4NClN1YmplY3Q6IE9wdGltYWwgTkZTIG1vdW50IG9wdGlv bnMgdG8gc2FmZWx5IGFsbG93IGludGVycnVwdHMgYW5kIHRpbWVvdXRzDQogb24gbmV3ZXIga2Vy bmVscw0KDQpIZWxsbywNCg0KSXMgaXQgc2FmZSB0byB1c2UgdGhlICJzb2Z0IiBtb3VudCBvcHRp b24gd2l0aCBwcm90bz10Y3Agb24gbmV3ZXIga2VybmVscyAoZS5nDQozLjIgYW5kIG5ld2VyKT8g Q3VycmVudGx5IHVzaW5nIHRoZSAiZGVmYXVsdHMiIG5mcyBtb3VudCBvcHRpb25zIG9uIFVidW50 dQ0KMTIuMDQgcmVzdWx0cyBpbiBwcm9jZXNzZXMgYmxvY2tpbmcgZm9yZXZlciBpbiB1bmludGVy cnVwdGFibGUgc2xlZXAgaWYgdGhleQ0KYXR0ZW1wdCB0byBhY2Nlc3MgYSBtb3VudHBvaW50IHdo aWxlIHRoZSBORlMgc2VydmVyIGlzIG9mZmxpbmUuIEkgd291bGQgcHJlZmVyDQp0aGF0IE5GUyBz aW1wbHkgcmV0dXJuIGFuIGVycm9yIHRvIHRoZSBjbGllbnRzIGFmdGVyIHJldHJ5aW5nIGEgZmV3 IHRpbWVzLCANCmhvd2V2ZXIgSSBhbHNvIGNhbm5vdCBoYXZlIGRhdGEgbG9zcy4gRnJvbSB0aGUg bWFuIHBhZ2UsIEkgdGhpbmsgdGhlc2Ugb3B0aW9ucw0Kd2lsbCBnaXZlIHRoYXQgZWZmZWN0Pw0K c29mdCxwcm90bz10Y3AsdGltZW89MTAscmV0cmFucz0zDQoNCj5Gcm9tIG15IHVuZGVyc3RhbmRp bmcsIHRoaXMgd2lsbCBjYXVzZSBORlMgdG8gcmV0cnkgdGhlIGNvbm5lY3Rpb24gMyB0aW1lcyAo b25jZQ0KcGVyIHNlY29uZCksIGFuZCB0aGVuIGlmIGFsbCAzIGFyZSB1bnN1Y2Nlc3NmdWwgcmV0 dXJuIGFuIGVycm9yIHRvIHRoZQ0KYXBwbGljYXRpb24uIElzIHRoaXMgY29ycmVjdD8gSXMgdGhl cmUgYSByaXNrIG9mIGRhdGEgbG9zcyBvciBjb3JydXB0aW9uIGJ5DQp1c2luZyAic29mdCIgaW4g dGhpcyB3YXk/IE9yIGlzIHRoZXJlIGEgYmV0dGVyIHdheSB0byBhcHByb2FjaCB0aGlzPw0KDQpU aGFua3MsDQoNCkFuZHJldyBNYXJ0aW4NCi0tDQpUbyB1bnN1YnNjcmliZSBmcm9tIHRoaXMgbGlz dDogc2VuZCB0aGUgbGluZSAidW5zdWJzY3JpYmUgbGludXgtbmZzIiBpbg0KdGhlIGJvZHkgb2Yg YSBtZXNzYWdlIHRvIG1ham9yZG9tb0B2Z2VyLmtlcm5lbC5vcmcNCk1vcmUgbWFqb3Jkb21vIGlu Zm8gYXQgIGh0dHA6Ly92Z2VyLmtlcm5lbC5vcmcvbWFqb3Jkb21vLWluZm8uaHRtbA0KDQo= ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-05 20:15 ` Brian Hawley @ 2014-03-05 20:54 ` Chuck Lever 2014-03-06 9:37 ` Ric Wheeler 1 sibling, 0 replies; 55+ messages in thread From: Chuck Lever @ 2014-03-05 20:54 UTC (permalink / raw) To: bhawley; +Cc: Andrew Martin, linux-nfs-owner, Linux NFS Mailing List On Mar 5, 2014, at 3:15 PM, Brian Hawley <bhawley@luminex.com> wrote: > > In my experience, you won't get the i/o errors reported back to the read/write/close operations. I don't know for certain, but I suspect this may be due to caching and chunking to turn I/o matching the rsize/wsize settings; and possibly the fact that the peer disconnection isn't noticed unless the nfs server resets (ie cable disconnection isn't sufficient). > > The inability to get the i/o errors back to the application has been a major pain for us. > > On a lark we did find that repeated unmont -f's does get i/o errors back to the application, but isn't our preferred way. > > > -----Original Message----- > From: Andrew Martin <amartin@xes-inc.com> > Sender: linux-nfs-owner@vger.kernel.org > Date: Wed, 5 Mar 2014 11:45:24 > To: <linux-nfs@vger.kernel.org> > Subject: Optimal NFS mount options to safely allow interrupts and timeouts > on newer kernels > > Hello, > > Is it safe to use the "soft" mount option with proto=tcp on newer kernels (e.g > 3.2 and newer)? Currently using the "defaults" nfs mount options on Ubuntu > 12.04 results in processes blocking forever in uninterruptable sleep if they > attempt to access a mountpoint while the NFS server is offline. I would prefer > that NFS simply return an error to the clients after retrying a few times, > however I also cannot have data loss. From the man page, I think these options > will give that effect? > soft,proto=tcp,timeo=10,retrans=3 > >> From my understanding, this will cause NFS to retry the connection 3 times (once > per second), and then if all 3 are unsuccessful return an error to the > application. Is this correct? Is there a risk of data loss or corruption by > using "soft" in this way? Or is there a better way to approach this? There is always a silent data corruption risk with “soft.” Using TCP and a long retransmit timeout mitigates the risk, but it is still there. A one second timeout for TCP is very short, and will almost certainly result in trouble, especially if the server or network are slow. You should be able to ^C any waiting NFS process. Blocking forever is usually the sign of a bug. In general, NFS is not especially tolerant of server unavailability. You may want to consider some other distributed file system protocol that is more fault-tolerant, or find ways to ensure your NFS servers are always accessible. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-05 20:15 ` Brian Hawley 2014-03-05 20:54 ` Chuck Lever @ 2014-03-06 9:37 ` Ric Wheeler 1 sibling, 0 replies; 55+ messages in thread From: Ric Wheeler @ 2014-03-06 9:37 UTC (permalink / raw) To: bhawley, Andrew Martin, linux-nfs-owner, linux-nfs On 03/05/2014 10:15 PM, Brian Hawley wrote: > In my experience, you won't get the i/o errors reported back to the read/write/close operations. I don't know for certain, but I suspect this may be due to caching and chunking to turn I/o matching the rsize/wsize settings; and possibly the fact that the peer disconnection isn't noticed unless the nfs server resets (ie cable disconnection isn't sufficient). > > The inability to get the i/o errors back to the application has been a major pain for us. > > On a lark we did find that repeated unmont -f's does get i/o errors back to the application, but isn't our preferred way. The key to get IO errors promptly is to make sure you use fsync/fdatasync (and so on) when you hit those points in your application that are where you want to recover from if things crash, get disconnected, etc. Those will push out the data from the page cache while your application is still around which is critical for any potential need to do recovery. Note that this is not just an issue with NFS, any file system (including local file systems) normally completes the write request when the IO hits the page cache. When that page eventually gets sent down to the permanent storage device (NFS server, local disk, etc), your process is potentially no longer around and certainly not waiting for IO errors in the original write call :) To make this even trickier is that the calls like fsync() that persist data have a substantial performance impact, so you don't want to over-use them. (Try writing a 1GB file with an fsync() before close and comparing that to writing a 1GB file opened in O_DIRECT|O_SYNC mode for the worst case for example :)) Ric ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-05 17:45 ` Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels Andrew Martin 2014-03-05 20:11 ` Jim Rees 2014-03-05 20:15 ` Brian Hawley @ 2014-03-06 3:50 ` NeilBrown 2014-03-06 5:03 ` Andrew Martin 2 siblings, 1 reply; 55+ messages in thread From: NeilBrown @ 2014-03-06 3:50 UTC (permalink / raw) To: Andrew Martin; +Cc: linux-nfs [-- Attachment #1: Type: text/plain, Size: 1718 bytes --] On Wed, 5 Mar 2014 11:45:24 -0600 (CST) Andrew Martin <amartin@xes-inc.com> wrote: > Hello, > > Is it safe to use the "soft" mount option with proto=tcp on newer kernels (e.g > 3.2 and newer)? Currently using the "defaults" nfs mount options on Ubuntu > 12.04 results in processes blocking forever in uninterruptable sleep if they > attempt to access a mountpoint while the NFS server is offline. I would prefer > that NFS simply return an error to the clients after retrying a few times, > however I also cannot have data loss. From the man page, I think these options > will give that effect? > soft,proto=tcp,timeo=10,retrans=3 > > >From my understanding, this will cause NFS to retry the connection 3 times (once > per second), and then if all 3 are unsuccessful return an error to the > application. Is this correct? Is there a risk of data loss or corruption by > using "soft" in this way? Or is there a better way to approach this? I think your best bet is to use an auto-mounter so that the filesystem gets unmounted if the server isn't available. "soft" always implies the risk of data loss. "Nulls Frequently Substituted" as it was described to very many years ago. Possibly it would be good to have something between 'hard' and 'soft' for cases like yours (you aren't the first to ask). From http://docstore.mik.ua/orelly/networking/puis/ch20_01.htm BSDI and OSF /1 also have a spongy option that is similar to hard , except that the stat, lookup, fsstat, readlink, and readdir operations behave like a soft MOUNT . Linux doesn't have 'spongy'. Maybe it could. Or maybe it was a failed experiment and there are good reasons not to want it. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 3:50 ` NeilBrown @ 2014-03-06 5:03 ` Andrew Martin 2014-03-06 5:37 ` NeilBrown 2014-03-06 12:34 ` Jim Rees 0 siblings, 2 replies; 55+ messages in thread From: Andrew Martin @ 2014-03-06 5:03 UTC (permalink / raw) To: NeilBrown; +Cc: linux-nfs ----- Original Message ----- > From: "NeilBrown" <neilb@suse.de> > To: "Andrew Martin" <amartin@xes-inc.com> > Cc: linux-nfs@vger.kernel.org > Sent: Wednesday, March 5, 2014 9:50:42 PM > Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels > > On Wed, 5 Mar 2014 11:45:24 -0600 (CST) Andrew Martin <amartin@xes-inc.com> > wrote: > > > Hello, > > > > Is it safe to use the "soft" mount option with proto=tcp on newer kernels > > (e.g > > 3.2 and newer)? Currently using the "defaults" nfs mount options on Ubuntu > > 12.04 results in processes blocking forever in uninterruptable sleep if > > they > > attempt to access a mountpoint while the NFS server is offline. I would > > prefer > > that NFS simply return an error to the clients after retrying a few times, > > however I also cannot have data loss. From the man page, I think these > > options > > will give that effect? > > soft,proto=tcp,timeo=10,retrans=3 > > > > >From my understanding, this will cause NFS to retry the connection 3 times > > >(once > > per second), and then if all 3 are unsuccessful return an error to the > > application. Is this correct? Is there a risk of data loss or corruption by > > using "soft" in this way? Or is there a better way to approach this? > > I think your best bet is to use an auto-mounter so that the filesystem gets > unmounted if the server isn't available. Would this still succeed in unmounting the filesystem if there are already processes requesting files from it (and blocking in uninterruptable sleep)? > "soft" always implies the risk of data loss. "Nulls Frequently Substituted" > as it was described to very many years ago. > > Possibly it would be good to have something between 'hard' and 'soft' for > cases like yours (you aren't the first to ask). > > From http://docstore.mik.ua/orelly/networking/puis/ch20_01.htm > > BSDI and OSF /1 also have a spongy option that is similar to hard , except > that the stat, lookup, fsstat, readlink, and readdir operations behave > like a soft MOUNT . > > Linux doesn't have 'spongy'. Maybe it could. Or maybe it was a failed > experiment and there are good reasons not to want it. The problem that sparked this question is a webserver where apache can serve files from an NFS mount. If the NFS server becomes unavailable, then the apache processes block in uninterruptable sleep and drive the load very high, forcing a server restart. It would be better for this case if the mount would simply return an error to apache, so that it would give up rather than blocking forever and taking down the system. Can such behavior be achieved safely? ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 5:03 ` Andrew Martin @ 2014-03-06 5:37 ` NeilBrown 2014-03-06 5:47 ` Brian Hawley 2014-03-06 12:34 ` Jim Rees 1 sibling, 1 reply; 55+ messages in thread From: NeilBrown @ 2014-03-06 5:37 UTC (permalink / raw) To: Andrew Martin; +Cc: linux-nfs [-- Attachment #1: Type: text/plain, Size: 3989 bytes --] On Wed, 5 Mar 2014 23:03:43 -0600 (CST) Andrew Martin <amartin@xes-inc.com> wrote: > ----- Original Message ----- > > From: "NeilBrown" <neilb@suse.de> > > To: "Andrew Martin" <amartin@xes-inc.com> > > Cc: linux-nfs@vger.kernel.org > > Sent: Wednesday, March 5, 2014 9:50:42 PM > > Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels > > > > On Wed, 5 Mar 2014 11:45:24 -0600 (CST) Andrew Martin <amartin@xes-inc.com> > > wrote: > > > > > Hello, > > > > > > Is it safe to use the "soft" mount option with proto=tcp on newer kernels > > > (e.g > > > 3.2 and newer)? Currently using the "defaults" nfs mount options on Ubuntu > > > 12.04 results in processes blocking forever in uninterruptable sleep if > > > they > > > attempt to access a mountpoint while the NFS server is offline. I would > > > prefer > > > that NFS simply return an error to the clients after retrying a few times, > > > however I also cannot have data loss. From the man page, I think these > > > options > > > will give that effect? > > > soft,proto=tcp,timeo=10,retrans=3 > > > > > > >From my understanding, this will cause NFS to retry the connection 3 times > > > >(once > > > per second), and then if all 3 are unsuccessful return an error to the > > > application. Is this correct? Is there a risk of data loss or corruption by > > > using "soft" in this way? Or is there a better way to approach this? > > > > I think your best bet is to use an auto-mounter so that the filesystem gets > > unmounted if the server isn't available. > Would this still succeed in unmounting the filesystem if there are already > processes requesting files from it (and blocking in uninterruptable sleep)? The kernel would allow a 'lazy' unmount in this case. I don't know if any automounter would try a lazy unmount though - I suspect not. A long time ago I used "amd" which would create syslinks to a separate tree where the filesystems were mounted. I'm pretty sure that when a server went away the symlink would disappear even if the unmount failed. So while any processes accessing the filesystem would block, new processes would not be able to find the filesystem and so would not block. > > > "soft" always implies the risk of data loss. "Nulls Frequently Substituted" > > as it was described to very many years ago. > > > > Possibly it would be good to have something between 'hard' and 'soft' for > > cases like yours (you aren't the first to ask). > > > > From http://docstore.mik.ua/orelly/networking/puis/ch20_01.htm > > > > BSDI and OSF /1 also have a spongy option that is similar to hard , except > > that the stat, lookup, fsstat, readlink, and readdir operations behave > > like a soft MOUNT . > > > > Linux doesn't have 'spongy'. Maybe it could. Or maybe it was a failed > > experiment and there are good reasons not to want it. > > The problem that sparked this question is a webserver where apache can serve > files from an NFS mount. If the NFS server becomes unavailable, then the apache > processes block in uninterruptable sleep and drive the load very high, forcing > a server restart. It would be better for this case if the mount would simply > return an error to apache, so that it would give up rather than blocking > forever and taking down the system. Can such behavior be achieved safely? If you have a monitoring program that notices this high load you can try umount -f /mount/point The "-f" should cause outstanding requests to fail. That doesn't stop more requests being made though so it might not be completely successful. Possibly running it several times would help. mount --move /mount/point /somewhere/safe for i in {1..15}; do umount -f /somewhere/safe; done might be even better, if you can get "mount --move" to work. It doesn't work for me, probably the fault of systemd (isn't everything :-)). NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 5:37 ` NeilBrown @ 2014-03-06 5:47 ` Brian Hawley 2014-03-06 15:30 ` Andrew Martin 0 siblings, 1 reply; 55+ messages in thread From: Brian Hawley @ 2014-03-06 5:47 UTC (permalink / raw) To: NeilBrown, linux-nfs-owner, Andrew Martin; +Cc: linux-nfs DQpJIGVuZGVkIHVwIHdyaXRpbmcgYSAibWFuYWdlX21vdW50cyIgc2NyaXB0IHJ1biBieSBjcm9u IHRoYXQgY29tcGFyZXMgL3Byb2MvbW91bnRzIGFuZCB0aGUgZnN0YWIsIHVzZWQgcGluZywgYW5k ICJ0aW1lb3V0IiBtZXNzYWdlcyBpbiAvdmFyL2xvZy9tZXNzYWdlcyB0byBpZGVudGlmeSBmaWxl c3lzdGVtcyB0aGF0IGFyZW4ndCByZXNwb25kaW5nLCByZXBlYXRlZGx5IGRvIHVtb3VudCAtZiB0 byBmb3JjZSBpL28gZXJyb3JzIGJhY2sgdG8gdGhlIGNhbGxpbmcgYXBwbGljYXRpb25zOyBhbmQg d2hlbiBtaXNzaW5nIG1vdW50cyAoaW4gZnN0YWIgYnV0IG5vdCAvcHJvYy9tb3VudHMpIGJ1dCB3 ZXJlIG5vdyBwaW5nYWJsZSwgYXR0ZW1wdCB0byByZW1vdW50IHRoZW0uDQoNCg0KRm9yIG1lLCB0 aW1lbyBhbmQgcmV0cmFucyBhcmUgbmVjZXNzYXJ5LCBidXQgbm90IHN1ZmZpY2llbnQuICBUaGUg Y2h1bmtpbmcgdG8gcnNpemUvd3NpemUgYW5kIGNhY2hpbmcgcGxheXMgYSByb2xlIGluIGhvdyB3 ZWxsIGkvbyBlcnJvcnMgZ2V0IHJlbGF5ZWQgYmFjayB0byB0aGUgYXBwbGljYXRpb25zIGRvaW5n IHRoZSBpL28uDQoNCllvdSB3aWxsIGNlcnRhaW5seSBsb3NlIGRhdGEgaW4gdGhlc2Ugc2NlbmFy aW8ncy4NCg0KSXQgd291bGQgYmUgZmFudGFzdGljIGlmIHNvbWVob3cgdGhlIHRpbWVvIGFuZCBy ZXRyYW5zIHdlcmUgc3VmZmljaWVudCAoaWUgd2hlbiB0aGV5IGZhaWwsIGkvbyBlcnJvcnMgZ2V0 IGJhY2sgdG8gdGhlIGFwcGxpY2F0aW9ucyB0aGF0IHF1ZXVlZCB0aGF0IGkvbyAob3IgZXZlbiB0 aGUgaS9vIHRoYXQgY2F1c2UgdGhlIGFwcGxpY2F0aW9uIHRvIHBlbmQgYmVjYXVzZSB0aGUgcnNp emUvd3NpemUgb3IgY2FjaGUgd2FzIGZ1bGwpLiAgIA0KDQpZb3UgY2FuIGVsaW1pbmF0ZSBzb21l IG9mIHRoYXQgYmVoYXZpb3Igd2l0aCBzeW5jL2RpcmVjdGlvLCBidXQgcGVyZm9ybWFuY2UgYmVj b21lcyBhYnlzbWFsLg0KDQpJIHRyaWVkICJsYXp5IiBpdCBkaWRuJ3QgcHJvdmlkZSB0aGUgZGVz aXJlZCBlZmZlY3QgKHRoZXkgdW5tb3VudGVkIHdoaWNoIHByZXZlbnRlZCBuZXcgaS9vJ3M7IGJ1 dCBleGlzdGluZyBJL28ncyBuZXZlciBnb3QgZXJyb3JzKS4NCg0KDQotLS0tLU9yaWdpbmFsIE1l c3NhZ2UtLS0tLQ0KRnJvbTogTmVpbEJyb3duIDxuZWlsYkBzdXNlLmRlPg0KU2VuZGVyOiBsaW51 eC1uZnMtb3duZXJAdmdlci5rZXJuZWwub3JnDQpEYXRlOiAJVGh1LCA2IE1hciAyMDE0IDE2OjM3 OjIxIA0KVG86IEFuZHJldyBNYXJ0aW48YW1hcnRpbkB4ZXMtaW5jLmNvbT4NCkNjOiA8bGludXgt bmZzQHZnZXIua2VybmVsLm9yZz4NClN1YmplY3Q6IFJlOiBPcHRpbWFsIE5GUyBtb3VudCBvcHRp b25zIHRvIHNhZmVseSBhbGxvdyBpbnRlcnJ1cHRzIGFuZA0KIHRpbWVvdXRzIG9uIG5ld2VyIGtl cm5lbHMNCg0KT24gV2VkLCA1IE1hciAyMDE0IDIzOjAzOjQzIC0wNjAwIChDU1QpIEFuZHJldyBN YXJ0aW4gPGFtYXJ0aW5AeGVzLWluYy5jb20+DQp3cm90ZToNCg0KPiAtLS0tLSBPcmlnaW5hbCBN ZXNzYWdlIC0tLS0tDQo+ID4gRnJvbTogIk5laWxCcm93biIgPG5laWxiQHN1c2UuZGU+DQo+ID4g VG86ICJBbmRyZXcgTWFydGluIiA8YW1hcnRpbkB4ZXMtaW5jLmNvbT4NCj4gPiBDYzogbGludXgt bmZzQHZnZXIua2VybmVsLm9yZw0KPiA+IFNlbnQ6IFdlZG5lc2RheSwgTWFyY2ggNSwgMjAxNCA5 OjUwOjQyIFBNDQo+ID4gU3ViamVjdDogUmU6IE9wdGltYWwgTkZTIG1vdW50IG9wdGlvbnMgdG8g c2FmZWx5IGFsbG93IGludGVycnVwdHMgYW5kIHRpbWVvdXRzIG9uIG5ld2VyIGtlcm5lbHMNCj4g PiANCj4gPiBPbiBXZWQsIDUgTWFyIDIwMTQgMTE6NDU6MjQgLTA2MDAgKENTVCkgQW5kcmV3IE1h cnRpbiA8YW1hcnRpbkB4ZXMtaW5jLmNvbT4NCj4gPiB3cm90ZToNCj4gPiANCj4gPiA+IEhlbGxv LA0KPiA+ID4gDQo+ID4gPiBJcyBpdCBzYWZlIHRvIHVzZSB0aGUgInNvZnQiIG1vdW50IG9wdGlv biB3aXRoIHByb3RvPXRjcCBvbiBuZXdlciBrZXJuZWxzDQo+ID4gPiAoZS5nDQo+ID4gPiAzLjIg YW5kIG5ld2VyKT8gQ3VycmVudGx5IHVzaW5nIHRoZSAiZGVmYXVsdHMiIG5mcyBtb3VudCBvcHRp b25zIG9uIFVidW50dQ0KPiA+ID4gMTIuMDQgcmVzdWx0cyBpbiBwcm9jZXNzZXMgYmxvY2tpbmcg Zm9yZXZlciBpbiB1bmludGVycnVwdGFibGUgc2xlZXAgaWYNCj4gPiA+IHRoZXkNCj4gPiA+IGF0 dGVtcHQgdG8gYWNjZXNzIGEgbW91bnRwb2ludCB3aGlsZSB0aGUgTkZTIHNlcnZlciBpcyBvZmZs aW5lLiBJIHdvdWxkDQo+ID4gPiBwcmVmZXINCj4gPiA+IHRoYXQgTkZTIHNpbXBseSByZXR1cm4g YW4gZXJyb3IgdG8gdGhlIGNsaWVudHMgYWZ0ZXIgcmV0cnlpbmcgYSBmZXcgdGltZXMsDQo+ID4g PiBob3dldmVyIEkgYWxzbyBjYW5ub3QgaGF2ZSBkYXRhIGxvc3MuIEZyb20gdGhlIG1hbiBwYWdl LCBJIHRoaW5rIHRoZXNlDQo+ID4gPiBvcHRpb25zDQo+ID4gPiB3aWxsIGdpdmUgdGhhdCBlZmZl Y3Q/DQo+ID4gPiBzb2Z0LHByb3RvPXRjcCx0aW1lbz0xMCxyZXRyYW5zPTMNCj4gPiA+IA0KPiA+ ID4gPkZyb20gbXkgdW5kZXJzdGFuZGluZywgdGhpcyB3aWxsIGNhdXNlIE5GUyB0byByZXRyeSB0 aGUgY29ubmVjdGlvbiAzIHRpbWVzDQo+ID4gPiA+KG9uY2UNCj4gPiA+IHBlciBzZWNvbmQpLCBh bmQgdGhlbiBpZiBhbGwgMyBhcmUgdW5zdWNjZXNzZnVsIHJldHVybiBhbiBlcnJvciB0byB0aGUN Cj4gPiA+IGFwcGxpY2F0aW9uLiBJcyB0aGlzIGNvcnJlY3Q/IElzIHRoZXJlIGEgcmlzayBvZiBk YXRhIGxvc3Mgb3IgY29ycnVwdGlvbiBieQ0KPiA+ID4gdXNpbmcgInNvZnQiIGluIHRoaXMgd2F5 PyBPciBpcyB0aGVyZSBhIGJldHRlciB3YXkgdG8gYXBwcm9hY2ggdGhpcz8NCj4gPiANCj4gPiBJ IHRoaW5rIHlvdXIgYmVzdCBiZXQgaXMgdG8gdXNlIGFuIGF1dG8tbW91bnRlciBzbyB0aGF0IHRo ZSBmaWxlc3lzdGVtIGdldHMNCj4gPiB1bm1vdW50ZWQgaWYgdGhlIHNlcnZlciBpc24ndCBhdmFp bGFibGUuDQo+IFdvdWxkIHRoaXMgc3RpbGwgc3VjY2VlZCBpbiB1bm1vdW50aW5nIHRoZSBmaWxl c3lzdGVtIGlmIHRoZXJlIGFyZSBhbHJlYWR5DQo+IHByb2Nlc3NlcyByZXF1ZXN0aW5nIGZpbGVz IGZyb20gaXQgKGFuZCBibG9ja2luZyBpbiB1bmludGVycnVwdGFibGUgc2xlZXApPw0KDQpUaGUg a2VybmVsIHdvdWxkIGFsbG93IGEgJ2xhenknIHVubW91bnQgaW4gdGhpcyBjYXNlLiAgSSBkb24n dCBrbm93IGlmIGFueQ0KYXV0b21vdW50ZXIgd291bGQgdHJ5IGEgbGF6eSB1bm1vdW50IHRob3Vn aCAtIEkgc3VzcGVjdCBub3QuDQoNCkEgbG9uZyB0aW1lIGFnbyBJIHVzZWQgImFtZCIgd2hpY2gg d291bGQgY3JlYXRlIHN5c2xpbmtzIHRvIGEgc2VwYXJhdGUgdHJlZQ0Kd2hlcmUgdGhlIGZpbGVz eXN0ZW1zIHdlcmUgbW91bnRlZC4gIEknbSBwcmV0dHkgc3VyZSB0aGF0IHdoZW4gYSBzZXJ2ZXIg d2VudA0KYXdheSB0aGUgc3ltbGluayB3b3VsZCBkaXNhcHBlYXIgZXZlbiBpZiB0aGUgdW5tb3Vu dCBmYWlsZWQuDQpTbyB3aGlsZSBhbnkgcHJvY2Vzc2VzIGFjY2Vzc2luZyB0aGUgZmlsZXN5c3Rl bSB3b3VsZCBibG9jaywgbmV3IHByb2Nlc3Nlcw0Kd291bGQgbm90IGJlIGFibGUgdG8gZmluZCB0 aGUgZmlsZXN5c3RlbSBhbmQgc28gd291bGQgbm90IGJsb2NrLg0KDQoNCj4gDQo+ID4gInNvZnQi IGFsd2F5cyBpbXBsaWVzIHRoZSByaXNrIG9mIGRhdGEgbG9zcy4gICJOdWxscyBGcmVxdWVudGx5 IFN1YnN0aXR1dGVkIg0KPiA+IGFzIGl0IHdhcyBkZXNjcmliZWQgdG8gdmVyeSBtYW55IHllYXJz IGFnby4NCj4gPiANCj4gPiBQb3NzaWJseSBpdCB3b3VsZCBiZSBnb29kIHRvIGhhdmUgc29tZXRo aW5nIGJldHdlZW4gJ2hhcmQnIGFuZCAnc29mdCcgZm9yDQo+ID4gY2FzZXMgbGlrZSB5b3VycyAo eW91IGFyZW4ndCB0aGUgZmlyc3QgdG8gYXNrKS4NCj4gPiANCj4gPiAgRnJvbSBodHRwOi8vZG9j c3RvcmUubWlrLnVhL29yZWxseS9uZXR3b3JraW5nL3B1aXMvY2gyMF8wMS5odG0NCj4gPiANCj4g PiAgICBCU0RJIGFuZCBPU0YgLzEgYWxzbyBoYXZlIGEgc3Bvbmd5IG9wdGlvbiB0aGF0IGlzIHNp bWlsYXIgdG8gaGFyZCAsIGV4Y2VwdA0KPiA+ICAgIHRoYXQgdGhlIHN0YXQsIGxvb2t1cCwgZnNz dGF0LCByZWFkbGluaywgYW5kIHJlYWRkaXIgb3BlcmF0aW9ucyBiZWhhdmUNCj4gPiAgICBsaWtl IGEgc29mdCBNT1VOVCAuDQo+ID4gDQo+ID4gTGludXggZG9lc24ndCBoYXZlICdzcG9uZ3knLiAg TWF5YmUgaXQgY291bGQuICBPciBtYXliZSBpdCB3YXMgYSBmYWlsZWQNCj4gPiBleHBlcmltZW50 IGFuZCB0aGVyZSBhcmUgZ29vZCByZWFzb25zIG5vdCB0byB3YW50IGl0Lg0KPiANCj4gVGhlIHBy b2JsZW0gdGhhdCBzcGFya2VkIHRoaXMgcXVlc3Rpb24gaXMgYSB3ZWJzZXJ2ZXIgd2hlcmUgYXBh Y2hlIGNhbiBzZXJ2ZQ0KPiBmaWxlcyBmcm9tIGFuIE5GUyBtb3VudC4gSWYgdGhlIE5GUyBzZXJ2 ZXIgYmVjb21lcyB1bmF2YWlsYWJsZSwgdGhlbiB0aGUgYXBhY2hlDQo+IHByb2Nlc3NlcyBibG9j ayBpbiB1bmludGVycnVwdGFibGUgc2xlZXAgYW5kIGRyaXZlIHRoZSBsb2FkIHZlcnkgaGlnaCwg Zm9yY2luZw0KPiBhIHNlcnZlciByZXN0YXJ0LiBJdCB3b3VsZCBiZSBiZXR0ZXIgZm9yIHRoaXMg Y2FzZSBpZiB0aGUgbW91bnQgd291bGQgc2ltcGx5IA0KPiByZXR1cm4gYW4gZXJyb3IgdG8gYXBh Y2hlLCBzbyB0aGF0IGl0IHdvdWxkIGdpdmUgdXAgcmF0aGVyIHRoYW4gYmxvY2tpbmcgDQo+IGZv cmV2ZXIgYW5kIHRha2luZyBkb3duIHRoZSBzeXN0ZW0uIENhbiBzdWNoIGJlaGF2aW9yIGJlIGFj aGlldmVkIHNhZmVseT8NCg0KSWYgeW91IGhhdmUgYSBtb25pdG9yaW5nIHByb2dyYW0gdGhhdCBu b3RpY2VzIHRoaXMgaGlnaCBsb2FkIHlvdSBjYW4gdHJ5DQogIHVtb3VudCAtZiAvbW91bnQvcG9p bnQNCg0KVGhlICItZiIgc2hvdWxkIGNhdXNlIG91dHN0YW5kaW5nIHJlcXVlc3RzIHRvIGZhaWwu ICBUaGF0IGRvZXNuJ3Qgc3RvcCBtb3JlDQpyZXF1ZXN0cyBiZWluZyBtYWRlIHRob3VnaCBzbyBp dCBtaWdodCBub3QgYmUgY29tcGxldGVseSBzdWNjZXNzZnVsLg0KUG9zc2libHkgcnVubmluZyBp dCBzZXZlcmFsIHRpbWVzIHdvdWxkIGhlbHAuDQoNCiAgbW91bnQgLS1tb3ZlIC9tb3VudC9wb2lu dCAvc29tZXdoZXJlL3NhZmUNCiAgZm9yIGkgaW4gezEuLjE1fTsgZG8gdW1vdW50IC1mIC9zb21l d2hlcmUvc2FmZTsgZG9uZQ0KDQptaWdodCBiZSBldmVuIGJldHRlciwgaWYgeW91IGNhbiBnZXQg Im1vdW50IC0tbW92ZSIgdG8gd29yay4gIEl0IGRvZXNuJ3Qgd29yaw0KZm9yIG1lLCBwcm9iYWJs eSB0aGUgZmF1bHQgb2Ygc3lzdGVtZCAoaXNuJ3QgZXZlcnl0aGluZyA6LSkpLg0KDQpOZWlsQnJv d24NCg0KDQoNCg== ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 5:47 ` Brian Hawley @ 2014-03-06 15:30 ` Andrew Martin 2014-03-06 16:22 ` Jim Rees 2014-03-06 18:56 ` Brian Hawley 0 siblings, 2 replies; 55+ messages in thread From: Andrew Martin @ 2014-03-06 15:30 UTC (permalink / raw) To: bhawley; +Cc: NeilBrown, linux-nfs-owner, linux-nfs > From: "Brian Hawley" <bhawley@luminex.com> > > I ended up writing a "manage_mounts" script run by cron that compares > /proc/mounts and the fstab, used ping, and "timeout" messages in > /var/log/messages to identify filesystems that aren't responding, repeatedly > do umount -f to force i/o errors back to the calling applications; and when > missing mounts (in fstab but not /proc/mounts) but were now pingable, > attempt to remount them. > > > For me, timeo and retrans are necessary, but not sufficient. The chunking to > rsize/wsize and caching plays a role in how well i/o errors get relayed back > to the applications doing the i/o. > > You will certainly lose data in these scenario's. > > It would be fantastic if somehow the timeo and retrans were sufficient (ie > when they fail, i/o errors get back to the applications that queued that i/o > (or even the i/o that cause the application to pend because the rsize/wsize > or cache was full). > > You can eliminate some of that behavior with sync/directio, but performance > becomes abysmal. > > I tried "lazy" it didn't provide the desired effect (they unmounted which > prevented new i/o's; but existing I/o's never got errors). This is the problem I am having - I can unmount the filesystem with -l, but once it is unmounted the existing apache processes are still stuck forever. Does repeatedly running "umount -f" instead of "umount -l" as you describe return I/O errors back to existing processes and allow them to stop? > From: "Jim Rees" <rees@umich.edu> > Given this is apache, I think if I were doing this I'd use ro,soft,intr,tcp > and not try to write anything to nfs. I was using tcp,bg,soft,intr when this problem occurred. I do not know if apache was attempting to do a write or a read, but it seems that tcp,soft,intr was not sufficient to prevent the problem. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 15:30 ` Andrew Martin @ 2014-03-06 16:22 ` Jim Rees 2014-03-06 16:43 ` Andrew Martin 2014-03-06 18:56 ` Brian Hawley 1 sibling, 1 reply; 55+ messages in thread From: Jim Rees @ 2014-03-06 16:22 UTC (permalink / raw) To: Andrew Martin; +Cc: bhawley, NeilBrown, linux-nfs-owner, linux-nfs Andrew Martin wrote: > From: "Jim Rees" <rees@umich.edu> > Given this is apache, I think if I were doing this I'd use ro,soft,intr,tcp > and not try to write anything to nfs. I was using tcp,bg,soft,intr when this problem occurred. I do not know if apache was attempting to do a write or a read, but it seems that tcp,soft,intr was not sufficient to prevent the problem. I had the impression from your original message that you were not using "soft" and were asking if it's safe to use it. Are you saying that even with the "soft" option the apache gets stuck forever? ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 16:22 ` Jim Rees @ 2014-03-06 16:43 ` Andrew Martin 2014-03-06 17:36 ` Jim Rees 2014-03-06 19:00 ` Brian Hawley 0 siblings, 2 replies; 55+ messages in thread From: Andrew Martin @ 2014-03-06 16:43 UTC (permalink / raw) To: Jim Rees; +Cc: bhawley, NeilBrown, linux-nfs-owner, linux-nfs > From: "Jim Rees" <rees@umich.edu> > Andrew Martin wrote: > > > From: "Jim Rees" <rees@umich.edu> > > Given this is apache, I think if I were doing this I'd use > > ro,soft,intr,tcp > > and not try to write anything to nfs. > I was using tcp,bg,soft,intr when this problem occurred. I do not know if > apache was attempting to do a write or a read, but it seems that > tcp,soft,intr > was not sufficient to prevent the problem. > > I had the impression from your original message that you were not using > "soft" and were asking if it's safe to use it. Are you saying that even with > the "soft" option the apache gets stuck forever? Yes, even with soft, it gets stuck forever. I had been using tcp,bg,soft,intr when the problem occurred (on several ocassions), so my original question was if it would be safe to use a small timeo and retrans values to hopefully return I/O errors quickly to the application, rather than blocking forever (which causes the high load and inevitable reboot). It sounds like that isn't safe, but perhaps there is another way to resolve this problem? ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 16:43 ` Andrew Martin @ 2014-03-06 17:36 ` Jim Rees 2014-03-06 18:26 ` Trond Myklebust 2014-03-06 18:35 ` Andrew Martin 2014-03-06 19:00 ` Brian Hawley 1 sibling, 2 replies; 55+ messages in thread From: Jim Rees @ 2014-03-06 17:36 UTC (permalink / raw) To: Andrew Martin; +Cc: bhawley, NeilBrown, linux-nfs-owner, linux-nfs Why would a bunch of blocked apaches cause high load and reboot? ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 17:36 ` Jim Rees @ 2014-03-06 18:26 ` Trond Myklebust 2014-03-06 18:35 ` Andrew Martin 1 sibling, 0 replies; 55+ messages in thread From: Trond Myklebust @ 2014-03-06 18:26 UTC (permalink / raw) To: Jim Rees; +Cc: Andrew Martin, bhawley, Brown Neil, linux-nfs-owner, linux-nfs On Mar 6, 2014, at 12:36, Jim Rees <rees@umich.edu> wrote: > Why would a bunch of blocked apaches cause high load and reboot? Good question. Are the TCP reconnect attempts perhaps eating up all the reserved ports and leaving them in the TIME_WAIT state? ‘netstat -tn’ should list all the ports currently in use by TCP connections. _________________________________ Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 17:36 ` Jim Rees 2014-03-06 18:26 ` Trond Myklebust @ 2014-03-06 18:35 ` Andrew Martin 2014-03-06 18:48 ` Jim Rees 2014-03-06 18:50 ` Trond Myklebust 1 sibling, 2 replies; 55+ messages in thread From: Andrew Martin @ 2014-03-06 18:35 UTC (permalink / raw) To: Jim Rees; +Cc: bhawley, NeilBrown, linux-nfs-owner, linux-nfs > From: "Jim Rees" <rees@umich.edu> > Why would a bunch of blocked apaches cause high load and reboot? What I believe happens is the apache child processes go to serve these requests and then block in uninterruptable sleep. Thus, there are fewer and fewer child processes to handle new incoming requests. Eventually, apache would normally kill said children (e.g after a child handles a certain number of requests), but it cannot kill them because they are in uninterruptable sleep. As more and more incoming requests are queued (and fewer and fewer child processes are available to serve the requests), the load climbs. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 18:35 ` Andrew Martin @ 2014-03-06 18:48 ` Jim Rees 2014-03-06 19:02 ` Trond Myklebust 2014-03-06 18:50 ` Trond Myklebust 1 sibling, 1 reply; 55+ messages in thread From: Jim Rees @ 2014-03-06 18:48 UTC (permalink / raw) To: Andrew Martin; +Cc: bhawley, NeilBrown, linux-nfs-owner, linux-nfs Andrew Martin wrote: > From: "Jim Rees" <rees@umich.edu> > Why would a bunch of blocked apaches cause high load and reboot? What I believe happens is the apache child processes go to serve these requests and then block in uninterruptable sleep. Thus, there are fewer and fewer child processes to handle new incoming requests. Eventually, apache would normally kill said children (e.g after a child handles a certain number of requests), but it cannot kill them because they are in uninterruptable sleep. As more and more incoming requests are queued (and fewer and fewer child processes are available to serve the requests), the load climbs. But Neil says the sleeps should be interruptible, despite what the man page says. Trond, as far as you know, should a soft mount be interruptible by SIGINT, or should it require a SIGKILL? ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 18:48 ` Jim Rees @ 2014-03-06 19:02 ` Trond Myklebust 0 siblings, 0 replies; 55+ messages in thread From: Trond Myklebust @ 2014-03-06 19:02 UTC (permalink / raw) To: Jim Rees; +Cc: Andrew Martin, bhawley, Brown Neil, linux-nfs-owner, linux-nfs On Mar 6, 2014, at 13:48, Jim Rees <rees@umich.edu> wrote: > Andrew Martin wrote: > >> From: "Jim Rees" <rees@umich.edu> >> Why would a bunch of blocked apaches cause high load and reboot? > What I believe happens is the apache child processes go to serve > these requests and then block in uninterruptable sleep. Thus, there > are fewer and fewer child processes to handle new incoming requests. > Eventually, apache would normally kill said children (e.g after a > child handles a certain number of requests), but it cannot kill them > because they are in uninterruptable sleep. As more and more incoming > requests are queued (and fewer and fewer child processes are available > to serve the requests), the load climbs. > > But Neil says the sleeps should be interruptible, despite what the man page > says. > > Trond, as far as you know, should a soft mount be interruptible by SIGINT, > or should it require a SIGKILL? The ‘TASK_KILLABLE’ state is interruptible by any _fatal_ signal. So if the application uses sigaction() to install a handler for SIGINT, then the RPC call will no longer be interruptible by SIGINT. _________________________________ Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 18:35 ` Andrew Martin 2014-03-06 18:48 ` Jim Rees @ 2014-03-06 18:50 ` Trond Myklebust 2014-03-06 19:46 ` Andrew Martin 1 sibling, 1 reply; 55+ messages in thread From: Trond Myklebust @ 2014-03-06 18:50 UTC (permalink / raw) To: Andrew Martin; +Cc: Jim Rees, bhawley, Brown Neil, linux-nfs-owner, linux-nfs On Mar 6, 2014, at 13:35, Andrew Martin <amartin@xes-inc.com> wrote: >> From: "Jim Rees" <rees@umich.edu> >> Why would a bunch of blocked apaches cause high load and reboot? > What I believe happens is the apache child processes go to serve > these requests and then block in uninterruptable sleep. Thus, there > are fewer and fewer child processes to handle new incoming requests. > Eventually, apache would normally kill said children (e.g after a > child handles a certain number of requests), but it cannot kill them > because they are in uninterruptable sleep. As more and more incoming > requests are queued (and fewer and fewer child processes are available > to serve the requests), the load climbs. Does ‘top’ support this theory? Presumably you should see a handful of non-sleeping apache threads dominating the load when it happens. Why is the server becoming ‘unavailable’ in the first place? Are you taking it down? _________________________________ Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 18:50 ` Trond Myklebust @ 2014-03-06 19:46 ` Andrew Martin 2014-03-06 19:52 ` Trond Myklebust 0 siblings, 1 reply; 55+ messages in thread From: Andrew Martin @ 2014-03-06 19:46 UTC (permalink / raw) To: Trond Myklebust; +Cc: Jim Rees, bhawley, Brown Neil, linux-nfs-owner, linux-nfs > From: "Trond Myklebust" <trond.myklebust@primarydata.com> > On Mar 6, 2014, at 13:35, Andrew Martin <amartin@xes-inc.com> wrote: > > >> From: "Jim Rees" <rees@umich.edu> > >> Why would a bunch of blocked apaches cause high load and reboot? > > What I believe happens is the apache child processes go to serve > > these requests and then block in uninterruptable sleep. Thus, there > > are fewer and fewer child processes to handle new incoming requests. > > Eventually, apache would normally kill said children (e.g after a > > child handles a certain number of requests), but it cannot kill them > > because they are in uninterruptable sleep. As more and more incoming > > requests are queued (and fewer and fewer child processes are available > > to serve the requests), the load climbs. > > Does ‘top’ support this theory? Presumably you should see a handful of > non-sleeping apache threads dominating the load when it happens. Yes, it looks like the root apache process is still running: root 1773 0.0 0.1 244176 16588 ? Ss Feb18 0:42 /usr/sbin/apache2 -k start All of the others, the children (running as the www-data user), are marked as D. > Why is the server becoming ‘unavailable’ in the first place? Are you taking > it down? I do not know the answer to this. A single NFS server has an export that is mounted on multiple servers, including this web server. The web server is running Ubuntu 10.04 LTS 2.6.32-57 with nfs-common 1.2.0. Intermittently, the NFS mountpoint will become inaccessible on this web server; processes that attempt to access it will block in uninterruptable sleep. While this is occurring, the NFS export is still accessible normally from other clients, so it appears to be related to this particular machine (probably since it is the last machine running Ubuntu 10.04 and not 12.04). I do not know if this is a bug in 2.6.32 or another package on the system, but at this time I cannot upgrade it to 12.04, so I need to find a solution on 10.04. I attempted to get a backtrace from one of the uninterruptable apache processes: echo w > /proc/sysrq-trigger Here's one example: [1227348.003904] apache2 D 0000000000000000 0 10175 1773 0x00000004 [1227348.003906] ffff8802813178c8 0000000000000082 0000000000015e00 0000000000015e00 [1227348.003908] ffff8801d88f03d0 ffff880281317fd8 0000000000015e00 ffff8801d88f0000 [1227348.003910] 0000000000015e00 ffff880281317fd8 0000000000015e00 ffff8801d88f03d0 [1227348.003912] Call Trace: [1227348.003918] [<ffffffffa00a5ca0>] ? rpc_wait_bit_killable+0x0/0x40 [sunrpc] [1227348.003923] [<ffffffffa00a5cc4>] rpc_wait_bit_killable+0x24/0x40 [sunrpc] [1227348.003925] [<ffffffff8156a41f>] __wait_on_bit+0x5f/0x90 [1227348.003930] [<ffffffffa00a5ca0>] ? rpc_wait_bit_killable+0x0/0x40 [sunrpc] [1227348.003932] [<ffffffff8156a4c8>] out_of_line_wait_on_bit+0x78/0x90 [1227348.003934] [<ffffffff81086790>] ? wake_bit_function+0x0/0x40 [1227348.003939] [<ffffffffa00a6611>] __rpc_execute+0x191/0x2a0 [sunrpc] [1227348.003945] [<ffffffffa00a6746>] rpc_execute+0x26/0x30 [sunrpc] [1227348.003949] [<ffffffffa009eb2a>] rpc_run_task+0x3a/0x90 [sunrpc] [1227348.003953] [<ffffffffa009ec82>] rpc_call_sync+0x42/0x70 [sunrpc] [1227348.003959] [<ffffffffa013b33b>] T.976+0x4b/0x70 [nfs] [1227348.003965] [<ffffffffa013bd75>] nfs3_proc_access+0xd5/0x1a0 [nfs] [1227348.003967] [<ffffffff810fea8f>] ? free_hot_page+0x2f/0x60 [1227348.003969] [<ffffffff8156bd6e>] ? _spin_lock+0xe/0x20 [1227348.003971] [<ffffffff8115b626>] ? dput+0xd6/0x1a0 [1227348.003973] [<ffffffff8115254f>] ? __follow_mount+0x6f/0xb0 [1227348.003978] [<ffffffffa00a7fd4>] ? rpcauth_lookup_credcache+0x1a4/0x270 [sunrpc] [1227348.003983] [<ffffffffa0125817>] nfs_do_access+0x97/0xf0 [nfs] [1227348.003989] [<ffffffffa00a87f5>] ? generic_lookup_cred+0x15/0x20 [sunrpc] [1227348.003994] [<ffffffffa00a7910>] ? rpcauth_lookupcred+0x70/0xc0 [sunrpc] [1227348.003996] [<ffffffff8115254f>] ? __follow_mount+0x6f/0xb0 [1227348.004001] [<ffffffffa0125915>] nfs_permission+0xa5/0x1e0 [nfs] [1227348.004003] [<ffffffff81153989>] __link_path_walk+0x99/0xf80 [1227348.004005] [<ffffffff81154aea>] path_walk+0x6a/0xe0 [1227348.004007] [<ffffffff81154cbb>] do_path_lookup+0x5b/0xa0 [1227348.004009] [<ffffffff81148e3a>] ? get_empty_filp+0xaa/0x180 [1227348.004011] [<ffffffff81155c63>] do_filp_open+0x103/0xba0 [1227348.004013] [<ffffffff8156bd6e>] ? _spin_lock+0xe/0x20 [1227348.004015] [<ffffffff812b8055>] ? _atomic_dec_and_lock+0x55/0x80 [1227348.004016] [<ffffffff811618ea>] ? alloc_fd+0x10a/0x150 [1227348.004018] [<ffffffff811454e9>] do_sys_open+0x69/0x170 [1227348.004020] [<ffffffff81145630>] sys_open+0x20/0x30 [1227348.004022] [<ffffffff81013172>] system_call_fastpath+0x16/0x1b ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 19:46 ` Andrew Martin @ 2014-03-06 19:52 ` Trond Myklebust 2014-03-06 20:45 ` Andrew Martin 0 siblings, 1 reply; 55+ messages in thread From: Trond Myklebust @ 2014-03-06 19:52 UTC (permalink / raw) To: Andrew Martin; +Cc: Jim Rees, bhawley, Brown Neil, linux-nfs-owner, linux-nfs On Mar 6, 2014, at 14:46, Andrew Martin <amartin@xes-inc.com> wrote: >> From: "Trond Myklebust" <trond.myklebust@primarydata.com> >> On Mar 6, 2014, at 13:35, Andrew Martin <amartin@xes-inc.com> wrote: >> >>>> From: "Jim Rees" <rees@umich.edu> >>>> Why would a bunch of blocked apaches cause high load and reboot? >>> What I believe happens is the apache child processes go to serve >>> these requests and then block in uninterruptable sleep. Thus, there >>> are fewer and fewer child processes to handle new incoming requests. >>> Eventually, apache would normally kill said children (e.g after a >>> child handles a certain number of requests), but it cannot kill them >>> because they are in uninterruptable sleep. As more and more incoming >>> requests are queued (and fewer and fewer child processes are available >>> to serve the requests), the load climbs. >> >> Does ‘top’ support this theory? Presumably you should see a handful of >> non-sleeping apache threads dominating the load when it happens. > Yes, it looks like the root apache process is still running: > root 1773 0.0 0.1 244176 16588 ? Ss Feb18 0:42 /usr/sbin/apache2 -k start > > All of the others, the children (running as the www-data user), are marked as D. > >> Why is the server becoming ‘unavailable’ in the first place? Are you taking >> it down? > I do not know the answer to this. A single NFS server has an export that is > mounted on multiple servers, including this web server. The web server is > running Ubuntu 10.04 LTS 2.6.32-57 with nfs-common 1.2.0. Intermittently, the > NFS mountpoint will become inaccessible on this web server; processes that > attempt to access it will block in uninterruptable sleep. While this is > occurring, the NFS export is still accessible normally from other clients, > so it appears to be related to this particular machine (probably since it is > the last machine running Ubuntu 10.04 and not 12.04). I do not know if this > is a bug in 2.6.32 or another package on the system, but at this time I > cannot upgrade it to 12.04, so I need to find a solution on 10.04. > > I attempted to get a backtrace from one of the uninterruptable apache processes: > echo w > /proc/sysrq-trigger > > Here's one example: > [1227348.003904] apache2 D 0000000000000000 0 10175 1773 0x00000004 > [1227348.003906] ffff8802813178c8 0000000000000082 0000000000015e00 0000000000015e00 > [1227348.003908] ffff8801d88f03d0 ffff880281317fd8 0000000000015e00 ffff8801d88f0000 > [1227348.003910] 0000000000015e00 ffff880281317fd8 0000000000015e00 ffff8801d88f03d0 > [1227348.003912] Call Trace: > [1227348.003918] [<ffffffffa00a5ca0>] ? rpc_wait_bit_killable+0x0/0x40 [sunrpc] > [1227348.003923] [<ffffffffa00a5cc4>] rpc_wait_bit_killable+0x24/0x40 [sunrpc] > [1227348.003925] [<ffffffff8156a41f>] __wait_on_bit+0x5f/0x90 > [1227348.003930] [<ffffffffa00a5ca0>] ? rpc_wait_bit_killable+0x0/0x40 [sunrpc] > [1227348.003932] [<ffffffff8156a4c8>] out_of_line_wait_on_bit+0x78/0x90 > [1227348.003934] [<ffffffff81086790>] ? wake_bit_function+0x0/0x40 > [1227348.003939] [<ffffffffa00a6611>] __rpc_execute+0x191/0x2a0 [sunrpc] > [1227348.003945] [<ffffffffa00a6746>] rpc_execute+0x26/0x30 [sunrpc] That basically means that the process is hanging in the RPC layer, somewhere in the state machine. ‘echo 0 >/proc/sys/sunrpc/rpc_debug’ as the ‘root’ user should give us a dump of which state these RPC calls are in. Can you please try that? _________________________________ Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 19:52 ` Trond Myklebust @ 2014-03-06 20:45 ` Andrew Martin 2014-03-06 21:01 ` Trond Myklebust 0 siblings, 1 reply; 55+ messages in thread From: Andrew Martin @ 2014-03-06 20:45 UTC (permalink / raw) To: Trond Myklebust; +Cc: Jim Rees, bhawley, Brown Neil, linux-nfs-owner, linux-nfs ----- Original Message ----- > From: "Trond Myklebust" <trond.myklebust@primarydata.com> > > I attempted to get a backtrace from one of the uninterruptable apache > > processes: > > echo w > /proc/sysrq-trigger > > > > Here's one example: > > [1227348.003904] apache2 D 0000000000000000 0 10175 1773 > > 0x00000004 > > [1227348.003906] ffff8802813178c8 0000000000000082 0000000000015e00 > > 0000000000015e00 > > [1227348.003908] ffff8801d88f03d0 ffff880281317fd8 0000000000015e00 > > ffff8801d88f0000 > > [1227348.003910] 0000000000015e00 ffff880281317fd8 0000000000015e00 > > ffff8801d88f03d0 > > [1227348.003912] Call Trace: > > [1227348.003918] [<ffffffffa00a5ca0>] ? rpc_wait_bit_killable+0x0/0x40 > > [sunrpc] > > [1227348.003923] [<ffffffffa00a5cc4>] rpc_wait_bit_killable+0x24/0x40 > > [sunrpc] > > [1227348.003925] [<ffffffff8156a41f>] __wait_on_bit+0x5f/0x90 > > [1227348.003930] [<ffffffffa00a5ca0>] ? rpc_wait_bit_killable+0x0/0x40 > > [sunrpc] > > [1227348.003932] [<ffffffff8156a4c8>] out_of_line_wait_on_bit+0x78/0x90 > > [1227348.003934] [<ffffffff81086790>] ? wake_bit_function+0x0/0x40 > > [1227348.003939] [<ffffffffa00a6611>] __rpc_execute+0x191/0x2a0 [sunrpc] > > [1227348.003945] [<ffffffffa00a6746>] rpc_execute+0x26/0x30 [sunrpc] > > That basically means that the process is hanging in the RPC layer, somewhere > in the state machine. ‘echo 0 >/proc/sys/sunrpc/rpc_debug’ as the ‘root’ > user should give us a dump of which state these RPC calls are in. Can you > please try that? Yes I will definitely run that the next time it happens, but since it occurs sporadically (and I have not yet found a way to reproduce it on demand), it could be days before it occurs again. I'll also run "netstat -tn" to check the TCP connections the next time this happens. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 20:45 ` Andrew Martin @ 2014-03-06 21:01 ` Trond Myklebust 2014-03-18 21:50 ` Andrew Martin 0 siblings, 1 reply; 55+ messages in thread From: Trond Myklebust @ 2014-03-06 21:01 UTC (permalink / raw) To: Andrew Martin; +Cc: Jim Rees, bhawley, Brown Neil, linux-nfs-owner, linux-nfs On Mar 6, 2014, at 15:45, Andrew Martin <amartin@xes-inc.com> wrote: > ----- Original Message ----- >> From: "Trond Myklebust" <trond.myklebust@primarydata.com> >>> I attempted to get a backtrace from one of the uninterruptable apache >>> processes: >>> echo w > /proc/sysrq-trigger >>> >>> Here's one example: >>> [1227348.003904] apache2 D 0000000000000000 0 10175 1773 >>> 0x00000004 >>> [1227348.003906] ffff8802813178c8 0000000000000082 0000000000015e00 >>> 0000000000015e00 >>> [1227348.003908] ffff8801d88f03d0 ffff880281317fd8 0000000000015e00 >>> ffff8801d88f0000 >>> [1227348.003910] 0000000000015e00 ffff880281317fd8 0000000000015e00 >>> ffff8801d88f03d0 >>> [1227348.003912] Call Trace: >>> [1227348.003918] [<ffffffffa00a5ca0>] ? rpc_wait_bit_killable+0x0/0x40 >>> [sunrpc] >>> [1227348.003923] [<ffffffffa00a5cc4>] rpc_wait_bit_killable+0x24/0x40 >>> [sunrpc] >>> [1227348.003925] [<ffffffff8156a41f>] __wait_on_bit+0x5f/0x90 >>> [1227348.003930] [<ffffffffa00a5ca0>] ? rpc_wait_bit_killable+0x0/0x40 >>> [sunrpc] >>> [1227348.003932] [<ffffffff8156a4c8>] out_of_line_wait_on_bit+0x78/0x90 >>> [1227348.003934] [<ffffffff81086790>] ? wake_bit_function+0x0/0x40 >>> [1227348.003939] [<ffffffffa00a6611>] __rpc_execute+0x191/0x2a0 [sunrpc] >>> [1227348.003945] [<ffffffffa00a6746>] rpc_execute+0x26/0x30 [sunrpc] >> >> That basically means that the process is hanging in the RPC layer, somewhere >> in the state machine. ‘echo 0 >/proc/sys/sunrpc/rpc_debug’ as the ‘root’ >> user should give us a dump of which state these RPC calls are in. Can you >> please try that? > Yes I will definitely run that the next time it happens, but since it occurs > sporadically (and I have not yet found a way to reproduce it on demand), it > could be days before it occurs again. I'll also run "netstat -tn" to check the > TCP connections the next time this happens. If you are comfortable applying patches and compiling your own kernels, then you might want to try applying the fix for a certain out-of-socket-buffer race that Neil reported, and that I suspect you may be hitting. The patch has been sent to the ‘stable kernel’ series, and so should appear soon in Debian’s own kernels, but if this is bothering you now, then go for it… https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=06ea0bfe6e6043cb56a78935a19f6f8ebc636226 _________________________________ Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 21:01 ` Trond Myklebust @ 2014-03-18 21:50 ` Andrew Martin 2014-03-18 22:27 ` Trond Myklebust 0 siblings, 1 reply; 55+ messages in thread From: Andrew Martin @ 2014-03-18 21:50 UTC (permalink / raw) To: Trond Myklebust; +Cc: Jim Rees, bhawley, Brown Neil, linux-nfs-owner, linux-nfs ----- Original Message ----- > From: "Trond Myklebust" <trond.myklebust@primarydata.com> > To: "Andrew Martin" <amartin@xes-inc.com> > Cc: "Jim Rees" <rees@umich.edu>, bhawley@luminex.com, "Brown Neil" <neilb@suse.de>, linux-nfs-owner@vger.kernel.org, > linux-nfs@vger.kernel.org > Sent: Thursday, March 6, 2014 3:01:03 PM > Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels > > > On Mar 6, 2014, at 15:45, Andrew Martin <amartin@xes-inc.com> wrote: > > > ----- Original Message ----- > >> From: "Trond Myklebust" <trond.myklebust@primarydata.com> > >>> I attempted to get a backtrace from one of the uninterruptable apache > >>> processes: > >>> echo w > /proc/sysrq-trigger > >>> > >>> Here's one example: > >>> [1227348.003904] apache2 D 0000000000000000 0 10175 1773 > >>> 0x00000004 > >>> [1227348.003906] ffff8802813178c8 0000000000000082 0000000000015e00 > >>> 0000000000015e00 > >>> [1227348.003908] ffff8801d88f03d0 ffff880281317fd8 0000000000015e00 > >>> ffff8801d88f0000 > >>> [1227348.003910] 0000000000015e00 ffff880281317fd8 0000000000015e00 > >>> ffff8801d88f03d0 > >>> [1227348.003912] Call Trace: > >>> [1227348.003918] [<ffffffffa00a5ca0>] ? rpc_wait_bit_killable+0x0/0x40 > >>> [sunrpc] > >>> [1227348.003923] [<ffffffffa00a5cc4>] rpc_wait_bit_killable+0x24/0x40 > >>> [sunrpc] > >>> [1227348.003925] [<ffffffff8156a41f>] __wait_on_bit+0x5f/0x90 > >>> [1227348.003930] [<ffffffffa00a5ca0>] ? rpc_wait_bit_killable+0x0/0x40 > >>> [sunrpc] > >>> [1227348.003932] [<ffffffff8156a4c8>] out_of_line_wait_on_bit+0x78/0x90 > >>> [1227348.003934] [<ffffffff81086790>] ? wake_bit_function+0x0/0x40 > >>> [1227348.003939] [<ffffffffa00a6611>] __rpc_execute+0x191/0x2a0 [sunrpc] > >>> [1227348.003945] [<ffffffffa00a6746>] rpc_execute+0x26/0x30 [sunrpc] > >> > >> That basically means that the process is hanging in the RPC layer, > >> somewhere > >> in the state machine. ‘echo 0 >/proc/sys/sunrpc/rpc_debug’ as the ‘root’ > >> user should give us a dump of which state these RPC calls are in. Can you > >> please try that? > > Yes I will definitely run that the next time it happens, but since it > > occurs > > sporadically (and I have not yet found a way to reproduce it on demand), it > > could be days before it occurs again. I'll also run "netstat -tn" to check > > the > > TCP connections the next time this happens. > > If you are comfortable applying patches and compiling your own kernels, then > you might want to try applying the fix for a certain out-of-socket-buffer > race that Neil reported, and that I suspect you may be hitting. The patch > has been sent to the ‘stable kernel’ series, and so should appear soon in > Debian’s own kernels, but if this is bothering you now, then go for it… > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=06ea0bfe6e6043cb56a78935a19f6f8ebc636226 > Trond, This problem has reoccurred, and I have captured the debug output that you requested: echo 0 >/proc/sys/sunrpc/rpc_debug: http://pastebin.com/9juDs2TW echo w > /proc/sysrq-trigger ; dmesg: http://pastebin.com/1vDx9bNf netstat -tn: http://pastebin.com/mjxqjmuL One suggestion for debug was to attempt to run "umount -f /path/to/mountpoint" repeatedly to attempt to send SIGKILL back up to the application. This always returned "Device or resource busy" and I was unable to unmount the filesystem until I used "mount -l". I was able to kill -9 all but two of the processes that were blocking in uninterruptable sleep. Note that I was able to get lsof output on these processes this time, and they all appeared to be blocking on access to a single file on the nfs share. If I tried to cat said file from this client, my terminal would block: open("/path/to/file", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=42385, ...}) = 0 mmap(NULL, 1056768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb00f0dc000 read(3, However, I could cat the file just fine from another nfs client. Does this additional information shed any light on the source of this problem? Thanks, Andrew ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-18 21:50 ` Andrew Martin @ 2014-03-18 22:27 ` Trond Myklebust 2014-03-28 22:00 ` Dr Fields James Bruce 0 siblings, 1 reply; 55+ messages in thread From: Trond Myklebust @ 2014-03-18 22:27 UTC (permalink / raw) To: Andrew Martin, Dr Fields James Bruce Cc: Jim Rees, bhawley, Brown Neil, linux-nfs-owner, linux-nfs On Mar 18, 2014, at 17:50, Andrew Martin <amartin@xes-inc.com> wrote: > ----- Original Message ----- >> From: "Trond Myklebust" <trond.myklebust@primarydata.com> >> To: "Andrew Martin" <amartin@xes-inc.com> >> Cc: "Jim Rees" <rees@umich.edu>, bhawley@luminex.com, "Brown Neil" <neilb@suse.de>, linux-nfs-owner@vger.kernel.org, >> linux-nfs@vger.kernel.org >> Sent: Thursday, March 6, 2014 3:01:03 PM >> Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels >> >> > > Trond, > > This problem has reoccurred, and I have captured the debug output that you requested: > > echo 0 >/proc/sys/sunrpc/rpc_debug: > http://pastebin.com/9juDs2TW > > echo w > /proc/sysrq-trigger ; dmesg: > http://pastebin.com/1vDx9bNf > > netstat -tn: > http://pastebin.com/mjxqjmuL > > One suggestion for debug was to attempt to run "umount -f /path/to/mountpoint" > repeatedly to attempt to send SIGKILL back up to the application. This always > returned "Device or resource busy" and I was unable to unmount the filesystem > until I used "mount -l". > > I was able to kill -9 all but two of the processes that were blocking in > uninterruptable sleep. Note that I was able to get lsof output on these > processes this time, and they all appeared to be blocking on access to a > single file on the nfs share. If I tried to cat said file from this client, > my terminal would block: > open("/path/to/file", O_RDONLY) = 3 > fstat(3, {st_mode=S_IFREG|0644, st_size=42385, ...}) = 0 > mmap(NULL, 1056768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb00f0dc000 > read(3, > > However, I could cat the file just fine from another nfs client. Does this > additional information shed any light on the source of this problem? > Ah… So this machine is acting both as a NFSv3 client and a NFSv4 server? • [1140235.544551] SysRq : Show Blocked State • [1140235.547126] task PC stack pid father • [1140235.547145] rpciod/0 D 0000000000000001 0 833 2 0x00000000 • [1140235.547150] ffff8802812a3c20 0000000000000046 0000000000015e00 0000000000015e00 • [1140235.547155] ffff880297251ad0 ffff8802812a3fd8 0000000000015e00 ffff880297251700 • [1140235.547159] 0000000000015e00 ffff8802812a3fd8 0000000000015e00 ffff880297251ad0 • [1140235.547164] Call Trace: • [1140235.547175] [<ffffffff8156a1a5>] schedule_timeout+0x195/0x300 • [1140235.547182] [<ffffffff81078130>] ? process_timeout+0x0/0x10 • [1140235.547197] [<ffffffffa009ef52>] rpc_shutdown_client+0xc2/0x100 [sunrpc] • [1140235.547203] [<ffffffff81086750>] ? autoremove_wake_function+0x0/0x40 • [1140235.547216] [<ffffffffa01aa62c>] put_nfs4_client+0x4c/0xb0 [nfsd] • [1140235.547227] [<ffffffffa01ae669>] nfsd4_cb_probe_done+0x29/0x60 [nfsd] • [1140235.547238] [<ffffffffa00a5d0c>] rpc_exit_task+0x2c/0x60 [sunrpc] • [1140235.547250] [<ffffffffa00a64e6>] __rpc_execute+0x66/0x2a0 [sunrpc] • [1140235.547261] [<ffffffffa00a6750>] ? rpc_async_schedule+0x0/0x20 [sunrpc] • [1140235.547272] [<ffffffffa00a6765>] rpc_async_schedule+0x15/0x20 [sunrpc] • [1140235.547276] [<ffffffff81081ba7>] run_workqueue+0xc7/0x1a0 • [1140235.547279] [<ffffffff81081d23>] worker_thread+0xa3/0x110 • [1140235.547284] [<ffffffff81086750>] ? autoremove_wake_function+0x0/0x40 • [1140235.547287] [<ffffffff81081c80>] ? worker_thread+0x0/0x110 • [1140235.547291] [<ffffffff810863d6>] kthread+0x96/0xa0 • [1140235.547295] [<ffffffff810141aa>] child_rip+0xa/0x20 • [1140235.547299] [<ffffffff81086340>] ? kthread+0x0/0xa0 • [1140235.547302] [<ffffffff810141a0>] ? child_rip+0x0/0x20 the above looks bad. The rpciod thread is sleeping, waiting for the rpc client to terminate, and the only task running on that rpc client, according to your rpc_debug output is the above CB_NULL probe. Deadlock... Bruce, it looks like the above should have been fixed in Linux 2.6.35 with commit 9045b4b9f7f3 (nfsd4: remove probe task's reference on client), is that correct? _________________________________ Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-18 22:27 ` Trond Myklebust @ 2014-03-28 22:00 ` Dr Fields James Bruce 2014-04-04 18:15 ` Andrew Martin 0 siblings, 1 reply; 55+ messages in thread From: Dr Fields James Bruce @ 2014-03-28 22:00 UTC (permalink / raw) To: Trond Myklebust Cc: Andrew Martin, Jim Rees, bhawley, Brown Neil, linux-nfs-owner, linux-nfs On Tue, Mar 18, 2014 at 06:27:57PM -0400, Trond Myklebust wrote: > > On Mar 18, 2014, at 17:50, Andrew Martin <amartin@xes-inc.com> wrote: > > > ----- Original Message ----- > >> From: "Trond Myklebust" <trond.myklebust@primarydata.com> > >> To: "Andrew Martin" <amartin@xes-inc.com> > >> Cc: "Jim Rees" <rees@umich.edu>, bhawley@luminex.com, "Brown Neil" <neilb@suse.de>, linux-nfs-owner@vger.kernel.org, > >> linux-nfs@vger.kernel.org > >> Sent: Thursday, March 6, 2014 3:01:03 PM > >> Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels > >> > >> > > > > Trond, > > > > This problem has reoccurred, and I have captured the debug output that you requested: > > > > echo 0 >/proc/sys/sunrpc/rpc_debug: > > http://pastebin.com/9juDs2TW > > > > echo w > /proc/sysrq-trigger ; dmesg: > > http://pastebin.com/1vDx9bNf > > > > netstat -tn: > > http://pastebin.com/mjxqjmuL > > > > One suggestion for debug was to attempt to run "umount -f /path/to/mountpoint" > > repeatedly to attempt to send SIGKILL back up to the application. This always > > returned "Device or resource busy" and I was unable to unmount the filesystem > > until I used "mount -l". > > > > I was able to kill -9 all but two of the processes that were blocking in > > uninterruptable sleep. Note that I was able to get lsof output on these > > processes this time, and they all appeared to be blocking on access to a > > single file on the nfs share. If I tried to cat said file from this client, > > my terminal would block: > > open("/path/to/file", O_RDONLY) = 3 > > fstat(3, {st_mode=S_IFREG|0644, st_size=42385, ...}) = 0 > > mmap(NULL, 1056768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb00f0dc000 > > read(3, > > > > However, I could cat the file just fine from another nfs client. Does this > > additional information shed any light on the source of this problem? > > > > Ah… So this machine is acting both as a NFSv3 client and a NFSv4 server? > > • [1140235.544551] SysRq : Show Blocked State > • [1140235.547126] task PC stack pid father > • [1140235.547145] rpciod/0 D 0000000000000001 0 833 2 0x00000000 > • [1140235.547150] ffff8802812a3c20 0000000000000046 0000000000015e00 0000000000015e00 > • [1140235.547155] ffff880297251ad0 ffff8802812a3fd8 0000000000015e00 ffff880297251700 > • [1140235.547159] 0000000000015e00 ffff8802812a3fd8 0000000000015e00 ffff880297251ad0 > • [1140235.547164] Call Trace: > • [1140235.547175] [<ffffffff8156a1a5>] schedule_timeout+0x195/0x300 > • [1140235.547182] [<ffffffff81078130>] ? process_timeout+0x0/0x10 > • [1140235.547197] [<ffffffffa009ef52>] rpc_shutdown_client+0xc2/0x100 [sunrpc] > • [1140235.547203] [<ffffffff81086750>] ? autoremove_wake_function+0x0/0x40 > • [1140235.547216] [<ffffffffa01aa62c>] put_nfs4_client+0x4c/0xb0 [nfsd] > • [1140235.547227] [<ffffffffa01ae669>] nfsd4_cb_probe_done+0x29/0x60 [nfsd] > • [1140235.547238] [<ffffffffa00a5d0c>] rpc_exit_task+0x2c/0x60 [sunrpc] > • [1140235.547250] [<ffffffffa00a64e6>] __rpc_execute+0x66/0x2a0 [sunrpc] > • [1140235.547261] [<ffffffffa00a6750>] ? rpc_async_schedule+0x0/0x20 [sunrpc] > • [1140235.547272] [<ffffffffa00a6765>] rpc_async_schedule+0x15/0x20 [sunrpc] > • [1140235.547276] [<ffffffff81081ba7>] run_workqueue+0xc7/0x1a0 > • [1140235.547279] [<ffffffff81081d23>] worker_thread+0xa3/0x110 > • [1140235.547284] [<ffffffff81086750>] ? autoremove_wake_function+0x0/0x40 > • [1140235.547287] [<ffffffff81081c80>] ? worker_thread+0x0/0x110 > • [1140235.547291] [<ffffffff810863d6>] kthread+0x96/0xa0 > • [1140235.547295] [<ffffffff810141aa>] child_rip+0xa/0x20 > • [1140235.547299] [<ffffffff81086340>] ? kthread+0x0/0xa0 > • [1140235.547302] [<ffffffff810141a0>] ? child_rip+0x0/0x20 > > the above looks bad. The rpciod thread is sleeping, waiting for the rpc client to terminate, and the only task running on that rpc client, according to your rpc_debug output is the above CB_NULL probe. Deadlock... > > Bruce, it looks like the above should have been fixed in Linux 2.6.35 with commit 9045b4b9f7f3 (nfsd4: remove probe task's reference on client), is that correct? Yes, that definitely looks it would explain the bug. And the sysrq trace shows 2.6.32-57. Andrew Martin, can you confirm that the problem is no longer reproduceable on a kernel with that patch applied? --b. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-28 22:00 ` Dr Fields James Bruce @ 2014-04-04 18:15 ` Andrew Martin 0 siblings, 0 replies; 55+ messages in thread From: Andrew Martin @ 2014-04-04 18:15 UTC (permalink / raw) To: Dr Fields James Bruce Cc: Trond Myklebust, Jim Rees, bhawley, Brown Neil, linux-nfs-owner, linux-nfs Bruce, ----- Original Message ----- > From: "Dr Fields James Bruce" <bfields@fieldses.org> > > Bruce, it looks like the above should have been fixed in Linux 2.6.35 with > > commit 9045b4b9f7f3 (nfsd4: remove probe task's reference on client), is > > that correct? > > Yes, that definitely looks it would explain the bug. And the sysrq > trace shows 2.6.32-57. > > Andrew Martin, can you confirm that the problem is no longer > reproduceable on a kernel with that patch applied? I have upgraded to 3.0.0-32. Since this problem is intermittent, I'm not sure when I will be able to reproduce it (if ever), but I'll reply to this thread if it ever reoccur. Thanks everyone for the help! Andrew ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 16:43 ` Andrew Martin 2014-03-06 17:36 ` Jim Rees @ 2014-03-06 19:00 ` Brian Hawley 2014-03-06 19:06 ` Trond Myklebust 1 sibling, 1 reply; 55+ messages in thread From: Brian Hawley @ 2014-03-06 19:00 UTC (permalink / raw) To: Andrew Martin, Jim Rees Cc: Brian Hawley, NeilBrown, linux-nfs-owner, linux-nfs DQpFdmVuIHdpdGggc21hbGwgdGltZW8gYW5kIHJldHJhbnMsIHlvdSB3b24ndCBnZXQgaS9vIGVy cm9ycyBiYWNrIHRvIHRoZSByZWFkcy93cml0ZXMuICAgVGhhdCdzIGJlZW4gb3VyIGV4cGVyaWVu Y2UgYW55d2F5Lg0KDQpXaXRoIHNvZnQsIHlvdSBtYXkgZW5kIHVwIHdpdGggbG9zdCBkYXRhIChk YXRhIHRoYXQgaGFkIGFscmVhZHkgYmVlbiB3cml0dGVuIHRvIHRoZSBjYWNoZSBidXQgbm90IHll dCB0byB0aGUgc3RvcmFnZSkuICAgWW91J2QgaGF2ZSB0aGF0IHNhbWUgaXNzdWUgd2l0aCAnaGFy ZCcgdG9vIGlmIGl0IHdhcyB5b3VyIGFwcGxpYW5jZSB0aGF0IGZhaWxlZC4gIElmIHRoZSBhcHBs aWFuY2UgbmV2ZXIgY29tZXMgYmFjaywgdGhvc2UgYmxvY2tzIGNhbiBuZXZlciBiZSB3cml0dGVu Lg0KDQpJbiB5b3VyIGNhc2UgdGhvdWdoLCB5b3UncmUgbm90IHdyaXRpbmcuICANCg0KDQotLS0t LU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KRnJvbTogQW5kcmV3IE1hcnRpbiA8YW1hcnRpbkB4ZXMt aW5jLmNvbT4NCkRhdGU6IFRodSwgNiBNYXIgMjAxNCAxMDo0Mzo0MiANClRvOiBKaW0gUmVlczxy ZWVzQHVtaWNoLmVkdT4NCkNjOiA8Ymhhd2xleUBsdW1pbmV4LmNvbT47IE5laWxCcm93bjxuZWls YkBzdXNlLmRlPjsgPGxpbnV4LW5mcy1vd25lckB2Z2VyLmtlcm5lbC5vcmc+OyA8bGludXgtbmZz QHZnZXIua2VybmVsLm9yZz4NClN1YmplY3Q6IFJlOiBPcHRpbWFsIE5GUyBtb3VudCBvcHRpb25z IHRvIHNhZmVseSBhbGxvdyBpbnRlcnJ1cHRzIGFuZA0KIHRpbWVvdXRzIG9uIG5ld2VyIGtlcm5l bHMNCg0KPiBGcm9tOiAiSmltIFJlZXMiIDxyZWVzQHVtaWNoLmVkdT4NCj4gQW5kcmV3IE1hcnRp biB3cm90ZToNCj4gDQo+ICAgPiBGcm9tOiAiSmltIFJlZXMiIDxyZWVzQHVtaWNoLmVkdT4NCj4g ICA+IEdpdmVuIHRoaXMgaXMgYXBhY2hlLCBJIHRoaW5rIGlmIEkgd2VyZSBkb2luZyB0aGlzIEkn ZCB1c2UNCj4gICA+IHJvLHNvZnQsaW50cix0Y3ANCj4gICA+IGFuZCBub3QgdHJ5IHRvIHdyaXRl IGFueXRoaW5nIHRvIG5mcy4NCj4gICBJIHdhcyB1c2luZyB0Y3AsYmcsc29mdCxpbnRyIHdoZW4g dGhpcyBwcm9ibGVtIG9jY3VycmVkLiBJIGRvIG5vdCBrbm93IGlmDQo+ICAgYXBhY2hlIHdhcyBh dHRlbXB0aW5nIHRvIGRvIGEgd3JpdGUgb3IgYSByZWFkLCBidXQgaXQgc2VlbXMgdGhhdA0KPiAg IHRjcCxzb2Z0LGludHINCj4gICB3YXMgbm90IHN1ZmZpY2llbnQgdG8gcHJldmVudCB0aGUgcHJv YmxlbS4NCj4gDQo+IEkgaGFkIHRoZSBpbXByZXNzaW9uIGZyb20geW91ciBvcmlnaW5hbCBtZXNz YWdlIHRoYXQgeW91IHdlcmUgbm90IHVzaW5nDQo+ICJzb2Z0IiBhbmQgd2VyZSBhc2tpbmcgaWYg aXQncyBzYWZlIHRvIHVzZSBpdC4gQXJlIHlvdSBzYXlpbmcgdGhhdCBldmVuIHdpdGgNCj4gdGhl ICJzb2Z0IiBvcHRpb24gdGhlIGFwYWNoZSBnZXRzIHN0dWNrIGZvcmV2ZXI/DQpZZXMsIGV2ZW4g d2l0aCBzb2Z0LCBpdCBnZXRzIHN0dWNrIGZvcmV2ZXIuIEkgaGFkIGJlZW4gdXNpbmcgdGNwLGJn LHNvZnQsaW50cg0Kd2hlbiB0aGUgcHJvYmxlbSBvY2N1cnJlZCAob24gc2V2ZXJhbCBvY2Fzc2lv bnMpLCBzbyBteSBvcmlnaW5hbCBxdWVzdGlvbiB3YXMNCmlmIGl0IHdvdWxkIGJlIHNhZmUgdG8g dXNlIGEgc21hbGwgdGltZW8gYW5kIHJldHJhbnMgdmFsdWVzIHRvIGhvcGVmdWxseSANCnJldHVy biBJL08gZXJyb3JzIHF1aWNrbHkgdG8gdGhlIGFwcGxpY2F0aW9uLCByYXRoZXIgdGhhbiBibG9j a2luZyBmb3JldmVyIA0KKHdoaWNoIGNhdXNlcyB0aGUgaGlnaCBsb2FkIGFuZCBpbmV2aXRhYmxl IHJlYm9vdCkuIEl0IHNvdW5kcyBsaWtlIHRoYXQgaXNuJ3QNCnNhZmUsIGJ1dCBwZXJoYXBzIHRo ZXJlIGlzIGFub3RoZXIgd2F5IHRvIHJlc29sdmUgdGhpcyBwcm9ibGVtPw0KDQo= ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 19:00 ` Brian Hawley @ 2014-03-06 19:06 ` Trond Myklebust 2014-03-06 19:14 ` Brian Hawley 0 siblings, 1 reply; 55+ messages in thread From: Trond Myklebust @ 2014-03-06 19:06 UTC (permalink / raw) To: bhawley; +Cc: Andrew Martin, Jim Rees, Brown Neil, linux-nfs-owner, linux-nfs On Mar 6, 2014, at 14:00, Brian Hawley <bhawley@luminex.com> wrote: > > Even with small timeo and retrans, you won't get i/o errors back to the reads/writes. That's been our experience anyway. Read caching, and buffered writes mean that the I/O errors often do not occur during the read()/write() system call itself. We do try to propagate I/O errors back to the application as soon as the do occur, but if that application isn’t using synchronous I/O, and it isn’t checking the return values of fsync() or close(), then there is little the kernel can do... > > With soft, you may end up with lost data (data that had already been written to the cache but not yet to the storage). You'd have that same issue with 'hard' too if it was your appliance that failed. If the appliance never comes back, those blocks can never be written. > > In your case though, you're not writing. > > > -----Original Message----- > From: Andrew Martin <amartin@xes-inc.com> > Date: Thu, 6 Mar 2014 10:43:42 > To: Jim Rees<rees@umich.edu> > Cc: <bhawley@luminex.com>; NeilBrown<neilb@suse.de>; <linux-nfs-owner@vger.kernel.org>; <linux-nfs@vger.kernel.org> > Subject: Re: Optimal NFS mount options to safely allow interrupts and > timeouts on newer kernels > >> From: "Jim Rees" <rees@umich.edu> >> Andrew Martin wrote: >> >>> From: "Jim Rees" <rees@umich.edu> >>> Given this is apache, I think if I were doing this I'd use >>> ro,soft,intr,tcp >>> and not try to write anything to nfs. >> I was using tcp,bg,soft,intr when this problem occurred. I do not know if >> apache was attempting to do a write or a read, but it seems that >> tcp,soft,intr >> was not sufficient to prevent the problem. >> >> I had the impression from your original message that you were not using >> "soft" and were asking if it's safe to use it. Are you saying that even with >> the "soft" option the apache gets stuck forever? > Yes, even with soft, it gets stuck forever. I had been using tcp,bg,soft,intr > when the problem occurred (on several ocassions), so my original question was > if it would be safe to use a small timeo and retrans values to hopefully > return I/O errors quickly to the application, rather than blocking forever > (which causes the high load and inevitable reboot). It sounds like that isn't > safe, but perhaps there is another way to resolve this problem? > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > _________________________________ Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 19:06 ` Trond Myklebust @ 2014-03-06 19:14 ` Brian Hawley 2014-03-06 19:26 ` Trond Myklebust 2014-03-06 19:29 ` Ric Wheeler 0 siblings, 2 replies; 55+ messages in thread From: Brian Hawley @ 2014-03-06 19:14 UTC (permalink / raw) To: Trond Myklebust, Brian Hawley Cc: Andrew Martin, Jim Rees, Brown Neil, linux-nfs-owner, linux-nfs DQpUcm9uZCwNCg0KSW4gdGhpcyBjYXNlLCBpdCBpc24ndCBmc3luYyBvciBjbG9zZSB0aGF0IGFy ZSBub3QgZ2V0dGluZyB0aGUgaS9vIGVycm9yLiAgSXQgaXMgdGhlIHdyaXRlKCkuICANCg0KQW5k IHdlIGNoZWNrIHRoZSByZXR1cm4gdmFsdWUgb2YgZXZlcnkgaS9vIHJlbGF0ZWQgY29tbWFuZC4N Cg0KV2UgYXJlbid0IHVzaW5nIHN5bmNocm9ub3VzIGJlY2F1c2UgdGhlIHBlcmZvcm1hbmNlIGJl Y29tZXMgYWJ5c21hbC4NCg0KUmVwZWF0ZWQgdW1vdW50IC1mIGRvZXMgZXZlbnR1YWxseSByZXN1 bHQgaW4gdGhlIGkvbyBlcnJvciBnZXR0aW5nIHByb3BhZ2F0ZWQgYmFjayB0byB0aGUgd3JpdGUo KSBjYWxsLiAgIEkgc3VzcGVjdCB0aGUgcmVwZWF0ZWQgdW1vdW50IC1mJ3MgYXJlIHdvcmtpbmcg dGhlaXIgd2F5IHRocm91Z2ggYmxvY2tzIGluIHRoZSBjYWNoZS9xdWV1ZSBhbmQgZXZlbnR1YWxs eSB3ZSBnZXQgYmFjayB0byB0aGUgYmxvY2tlZCB3cml0ZS4gICAgDQoNCkFzIEkgbWVudGlvbmVk IHByZXZpb3VzbHksIGlmIHdlIG1vdW50IHdpdGggc3luYyBvciBkaXJlY3QgaS9vIHR5cGUgb3B0 aW9ucywgd2Ugd2lsbCBnZXQgdGhlIGkvbyBlcnJvciwgYnV0IGZvciBwZXJmb3JtYW5jZSByZWFz b25zLCB0aGlzIGlzbid0IGFuIG9wdGlvbi4NCg0KLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0N CkZyb206IFRyb25kIE15a2xlYnVzdCA8dHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbT4N CkRhdGU6IFRodSwgNiBNYXIgMjAxNCAxNDowNjoyNCANClRvOiA8Ymhhd2xleUBsdW1pbmV4LmNv bT4NCkNjOiBBbmRyZXcgTWFydGluPGFtYXJ0aW5AeGVzLWluYy5jb20+OyBKaW0gUmVlczxyZWVz QHVtaWNoLmVkdT47IEJyb3duIE5laWw8bmVpbGJAc3VzZS5kZT47IDxsaW51eC1uZnMtb3duZXJA dmdlci5rZXJuZWwub3JnPjsgPGxpbnV4LW5mc0B2Z2VyLmtlcm5lbC5vcmc+DQpTdWJqZWN0OiBS ZTogT3B0aW1hbCBORlMgbW91bnQgb3B0aW9ucyB0byBzYWZlbHkgYWxsb3cgaW50ZXJydXB0cyBh bmQgdGltZW91dHMgb24gbmV3ZXIga2VybmVscw0KDQoNCk9uIE1hciA2LCAyMDE0LCBhdCAxNDow MCwgQnJpYW4gSGF3bGV5IDxiaGF3bGV5QGx1bWluZXguY29tPiB3cm90ZToNCg0KPiANCj4gRXZl biB3aXRoIHNtYWxsIHRpbWVvIGFuZCByZXRyYW5zLCB5b3Ugd29uJ3QgZ2V0IGkvbyBlcnJvcnMg YmFjayB0byB0aGUgcmVhZHMvd3JpdGVzLiAgIFRoYXQncyBiZWVuIG91ciBleHBlcmllbmNlIGFu eXdheS4NCg0KUmVhZCBjYWNoaW5nLCBhbmQgYnVmZmVyZWQgd3JpdGVzIG1lYW4gdGhhdCB0aGUg SS9PIGVycm9ycyBvZnRlbiBkbyBub3Qgb2NjdXIgZHVyaW5nIHRoZSByZWFkKCkvd3JpdGUoKSBz eXN0ZW0gY2FsbCBpdHNlbGYuDQoNCldlIGRvIHRyeSB0byBwcm9wYWdhdGUgSS9PIGVycm9ycyBi YWNrIHRvIHRoZSBhcHBsaWNhdGlvbiBhcyBzb29uIGFzIHRoZSBkbyBvY2N1ciwgYnV0IGlmIHRo YXQgYXBwbGljYXRpb24gaXNuknQgdXNpbmcgc3luY2hyb25vdXMgSS9PLCBhbmQgaXQgaXNuknQg Y2hlY2tpbmcgdGhlIHJldHVybiB2YWx1ZXMgb2YgZnN5bmMoKSBvciBjbG9zZSgpLCB0aGVuIHRo ZXJlIGlzIGxpdHRsZSB0aGUga2VybmVsIGNhbiBkby4uLg0KDQo+IA0KPiBXaXRoIHNvZnQsIHlv dSBtYXkgZW5kIHVwIHdpdGggbG9zdCBkYXRhIChkYXRhIHRoYXQgaGFkIGFscmVhZHkgYmVlbiB3 cml0dGVuIHRvIHRoZSBjYWNoZSBidXQgbm90IHlldCB0byB0aGUgc3RvcmFnZSkuICAgWW91J2Qg aGF2ZSB0aGF0IHNhbWUgaXNzdWUgd2l0aCAnaGFyZCcgdG9vIGlmIGl0IHdhcyB5b3VyIGFwcGxp YW5jZSB0aGF0IGZhaWxlZC4gIElmIHRoZSBhcHBsaWFuY2UgbmV2ZXIgY29tZXMgYmFjaywgdGhv c2UgYmxvY2tzIGNhbiBuZXZlciBiZSB3cml0dGVuLg0KPiANCj4gSW4geW91ciBjYXNlIHRob3Vn aCwgeW91J3JlIG5vdCB3cml0aW5nLiAgDQo+IA0KPiANCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdl LS0tLS0NCj4gRnJvbTogQW5kcmV3IE1hcnRpbiA8YW1hcnRpbkB4ZXMtaW5jLmNvbT4NCj4gRGF0 ZTogVGh1LCA2IE1hciAyMDE0IDEwOjQzOjQyIA0KPiBUbzogSmltIFJlZXM8cmVlc0B1bWljaC5l ZHU+DQo+IENjOiA8Ymhhd2xleUBsdW1pbmV4LmNvbT47IE5laWxCcm93bjxuZWlsYkBzdXNlLmRl PjsgPGxpbnV4LW5mcy1vd25lckB2Z2VyLmtlcm5lbC5vcmc+OyA8bGludXgtbmZzQHZnZXIua2Vy bmVsLm9yZz4NCj4gU3ViamVjdDogUmU6IE9wdGltYWwgTkZTIG1vdW50IG9wdGlvbnMgdG8gc2Fm ZWx5IGFsbG93IGludGVycnVwdHMgYW5kDQo+IHRpbWVvdXRzIG9uIG5ld2VyIGtlcm5lbHMNCj4g DQo+PiBGcm9tOiAiSmltIFJlZXMiIDxyZWVzQHVtaWNoLmVkdT4NCj4+IEFuZHJldyBNYXJ0aW4g d3JvdGU6DQo+PiANCj4+PiBGcm9tOiAiSmltIFJlZXMiIDxyZWVzQHVtaWNoLmVkdT4NCj4+PiBH aXZlbiB0aGlzIGlzIGFwYWNoZSwgSSB0aGluayBpZiBJIHdlcmUgZG9pbmcgdGhpcyBJJ2QgdXNl DQo+Pj4gcm8sc29mdCxpbnRyLHRjcA0KPj4+IGFuZCBub3QgdHJ5IHRvIHdyaXRlIGFueXRoaW5n IHRvIG5mcy4NCj4+ICBJIHdhcyB1c2luZyB0Y3AsYmcsc29mdCxpbnRyIHdoZW4gdGhpcyBwcm9i bGVtIG9jY3VycmVkLiBJIGRvIG5vdCBrbm93IGlmDQo+PiAgYXBhY2hlIHdhcyBhdHRlbXB0aW5n IHRvIGRvIGEgd3JpdGUgb3IgYSByZWFkLCBidXQgaXQgc2VlbXMgdGhhdA0KPj4gIHRjcCxzb2Z0 LGludHINCj4+ICB3YXMgbm90IHN1ZmZpY2llbnQgdG8gcHJldmVudCB0aGUgcHJvYmxlbS4NCj4+ IA0KPj4gSSBoYWQgdGhlIGltcHJlc3Npb24gZnJvbSB5b3VyIG9yaWdpbmFsIG1lc3NhZ2UgdGhh dCB5b3Ugd2VyZSBub3QgdXNpbmcNCj4+ICJzb2Z0IiBhbmQgd2VyZSBhc2tpbmcgaWYgaXQncyBz YWZlIHRvIHVzZSBpdC4gQXJlIHlvdSBzYXlpbmcgdGhhdCBldmVuIHdpdGgNCj4+IHRoZSAic29m dCIgb3B0aW9uIHRoZSBhcGFjaGUgZ2V0cyBzdHVjayBmb3JldmVyPw0KPiBZZXMsIGV2ZW4gd2l0 aCBzb2Z0LCBpdCBnZXRzIHN0dWNrIGZvcmV2ZXIuIEkgaGFkIGJlZW4gdXNpbmcgdGNwLGJnLHNv ZnQsaW50cg0KPiB3aGVuIHRoZSBwcm9ibGVtIG9jY3VycmVkIChvbiBzZXZlcmFsIG9jYXNzaW9u cyksIHNvIG15IG9yaWdpbmFsIHF1ZXN0aW9uIHdhcw0KPiBpZiBpdCB3b3VsZCBiZSBzYWZlIHRv IHVzZSBhIHNtYWxsIHRpbWVvIGFuZCByZXRyYW5zIHZhbHVlcyB0byBob3BlZnVsbHkgDQo+IHJl dHVybiBJL08gZXJyb3JzIHF1aWNrbHkgdG8gdGhlIGFwcGxpY2F0aW9uLCByYXRoZXIgdGhhbiBi bG9ja2luZyBmb3JldmVyIA0KPiAod2hpY2ggY2F1c2VzIHRoZSBoaWdoIGxvYWQgYW5kIGluZXZp dGFibGUgcmVib290KS4gSXQgc291bmRzIGxpa2UgdGhhdCBpc24ndA0KPiBzYWZlLCBidXQgcGVy aGFwcyB0aGVyZSBpcyBhbm90aGVyIHdheSB0byByZXNvbHZlIHRoaXMgcHJvYmxlbT8NCj4gLS0N Cj4gVG8gdW5zdWJzY3JpYmUgZnJvbSB0aGlzIGxpc3Q6IHNlbmQgdGhlIGxpbmUgInVuc3Vic2Ny aWJlIGxpbnV4LW5mcyIgaW4NCj4gdGhlIGJvZHkgb2YgYSBtZXNzYWdlIHRvIG1ham9yZG9tb0B2 Z2VyLmtlcm5lbC5vcmcNCj4gTW9yZSBtYWpvcmRvbW8gaW5mbyBhdCAgaHR0cDovL3ZnZXIua2Vy bmVsLm9yZy9tYWpvcmRvbW8taW5mby5odG1sDQo+IA0KDQpfX19fX19fX19fX19fX19fX19fX19f X19fX19fX19fX18NClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWludGFpbmVy LCBQcmltYXJ5RGF0YQ0KdHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbQ0KDQo= ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 19:14 ` Brian Hawley @ 2014-03-06 19:26 ` Trond Myklebust 2014-03-06 19:33 ` Brian Hawley 2014-03-06 19:29 ` Ric Wheeler 1 sibling, 1 reply; 55+ messages in thread From: Trond Myklebust @ 2014-03-06 19:26 UTC (permalink / raw) To: bhawley; +Cc: Andrew Martin, Jim Rees, Brown Neil, linux-nfs-owner, linux-nfs On Mar 6, 2014, at 14:14, Brian Hawley <bhawley@luminex.com> wrote: > > Trond, > > In this case, it isn't fsync or close that are not getting the i/o error. It is the write(). My point is that write() isn’t even required to return an error in the case where your NFS server is unavailable. Unless you use O_SYNC or O_DIRECT writes, then the kernel is entitled and indeed expected to cache the data in its page cache until you explicitly call fsync(). The return value of that fsync() call is what tells you whether or not your data has safely been stored to disk. > And we check the return value of every i/o related command. > We aren't using synchronous because the performance becomes abysmal. > > Repeated umount -f does eventually result in the i/o error getting propagated back to the write() call. I suspect the repeated umount -f's are working their way through blocks in the cache/queue and eventually we get back to the blocked write. > > As I mentioned previously, if we mount with sync or direct i/o type options, we will get the i/o error, but for performance reasons, this isn't an option. Sure, but in that case you do need to call fsync() before the application exits. Nothing else can guarantee data stability, and that’s true for all storage. > -----Original Message----- > From: Trond Myklebust <trond.myklebust@primarydata.com> > Date: Thu, 6 Mar 2014 14:06:24 > To: <bhawley@luminex.com> > Cc: Andrew Martin<amartin@xes-inc.com>; Jim Rees<rees@umich.edu>; Brown Neil<neilb@suse.de>; <linux-nfs-owner@vger.kernel.org>; <linux-nfs@vger.kernel.org> > Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels > > > On Mar 6, 2014, at 14:00, Brian Hawley <bhawley@luminex.com> wrote: > >> >> Even with small timeo and retrans, you won't get i/o errors back to the reads/writes. That's been our experience anyway. > > Read caching, and buffered writes mean that the I/O errors often do not occur during the read()/write() system call itself. > > We do try to propagate I/O errors back to the application as soon as the do occur, but if that application isn’t using synchronous I/O, and it isn’t checking the return values of fsync() or close(), then there is little the kernel can do... > >> >> With soft, you may end up with lost data (data that had already been written to the cache but not yet to the storage). You'd have that same issue with 'hard' too if it was your appliance that failed. If the appliance never comes back, those blocks can never be written. >> >> In your case though, you're not writing. >> >> >> -----Original Message----- >> From: Andrew Martin <amartin@xes-inc.com> >> Date: Thu, 6 Mar 2014 10:43:42 >> To: Jim Rees<rees@umich.edu> >> Cc: <bhawley@luminex.com>; NeilBrown<neilb@suse.de>; <linux-nfs-owner@vger.kernel.org>; <linux-nfs@vger.kernel.org> >> Subject: Re: Optimal NFS mount options to safely allow interrupts and >> timeouts on newer kernels >> >>> From: "Jim Rees" <rees@umich.edu> >>> Andrew Martin wrote: >>> >>>> From: "Jim Rees" <rees@umich.edu> >>>> Given this is apache, I think if I were doing this I'd use >>>> ro,soft,intr,tcp >>>> and not try to write anything to nfs. >>> I was using tcp,bg,soft,intr when this problem occurred. I do not know if >>> apache was attempting to do a write or a read, but it seems that >>> tcp,soft,intr >>> was not sufficient to prevent the problem. >>> >>> I had the impression from your original message that you were not using >>> "soft" and were asking if it's safe to use it. Are you saying that even with >>> the "soft" option the apache gets stuck forever? >> Yes, even with soft, it gets stuck forever. I had been using tcp,bg,soft,intr >> when the problem occurred (on several ocassions), so my original question was >> if it would be safe to use a small timeo and retrans values to hopefully >> return I/O errors quickly to the application, rather than blocking forever >> (which causes the high load and inevitable reboot). It sounds like that isn't >> safe, but perhaps there is another way to resolve this problem? >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > _________________________________ > Trond Myklebust > Linux NFS client maintainer, PrimaryData > trond.myklebust@primarydata.com > _________________________________ Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 19:26 ` Trond Myklebust @ 2014-03-06 19:33 ` Brian Hawley 2014-03-06 19:47 ` Trond Myklebust 0 siblings, 1 reply; 55+ messages in thread From: Brian Hawley @ 2014-03-06 19:33 UTC (permalink / raw) To: Trond Myklebust, Brian Hawley Cc: Andrew Martin, Jim Rees, Brown Neil, linux-nfs-owner, linux-nfs DQpXZSBkbyBjYWxsIGZzeW5jIGF0IHN5bmNocm9uaXphdGlvbiBwb2ludHMuDQoNClRoZSBwcm9i bGVtIGlzIHRoZSB3cml0ZSgpIGJsb2NrcyBmb3JldmVyIChvciBmb3IgYW4gZXhjZXB0aW9uYWxs eSBsb25nIHRpbWUgb24gdGhlIG9yZGVyIG9mIGhvdXJzIGFuZCBkYXlzKSwgZXZlbiB3aXRoIHRp bWVvIHNldCB0byBzYXkgMjAgYW5kIHJldHJhbnMgc2V0IHRvIDIuICBXZSBzZWUgdGltZW91dCBt ZXNzYWdlcyBpbiAvdmFyL2xvZy9tZXNzYWdlcywgYnV0IHRoZSB3cml0ZSBjb250aW51ZXMgdG8g cGVuZC4gICBVbnRpbCB3ZSBzdGFydCBkb2luZyByZXBlYXRlZCB1bW91bnQgLWYncy4gIFRoZW4g aXQgcmV0dXJucyBhbmQgaGFzIGFuIGkvbyBlcnJvci4NCg0KDQotLS0tLU9yaWdpbmFsIE1lc3Nh Z2UtLS0tLQ0KRnJvbTogVHJvbmQgTXlrbGVidXN0IDx0cm9uZC5teWtsZWJ1c3RAcHJpbWFyeWRh dGEuY29tPg0KRGF0ZTogVGh1LCA2IE1hciAyMDE0IDE0OjI2OjI0IA0KVG86IDxiaGF3bGV5QGx1 bWluZXguY29tPg0KQ2M6IEFuZHJldyBNYXJ0aW48YW1hcnRpbkB4ZXMtaW5jLmNvbT47IEppbSBS ZWVzPHJlZXNAdW1pY2guZWR1PjsgQnJvd24gTmVpbDxuZWlsYkBzdXNlLmRlPjsgPGxpbnV4LW5m cy1vd25lckB2Z2VyLmtlcm5lbC5vcmc+OyA8bGludXgtbmZzQHZnZXIua2VybmVsLm9yZz4NClN1 YmplY3Q6IFJlOiBPcHRpbWFsIE5GUyBtb3VudCBvcHRpb25zIHRvIHNhZmVseSBhbGxvdyBpbnRl cnJ1cHRzIGFuZCB0aW1lb3V0cyBvbiBuZXdlciBrZXJuZWxzDQoNCg0KT24gTWFyIDYsIDIwMTQs IGF0IDE0OjE0LCBCcmlhbiBIYXdsZXkgPGJoYXdsZXlAbHVtaW5leC5jb20+IHdyb3RlOg0KDQo+ IA0KPiBUcm9uZCwNCj4gDQo+IEluIHRoaXMgY2FzZSwgaXQgaXNuJ3QgZnN5bmMgb3IgY2xvc2Ug dGhhdCBhcmUgbm90IGdldHRpbmcgdGhlIGkvbyBlcnJvci4gIEl0IGlzIHRoZSB3cml0ZSgpLiAg DQoNCk15IHBvaW50IGlzIHRoYXQgd3JpdGUoKSBpc26SdCBldmVuIHJlcXVpcmVkIHRvIHJldHVy biBhbiBlcnJvciBpbiB0aGUgY2FzZSB3aGVyZSB5b3VyIE5GUyBzZXJ2ZXIgaXMgdW5hdmFpbGFi bGUuIFVubGVzcyB5b3UgdXNlIE9fU1lOQyBvciBPX0RJUkVDVCB3cml0ZXMsIHRoZW4gdGhlIGtl cm5lbCBpcyBlbnRpdGxlZCBhbmQgaW5kZWVkIGV4cGVjdGVkIHRvIGNhY2hlIHRoZSBkYXRhIGlu IGl0cyBwYWdlIGNhY2hlIHVudGlsIHlvdSBleHBsaWNpdGx5IGNhbGwgZnN5bmMoKS4gVGhlIHJl dHVybiB2YWx1ZSBvZiB0aGF0IGZzeW5jKCkgY2FsbCBpcyB3aGF0IHRlbGxzIHlvdSB3aGV0aGVy IG9yIG5vdCB5b3VyIGRhdGEgaGFzIHNhZmVseSBiZWVuIHN0b3JlZCB0byBkaXNrLg0KDQo+IEFu ZCB3ZSBjaGVjayB0aGUgcmV0dXJuIHZhbHVlIG9mIGV2ZXJ5IGkvbyByZWxhdGVkIGNvbW1hbmQu DQoNCj4gV2UgYXJlbid0IHVzaW5nIHN5bmNocm9ub3VzIGJlY2F1c2UgdGhlIHBlcmZvcm1hbmNl IGJlY29tZXMgYWJ5c21hbC4NCj4gDQo+IFJlcGVhdGVkIHVtb3VudCAtZiBkb2VzIGV2ZW50dWFs bHkgcmVzdWx0IGluIHRoZSBpL28gZXJyb3IgZ2V0dGluZyBwcm9wYWdhdGVkIGJhY2sgdG8gdGhl IHdyaXRlKCkgY2FsbC4gICBJIHN1c3BlY3QgdGhlIHJlcGVhdGVkIHVtb3VudCAtZidzIGFyZSB3 b3JraW5nIHRoZWlyIHdheSB0aHJvdWdoIGJsb2NrcyBpbiB0aGUgY2FjaGUvcXVldWUgYW5kIGV2 ZW50dWFsbHkgd2UgZ2V0IGJhY2sgdG8gdGhlIGJsb2NrZWQgd3JpdGUuICAgIA0KPiANCj4gQXMg SSBtZW50aW9uZWQgcHJldmlvdXNseSwgaWYgd2UgbW91bnQgd2l0aCBzeW5jIG9yIGRpcmVjdCBp L28gdHlwZSBvcHRpb25zLCB3ZSB3aWxsIGdldCB0aGUgaS9vIGVycm9yLCBidXQgZm9yIHBlcmZv cm1hbmNlIHJlYXNvbnMsIHRoaXMgaXNuJ3QgYW4gb3B0aW9uLg0KDQpTdXJlLCBidXQgaW4gdGhh dCBjYXNlIHlvdSBkbyBuZWVkIHRvIGNhbGwgZnN5bmMoKSBiZWZvcmUgdGhlIGFwcGxpY2F0aW9u IGV4aXRzLiBOb3RoaW5nIGVsc2UgY2FuIGd1YXJhbnRlZSBkYXRhIHN0YWJpbGl0eSwgYW5kIHRo YXSScyB0cnVlIGZvciBhbGwgc3RvcmFnZS4NCg0KPiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0t LQ0KPiBGcm9tOiBUcm9uZCBNeWtsZWJ1c3QgPHRyb25kLm15a2xlYnVzdEBwcmltYXJ5ZGF0YS5j b20+DQo+IERhdGU6IFRodSwgNiBNYXIgMjAxNCAxNDowNjoyNCANCj4gVG86IDxiaGF3bGV5QGx1 bWluZXguY29tPg0KPiBDYzogQW5kcmV3IE1hcnRpbjxhbWFydGluQHhlcy1pbmMuY29tPjsgSmlt IFJlZXM8cmVlc0B1bWljaC5lZHU+OyBCcm93biBOZWlsPG5laWxiQHN1c2UuZGU+OyA8bGludXgt bmZzLW93bmVyQHZnZXIua2VybmVsLm9yZz47IDxsaW51eC1uZnNAdmdlci5rZXJuZWwub3JnPg0K PiBTdWJqZWN0OiBSZTogT3B0aW1hbCBORlMgbW91bnQgb3B0aW9ucyB0byBzYWZlbHkgYWxsb3cg aW50ZXJydXB0cyBhbmQgdGltZW91dHMgb24gbmV3ZXIga2VybmVscw0KPiANCj4gDQo+IE9uIE1h ciA2LCAyMDE0LCBhdCAxNDowMCwgQnJpYW4gSGF3bGV5IDxiaGF3bGV5QGx1bWluZXguY29tPiB3 cm90ZToNCj4gDQo+PiANCj4+IEV2ZW4gd2l0aCBzbWFsbCB0aW1lbyBhbmQgcmV0cmFucywgeW91 IHdvbid0IGdldCBpL28gZXJyb3JzIGJhY2sgdG8gdGhlIHJlYWRzL3dyaXRlcy4gICBUaGF0J3Mg YmVlbiBvdXIgZXhwZXJpZW5jZSBhbnl3YXkuDQo+IA0KPiBSZWFkIGNhY2hpbmcsIGFuZCBidWZm ZXJlZCB3cml0ZXMgbWVhbiB0aGF0IHRoZSBJL08gZXJyb3JzIG9mdGVuIGRvIG5vdCBvY2N1ciBk dXJpbmcgdGhlIHJlYWQoKS93cml0ZSgpIHN5c3RlbSBjYWxsIGl0c2VsZi4NCj4gDQo+IFdlIGRv IHRyeSB0byBwcm9wYWdhdGUgSS9PIGVycm9ycyBiYWNrIHRvIHRoZSBhcHBsaWNhdGlvbiBhcyBz b29uIGFzIHRoZSBkbyBvY2N1ciwgYnV0IGlmIHRoYXQgYXBwbGljYXRpb24gaXNuknQgdXNpbmcg c3luY2hyb25vdXMgSS9PLCBhbmQgaXQgaXNuknQgY2hlY2tpbmcgdGhlIHJldHVybiB2YWx1ZXMg b2YgZnN5bmMoKSBvciBjbG9zZSgpLCB0aGVuIHRoZXJlIGlzIGxpdHRsZSB0aGUga2VybmVsIGNh biBkby4uLg0KPiANCj4+IA0KPj4gV2l0aCBzb2Z0LCB5b3UgbWF5IGVuZCB1cCB3aXRoIGxvc3Qg ZGF0YSAoZGF0YSB0aGF0IGhhZCBhbHJlYWR5IGJlZW4gd3JpdHRlbiB0byB0aGUgY2FjaGUgYnV0 IG5vdCB5ZXQgdG8gdGhlIHN0b3JhZ2UpLiAgIFlvdSdkIGhhdmUgdGhhdCBzYW1lIGlzc3VlIHdp dGggJ2hhcmQnIHRvbyBpZiBpdCB3YXMgeW91ciBhcHBsaWFuY2UgdGhhdCBmYWlsZWQuICBJZiB0 aGUgYXBwbGlhbmNlIG5ldmVyIGNvbWVzIGJhY2ssIHRob3NlIGJsb2NrcyBjYW4gbmV2ZXIgYmUg d3JpdHRlbi4NCj4+IA0KPj4gSW4geW91ciBjYXNlIHRob3VnaCwgeW91J3JlIG5vdCB3cml0aW5n LiAgDQo+PiANCj4+IA0KPj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4+IEZyb206IEFu ZHJldyBNYXJ0aW4gPGFtYXJ0aW5AeGVzLWluYy5jb20+DQo+PiBEYXRlOiBUaHUsIDYgTWFyIDIw MTQgMTA6NDM6NDIgDQo+PiBUbzogSmltIFJlZXM8cmVlc0B1bWljaC5lZHU+DQo+PiBDYzogPGJo YXdsZXlAbHVtaW5leC5jb20+OyBOZWlsQnJvd248bmVpbGJAc3VzZS5kZT47IDxsaW51eC1uZnMt b3duZXJAdmdlci5rZXJuZWwub3JnPjsgPGxpbnV4LW5mc0B2Z2VyLmtlcm5lbC5vcmc+DQo+PiBT dWJqZWN0OiBSZTogT3B0aW1hbCBORlMgbW91bnQgb3B0aW9ucyB0byBzYWZlbHkgYWxsb3cgaW50 ZXJydXB0cyBhbmQNCj4+IHRpbWVvdXRzIG9uIG5ld2VyIGtlcm5lbHMNCj4+IA0KPj4+IEZyb206 ICJKaW0gUmVlcyIgPHJlZXNAdW1pY2guZWR1Pg0KPj4+IEFuZHJldyBNYXJ0aW4gd3JvdGU6DQo+ Pj4gDQo+Pj4+IEZyb206ICJKaW0gUmVlcyIgPHJlZXNAdW1pY2guZWR1Pg0KPj4+PiBHaXZlbiB0 aGlzIGlzIGFwYWNoZSwgSSB0aGluayBpZiBJIHdlcmUgZG9pbmcgdGhpcyBJJ2QgdXNlDQo+Pj4+ IHJvLHNvZnQsaW50cix0Y3ANCj4+Pj4gYW5kIG5vdCB0cnkgdG8gd3JpdGUgYW55dGhpbmcgdG8g bmZzLg0KPj4+IEkgd2FzIHVzaW5nIHRjcCxiZyxzb2Z0LGludHIgd2hlbiB0aGlzIHByb2JsZW0g b2NjdXJyZWQuIEkgZG8gbm90IGtub3cgaWYNCj4+PiBhcGFjaGUgd2FzIGF0dGVtcHRpbmcgdG8g ZG8gYSB3cml0ZSBvciBhIHJlYWQsIGJ1dCBpdCBzZWVtcyB0aGF0DQo+Pj4gdGNwLHNvZnQsaW50 cg0KPj4+IHdhcyBub3Qgc3VmZmljaWVudCB0byBwcmV2ZW50IHRoZSBwcm9ibGVtLg0KPj4+IA0K Pj4+IEkgaGFkIHRoZSBpbXByZXNzaW9uIGZyb20geW91ciBvcmlnaW5hbCBtZXNzYWdlIHRoYXQg eW91IHdlcmUgbm90IHVzaW5nDQo+Pj4gInNvZnQiIGFuZCB3ZXJlIGFza2luZyBpZiBpdCdzIHNh ZmUgdG8gdXNlIGl0LiBBcmUgeW91IHNheWluZyB0aGF0IGV2ZW4gd2l0aA0KPj4+IHRoZSAic29m dCIgb3B0aW9uIHRoZSBhcGFjaGUgZ2V0cyBzdHVjayBmb3JldmVyPw0KPj4gWWVzLCBldmVuIHdp dGggc29mdCwgaXQgZ2V0cyBzdHVjayBmb3JldmVyLiBJIGhhZCBiZWVuIHVzaW5nIHRjcCxiZyxz b2Z0LGludHINCj4+IHdoZW4gdGhlIHByb2JsZW0gb2NjdXJyZWQgKG9uIHNldmVyYWwgb2Nhc3Np b25zKSwgc28gbXkgb3JpZ2luYWwgcXVlc3Rpb24gd2FzDQo+PiBpZiBpdCB3b3VsZCBiZSBzYWZl IHRvIHVzZSBhIHNtYWxsIHRpbWVvIGFuZCByZXRyYW5zIHZhbHVlcyB0byBob3BlZnVsbHkgDQo+ PiByZXR1cm4gSS9PIGVycm9ycyBxdWlja2x5IHRvIHRoZSBhcHBsaWNhdGlvbiwgcmF0aGVyIHRo YW4gYmxvY2tpbmcgZm9yZXZlciANCj4+ICh3aGljaCBjYXVzZXMgdGhlIGhpZ2ggbG9hZCBhbmQg aW5ldml0YWJsZSByZWJvb3QpLiBJdCBzb3VuZHMgbGlrZSB0aGF0IGlzbid0DQo+PiBzYWZlLCBi dXQgcGVyaGFwcyB0aGVyZSBpcyBhbm90aGVyIHdheSB0byByZXNvbHZlIHRoaXMgcHJvYmxlbT8N Cj4+IC0tDQo+PiBUbyB1bnN1YnNjcmliZSBmcm9tIHRoaXMgbGlzdDogc2VuZCB0aGUgbGluZSAi dW5zdWJzY3JpYmUgbGludXgtbmZzIiBpbg0KPj4gdGhlIGJvZHkgb2YgYSBtZXNzYWdlIHRvIG1h am9yZG9tb0B2Z2VyLmtlcm5lbC5vcmcNCj4+IE1vcmUgbWFqb3Jkb21vIGluZm8gYXQgIGh0dHA6 Ly92Z2VyLmtlcm5lbC5vcmcvbWFqb3Jkb21vLWluZm8uaHRtbA0KPj4gDQo+IA0KPiBfX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX18NCj4gVHJvbmQgTXlrbGVidXN0DQo+IExpbnV4IE5G UyBjbGllbnQgbWFpbnRhaW5lciwgUHJpbWFyeURhdGENCj4gdHJvbmQubXlrbGVidXN0QHByaW1h cnlkYXRhLmNvbQ0KPiANCg0KX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fDQpUcm9u ZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lciwgUHJpbWFyeURhdGENCnRy b25kLm15a2xlYnVzdEBwcmltYXJ5ZGF0YS5jb20NCg0K ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 19:33 ` Brian Hawley @ 2014-03-06 19:47 ` Trond Myklebust 2014-03-06 19:56 ` Brian Hawley 0 siblings, 1 reply; 55+ messages in thread From: Trond Myklebust @ 2014-03-06 19:47 UTC (permalink / raw) To: bhawley; +Cc: Andrew Martin, Jim Rees, Brown Neil, linux-nfs-owner, linux-nfs On Mar 6, 2014, at 14:33, Brian Hawley <bhawley@luminex.com> wrote: > > We do call fsync at synchronization points. > > The problem is the write() blocks forever (or for an exceptionally long time on the order of hours and days), even with timeo set to say 20 and retrans set to 2. We see timeout messages in /var/log/messages, but the write continues to pend. Until we start doing repeated umount -f's. Then it returns and has an i/o error. How much data are you trying to sync? ‘soft’ won’t time out the entire batch at once. It feeds each write RPC call through, and lets it time out. So if you have cached a huge amount of writes, then that can take a while. The solution is to play with the ‘dirty_background_bytes’ (and/or ‘dirty_bytes’) sysctl so that it starts writeback at an earlier time. Also, what is the cause of these stalls in the first place? Is the TCP connection to the server still up? Are any Oopses present in either the client or the server syslogs? > -----Original Message----- > From: Trond Myklebust <trond.myklebust@primarydata.com> > Date: Thu, 6 Mar 2014 14:26:24 > To: <bhawley@luminex.com> > Cc: Andrew Martin<amartin@xes-inc.com>; Jim Rees<rees@umich.edu>; Brown Neil<neilb@suse.de>; <linux-nfs-owner@vger.kernel.org>; <linux-nfs@vger.kernel.org> > Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels > > > On Mar 6, 2014, at 14:14, Brian Hawley <bhawley@luminex.com> wrote: > >> >> Trond, >> >> In this case, it isn't fsync or close that are not getting the i/o error. It is the write(). > > My point is that write() isn’t even required to return an error in the case where your NFS server is unavailable. Unless you use O_SYNC or O_DIRECT writes, then the kernel is entitled and indeed expected to cache the data in its page cache until you explicitly call fsync(). The return value of that fsync() call is what tells you whether or not your data has safely been stored to disk. > >> And we check the return value of every i/o related command. > >> We aren't using synchronous because the performance becomes abysmal. >> >> Repeated umount -f does eventually result in the i/o error getting propagated back to the write() call. I suspect the repeated umount -f's are working their way through blocks in the cache/queue and eventually we get back to the blocked write. >> >> As I mentioned previously, if we mount with sync or direct i/o type options, we will get the i/o error, but for performance reasons, this isn't an option. > > Sure, but in that case you do need to call fsync() before the application exits. Nothing else can guarantee data stability, and that’s true for all storage. > >> -----Original Message----- >> From: Trond Myklebust <trond.myklebust@primarydata.com> >> Date: Thu, 6 Mar 2014 14:06:24 >> To: <bhawley@luminex.com> >> Cc: Andrew Martin<amartin@xes-inc.com>; Jim Rees<rees@umich.edu>; Brown Neil<neilb@suse.de>; <linux-nfs-owner@vger.kernel.org>; <linux-nfs@vger.kernel.org> >> Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels >> >> >> On Mar 6, 2014, at 14:00, Brian Hawley <bhawley@luminex.com> wrote: >> >>> >>> Even with small timeo and retrans, you won't get i/o errors back to the reads/writes. That's been our experience anyway. >> >> Read caching, and buffered writes mean that the I/O errors often do not occur during the read()/write() system call itself. >> >> We do try to propagate I/O errors back to the application as soon as the do occur, but if that application isn’t using synchronous I/O, and it isn’t checking the return values of fsync() or close(), then there is little the kernel can do... >> >>> >>> With soft, you may end up with lost data (data that had already been written to the cache but not yet to the storage). You'd have that same issue with 'hard' too if it was your appliance that failed. If the appliance never comes back, those blocks can never be written. >>> >>> In your case though, you're not writing. >>> >>> >>> -----Original Message----- >>> From: Andrew Martin <amartin@xes-inc.com> >>> Date: Thu, 6 Mar 2014 10:43:42 >>> To: Jim Rees<rees@umich.edu> >>> Cc: <bhawley@luminex.com>; NeilBrown<neilb@suse.de>; <linux-nfs-owner@vger.kernel.org>; <linux-nfs@vger.kernel.org> >>> Subject: Re: Optimal NFS mount options to safely allow interrupts and >>> timeouts on newer kernels >>> >>>> From: "Jim Rees" <rees@umich.edu> >>>> Andrew Martin wrote: >>>> >>>>> From: "Jim Rees" <rees@umich.edu> >>>>> Given this is apache, I think if I were doing this I'd use >>>>> ro,soft,intr,tcp >>>>> and not try to write anything to nfs. >>>> I was using tcp,bg,soft,intr when this problem occurred. I do not know if >>>> apache was attempting to do a write or a read, but it seems that >>>> tcp,soft,intr >>>> was not sufficient to prevent the problem. >>>> >>>> I had the impression from your original message that you were not using >>>> "soft" and were asking if it's safe to use it. Are you saying that even with >>>> the "soft" option the apache gets stuck forever? >>> Yes, even with soft, it gets stuck forever. I had been using tcp,bg,soft,intr >>> when the problem occurred (on several ocassions), so my original question was >>> if it would be safe to use a small timeo and retrans values to hopefully >>> return I/O errors quickly to the application, rather than blocking forever >>> (which causes the high load and inevitable reboot). It sounds like that isn't >>> safe, but perhaps there is another way to resolve this problem? >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> >> _________________________________ >> Trond Myklebust >> Linux NFS client maintainer, PrimaryData >> trond.myklebust@primarydata.com >> > > _________________________________ > Trond Myklebust > Linux NFS client maintainer, PrimaryData > trond.myklebust@primarydata.com > _________________________________ Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 19:47 ` Trond Myklebust @ 2014-03-06 19:56 ` Brian Hawley 2014-03-06 20:31 ` Trond Myklebust 0 siblings, 1 reply; 55+ messages in thread From: Brian Hawley @ 2014-03-06 19:56 UTC (permalink / raw) To: Trond Myklebust, linux-nfs-owner, Brian Hawley Cc: Andrew Martin, Jim Rees, Brown Neil, linux-nfs DQpHaXZlbiB0aGF0IHRoZSBzeXN0ZW1zIHR5cGljYWxseSBoYXZlIDE2R0IncywgdGhlIG1lbW9y eSBhdmFpbGFibGUgZm9yIGNhY2hlIGlzIHVzdWFsbHkgYXJvdW5kIDEzR0IuDQoNCkRpcnR5IHdy aXRlYmFjayBjZW50aXNlY3MgaXMgc2V0IHRvIDEwMCwgYXMgaXMgZGlydHkgZXhwaXJlIGNlbnRp c2VjcyAod2UgYXJlIHByaW1hcmlseSBhIHNlcXVlbnRpYWwgYWNjZXNzIGFwcGxpY2F0aW9uKS4N Cg0KRGlydHkgcmF0aW8gaXMgNTAgYW5kIGRpcnR5IGJhY2tncm91bmQgcmF0aW8gaXMgMTAuIA0K DQpXZSBzZXQgdGhlc2UgdG8gdHJ5IHRvIGtlZXAgdGhlIGRhdGEgZnJvbSBjYWNoZSBhbHdheXMg YmVpbmcgcHVzaGVkIG91dC4NCg0KTm8gb29wc2VzLiAgIFR5cGljYWxseSBpdCB3b3VsZCBiZSBk dWUgdG8gYW4gYXBwbGlhbmNlIG9yIG5ldHdvcmsgY29ubmVjdGlvbiB0byBpdCBnb2luZyBkb3du LiAgQXQgd2hpY2ggcG9pbnQsIHdlIHdhbnQgdG8gZmFpbCBvdmVyIHRvIGFuIGFsdGVybmF0aXZl IGFwcGxpYW5jZSB3aGljaCBpcyBzZXJ2aW5nIHRoZSBzYW1lIGRhdGEuICAgIA0KDQpJdCdzIHVu Zm9ydHVuYXRlIHRoYXQgd2hlbiB0aGUgaS9vIGVycm9yIGlzIGRldGVjdGVkIHRoYXQgdGhlIG90 aGVyIHBhY2tldHMgY2FuJ3QganVzdCB0aW1lb3V0IHJpZ2h0IGF3YXkgd2l0aCB0aGUgaS9vIGVy cm9yLiAgIEFmdGVyIGFsbCwgaXQncyB1bmxpa2VseSB0byBjb21lIGJhY2ssIGFuZCBpZiBpdCBk b2VzLCB5b3UndmUgbG9zdCB0aGF0IGRhdGEgdGhhdCB3YXMgY2FjaGVkLiAgSSdkIGFsbW9zdCBy YXRoZXIgaGF2ZSBhbGwgdGhlIGkvbydzIHRoYXQgd2VyZSBjYWNoZWQgdXAgdG8gdGhlIGJsb2Nr ZWQgb25lIGZhaWwgc28gSSBrbm93IHRoZXJlIHdhcyBhIGZhaWx1cmUgb2Ygc29tZSBvZiB0aGUg d3JpdGVzIHByZWNlZWRpbmcgdGhlIG9uZSB0aGF0IGJsb2NrZWQgYW5kIGdvdCB0aGUgaS9vIGVy cm9yLiAgICBUaGlzIGlzIHRoZSBwcmljZSB3ZSBwYXkgZm9yIHVzaW5nICJzb2Z0IiBhbmQgaXQg aXMgYW4gZXhwZWN0ZWQgcHJpY2UuICAgT3RoZXJ3aXNlLCB3ZSdkIHVzZSAiaGFyZCIuDQoNCg0K DQoNCi0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQpGcm9tOiBUcm9uZCBNeWtsZWJ1c3QgPHRy b25kLm15a2xlYnVzdEBwcmltYXJ5ZGF0YS5jb20+DQpTZW5kZXI6IGxpbnV4LW5mcy1vd25lckB2 Z2VyLmtlcm5lbC5vcmcNCkRhdGU6CVRodSwgNiBNYXIgMjAxNCAxNDo0Nzo0OCANClRvOiA8Ymhh d2xleUBsdW1pbmV4LmNvbT4NCkNjOiBBbmRyZXcgTWFydGluPGFtYXJ0aW5AeGVzLWluYy5jb20+ OyBKaW0gUmVlczxyZWVzQHVtaWNoLmVkdT47IEJyb3duIE5laWw8bmVpbGJAc3VzZS5kZT47IDxs aW51eC1uZnMtb3duZXJAdmdlci5rZXJuZWwub3JnPjsgPGxpbnV4LW5mc0B2Z2VyLmtlcm5lbC5v cmc+DQpTdWJqZWN0OiBSZTogT3B0aW1hbCBORlMgbW91bnQgb3B0aW9ucyB0byBzYWZlbHkgYWxs b3cgaW50ZXJydXB0cyBhbmQgdGltZW91dHMgb24gbmV3ZXIga2VybmVscw0KDQoNCk9uIE1hciA2 LCAyMDE0LCBhdCAxNDozMywgQnJpYW4gSGF3bGV5IDxiaGF3bGV5QGx1bWluZXguY29tPiB3cm90 ZToNCg0KPiANCj4gV2UgZG8gY2FsbCBmc3luYyBhdCBzeW5jaHJvbml6YXRpb24gcG9pbnRzLg0K PiANCj4gVGhlIHByb2JsZW0gaXMgdGhlIHdyaXRlKCkgYmxvY2tzIGZvcmV2ZXIgKG9yIGZvciBh biBleGNlcHRpb25hbGx5IGxvbmcgdGltZSBvbiB0aGUgb3JkZXIgb2YgaG91cnMgYW5kIGRheXMp LCBldmVuIHdpdGggdGltZW8gc2V0IHRvIHNheSAyMCBhbmQgcmV0cmFucyBzZXQgdG8gMi4gIFdl IHNlZSB0aW1lb3V0IG1lc3NhZ2VzIGluIC92YXIvbG9nL21lc3NhZ2VzLCBidXQgdGhlIHdyaXRl IGNvbnRpbnVlcyB0byBwZW5kLiAgIFVudGlsIHdlIHN0YXJ0IGRvaW5nIHJlcGVhdGVkIHVtb3Vu dCAtZidzLiAgVGhlbiBpdCByZXR1cm5zIGFuZCBoYXMgYW4gaS9vIGVycm9yLg0KDQpIb3cgbXVj aCBkYXRhIGFyZSB5b3UgdHJ5aW5nIHRvIHN5bmM/IJFzb2Z0kiB3b26SdCB0aW1lIG91dCB0aGUg ZW50aXJlIGJhdGNoIGF0IG9uY2UuIEl0IGZlZWRzIGVhY2ggd3JpdGUgUlBDIGNhbGwgdGhyb3Vn aCwgYW5kIGxldHMgaXQgdGltZSBvdXQuIFNvIGlmIHlvdSBoYXZlIGNhY2hlZCBhIGh1Z2UgYW1v dW50IG9mIHdyaXRlcywgdGhlbiB0aGF0IGNhbiB0YWtlIGEgd2hpbGUuIFRoZSBzb2x1dGlvbiBp cyB0byBwbGF5IHdpdGggdGhlIJFkaXJ0eV9iYWNrZ3JvdW5kX2J5dGVzkiAoYW5kL29yIJFkaXJ0 eV9ieXRlc5IpIHN5c2N0bCBzbyB0aGF0IGl0IHN0YXJ0cyB3cml0ZWJhY2sgYXQgYW4gZWFybGll ciB0aW1lLg0KDQpBbHNvLCB3aGF0IGlzIHRoZSBjYXVzZSBvZiB0aGVzZSBzdGFsbHMgaW4gdGhl IGZpcnN0IHBsYWNlPyBJcyB0aGUgVENQIGNvbm5lY3Rpb24gdG8gdGhlIHNlcnZlciBzdGlsbCB1 cD8gQXJlIGFueSBPb3BzZXMgcHJlc2VudCBpbiBlaXRoZXIgdGhlIGNsaWVudCBvciB0aGUgc2Vy dmVyIHN5c2xvZ3M/DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogVHJv bmQgTXlrbGVidXN0IDx0cm9uZC5teWtsZWJ1c3RAcHJpbWFyeWRhdGEuY29tPg0KPiBEYXRlOiBU aHUsIDYgTWFyIDIwMTQgMTQ6MjY6MjQgDQo+IFRvOiA8Ymhhd2xleUBsdW1pbmV4LmNvbT4NCj4g Q2M6IEFuZHJldyBNYXJ0aW48YW1hcnRpbkB4ZXMtaW5jLmNvbT47IEppbSBSZWVzPHJlZXNAdW1p Y2guZWR1PjsgQnJvd24gTmVpbDxuZWlsYkBzdXNlLmRlPjsgPGxpbnV4LW5mcy1vd25lckB2Z2Vy Lmtlcm5lbC5vcmc+OyA8bGludXgtbmZzQHZnZXIua2VybmVsLm9yZz4NCj4gU3ViamVjdDogUmU6 IE9wdGltYWwgTkZTIG1vdW50IG9wdGlvbnMgdG8gc2FmZWx5IGFsbG93IGludGVycnVwdHMgYW5k IHRpbWVvdXRzIG9uIG5ld2VyIGtlcm5lbHMNCj4gDQo+IA0KPiBPbiBNYXIgNiwgMjAxNCwgYXQg MTQ6MTQsIEJyaWFuIEhhd2xleSA8Ymhhd2xleUBsdW1pbmV4LmNvbT4gd3JvdGU6DQo+IA0KPj4g DQo+PiBUcm9uZCwNCj4+IA0KPj4gSW4gdGhpcyBjYXNlLCBpdCBpc24ndCBmc3luYyBvciBjbG9z ZSB0aGF0IGFyZSBub3QgZ2V0dGluZyB0aGUgaS9vIGVycm9yLiAgSXQgaXMgdGhlIHdyaXRlKCku ICANCj4gDQo+IE15IHBvaW50IGlzIHRoYXQgd3JpdGUoKSBpc26SdCBldmVuIHJlcXVpcmVkIHRv IHJldHVybiBhbiBlcnJvciBpbiB0aGUgY2FzZSB3aGVyZSB5b3VyIE5GUyBzZXJ2ZXIgaXMgdW5h dmFpbGFibGUuIFVubGVzcyB5b3UgdXNlIE9fU1lOQyBvciBPX0RJUkVDVCB3cml0ZXMsIHRoZW4g dGhlIGtlcm5lbCBpcyBlbnRpdGxlZCBhbmQgaW5kZWVkIGV4cGVjdGVkIHRvIGNhY2hlIHRoZSBk YXRhIGluIGl0cyBwYWdlIGNhY2hlIHVudGlsIHlvdSBleHBsaWNpdGx5IGNhbGwgZnN5bmMoKS4g VGhlIHJldHVybiB2YWx1ZSBvZiB0aGF0IGZzeW5jKCkgY2FsbCBpcyB3aGF0IHRlbGxzIHlvdSB3 aGV0aGVyIG9yIG5vdCB5b3VyIGRhdGEgaGFzIHNhZmVseSBiZWVuIHN0b3JlZCB0byBkaXNrLg0K PiANCj4+IEFuZCB3ZSBjaGVjayB0aGUgcmV0dXJuIHZhbHVlIG9mIGV2ZXJ5IGkvbyByZWxhdGVk IGNvbW1hbmQuDQo+IA0KPj4gV2UgYXJlbid0IHVzaW5nIHN5bmNocm9ub3VzIGJlY2F1c2UgdGhl IHBlcmZvcm1hbmNlIGJlY29tZXMgYWJ5c21hbC4NCj4+IA0KPj4gUmVwZWF0ZWQgdW1vdW50IC1m IGRvZXMgZXZlbnR1YWxseSByZXN1bHQgaW4gdGhlIGkvbyBlcnJvciBnZXR0aW5nIHByb3BhZ2F0 ZWQgYmFjayB0byB0aGUgd3JpdGUoKSBjYWxsLiAgIEkgc3VzcGVjdCB0aGUgcmVwZWF0ZWQgdW1v dW50IC1mJ3MgYXJlIHdvcmtpbmcgdGhlaXIgd2F5IHRocm91Z2ggYmxvY2tzIGluIHRoZSBjYWNo ZS9xdWV1ZSBhbmQgZXZlbnR1YWxseSB3ZSBnZXQgYmFjayB0byB0aGUgYmxvY2tlZCB3cml0ZS4g ICAgDQo+PiANCj4+IEFzIEkgbWVudGlvbmVkIHByZXZpb3VzbHksIGlmIHdlIG1vdW50IHdpdGgg c3luYyBvciBkaXJlY3QgaS9vIHR5cGUgb3B0aW9ucywgd2Ugd2lsbCBnZXQgdGhlIGkvbyBlcnJv ciwgYnV0IGZvciBwZXJmb3JtYW5jZSByZWFzb25zLCB0aGlzIGlzbid0IGFuIG9wdGlvbi4NCj4g DQo+IFN1cmUsIGJ1dCBpbiB0aGF0IGNhc2UgeW91IGRvIG5lZWQgdG8gY2FsbCBmc3luYygpIGJl Zm9yZSB0aGUgYXBwbGljYXRpb24gZXhpdHMuIE5vdGhpbmcgZWxzZSBjYW4gZ3VhcmFudGVlIGRh dGEgc3RhYmlsaXR5LCBhbmQgdGhhdJJzIHRydWUgZm9yIGFsbCBzdG9yYWdlLg0KPiANCj4+IC0t LS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQo+PiBGcm9tOiBUcm9uZCBNeWtsZWJ1c3QgPHRyb25k Lm15a2xlYnVzdEBwcmltYXJ5ZGF0YS5jb20+DQo+PiBEYXRlOiBUaHUsIDYgTWFyIDIwMTQgMTQ6 MDY6MjQgDQo+PiBUbzogPGJoYXdsZXlAbHVtaW5leC5jb20+DQo+PiBDYzogQW5kcmV3IE1hcnRp bjxhbWFydGluQHhlcy1pbmMuY29tPjsgSmltIFJlZXM8cmVlc0B1bWljaC5lZHU+OyBCcm93biBO ZWlsPG5laWxiQHN1c2UuZGU+OyA8bGludXgtbmZzLW93bmVyQHZnZXIua2VybmVsLm9yZz47IDxs aW51eC1uZnNAdmdlci5rZXJuZWwub3JnPg0KPj4gU3ViamVjdDogUmU6IE9wdGltYWwgTkZTIG1v dW50IG9wdGlvbnMgdG8gc2FmZWx5IGFsbG93IGludGVycnVwdHMgYW5kIHRpbWVvdXRzIG9uIG5l d2VyIGtlcm5lbHMNCj4+IA0KPj4gDQo+PiBPbiBNYXIgNiwgMjAxNCwgYXQgMTQ6MDAsIEJyaWFu IEhhd2xleSA8Ymhhd2xleUBsdW1pbmV4LmNvbT4gd3JvdGU6DQo+PiANCj4+PiANCj4+PiBFdmVu IHdpdGggc21hbGwgdGltZW8gYW5kIHJldHJhbnMsIHlvdSB3b24ndCBnZXQgaS9vIGVycm9ycyBi YWNrIHRvIHRoZSByZWFkcy93cml0ZXMuICAgVGhhdCdzIGJlZW4gb3VyIGV4cGVyaWVuY2UgYW55 d2F5Lg0KPj4gDQo+PiBSZWFkIGNhY2hpbmcsIGFuZCBidWZmZXJlZCB3cml0ZXMgbWVhbiB0aGF0 IHRoZSBJL08gZXJyb3JzIG9mdGVuIGRvIG5vdCBvY2N1ciBkdXJpbmcgdGhlIHJlYWQoKS93cml0 ZSgpIHN5c3RlbSBjYWxsIGl0c2VsZi4NCj4+IA0KPj4gV2UgZG8gdHJ5IHRvIHByb3BhZ2F0ZSBJ L08gZXJyb3JzIGJhY2sgdG8gdGhlIGFwcGxpY2F0aW9uIGFzIHNvb24gYXMgdGhlIGRvIG9jY3Vy LCBidXQgaWYgdGhhdCBhcHBsaWNhdGlvbiBpc26SdCB1c2luZyBzeW5jaHJvbm91cyBJL08sIGFu ZCBpdCBpc26SdCBjaGVja2luZyB0aGUgcmV0dXJuIHZhbHVlcyBvZiBmc3luYygpIG9yIGNsb3Nl KCksIHRoZW4gdGhlcmUgaXMgbGl0dGxlIHRoZSBrZXJuZWwgY2FuIGRvLi4uDQo+PiANCj4+PiAN Cj4+PiBXaXRoIHNvZnQsIHlvdSBtYXkgZW5kIHVwIHdpdGggbG9zdCBkYXRhIChkYXRhIHRoYXQg aGFkIGFscmVhZHkgYmVlbiB3cml0dGVuIHRvIHRoZSBjYWNoZSBidXQgbm90IHlldCB0byB0aGUg c3RvcmFnZSkuICAgWW91J2QgaGF2ZSB0aGF0IHNhbWUgaXNzdWUgd2l0aCAnaGFyZCcgdG9vIGlm IGl0IHdhcyB5b3VyIGFwcGxpYW5jZSB0aGF0IGZhaWxlZC4gIElmIHRoZSBhcHBsaWFuY2UgbmV2 ZXIgY29tZXMgYmFjaywgdGhvc2UgYmxvY2tzIGNhbiBuZXZlciBiZSB3cml0dGVuLg0KPj4+IA0K Pj4+IEluIHlvdXIgY2FzZSB0aG91Z2gsIHlvdSdyZSBub3Qgd3JpdGluZy4gIA0KPj4+IA0KPj4+ IA0KPj4+IC0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQo+Pj4gRnJvbTogQW5kcmV3IE1hcnRp biA8YW1hcnRpbkB4ZXMtaW5jLmNvbT4NCj4+PiBEYXRlOiBUaHUsIDYgTWFyIDIwMTQgMTA6NDM6 NDIgDQo+Pj4gVG86IEppbSBSZWVzPHJlZXNAdW1pY2guZWR1Pg0KPj4+IENjOiA8Ymhhd2xleUBs dW1pbmV4LmNvbT47IE5laWxCcm93bjxuZWlsYkBzdXNlLmRlPjsgPGxpbnV4LW5mcy1vd25lckB2 Z2VyLmtlcm5lbC5vcmc+OyA8bGludXgtbmZzQHZnZXIua2VybmVsLm9yZz4NCj4+PiBTdWJqZWN0 OiBSZTogT3B0aW1hbCBORlMgbW91bnQgb3B0aW9ucyB0byBzYWZlbHkgYWxsb3cgaW50ZXJydXB0 cyBhbmQNCj4+PiB0aW1lb3V0cyBvbiBuZXdlciBrZXJuZWxzDQo+Pj4gDQo+Pj4+IEZyb206ICJK aW0gUmVlcyIgPHJlZXNAdW1pY2guZWR1Pg0KPj4+PiBBbmRyZXcgTWFydGluIHdyb3RlOg0KPj4+ PiANCj4+Pj4+IEZyb206ICJKaW0gUmVlcyIgPHJlZXNAdW1pY2guZWR1Pg0KPj4+Pj4gR2l2ZW4g dGhpcyBpcyBhcGFjaGUsIEkgdGhpbmsgaWYgSSB3ZXJlIGRvaW5nIHRoaXMgSSdkIHVzZQ0KPj4+ Pj4gcm8sc29mdCxpbnRyLHRjcA0KPj4+Pj4gYW5kIG5vdCB0cnkgdG8gd3JpdGUgYW55dGhpbmcg dG8gbmZzLg0KPj4+PiBJIHdhcyB1c2luZyB0Y3AsYmcsc29mdCxpbnRyIHdoZW4gdGhpcyBwcm9i bGVtIG9jY3VycmVkLiBJIGRvIG5vdCBrbm93IGlmDQo+Pj4+IGFwYWNoZSB3YXMgYXR0ZW1wdGlu ZyB0byBkbyBhIHdyaXRlIG9yIGEgcmVhZCwgYnV0IGl0IHNlZW1zIHRoYXQNCj4+Pj4gdGNwLHNv ZnQsaW50cg0KPj4+PiB3YXMgbm90IHN1ZmZpY2llbnQgdG8gcHJldmVudCB0aGUgcHJvYmxlbS4N Cj4+Pj4gDQo+Pj4+IEkgaGFkIHRoZSBpbXByZXNzaW9uIGZyb20geW91ciBvcmlnaW5hbCBtZXNz YWdlIHRoYXQgeW91IHdlcmUgbm90IHVzaW5nDQo+Pj4+ICJzb2Z0IiBhbmQgd2VyZSBhc2tpbmcg aWYgaXQncyBzYWZlIHRvIHVzZSBpdC4gQXJlIHlvdSBzYXlpbmcgdGhhdCBldmVuIHdpdGgNCj4+ Pj4gdGhlICJzb2Z0IiBvcHRpb24gdGhlIGFwYWNoZSBnZXRzIHN0dWNrIGZvcmV2ZXI/DQo+Pj4g WWVzLCBldmVuIHdpdGggc29mdCwgaXQgZ2V0cyBzdHVjayBmb3JldmVyLiBJIGhhZCBiZWVuIHVz aW5nIHRjcCxiZyxzb2Z0LGludHINCj4+PiB3aGVuIHRoZSBwcm9ibGVtIG9jY3VycmVkIChvbiBz ZXZlcmFsIG9jYXNzaW9ucyksIHNvIG15IG9yaWdpbmFsIHF1ZXN0aW9uIHdhcw0KPj4+IGlmIGl0 IHdvdWxkIGJlIHNhZmUgdG8gdXNlIGEgc21hbGwgdGltZW8gYW5kIHJldHJhbnMgdmFsdWVzIHRv IGhvcGVmdWxseSANCj4+PiByZXR1cm4gSS9PIGVycm9ycyBxdWlja2x5IHRvIHRoZSBhcHBsaWNh dGlvbiwgcmF0aGVyIHRoYW4gYmxvY2tpbmcgZm9yZXZlciANCj4+PiAod2hpY2ggY2F1c2VzIHRo ZSBoaWdoIGxvYWQgYW5kIGluZXZpdGFibGUgcmVib290KS4gSXQgc291bmRzIGxpa2UgdGhhdCBp c24ndA0KPj4+IHNhZmUsIGJ1dCBwZXJoYXBzIHRoZXJlIGlzIGFub3RoZXIgd2F5IHRvIHJlc29s dmUgdGhpcyBwcm9ibGVtPw0KPj4+IC0tDQo+Pj4gVG8gdW5zdWJzY3JpYmUgZnJvbSB0aGlzIGxp c3Q6IHNlbmQgdGhlIGxpbmUgInVuc3Vic2NyaWJlIGxpbnV4LW5mcyIgaW4NCj4+PiB0aGUgYm9k eSBvZiBhIG1lc3NhZ2UgdG8gbWFqb3Jkb21vQHZnZXIua2VybmVsLm9yZw0KPj4+IE1vcmUgbWFq b3Jkb21vIGluZm8gYXQgIGh0dHA6Ly92Z2VyLmtlcm5lbC5vcmcvbWFqb3Jkb21vLWluZm8uaHRt bA0KPj4+IA0KPj4gDQo+PiBfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18NCj4+IFRy b25kIE15a2xlYnVzdA0KPj4gTGludXggTkZTIGNsaWVudCBtYWludGFpbmVyLCBQcmltYXJ5RGF0 YQ0KPj4gdHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbQ0KPj4gDQo+IA0KPiBfX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX18NCj4gVHJvbmQgTXlrbGVidXN0DQo+IExpbnV4IE5G UyBjbGllbnQgbWFpbnRhaW5lciwgUHJpbWFyeURhdGENCj4gdHJvbmQubXlrbGVidXN0QHByaW1h cnlkYXRhLmNvbQ0KPiANCg0KX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fDQpUcm9u ZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lciwgUHJpbWFyeURhdGENCnRy b25kLm15a2xlYnVzdEBwcmltYXJ5ZGF0YS5jb20NCg0KLS0NClRvIHVuc3Vic2NyaWJlIGZyb20g dGhpcyBsaXN0OiBzZW5kIHRoZSBsaW5lICJ1bnN1YnNjcmliZSBsaW51eC1uZnMiIGluDQp0aGUg Ym9keSBvZiBhIG1lc3NhZ2UgdG8gbWFqb3Jkb21vQHZnZXIua2VybmVsLm9yZw0KTW9yZSBtYWpv cmRvbW8gaW5mbyBhdCAgaHR0cDovL3ZnZXIua2VybmVsLm9yZy9tYWpvcmRvbW8taW5mby5odG1s DQoNCg== ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 19:56 ` Brian Hawley @ 2014-03-06 20:31 ` Trond Myklebust 2014-03-06 20:34 ` Brian Hawley 0 siblings, 1 reply; 55+ messages in thread From: Trond Myklebust @ 2014-03-06 20:31 UTC (permalink / raw) To: bhawley; +Cc: linux-nfs-owner, Andrew Martin, Jim Rees, Brown Neil, linux-nfs On Mar 6, 2014, at 14:56, Brian Hawley <bhawley@luminex.com> wrote: > > Given that the systems typically have 16GB's, the memory available for cache is usually around 13GB. > > Dirty writeback centisecs is set to 100, as is dirty expire centisecs (we are primarily a sequential access application). > > Dirty ratio is 50 and dirty background ratio is 10. That means you can have up to 8GB to push out in one go. You can hardly blame NFS for being slow in that situation. Why do you need to cache these writes so aggressively? Is the data being edited and rewritten multiple times in the page cache before you want to push it to disk? > We set these to try to keep the data from cache always being pushed out. > > No oopses. Typically it would be due to an appliance or network connection to it going down. At which point, we want to fail over to an alternative appliance which is serving the same data. > > It's unfortunate that when the i/o error is detected that the other packets can't just timeout right away with the i/o error. After all, it's unlikely to come back, and if it does, you've lost that data that was cached. I'd almost rather have all the i/o's that were cached up to the blocked one fail so I know there was a failure of some of the writes preceeding the one that blocked and got the i/o error. This is the price we pay for using "soft" and it is an expected price. Otherwise, we'd use "hard”. Right, but the RPC layer does not know that these are all writes to the same file, and it can’t be expected to know why the server isn’t replying. For instance, I’ve known a single ‘unlink' RPC call to take 17 minutes to complete on a server that had a lot of cleanup to do on that file; during that time, the server was happy to take RPC requests for other files... > -----Original Message----- > From: Trond Myklebust <trond.myklebust@primarydata.com> > Sender: linux-nfs-owner@vger.kernel.org > Date: Thu, 6 Mar 2014 14:47:48 > To: <bhawley@luminex.com> > Cc: Andrew Martin<amartin@xes-inc.com>; Jim Rees<rees@umich.edu>; Brown Neil<neilb@suse.de>; <linux-nfs-owner@vger.kernel.org>; <linux-nfs@vger.kernel.org> > Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels > > > On Mar 6, 2014, at 14:33, Brian Hawley <bhawley@luminex.com> wrote: > >> >> We do call fsync at synchronization points. >> >> The problem is the write() blocks forever (or for an exceptionally long time on the order of hours and days), even with timeo set to say 20 and retrans set to 2. We see timeout messages in /var/log/messages, but the write continues to pend. Until we start doing repeated umount -f's. Then it returns and has an i/o error. > > How much data are you trying to sync? ‘soft’ won’t time out the entire batch at once. It feeds each write RPC call through, and lets it time out. So if you have cached a huge amount of writes, then that can take a while. The solution is to play with the ‘dirty_background_bytes’ (and/or ‘dirty_bytes’) sysctl so that it starts writeback at an earlier time. > > Also, what is the cause of these stalls in the first place? Is the TCP connection to the server still up? Are any Oopses present in either the client or the server syslogs? > >> -----Original Message----- >> From: Trond Myklebust <trond.myklebust@primarydata.com> >> Date: Thu, 6 Mar 2014 14:26:24 >> To: <bhawley@luminex.com> >> Cc: Andrew Martin<amartin@xes-inc.com>; Jim Rees<rees@umich.edu>; Brown Neil<neilb@suse.de>; <linux-nfs-owner@vger.kernel.org>; <linux-nfs@vger.kernel.org> >> Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels >> >> >> On Mar 6, 2014, at 14:14, Brian Hawley <bhawley@luminex.com> wrote: >> >>> >>> Trond, >>> >>> In this case, it isn't fsync or close that are not getting the i/o error. It is the write(). >> >> My point is that write() isn’t even required to return an error in the case where your NFS server is unavailable. Unless you use O_SYNC or O_DIRECT writes, then the kernel is entitled and indeed expected to cache the data in its page cache until you explicitly call fsync(). The return value of that fsync() call is what tells you whether or not your data has safely been stored to disk. >> >>> And we check the return value of every i/o related command. >> >>> We aren't using synchronous because the performance becomes abysmal. >>> >>> Repeated umount -f does eventually result in the i/o error getting propagated back to the write() call. I suspect the repeated umount -f's are working their way through blocks in the cache/queue and eventually we get back to the blocked write. >>> >>> As I mentioned previously, if we mount with sync or direct i/o type options, we will get the i/o error, but for performance reasons, this isn't an option. >> >> Sure, but in that case you do need to call fsync() before the application exits. Nothing else can guarantee data stability, and that’s true for all storage. >> >>> -----Original Message----- >>> From: Trond Myklebust <trond.myklebust@primarydata.com> >>> Date: Thu, 6 Mar 2014 14:06:24 >>> To: <bhawley@luminex.com> >>> Cc: Andrew Martin<amartin@xes-inc.com>; Jim Rees<rees@umich.edu>; Brown Neil<neilb@suse.de>; <linux-nfs-owner@vger.kernel.org>; <linux-nfs@vger.kernel.org> >>> Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels >>> >>> >>> On Mar 6, 2014, at 14:00, Brian Hawley <bhawley@luminex.com> wrote: >>> >>>> >>>> Even with small timeo and retrans, you won't get i/o errors back to the reads/writes. That's been our experience anyway. >>> >>> Read caching, and buffered writes mean that the I/O errors often do not occur during the read()/write() system call itself. >>> >>> We do try to propagate I/O errors back to the application as soon as the do occur, but if that application isn’t using synchronous I/O, and it isn’t checking the return values of fsync() or close(), then there is little the kernel can do... >>> >>>> >>>> With soft, you may end up with lost data (data that had already been written to the cache but not yet to the storage). You'd have that same issue with 'hard' too if it was your appliance that failed. If the appliance never comes back, those blocks can never be written. >>>> >>>> In your case though, you're not writing. >>>> >>>> >>>> -----Original Message----- >>>> From: Andrew Martin <amartin@xes-inc.com> >>>> Date: Thu, 6 Mar 2014 10:43:42 >>>> To: Jim Rees<rees@umich.edu> >>>> Cc: <bhawley@luminex.com>; NeilBrown<neilb@suse.de>; <linux-nfs-owner@vger.kernel.org>; <linux-nfs@vger.kernel.org> >>>> Subject: Re: Optimal NFS mount options to safely allow interrupts and >>>> timeouts on newer kernels >>>> >>>>> From: "Jim Rees" <rees@umich.edu> >>>>> Andrew Martin wrote: >>>>> >>>>>> From: "Jim Rees" <rees@umich.edu> >>>>>> Given this is apache, I think if I were doing this I'd use >>>>>> ro,soft,intr,tcp >>>>>> and not try to write anything to nfs. >>>>> I was using tcp,bg,soft,intr when this problem occurred. I do not know if >>>>> apache was attempting to do a write or a read, but it seems that >>>>> tcp,soft,intr >>>>> was not sufficient to prevent the problem. >>>>> >>>>> I had the impression from your original message that you were not using >>>>> "soft" and were asking if it's safe to use it. Are you saying that even with >>>>> the "soft" option the apache gets stuck forever? >>>> Yes, even with soft, it gets stuck forever. I had been using tcp,bg,soft,intr >>>> when the problem occurred (on several ocassions), so my original question was >>>> if it would be safe to use a small timeo and retrans values to hopefully >>>> return I/O errors quickly to the application, rather than blocking forever >>>> (which causes the high load and inevitable reboot). It sounds like that isn't >>>> safe, but perhaps there is another way to resolve this problem? >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>> >>> _________________________________ >>> Trond Myklebust >>> Linux NFS client maintainer, PrimaryData >>> trond.myklebust@primarydata.com >>> >> >> _________________________________ >> Trond Myklebust >> Linux NFS client maintainer, PrimaryData >> trond.myklebust@primarydata.com >> > > _________________________________ > Trond Myklebust > Linux NFS client maintainer, PrimaryData > trond.myklebust@primarydata.com > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > _________________________________ Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 20:31 ` Trond Myklebust @ 2014-03-06 20:34 ` Brian Hawley 2014-03-06 20:41 ` Trond Myklebust 0 siblings, 1 reply; 55+ messages in thread From: Brian Hawley @ 2014-03-06 20:34 UTC (permalink / raw) To: Trond Myklebust, linux-nfs-owner, Brian Hawley Cc: Andrew Martin, Jim Rees, Brown Neil, linux-nfs DQpXZSdyZSBub3QgaW50ZW5kaW5nIHRvIGFnZ3Jlc3NpdmVseSBjYWNoZS4gIFRoZXJlIGp1c3Qg aGFwcGVucyB0byBiZSBhIGxvdCBvZiBmcmVlIG1lbW9yeS4gICAgDQoNCg0KLS0tLS1PcmlnaW5h bCBNZXNzYWdlLS0tLS0NCkZyb206IFRyb25kIE15a2xlYnVzdCA8dHJvbmQubXlrbGVidXN0QHBy aW1hcnlkYXRhLmNvbT4NClNlbmRlcjogbGludXgtbmZzLW93bmVyQHZnZXIua2VybmVsLm9yZw0K RGF0ZTogCVRodSwgNiBNYXIgMjAxNCAxNTozMTozMyANClRvOiA8Ymhhd2xleUBsdW1pbmV4LmNv bT4NCkNjOiA8bGludXgtbmZzLW93bmVyQHZnZXIua2VybmVsLm9yZz47IEFuZHJldyBNYXJ0aW48 YW1hcnRpbkB4ZXMtaW5jLmNvbT47IEppbSBSZWVzPHJlZXNAdW1pY2guZWR1PjsgQnJvd24gTmVp bDxuZWlsYkBzdXNlLmRlPjsgPGxpbnV4LW5mc0B2Z2VyLmtlcm5lbC5vcmc+DQpTdWJqZWN0OiBS ZTogT3B0aW1hbCBORlMgbW91bnQgb3B0aW9ucyB0byBzYWZlbHkgYWxsb3cgaW50ZXJydXB0cyBh bmQgdGltZW91dHMgb24gbmV3ZXIga2VybmVscw0KDQoNCk9uIE1hciA2LCAyMDE0LCBhdCAxNDo1 NiwgQnJpYW4gSGF3bGV5IDxiaGF3bGV5QGx1bWluZXguY29tPiB3cm90ZToNCg0KPiANCj4gR2l2 ZW4gdGhhdCB0aGUgc3lzdGVtcyB0eXBpY2FsbHkgaGF2ZSAxNkdCJ3MsIHRoZSBtZW1vcnkgYXZh aWxhYmxlIGZvciBjYWNoZSBpcyB1c3VhbGx5IGFyb3VuZCAxM0dCLg0KPiANCj4gRGlydHkgd3Jp dGViYWNrIGNlbnRpc2VjcyBpcyBzZXQgdG8gMTAwLCBhcyBpcyBkaXJ0eSBleHBpcmUgY2VudGlz ZWNzICh3ZSBhcmUgcHJpbWFyaWx5IGEgc2VxdWVudGlhbCBhY2Nlc3MgYXBwbGljYXRpb24pLg0K PiANCj4gRGlydHkgcmF0aW8gaXMgNTAgYW5kIGRpcnR5IGJhY2tncm91bmQgcmF0aW8gaXMgMTAu IA0KDQpUaGF0IG1lYW5zIHlvdSBjYW4gaGF2ZSB1cCB0byA4R0IgdG8gcHVzaCBvdXQgaW4gb25l IGdvLiBZb3UgY2FuIGhhcmRseSBibGFtZSBORlMgZm9yIGJlaW5nIHNsb3cgaW4gdGhhdCBzaXR1 YXRpb24uDQpXaHkgZG8geW91IG5lZWQgdG8gY2FjaGUgdGhlc2Ugd3JpdGVzIHNvIGFnZ3Jlc3Np dmVseT8gSXMgdGhlIGRhdGEgYmVpbmcgZWRpdGVkIGFuZCByZXdyaXR0ZW4gbXVsdGlwbGUgdGlt ZXMgaW4gdGhlIHBhZ2UgY2FjaGUgYmVmb3JlIHlvdSB3YW50IHRvIHB1c2ggaXQgdG8gZGlzaz8N Cg0KPiBXZSBzZXQgdGhlc2UgdG8gdHJ5IHRvIGtlZXAgdGhlIGRhdGEgZnJvbSBjYWNoZSBhbHdh eXMgYmVpbmcgcHVzaGVkIG91dC4NCj4gDQo+IE5vIG9vcHNlcy4gICBUeXBpY2FsbHkgaXQgd291 bGQgYmUgZHVlIHRvIGFuIGFwcGxpYW5jZSBvciBuZXR3b3JrIGNvbm5lY3Rpb24gdG8gaXQgZ29p bmcgZG93bi4gIEF0IHdoaWNoIHBvaW50LCB3ZSB3YW50IHRvIGZhaWwgb3ZlciB0byBhbiBhbHRl cm5hdGl2ZSBhcHBsaWFuY2Ugd2hpY2ggaXMgc2VydmluZyB0aGUgc2FtZSBkYXRhLiAgICANCj4g DQo+IEl0J3MgdW5mb3J0dW5hdGUgdGhhdCB3aGVuIHRoZSBpL28gZXJyb3IgaXMgZGV0ZWN0ZWQg dGhhdCB0aGUgb3RoZXIgcGFja2V0cyBjYW4ndCBqdXN0IHRpbWVvdXQgcmlnaHQgYXdheSB3aXRo IHRoZSBpL28gZXJyb3IuICAgQWZ0ZXIgYWxsLCBpdCdzIHVubGlrZWx5IHRvIGNvbWUgYmFjaywg YW5kIGlmIGl0IGRvZXMsIHlvdSd2ZSBsb3N0IHRoYXQgZGF0YSB0aGF0IHdhcyBjYWNoZWQuICBJ J2QgYWxtb3N0IHJhdGhlciBoYXZlIGFsbCB0aGUgaS9vJ3MgdGhhdCB3ZXJlIGNhY2hlZCB1cCB0 byB0aGUgYmxvY2tlZCBvbmUgZmFpbCBzbyBJIGtub3cgdGhlcmUgd2FzIGEgZmFpbHVyZSBvZiBz b21lIG9mIHRoZSB3cml0ZXMgcHJlY2VlZGluZyB0aGUgb25lIHRoYXQgYmxvY2tlZCBhbmQgZ290 IHRoZSBpL28gZXJyb3IuICAgIFRoaXMgaXMgdGhlIHByaWNlIHdlIHBheSBmb3IgdXNpbmcgInNv ZnQiIGFuZCBpdCBpcyBhbiBleHBlY3RlZCBwcmljZS4gICBPdGhlcndpc2UsIHdlJ2QgdXNlICJo YXJklC4NCg0KUmlnaHQsIGJ1dCB0aGUgUlBDIGxheWVyIGRvZXMgbm90IGtub3cgdGhhdCB0aGVz ZSBhcmUgYWxsIHdyaXRlcyB0byB0aGUgc2FtZSBmaWxlLCBhbmQgaXQgY2FuknQgYmUgZXhwZWN0 ZWQgdG8ga25vdyB3aHkgdGhlIHNlcnZlciBpc26SdCByZXBseWluZy4gRm9yIGluc3RhbmNlLCBJ knZlIGtub3duIGEgc2luZ2xlIJF1bmxpbmsnIFJQQyBjYWxsIHRvIHRha2UgMTcgbWludXRlcyB0 byBjb21wbGV0ZSBvbiBhIHNlcnZlciB0aGF0IGhhZCBhIGxvdCBvZiBjbGVhbnVwIHRvIGRvIG9u IHRoYXQgZmlsZTsgZHVyaW5nIHRoYXQgdGltZSwgdGhlIHNlcnZlciB3YXMgaGFwcHkgdG8gdGFr ZSBSUEMgcmVxdWVzdHMgZm9yIG90aGVyIGZpbGVzLi4uDQoNCg0KPiAtLS0tLU9yaWdpbmFsIE1l c3NhZ2UtLS0tLQ0KPiBGcm9tOiBUcm9uZCBNeWtsZWJ1c3QgPHRyb25kLm15a2xlYnVzdEBwcmlt YXJ5ZGF0YS5jb20+DQo+IFNlbmRlcjogbGludXgtbmZzLW93bmVyQHZnZXIua2VybmVsLm9yZw0K PiBEYXRlOglUaHUsIDYgTWFyIDIwMTQgMTQ6NDc6NDggDQo+IFRvOiA8Ymhhd2xleUBsdW1pbmV4 LmNvbT4NCj4gQ2M6IEFuZHJldyBNYXJ0aW48YW1hcnRpbkB4ZXMtaW5jLmNvbT47IEppbSBSZWVz PHJlZXNAdW1pY2guZWR1PjsgQnJvd24gTmVpbDxuZWlsYkBzdXNlLmRlPjsgPGxpbnV4LW5mcy1v d25lckB2Z2VyLmtlcm5lbC5vcmc+OyA8bGludXgtbmZzQHZnZXIua2VybmVsLm9yZz4NCj4gU3Vi amVjdDogUmU6IE9wdGltYWwgTkZTIG1vdW50IG9wdGlvbnMgdG8gc2FmZWx5IGFsbG93IGludGVy cnVwdHMgYW5kIHRpbWVvdXRzIG9uIG5ld2VyIGtlcm5lbHMNCj4gDQo+IA0KPiBPbiBNYXIgNiwg MjAxNCwgYXQgMTQ6MzMsIEJyaWFuIEhhd2xleSA8Ymhhd2xleUBsdW1pbmV4LmNvbT4gd3JvdGU6 DQo+IA0KPj4gDQo+PiBXZSBkbyBjYWxsIGZzeW5jIGF0IHN5bmNocm9uaXphdGlvbiBwb2ludHMu DQo+PiANCj4+IFRoZSBwcm9ibGVtIGlzIHRoZSB3cml0ZSgpIGJsb2NrcyBmb3JldmVyIChvciBm b3IgYW4gZXhjZXB0aW9uYWxseSBsb25nIHRpbWUgb24gdGhlIG9yZGVyIG9mIGhvdXJzIGFuZCBk YXlzKSwgZXZlbiB3aXRoIHRpbWVvIHNldCB0byBzYXkgMjAgYW5kIHJldHJhbnMgc2V0IHRvIDIu ICBXZSBzZWUgdGltZW91dCBtZXNzYWdlcyBpbiAvdmFyL2xvZy9tZXNzYWdlcywgYnV0IHRoZSB3 cml0ZSBjb250aW51ZXMgdG8gcGVuZC4gICBVbnRpbCB3ZSBzdGFydCBkb2luZyByZXBlYXRlZCB1 bW91bnQgLWYncy4gIFRoZW4gaXQgcmV0dXJucyBhbmQgaGFzIGFuIGkvbyBlcnJvci4NCj4gDQo+ IEhvdyBtdWNoIGRhdGEgYXJlIHlvdSB0cnlpbmcgdG8gc3luYz8gkXNvZnSSIHdvbpJ0IHRpbWUg b3V0IHRoZSBlbnRpcmUgYmF0Y2ggYXQgb25jZS4gSXQgZmVlZHMgZWFjaCB3cml0ZSBSUEMgY2Fs bCB0aHJvdWdoLCBhbmQgbGV0cyBpdCB0aW1lIG91dC4gU28gaWYgeW91IGhhdmUgY2FjaGVkIGEg aHVnZSBhbW91bnQgb2Ygd3JpdGVzLCB0aGVuIHRoYXQgY2FuIHRha2UgYSB3aGlsZS4gVGhlIHNv bHV0aW9uIGlzIHRvIHBsYXkgd2l0aCB0aGUgkWRpcnR5X2JhY2tncm91bmRfYnl0ZXOSIChhbmQv b3IgkWRpcnR5X2J5dGVzkikgc3lzY3RsIHNvIHRoYXQgaXQgc3RhcnRzIHdyaXRlYmFjayBhdCBh biBlYXJsaWVyIHRpbWUuDQo+IA0KPiBBbHNvLCB3aGF0IGlzIHRoZSBjYXVzZSBvZiB0aGVzZSBz dGFsbHMgaW4gdGhlIGZpcnN0IHBsYWNlPyBJcyB0aGUgVENQIGNvbm5lY3Rpb24gdG8gdGhlIHNl cnZlciBzdGlsbCB1cD8gQXJlIGFueSBPb3BzZXMgcHJlc2VudCBpbiBlaXRoZXIgdGhlIGNsaWVu dCBvciB0aGUgc2VydmVyIHN5c2xvZ3M/DQo+IA0KPj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0t LS0NCj4+IEZyb206IFRyb25kIE15a2xlYnVzdCA8dHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRh LmNvbT4NCj4+IERhdGU6IFRodSwgNiBNYXIgMjAxNCAxNDoyNjoyNCANCj4+IFRvOiA8Ymhhd2xl eUBsdW1pbmV4LmNvbT4NCj4+IENjOiBBbmRyZXcgTWFydGluPGFtYXJ0aW5AeGVzLWluYy5jb20+ OyBKaW0gUmVlczxyZWVzQHVtaWNoLmVkdT47IEJyb3duIE5laWw8bmVpbGJAc3VzZS5kZT47IDxs aW51eC1uZnMtb3duZXJAdmdlci5rZXJuZWwub3JnPjsgPGxpbnV4LW5mc0B2Z2VyLmtlcm5lbC5v cmc+DQo+PiBTdWJqZWN0OiBSZTogT3B0aW1hbCBORlMgbW91bnQgb3B0aW9ucyB0byBzYWZlbHkg YWxsb3cgaW50ZXJydXB0cyBhbmQgdGltZW91dHMgb24gbmV3ZXIga2VybmVscw0KPj4gDQo+PiAN Cj4+IE9uIE1hciA2LCAyMDE0LCBhdCAxNDoxNCwgQnJpYW4gSGF3bGV5IDxiaGF3bGV5QGx1bWlu ZXguY29tPiB3cm90ZToNCj4+IA0KPj4+IA0KPj4+IFRyb25kLA0KPj4+IA0KPj4+IEluIHRoaXMg Y2FzZSwgaXQgaXNuJ3QgZnN5bmMgb3IgY2xvc2UgdGhhdCBhcmUgbm90IGdldHRpbmcgdGhlIGkv byBlcnJvci4gIEl0IGlzIHRoZSB3cml0ZSgpLiAgDQo+PiANCj4+IE15IHBvaW50IGlzIHRoYXQg d3JpdGUoKSBpc26SdCBldmVuIHJlcXVpcmVkIHRvIHJldHVybiBhbiBlcnJvciBpbiB0aGUgY2Fz ZSB3aGVyZSB5b3VyIE5GUyBzZXJ2ZXIgaXMgdW5hdmFpbGFibGUuIFVubGVzcyB5b3UgdXNlIE9f U1lOQyBvciBPX0RJUkVDVCB3cml0ZXMsIHRoZW4gdGhlIGtlcm5lbCBpcyBlbnRpdGxlZCBhbmQg aW5kZWVkIGV4cGVjdGVkIHRvIGNhY2hlIHRoZSBkYXRhIGluIGl0cyBwYWdlIGNhY2hlIHVudGls IHlvdSBleHBsaWNpdGx5IGNhbGwgZnN5bmMoKS4gVGhlIHJldHVybiB2YWx1ZSBvZiB0aGF0IGZz eW5jKCkgY2FsbCBpcyB3aGF0IHRlbGxzIHlvdSB3aGV0aGVyIG9yIG5vdCB5b3VyIGRhdGEgaGFz IHNhZmVseSBiZWVuIHN0b3JlZCB0byBkaXNrLg0KPj4gDQo+Pj4gQW5kIHdlIGNoZWNrIHRoZSBy ZXR1cm4gdmFsdWUgb2YgZXZlcnkgaS9vIHJlbGF0ZWQgY29tbWFuZC4NCj4+IA0KPj4+IFdlIGFy ZW4ndCB1c2luZyBzeW5jaHJvbm91cyBiZWNhdXNlIHRoZSBwZXJmb3JtYW5jZSBiZWNvbWVzIGFi eXNtYWwuDQo+Pj4gDQo+Pj4gUmVwZWF0ZWQgdW1vdW50IC1mIGRvZXMgZXZlbnR1YWxseSByZXN1 bHQgaW4gdGhlIGkvbyBlcnJvciBnZXR0aW5nIHByb3BhZ2F0ZWQgYmFjayB0byB0aGUgd3JpdGUo KSBjYWxsLiAgIEkgc3VzcGVjdCB0aGUgcmVwZWF0ZWQgdW1vdW50IC1mJ3MgYXJlIHdvcmtpbmcg dGhlaXIgd2F5IHRocm91Z2ggYmxvY2tzIGluIHRoZSBjYWNoZS9xdWV1ZSBhbmQgZXZlbnR1YWxs eSB3ZSBnZXQgYmFjayB0byB0aGUgYmxvY2tlZCB3cml0ZS4gICAgDQo+Pj4gDQo+Pj4gQXMgSSBt ZW50aW9uZWQgcHJldmlvdXNseSwgaWYgd2UgbW91bnQgd2l0aCBzeW5jIG9yIGRpcmVjdCBpL28g dHlwZSBvcHRpb25zLCB3ZSB3aWxsIGdldCB0aGUgaS9vIGVycm9yLCBidXQgZm9yIHBlcmZvcm1h bmNlIHJlYXNvbnMsIHRoaXMgaXNuJ3QgYW4gb3B0aW9uLg0KPj4gDQo+PiBTdXJlLCBidXQgaW4g dGhhdCBjYXNlIHlvdSBkbyBuZWVkIHRvIGNhbGwgZnN5bmMoKSBiZWZvcmUgdGhlIGFwcGxpY2F0 aW9uIGV4aXRzLiBOb3RoaW5nIGVsc2UgY2FuIGd1YXJhbnRlZSBkYXRhIHN0YWJpbGl0eSwgYW5k IHRoYXSScyB0cnVlIGZvciBhbGwgc3RvcmFnZS4NCj4+IA0KPj4+IC0tLS0tT3JpZ2luYWwgTWVz c2FnZS0tLS0tDQo+Pj4gRnJvbTogVHJvbmQgTXlrbGVidXN0IDx0cm9uZC5teWtsZWJ1c3RAcHJp bWFyeWRhdGEuY29tPg0KPj4+IERhdGU6IFRodSwgNiBNYXIgMjAxNCAxNDowNjoyNCANCj4+PiBU bzogPGJoYXdsZXlAbHVtaW5leC5jb20+DQo+Pj4gQ2M6IEFuZHJldyBNYXJ0aW48YW1hcnRpbkB4 ZXMtaW5jLmNvbT47IEppbSBSZWVzPHJlZXNAdW1pY2guZWR1PjsgQnJvd24gTmVpbDxuZWlsYkBz dXNlLmRlPjsgPGxpbnV4LW5mcy1vd25lckB2Z2VyLmtlcm5lbC5vcmc+OyA8bGludXgtbmZzQHZn ZXIua2VybmVsLm9yZz4NCj4+PiBTdWJqZWN0OiBSZTogT3B0aW1hbCBORlMgbW91bnQgb3B0aW9u cyB0byBzYWZlbHkgYWxsb3cgaW50ZXJydXB0cyBhbmQgdGltZW91dHMgb24gbmV3ZXIga2VybmVs cw0KPj4+IA0KPj4+IA0KPj4+IE9uIE1hciA2LCAyMDE0LCBhdCAxNDowMCwgQnJpYW4gSGF3bGV5 IDxiaGF3bGV5QGx1bWluZXguY29tPiB3cm90ZToNCj4+PiANCj4+Pj4gDQo+Pj4+IEV2ZW4gd2l0 aCBzbWFsbCB0aW1lbyBhbmQgcmV0cmFucywgeW91IHdvbid0IGdldCBpL28gZXJyb3JzIGJhY2sg dG8gdGhlIHJlYWRzL3dyaXRlcy4gICBUaGF0J3MgYmVlbiBvdXIgZXhwZXJpZW5jZSBhbnl3YXku DQo+Pj4gDQo+Pj4gUmVhZCBjYWNoaW5nLCBhbmQgYnVmZmVyZWQgd3JpdGVzIG1lYW4gdGhhdCB0 aGUgSS9PIGVycm9ycyBvZnRlbiBkbyBub3Qgb2NjdXIgZHVyaW5nIHRoZSByZWFkKCkvd3JpdGUo KSBzeXN0ZW0gY2FsbCBpdHNlbGYuDQo+Pj4gDQo+Pj4gV2UgZG8gdHJ5IHRvIHByb3BhZ2F0ZSBJ L08gZXJyb3JzIGJhY2sgdG8gdGhlIGFwcGxpY2F0aW9uIGFzIHNvb24gYXMgdGhlIGRvIG9jY3Vy LCBidXQgaWYgdGhhdCBhcHBsaWNhdGlvbiBpc26SdCB1c2luZyBzeW5jaHJvbm91cyBJL08sIGFu ZCBpdCBpc26SdCBjaGVja2luZyB0aGUgcmV0dXJuIHZhbHVlcyBvZiBmc3luYygpIG9yIGNsb3Nl KCksIHRoZW4gdGhlcmUgaXMgbGl0dGxlIHRoZSBrZXJuZWwgY2FuIGRvLi4uDQo+Pj4gDQo+Pj4+ IA0KPj4+PiBXaXRoIHNvZnQsIHlvdSBtYXkgZW5kIHVwIHdpdGggbG9zdCBkYXRhIChkYXRhIHRo YXQgaGFkIGFscmVhZHkgYmVlbiB3cml0dGVuIHRvIHRoZSBjYWNoZSBidXQgbm90IHlldCB0byB0 aGUgc3RvcmFnZSkuICAgWW91J2QgaGF2ZSB0aGF0IHNhbWUgaXNzdWUgd2l0aCAnaGFyZCcgdG9v IGlmIGl0IHdhcyB5b3VyIGFwcGxpYW5jZSB0aGF0IGZhaWxlZC4gIElmIHRoZSBhcHBsaWFuY2Ug bmV2ZXIgY29tZXMgYmFjaywgdGhvc2UgYmxvY2tzIGNhbiBuZXZlciBiZSB3cml0dGVuLg0KPj4+ PiANCj4+Pj4gSW4geW91ciBjYXNlIHRob3VnaCwgeW91J3JlIG5vdCB3cml0aW5nLiAgDQo+Pj4+ IA0KPj4+PiANCj4+Pj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4+Pj4gRnJvbTogQW5k cmV3IE1hcnRpbiA8YW1hcnRpbkB4ZXMtaW5jLmNvbT4NCj4+Pj4gRGF0ZTogVGh1LCA2IE1hciAy MDE0IDEwOjQzOjQyIA0KPj4+PiBUbzogSmltIFJlZXM8cmVlc0B1bWljaC5lZHU+DQo+Pj4+IENj OiA8Ymhhd2xleUBsdW1pbmV4LmNvbT47IE5laWxCcm93bjxuZWlsYkBzdXNlLmRlPjsgPGxpbnV4 LW5mcy1vd25lckB2Z2VyLmtlcm5lbC5vcmc+OyA8bGludXgtbmZzQHZnZXIua2VybmVsLm9yZz4N Cj4+Pj4gU3ViamVjdDogUmU6IE9wdGltYWwgTkZTIG1vdW50IG9wdGlvbnMgdG8gc2FmZWx5IGFs bG93IGludGVycnVwdHMgYW5kDQo+Pj4+IHRpbWVvdXRzIG9uIG5ld2VyIGtlcm5lbHMNCj4+Pj4g DQo+Pj4+PiBGcm9tOiAiSmltIFJlZXMiIDxyZWVzQHVtaWNoLmVkdT4NCj4+Pj4+IEFuZHJldyBN YXJ0aW4gd3JvdGU6DQo+Pj4+PiANCj4+Pj4+PiBGcm9tOiAiSmltIFJlZXMiIDxyZWVzQHVtaWNo LmVkdT4NCj4+Pj4+PiBHaXZlbiB0aGlzIGlzIGFwYWNoZSwgSSB0aGluayBpZiBJIHdlcmUgZG9p bmcgdGhpcyBJJ2QgdXNlDQo+Pj4+Pj4gcm8sc29mdCxpbnRyLHRjcA0KPj4+Pj4+IGFuZCBub3Qg dHJ5IHRvIHdyaXRlIGFueXRoaW5nIHRvIG5mcy4NCj4+Pj4+IEkgd2FzIHVzaW5nIHRjcCxiZyxz b2Z0LGludHIgd2hlbiB0aGlzIHByb2JsZW0gb2NjdXJyZWQuIEkgZG8gbm90IGtub3cgaWYNCj4+ Pj4+IGFwYWNoZSB3YXMgYXR0ZW1wdGluZyB0byBkbyBhIHdyaXRlIG9yIGEgcmVhZCwgYnV0IGl0 IHNlZW1zIHRoYXQNCj4+Pj4+IHRjcCxzb2Z0LGludHINCj4+Pj4+IHdhcyBub3Qgc3VmZmljaWVu dCB0byBwcmV2ZW50IHRoZSBwcm9ibGVtLg0KPj4+Pj4gDQo+Pj4+PiBJIGhhZCB0aGUgaW1wcmVz c2lvbiBmcm9tIHlvdXIgb3JpZ2luYWwgbWVzc2FnZSB0aGF0IHlvdSB3ZXJlIG5vdCB1c2luZw0K Pj4+Pj4gInNvZnQiIGFuZCB3ZXJlIGFza2luZyBpZiBpdCdzIHNhZmUgdG8gdXNlIGl0LiBBcmUg eW91IHNheWluZyB0aGF0IGV2ZW4gd2l0aA0KPj4+Pj4gdGhlICJzb2Z0IiBvcHRpb24gdGhlIGFw YWNoZSBnZXRzIHN0dWNrIGZvcmV2ZXI/DQo+Pj4+IFllcywgZXZlbiB3aXRoIHNvZnQsIGl0IGdl dHMgc3R1Y2sgZm9yZXZlci4gSSBoYWQgYmVlbiB1c2luZyB0Y3AsYmcsc29mdCxpbnRyDQo+Pj4+ IHdoZW4gdGhlIHByb2JsZW0gb2NjdXJyZWQgKG9uIHNldmVyYWwgb2Nhc3Npb25zKSwgc28gbXkg b3JpZ2luYWwgcXVlc3Rpb24gd2FzDQo+Pj4+IGlmIGl0IHdvdWxkIGJlIHNhZmUgdG8gdXNlIGEg c21hbGwgdGltZW8gYW5kIHJldHJhbnMgdmFsdWVzIHRvIGhvcGVmdWxseSANCj4+Pj4gcmV0dXJu IEkvTyBlcnJvcnMgcXVpY2tseSB0byB0aGUgYXBwbGljYXRpb24sIHJhdGhlciB0aGFuIGJsb2Nr aW5nIGZvcmV2ZXIgDQo+Pj4+ICh3aGljaCBjYXVzZXMgdGhlIGhpZ2ggbG9hZCBhbmQgaW5ldml0 YWJsZSByZWJvb3QpLiBJdCBzb3VuZHMgbGlrZSB0aGF0IGlzbid0DQo+Pj4+IHNhZmUsIGJ1dCBw ZXJoYXBzIHRoZXJlIGlzIGFub3RoZXIgd2F5IHRvIHJlc29sdmUgdGhpcyBwcm9ibGVtPw0KPj4+ PiAtLQ0KPj4+PiBUbyB1bnN1YnNjcmliZSBmcm9tIHRoaXMgbGlzdDogc2VuZCB0aGUgbGluZSAi dW5zdWJzY3JpYmUgbGludXgtbmZzIiBpbg0KPj4+PiB0aGUgYm9keSBvZiBhIG1lc3NhZ2UgdG8g bWFqb3Jkb21vQHZnZXIua2VybmVsLm9yZw0KPj4+PiBNb3JlIG1ham9yZG9tbyBpbmZvIGF0ICBo dHRwOi8vdmdlci5rZXJuZWwub3JnL21ham9yZG9tby1pbmZvLmh0bWwNCj4+Pj4gDQo+Pj4gDQo+ Pj4gX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fDQo+Pj4gVHJvbmQgTXlrbGVidXN0 DQo+Pj4gTGludXggTkZTIGNsaWVudCBtYWludGFpbmVyLCBQcmltYXJ5RGF0YQ0KPj4+IHRyb25k Lm15a2xlYnVzdEBwcmltYXJ5ZGF0YS5jb20NCj4+PiANCj4+IA0KPj4gX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fDQo+PiBUcm9uZCBNeWtsZWJ1c3QNCj4+IExpbnV4IE5GUyBjbGll bnQgbWFpbnRhaW5lciwgUHJpbWFyeURhdGENCj4+IHRyb25kLm15a2xlYnVzdEBwcmltYXJ5ZGF0 YS5jb20NCj4+IA0KPiANCj4gX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fDQo+IFRy b25kIE15a2xlYnVzdA0KPiBMaW51eCBORlMgY2xpZW50IG1haW50YWluZXIsIFByaW1hcnlEYXRh DQo+IHRyb25kLm15a2xlYnVzdEBwcmltYXJ5ZGF0YS5jb20NCj4gDQo+IC0tDQo+IFRvIHVuc3Vi c2NyaWJlIGZyb20gdGhpcyBsaXN0OiBzZW5kIHRoZSBsaW5lICJ1bnN1YnNjcmliZSBsaW51eC1u ZnMiIGluDQo+IHRoZSBib2R5IG9mIGEgbWVzc2FnZSB0byBtYWpvcmRvbW9Admdlci5rZXJuZWwu b3JnDQo+IE1vcmUgbWFqb3Jkb21vIGluZm8gYXQgIGh0dHA6Ly92Z2VyLmtlcm5lbC5vcmcvbWFq b3Jkb21vLWluZm8uaHRtbA0KPiANCj4gLS0NCj4gVG8gdW5zdWJzY3JpYmUgZnJvbSB0aGlzIGxp c3Q6IHNlbmQgdGhlIGxpbmUgInVuc3Vic2NyaWJlIGxpbnV4LW5mcyIgaW4NCj4gdGhlIGJvZHkg b2YgYSBtZXNzYWdlIHRvIG1ham9yZG9tb0B2Z2VyLmtlcm5lbC5vcmcNCj4gTW9yZSBtYWpvcmRv bW8gaW5mbyBhdCAgaHR0cDovL3ZnZXIua2VybmVsLm9yZy9tYWpvcmRvbW8taW5mby5odG1sDQo+ IA0KDQpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18NClRyb25kIE15a2xlYnVzdA0K TGludXggTkZTIGNsaWVudCBtYWludGFpbmVyLCBQcmltYXJ5RGF0YQ0KdHJvbmQubXlrbGVidXN0 QHByaW1hcnlkYXRhLmNvbQ0KDQotLQ0KVG8gdW5zdWJzY3JpYmUgZnJvbSB0aGlzIGxpc3Q6IHNl bmQgdGhlIGxpbmUgInVuc3Vic2NyaWJlIGxpbnV4LW5mcyIgaW4NCnRoZSBib2R5IG9mIGEgbWVz c2FnZSB0byBtYWpvcmRvbW9Admdlci5rZXJuZWwub3JnDQpNb3JlIG1ham9yZG9tbyBpbmZvIGF0 ICBodHRwOi8vdmdlci5rZXJuZWwub3JnL21ham9yZG9tby1pbmZvLmh0bWwNCg0K ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 20:34 ` Brian Hawley @ 2014-03-06 20:41 ` Trond Myklebust 0 siblings, 0 replies; 55+ messages in thread From: Trond Myklebust @ 2014-03-06 20:41 UTC (permalink / raw) To: bhawley; +Cc: linux-nfs-owner, Andrew Martin, Jim Rees, Brown Neil, linux-nfs On Mar 6, 2014, at 15:34, Brian Hawley <bhawley@luminex.com> wrote: > > We're not intending to aggressively cache. There just happens to be a lot of free memory. > I’d suggest tuning down the ‘dirty_ratio’ to a smaller value. Unless you need to rewrite it, you really are better off pushing the data to storage a little sooner. Then, as I said, try the ‘echo 0 >/proc/sys/sunrpc/rpc_debug’ during one of these hangs in order to find out where the RPC calls are waiting. Also, run that ‘netstat -tn’ to see that there TCP connection to port 2049 on the server is up, and that there are free TCP ports in the range 665-1023. > > -----Original Message----- > From: Trond Myklebust <trond.myklebust@primarydata.com> > Sender: linux-nfs-owner@vger.kernel.org > Date: Thu, 6 Mar 2014 15:31:33 > To: <bhawley@luminex.com> > Cc: <linux-nfs-owner@vger.kernel.org>; Andrew Martin<amartin@xes-inc.com>; Jim Rees<rees@umich.edu>; Brown Neil<neilb@suse.de>; <linux-nfs@vger.kernel.org> > Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels > > > On Mar 6, 2014, at 14:56, Brian Hawley <bhawley@luminex.com> wrote: > >> >> Given that the systems typically have 16GB's, the memory available for cache is usually around 13GB. >> >> Dirty writeback centisecs is set to 100, as is dirty expire centisecs (we are primarily a sequential access application). >> >> Dirty ratio is 50 and dirty background ratio is 10. > > That means you can have up to 8GB to push out in one go. You can hardly blame NFS for being slow in that situation. > Why do you need to cache these writes so aggressively? Is the data being edited and rewritten multiple times in the page cache before you want to push it to disk? > >> We set these to try to keep the data from cache always being pushed out. >> >> No oopses. Typically it would be due to an appliance or network connection to it going down. At which point, we want to fail over to an alternative appliance which is serving the same data. >> >> It's unfortunate that when the i/o error is detected that the other packets can't just timeout right away with the i/o error. After all, it's unlikely to come back, and if it does, you've lost that data that was cached. I'd almost rather have all the i/o's that were cached up to the blocked one fail so I know there was a failure of some of the writes preceeding the one that blocked and got the i/o error. This is the price we pay for using "soft" and it is an expected price. Otherwise, we'd use "hard”. > > Right, but the RPC layer does not know that these are all writes to the same file, and it can’t be expected to know why the server isn’t replying. For instance, I’ve known a single ‘unlink' RPC call to take 17 minutes to complete on a server that had a lot of cleanup to do on that file; during that time, the server was happy to take RPC requests for other files... > > >> -----Original Message----- >> From: Trond Myklebust <trond.myklebust@primarydata.com> >> Sender: linux-nfs-owner@vger.kernel.org >> Date: Thu, 6 Mar 2014 14:47:48 >> To: <bhawley@luminex.com> >> Cc: Andrew Martin<amartin@xes-inc.com>; Jim Rees<rees@umich.edu>; Brown Neil<neilb@suse.de>; <linux-nfs-owner@vger.kernel.org>; <linux-nfs@vger.kernel.org> >> Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels >> >> >> On Mar 6, 2014, at 14:33, Brian Hawley <bhawley@luminex.com> wrote: >> >>> >>> We do call fsync at synchronization points. >>> >>> The problem is the write() blocks forever (or for an exceptionally long time on the order of hours and days), even with timeo set to say 20 and retrans set to 2. We see timeout messages in /var/log/messages, but the write continues to pend. Until we start doing repeated umount -f's. Then it returns and has an i/o error. >> >> How much data are you trying to sync? ‘soft’ won’t time out the entire batch at once. It feeds each write RPC call through, and lets it time out. So if you have cached a huge amount of writes, then that can take a while. The solution is to play with the ‘dirty_background_bytes’ (and/or ‘dirty_bytes’) sysctl so that it starts writeback at an earlier time. >> >> Also, what is the cause of these stalls in the first place? Is the TCP connection to the server still up? Are any Oopses present in either the client or the server syslogs? >> >>> -----Original Message----- >>> From: Trond Myklebust <trond.myklebust@primarydata.com> >>> Date: Thu, 6 Mar 2014 14:26:24 >>> To: <bhawley@luminex.com> >>> Cc: Andrew Martin<amartin@xes-inc.com>; Jim Rees<rees@umich.edu>; Brown Neil<neilb@suse.de>; <linux-nfs-owner@vger.kernel.org>; <linux-nfs@vger.kernel.org> >>> Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels >>> >>> >>> On Mar 6, 2014, at 14:14, Brian Hawley <bhawley@luminex.com> wrote: >>> >>>> >>>> Trond, >>>> >>>> In this case, it isn't fsync or close that are not getting the i/o error. It is the write(). >>> >>> My point is that write() isn’t even required to return an error in the case where your NFS server is unavailable. Unless you use O_SYNC or O_DIRECT writes, then the kernel is entitled and indeed expected to cache the data in its page cache until you explicitly call fsync(). The return value of that fsync() call is what tells you whether or not your data has safely been stored to disk. >>> >>>> And we check the return value of every i/o related command. >>> >>>> We aren't using synchronous because the performance becomes abysmal. >>>> >>>> Repeated umount -f does eventually result in the i/o error getting propagated back to the write() call. I suspect the repeated umount -f's are working their way through blocks in the cache/queue and eventually we get back to the blocked write. >>>> >>>> As I mentioned previously, if we mount with sync or direct i/o type options, we will get the i/o error, but for performance reasons, this isn't an option. >>> >>> Sure, but in that case you do need to call fsync() before the application exits. Nothing else can guarantee data stability, and that’s true for all storage. >>> >>>> -----Original Message----- >>>> From: Trond Myklebust <trond.myklebust@primarydata.com> >>>> Date: Thu, 6 Mar 2014 14:06:24 >>>> To: <bhawley@luminex.com> >>>> Cc: Andrew Martin<amartin@xes-inc.com>; Jim Rees<rees@umich.edu>; Brown Neil<neilb@suse.de>; <linux-nfs-owner@vger.kernel.org>; <linux-nfs@vger.kernel.org> >>>> Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels >>>> >>>> >>>> On Mar 6, 2014, at 14:00, Brian Hawley <bhawley@luminex.com> wrote: >>>> >>>>> >>>>> Even with small timeo and retrans, you won't get i/o errors back to the reads/writes. That's been our experience anyway. >>>> >>>> Read caching, and buffered writes mean that the I/O errors often do not occur during the read()/write() system call itself. >>>> >>>> We do try to propagate I/O errors back to the application as soon as the do occur, but if that application isn’t using synchronous I/O, and it isn’t checking the return values of fsync() or close(), then there is little the kernel can do... >>>> >>>>> >>>>> With soft, you may end up with lost data (data that had already been written to the cache but not yet to the storage). You'd have that same issue with 'hard' too if it was your appliance that failed. If the appliance never comes back, those blocks can never be written. >>>>> >>>>> In your case though, you're not writing. >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Andrew Martin <amartin@xes-inc.com> >>>>> Date: Thu, 6 Mar 2014 10:43:42 >>>>> To: Jim Rees<rees@umich.edu> >>>>> Cc: <bhawley@luminex.com>; NeilBrown<neilb@suse.de>; <linux-nfs-owner@vger.kernel.org>; <linux-nfs@vger.kernel.org> >>>>> Subject: Re: Optimal NFS mount options to safely allow interrupts and >>>>> timeouts on newer kernels >>>>> >>>>>> From: "Jim Rees" <rees@umich.edu> >>>>>> Andrew Martin wrote: >>>>>> >>>>>>> From: "Jim Rees" <rees@umich.edu> >>>>>>> Given this is apache, I think if I were doing this I'd use >>>>>>> ro,soft,intr,tcp >>>>>>> and not try to write anything to nfs. >>>>>> I was using tcp,bg,soft,intr when this problem occurred. I do not know if >>>>>> apache was attempting to do a write or a read, but it seems that >>>>>> tcp,soft,intr >>>>>> was not sufficient to prevent the problem. >>>>>> >>>>>> I had the impression from your original message that you were not using >>>>>> "soft" and were asking if it's safe to use it. Are you saying that even with >>>>>> the "soft" option the apache gets stuck forever? >>>>> Yes, even with soft, it gets stuck forever. I had been using tcp,bg,soft,intr >>>>> when the problem occurred (on several ocassions), so my original question was >>>>> if it would be safe to use a small timeo and retrans values to hopefully >>>>> return I/O errors quickly to the application, rather than blocking forever >>>>> (which causes the high load and inevitable reboot). It sounds like that isn't >>>>> safe, but perhaps there is another way to resolve this problem? >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>> >>>> >>>> _________________________________ >>>> Trond Myklebust >>>> Linux NFS client maintainer, PrimaryData >>>> trond.myklebust@primarydata.com >>>> >>> >>> _________________________________ >>> Trond Myklebust >>> Linux NFS client maintainer, PrimaryData >>> trond.myklebust@primarydata.com >>> >> >> _________________________________ >> Trond Myklebust >> Linux NFS client maintainer, PrimaryData >> trond.myklebust@primarydata.com >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > _________________________________ > Trond Myklebust > Linux NFS client maintainer, PrimaryData > trond.myklebust@primarydata.com > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > _________________________________ Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 19:14 ` Brian Hawley 2014-03-06 19:26 ` Trond Myklebust @ 2014-03-06 19:29 ` Ric Wheeler 2014-03-06 19:38 ` Brian Hawley 1 sibling, 1 reply; 55+ messages in thread From: Ric Wheeler @ 2014-03-06 19:29 UTC (permalink / raw) To: bhawley, Trond Myklebust Cc: Andrew Martin, Jim Rees, Brown Neil, linux-nfs-owner, linux-nfs [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset=windows-1252; format=flowed, Size: 4348 bytes --] On 03/06/2014 09:14 PM, Brian Hawley wrote: > Trond, > > In this case, it isn't fsync or close that are not getting the i/o error. It is the write(). > > And we check the return value of every i/o related command. Checking write() return status means we wrote to the page cache - you must also fsync() that file to push it out to the target. Do that when it counts, leaving data in the page cache until you actually need persistence and your performance should be reasonable. Doing it the safe way is not free, you will see a performance hit (less so if you can do batching, etc). ric > > We aren't using synchronous because the performance becomes abysmal. > > Repeated umount -f does eventually result in the i/o error getting propagated back to the write() call. I suspect the repeated umount -f's are working their way through blocks in the cache/queue and eventually we get back to the blocked write. > > As I mentioned previously, if we mount with sync or direct i/o type options, we will get the i/o error, but for performance reasons, this isn't an option. > > -----Original Message----- > From: Trond Myklebust <trond.myklebust@primarydata.com> > Date: Thu, 6 Mar 2014 14:06:24 > To: <bhawley@luminex.com> > Cc: Andrew Martin<amartin@xes-inc.com>; Jim Rees<rees@umich.edu>; Brown Neil<neilb@suse.de>; <linux-nfs-owner@vger.kernel.org>; <linux-nfs@vger.kernel.org> > Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels > > > On Mar 6, 2014, at 14:00, Brian Hawley <bhawley@luminex.com> wrote: > >> Even with small timeo and retrans, you won't get i/o errors back to the reads/writes. That's been our experience anyway. > Read caching, and buffered writes mean that the I/O errors often do not occur during the read()/write() system call itself. > > We do try to propagate I/O errors back to the application as soon as the do occur, but if that application isnt using synchronous I/O, and it isnt checking the return values of fsync() or close(), then there is little the kernel can do... > >> With soft, you may end up with lost data (data that had already been written to the cache but not yet to the storage). You'd have that same issue with 'hard' too if it was your appliance that failed. If the appliance never comes back, those blocks can never be written. >> >> In your case though, you're not writing. >> >> >> -----Original Message----- >> From: Andrew Martin <amartin@xes-inc.com> >> Date: Thu, 6 Mar 2014 10:43:42 >> To: Jim Rees<rees@umich.edu> >> Cc: <bhawley@luminex.com>; NeilBrown<neilb@suse.de>; <linux-nfs-owner@vger.kernel.org>; <linux-nfs@vger.kernel.org> >> Subject: Re: Optimal NFS mount options to safely allow interrupts and >> timeouts on newer kernels >> >>> From: "Jim Rees" <rees@umich.edu> >>> Andrew Martin wrote: >>> >>>> From: "Jim Rees" <rees@umich.edu> >>>> Given this is apache, I think if I were doing this I'd use >>>> ro,soft,intr,tcp >>>> and not try to write anything to nfs. >>> I was using tcp,bg,soft,intr when this problem occurred. I do not know if >>> apache was attempting to do a write or a read, but it seems that >>> tcp,soft,intr >>> was not sufficient to prevent the problem. >>> >>> I had the impression from your original message that you were not using >>> "soft" and were asking if it's safe to use it. Are you saying that even with >>> the "soft" option the apache gets stuck forever? >> Yes, even with soft, it gets stuck forever. I had been using tcp,bg,soft,intr >> when the problem occurred (on several ocassions), so my original question was >> if it would be safe to use a small timeo and retrans values to hopefully >> return I/O errors quickly to the application, rather than blocking forever >> (which causes the high load and inevitable reboot). It sounds like that isn't >> safe, but perhaps there is another way to resolve this problem? >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > _________________________________ > Trond Myklebust > Linux NFS client maintainer, PrimaryData > trond.myklebust@primarydata.com > > N§²æìr¸yúèØb²X¬¶Ç§vØ^)Þº{.nÇ+·¥{±û"Ø^nr¡ö¦zË\x1aëh¨èÚ&¢ø\x1e®G«éh®\x03(éÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þàþf£¢·h§~mml== ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 19:29 ` Ric Wheeler @ 2014-03-06 19:38 ` Brian Hawley 2014-04-04 18:15 ` Andrew Martin 0 siblings, 1 reply; 55+ messages in thread From: Brian Hawley @ 2014-03-06 19:38 UTC (permalink / raw) To: Ric Wheeler, Brian Hawley, Trond Myklebust Cc: Andrew Martin, Jim Rees, Brown Neil, linux-nfs-owner, linux-nfs DQpJIGFncmVlIGNvbXBsZXRlbHkgdGhhdCB0aGUgd3JpdGUoKSByZXR1cm5pbmcgb25seSBtZWFu cyBpdCdzIGluIHRoZSBwYWdlIGNhY2hlLg0KDQpJIGFncmVlIGNvbXBsZXRlbHkgdGhhdCBmc3lu YygpIHJlc3VsdCBpcyB0aGUgb25seSB3YXkgdG8ga25vdyB5b3VyIGRhdGEgaXMgc2FmZS4NCg0K TmVpdGhlciBvZiB0aG9zZSBpcyB3aGF0IEksIG9yIHRoZSBvcmlnaW5hbCBwb3N0ZXIgKGFuZCB3 aGF0IG90aGVyIHBvc3RlcnMgaW4gdGhlIHBhc3QpIG9uIHRoaXMgc3ViamVjdCBhcmUgZGlzcHV0 aW5nIG9yIGNvbmNlcm5lZCBhYm91dC4NCg0KVGhlIGlzc3VlIGlzLCB0aGUgd3JpdGUoKSBjYWxs IChpbiBteSBjYXNlIC0gcmVhZCgpIGluIHRoZSBvcmlnaW5hbCBwb3N0ZXJzIGNhc2UpIGRvZXMg Tk9UIHJldHVybi4gICANCg0KV2UgYm90aCBleHBlY3QgdGhhdCBhIHNvZnQgbW91bnRlZCBORlMg ZmlsZXN5c3RlbSBzaG91bGQgcHJvcGFnYXRlIGkvbyBlcnJvcnMgYmFjayB0byB0aGUgYXBwbGlj YXRpb24gd2hlbiB0aGUgcmV0cmFucy90aW1lbyBmYWlscyAod2l0aG91dCB0aGUgZmlsZXN5c3Rl bSBiZWluZyBtb3VudGVkIHN5bmMpLiAgIEJ1dCB0aGF0IGRvZXNuJ3QgaGFwcGVuLiAgICBBbmQg dGh1cyB0aGUgYXBwbGljYXRpb24gYmxvY2tzIGluZGVmaW5pdGVseSAob3IgY2VydGFpbmx5IGxv bmdlciB0aGFuIHVzZWZ1bCkuICAgDQoNCldoeSByZXBlYXRlZCB1bW91bnQgLWYncyBldmVudHVh bGx5IGdldCB0aGUgaS9vIGVycm9yIGJhY2sgdG8gdGhlIGNhbGxlciBhbmQgdGh1cyAidW5ibG9j ayIgdGhlIGFwcGxpY2F0aW9uLCBJJ20gbm90IHN1cmUuICAgQnV0IEknZCBndWVzcyBpdCBoYXMg c29tZXRoaW5nIHRvIGRvIHdpdGggaGF2aW5nIHRvIGdldCBlbnRyaWVzIHBlbmRpbmcgdG8gYmUg d3JpdHRlbiBvZmYgdGhlIHF1ZXVlIHVudGlsIGl0IGV2ZW50dWFsbHkgd29ya3MgaXRzIHdheSBi YWNrIHRvIHRoZSBsYXN0IHdyaXRlKCkgdGhhdCBibG9ja2VkIGIvYyB0aGUgY2FjaGUgd2FzIGZ1 bGwgKG9yIHNvbWV0aGluZyBsaWtlIHRoYXQpLg0KDQoNCi0tLS0tT3JpZ2luYWwgTWVzc2FnZS0t LS0tDQpGcm9tOiBSaWMgV2hlZWxlciA8cndoZWVsZXJAcmVkaGF0LmNvbT4NCkRhdGU6IFRodSwg MDYgTWFyIDIwMTQgMjE6Mjk6MTYgDQpUbzogPGJoYXdsZXlAbHVtaW5leC5jb20+OyBUcm9uZCBN eWtsZWJ1c3Q8dHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbT4NCkNjOiBBbmRyZXcgTWFy dGluPGFtYXJ0aW5AeGVzLWluYy5jb20+OyBKaW0gUmVlczxyZWVzQHVtaWNoLmVkdT47IEJyb3du IE5laWw8bmVpbGJAc3VzZS5kZT47IDxsaW51eC1uZnMtb3duZXJAdmdlci5rZXJuZWwub3JnPjsg PGxpbnV4LW5mc0B2Z2VyLmtlcm5lbC5vcmc+DQpTdWJqZWN0OiBSZTogT3B0aW1hbCBORlMgbW91 bnQgb3B0aW9ucyB0byBzYWZlbHkgYWxsb3cgaW50ZXJydXB0cyBhbmQgdGltZW91dHMNCiBvbiBu ZXdlciBrZXJuZWxzDQoNCk9uIDAzLzA2LzIwMTQgMDk6MTQgUE0sIEJyaWFuIEhhd2xleSB3cm90 ZToNCj4gVHJvbmQsDQo+DQo+IEluIHRoaXMgY2FzZSwgaXQgaXNuJ3QgZnN5bmMgb3IgY2xvc2Ug dGhhdCBhcmUgbm90IGdldHRpbmcgdGhlIGkvbyBlcnJvci4gIEl0IGlzIHRoZSB3cml0ZSgpLg0K Pg0KPiBBbmQgd2UgY2hlY2sgdGhlIHJldHVybiB2YWx1ZSBvZiBldmVyeSBpL28gcmVsYXRlZCBj b21tYW5kLg0KDQpDaGVja2luZyB3cml0ZSgpIHJldHVybiBzdGF0dXMgbWVhbnMgd2Ugd3JvdGUg dG8gdGhlIHBhZ2UgY2FjaGUgLSB5b3UgbXVzdCBhbHNvIA0KZnN5bmMoKSB0aGF0IGZpbGUgdG8g cHVzaCBpdCBvdXQgdG8gdGhlIHRhcmdldC4gIERvIHRoYXQgd2hlbiBpdCBjb3VudHMsIGxlYXZp bmcgDQpkYXRhIGluIHRoZSBwYWdlIGNhY2hlIHVudGlsIHlvdSBhY3R1YWxseSBuZWVkIHBlcnNp c3RlbmNlIGFuZCB5b3VyIHBlcmZvcm1hbmNlIA0Kc2hvdWxkIGJlIHJlYXNvbmFibGUuDQoNCkRv aW5nIGl0IHRoZSBzYWZlIHdheSBpcyBub3QgZnJlZSwgeW91IHdpbGwgc2VlIGEgcGVyZm9ybWFu Y2UgaGl0IChsZXNzIHNvIGlmIA0KeW91IGNhbiBkbyBiYXRjaGluZywgZXRjKS4NCg0KcmljDQoN Cj4NCj4gV2UgYXJlbid0IHVzaW5nIHN5bmNocm9ub3VzIGJlY2F1c2UgdGhlIHBlcmZvcm1hbmNl IGJlY29tZXMgYWJ5c21hbC4NCj4NCj4gUmVwZWF0ZWQgdW1vdW50IC1mIGRvZXMgZXZlbnR1YWxs eSByZXN1bHQgaW4gdGhlIGkvbyBlcnJvciBnZXR0aW5nIHByb3BhZ2F0ZWQgYmFjayB0byB0aGUg d3JpdGUoKSBjYWxsLiAgIEkgc3VzcGVjdCB0aGUgcmVwZWF0ZWQgdW1vdW50IC1mJ3MgYXJlIHdv cmtpbmcgdGhlaXIgd2F5IHRocm91Z2ggYmxvY2tzIGluIHRoZSBjYWNoZS9xdWV1ZSBhbmQgZXZl bnR1YWxseSB3ZSBnZXQgYmFjayB0byB0aGUgYmxvY2tlZCB3cml0ZS4NCj4NCj4gQXMgSSBtZW50 aW9uZWQgcHJldmlvdXNseSwgaWYgd2UgbW91bnQgd2l0aCBzeW5jIG9yIGRpcmVjdCBpL28gdHlw ZSBvcHRpb25zLCB3ZSB3aWxsIGdldCB0aGUgaS9vIGVycm9yLCBidXQgZm9yIHBlcmZvcm1hbmNl IHJlYXNvbnMsIHRoaXMgaXNuJ3QgYW4gb3B0aW9uLg0KPg0KPiAtLS0tLU9yaWdpbmFsIE1lc3Nh Z2UtLS0tLQ0KPiBGcm9tOiBUcm9uZCBNeWtsZWJ1c3QgPHRyb25kLm15a2xlYnVzdEBwcmltYXJ5 ZGF0YS5jb20+DQo+IERhdGU6IFRodSwgNiBNYXIgMjAxNCAxNDowNjoyNA0KPiBUbzogPGJoYXds ZXlAbHVtaW5leC5jb20+DQo+IENjOiBBbmRyZXcgTWFydGluPGFtYXJ0aW5AeGVzLWluYy5jb20+ OyBKaW0gUmVlczxyZWVzQHVtaWNoLmVkdT47IEJyb3duIE5laWw8bmVpbGJAc3VzZS5kZT47IDxs aW51eC1uZnMtb3duZXJAdmdlci5rZXJuZWwub3JnPjsgPGxpbnV4LW5mc0B2Z2VyLmtlcm5lbC5v cmc+DQo+IFN1YmplY3Q6IFJlOiBPcHRpbWFsIE5GUyBtb3VudCBvcHRpb25zIHRvIHNhZmVseSBh bGxvdyBpbnRlcnJ1cHRzIGFuZCB0aW1lb3V0cyBvbiBuZXdlciBrZXJuZWxzDQo+DQo+DQo+IE9u IE1hciA2LCAyMDE0LCBhdCAxNDowMCwgQnJpYW4gSGF3bGV5IDxiaGF3bGV5QGx1bWluZXguY29t PiB3cm90ZToNCj4NCj4+IEV2ZW4gd2l0aCBzbWFsbCB0aW1lbyBhbmQgcmV0cmFucywgeW91IHdv bid0IGdldCBpL28gZXJyb3JzIGJhY2sgdG8gdGhlIHJlYWRzL3dyaXRlcy4gICBUaGF0J3MgYmVl biBvdXIgZXhwZXJpZW5jZSBhbnl3YXkuDQo+IFJlYWQgY2FjaGluZywgYW5kIGJ1ZmZlcmVkIHdy aXRlcyBtZWFuIHRoYXQgdGhlIEkvTyBlcnJvcnMgb2Z0ZW4gZG8gbm90IG9jY3VyIGR1cmluZyB0 aGUgcmVhZCgpL3dyaXRlKCkgc3lzdGVtIGNhbGwgaXRzZWxmLg0KPg0KPiBXZSBkbyB0cnkgdG8g cHJvcGFnYXRlIEkvTyBlcnJvcnMgYmFjayB0byB0aGUgYXBwbGljYXRpb24gYXMgc29vbiBhcyB0 aGUgZG8gb2NjdXIsIGJ1dCBpZiB0aGF0IGFwcGxpY2F0aW9uIGlzbpJ0IHVzaW5nIHN5bmNocm9u b3VzIEkvTywgYW5kIGl0IGlzbpJ0IGNoZWNraW5nIHRoZSByZXR1cm4gdmFsdWVzIG9mIGZzeW5j KCkgb3IgY2xvc2UoKSwgdGhlbiB0aGVyZSBpcyBsaXR0bGUgdGhlIGtlcm5lbCBjYW4gZG8uLi4N Cj4NCj4+IFdpdGggc29mdCwgeW91IG1heSBlbmQgdXAgd2l0aCBsb3N0IGRhdGEgKGRhdGEgdGhh dCBoYWQgYWxyZWFkeSBiZWVuIHdyaXR0ZW4gdG8gdGhlIGNhY2hlIGJ1dCBub3QgeWV0IHRvIHRo ZSBzdG9yYWdlKS4gICBZb3UnZCBoYXZlIHRoYXQgc2FtZSBpc3N1ZSB3aXRoICdoYXJkJyB0b28g aWYgaXQgd2FzIHlvdXIgYXBwbGlhbmNlIHRoYXQgZmFpbGVkLiAgSWYgdGhlIGFwcGxpYW5jZSBu ZXZlciBjb21lcyBiYWNrLCB0aG9zZSBibG9ja3MgY2FuIG5ldmVyIGJlIHdyaXR0ZW4uDQo+Pg0K Pj4gSW4geW91ciBjYXNlIHRob3VnaCwgeW91J3JlIG5vdCB3cml0aW5nLg0KPj4NCj4+DQo+PiAt LS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPj4gRnJvbTogQW5kcmV3IE1hcnRpbiA8YW1hcnRp bkB4ZXMtaW5jLmNvbT4NCj4+IERhdGU6IFRodSwgNiBNYXIgMjAxNCAxMDo0Mzo0Mg0KPj4gVG86 IEppbSBSZWVzPHJlZXNAdW1pY2guZWR1Pg0KPj4gQ2M6IDxiaGF3bGV5QGx1bWluZXguY29tPjsg TmVpbEJyb3duPG5laWxiQHN1c2UuZGU+OyA8bGludXgtbmZzLW93bmVyQHZnZXIua2VybmVsLm9y Zz47IDxsaW51eC1uZnNAdmdlci5rZXJuZWwub3JnPg0KPj4gU3ViamVjdDogUmU6IE9wdGltYWwg TkZTIG1vdW50IG9wdGlvbnMgdG8gc2FmZWx5IGFsbG93IGludGVycnVwdHMgYW5kDQo+PiB0aW1l b3V0cyBvbiBuZXdlciBrZXJuZWxzDQo+Pg0KPj4+IEZyb206ICJKaW0gUmVlcyIgPHJlZXNAdW1p Y2guZWR1Pg0KPj4+IEFuZHJldyBNYXJ0aW4gd3JvdGU6DQo+Pj4NCj4+Pj4gRnJvbTogIkppbSBS ZWVzIiA8cmVlc0B1bWljaC5lZHU+DQo+Pj4+IEdpdmVuIHRoaXMgaXMgYXBhY2hlLCBJIHRoaW5r IGlmIEkgd2VyZSBkb2luZyB0aGlzIEknZCB1c2UNCj4+Pj4gcm8sc29mdCxpbnRyLHRjcA0KPj4+ PiBhbmQgbm90IHRyeSB0byB3cml0ZSBhbnl0aGluZyB0byBuZnMuDQo+Pj4gICBJIHdhcyB1c2lu ZyB0Y3AsYmcsc29mdCxpbnRyIHdoZW4gdGhpcyBwcm9ibGVtIG9jY3VycmVkLiBJIGRvIG5vdCBr bm93IGlmDQo+Pj4gICBhcGFjaGUgd2FzIGF0dGVtcHRpbmcgdG8gZG8gYSB3cml0ZSBvciBhIHJl YWQsIGJ1dCBpdCBzZWVtcyB0aGF0DQo+Pj4gICB0Y3Asc29mdCxpbnRyDQo+Pj4gICB3YXMgbm90 IHN1ZmZpY2llbnQgdG8gcHJldmVudCB0aGUgcHJvYmxlbS4NCj4+Pg0KPj4+IEkgaGFkIHRoZSBp bXByZXNzaW9uIGZyb20geW91ciBvcmlnaW5hbCBtZXNzYWdlIHRoYXQgeW91IHdlcmUgbm90IHVz aW5nDQo+Pj4gInNvZnQiIGFuZCB3ZXJlIGFza2luZyBpZiBpdCdzIHNhZmUgdG8gdXNlIGl0LiBB cmUgeW91IHNheWluZyB0aGF0IGV2ZW4gd2l0aA0KPj4+IHRoZSAic29mdCIgb3B0aW9uIHRoZSBh cGFjaGUgZ2V0cyBzdHVjayBmb3JldmVyPw0KPj4gWWVzLCBldmVuIHdpdGggc29mdCwgaXQgZ2V0 cyBzdHVjayBmb3JldmVyLiBJIGhhZCBiZWVuIHVzaW5nIHRjcCxiZyxzb2Z0LGludHINCj4+IHdo ZW4gdGhlIHByb2JsZW0gb2NjdXJyZWQgKG9uIHNldmVyYWwgb2Nhc3Npb25zKSwgc28gbXkgb3Jp Z2luYWwgcXVlc3Rpb24gd2FzDQo+PiBpZiBpdCB3b3VsZCBiZSBzYWZlIHRvIHVzZSBhIHNtYWxs IHRpbWVvIGFuZCByZXRyYW5zIHZhbHVlcyB0byBob3BlZnVsbHkNCj4+IHJldHVybiBJL08gZXJy b3JzIHF1aWNrbHkgdG8gdGhlIGFwcGxpY2F0aW9uLCByYXRoZXIgdGhhbiBibG9ja2luZyBmb3Jl dmVyDQo+PiAod2hpY2ggY2F1c2VzIHRoZSBoaWdoIGxvYWQgYW5kIGluZXZpdGFibGUgcmVib290 KS4gSXQgc291bmRzIGxpa2UgdGhhdCBpc24ndA0KPj4gc2FmZSwgYnV0IHBlcmhhcHMgdGhlcmUg aXMgYW5vdGhlciB3YXkgdG8gcmVzb2x2ZSB0aGlzIHByb2JsZW0/DQo+PiAtLQ0KPj4gVG8gdW5z dWJzY3JpYmUgZnJvbSB0aGlzIGxpc3Q6IHNlbmQgdGhlIGxpbmUgInVuc3Vic2NyaWJlIGxpbnV4 LW5mcyIgaW4NCj4+IHRoZSBib2R5IG9mIGEgbWVzc2FnZSB0byBtYWpvcmRvbW9Admdlci5rZXJu ZWwub3JnDQo+PiBNb3JlIG1ham9yZG9tbyBpbmZvIGF0ICBodHRwOi8vdmdlci5rZXJuZWwub3Jn L21ham9yZG9tby1pbmZvLmh0bWwNCj4+DQo+IF9fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fXw0KPiBUcm9uZCBNeWtsZWJ1c3QNCj4gTGludXggTkZTIGNsaWVudCBtYWludGFpbmVyLCBQ cmltYXJ5RGF0YQ0KPiB0cm9uZC5teWtsZWJ1c3RAcHJpbWFyeWRhdGEuY29tDQo+DQo+IE6Lp7Lm 7HK4m3n66JrYYrJYrLbHp3bYXpYp3rp7Lm7HK4m3pYp7sZ37Ip7YXm6HcqH2pnrLGoHraJmo6K3a JqL4Hq5Hq53paK4DKK3pmo6K3aJqIp36GrYbbaf/74Hq5Hq53paK4P5mo6K3aJqIp36IbW1sPT0N Cg0K ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 19:38 ` Brian Hawley @ 2014-04-04 18:15 ` Andrew Martin 0 siblings, 0 replies; 55+ messages in thread From: Andrew Martin @ 2014-04-04 18:15 UTC (permalink / raw) To: bhawley Cc: Ric Wheeler, Trond Myklebust, Jim Rees, Brown Neil, linux-nfs-owner, linux-nfs Trond, ----- Original Message ----- > From: "Brian Hawley" <bhawley@luminex.com> > To: "Ric Wheeler" <rwheeler@redhat.com>, "Brian Hawley" <bhawley@luminex.com>, "Trond Myklebust" > <trond.myklebust@primarydata.com> > Cc: "Andrew Martin" <amartin@xes-inc.com>, "Jim Rees" <rees@umich.edu>, "Brown Neil" <neilb@suse.de>, > linux-nfs-owner@vger.kernel.org, linux-nfs@vger.kernel.org > Sent: Thursday, March 6, 2014 1:38:15 PM > Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels > > > I agree completely that the write() returning only means it's in the page > cache. > > I agree completely that fsync() result is the only way to know your data is > safe. > > Neither of those is what I, or the original poster (and what other posters in > the past) on this subject are disputing or concerned about. > > The issue is, the write() call (in my case - read() in the original posters > case) does NOT return. Is it possible with the "sync" mount option (or via another method) to force all writes to fsync and fail immediately if they do not succeed? In other words skip the cache? In some applications I'd rather pass the error back up to the application right away for it to handle (even if the error is caused by network turbulence) rather than risk getting into this situation where writes block forever. Thanks, Andrew ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 15:30 ` Andrew Martin 2014-03-06 16:22 ` Jim Rees @ 2014-03-06 18:56 ` Brian Hawley 1 sibling, 0 replies; 55+ messages in thread From: Brian Hawley @ 2014-03-06 18:56 UTC (permalink / raw) To: Andrew Martin, Brian Hawley; +Cc: NeilBrown, linux-nfs-owner, linux-nfs DQpVc2luZyB1bW91bnQgLWYgcmVwZWF0ZWRseSBkaWQgZXZlbnR1YWxseSBnZXQgaS9vIGVycm9y cyBiYWNrIHRvIGFsbCB0aGUgcmVhZC93cml0ZXMuDQoNCkkgdW5kZXJzdGFuZCBSaWMncyBjb21t ZW50IGFib3V0IHVzaW5nIGZzeW5jLCBhbmQgd2UgZG8gaW4gZmFjdCB1c2UgZnN5bmMgYXQgZGF0 YSBzeW5jaHJvbml6YXRpb24gcG9pbnRzIChsaWtlIGNsb3NlLCBzZWVrcywgY2hhbmdlcyBmcm9t IHdyaXRlIHRvIHJlYWQsIGV0YyAtLSBvdXJzIGlzIGEgc2VxdWVudGlhbCBpL28gYXBwbGljYXRp b24pLiAgIEJ1dCBpdCBpcyB0aGVzZSB3cml0ZXMgYW5kIHJlYWRzIHRoYXQgZW5kIHVwIGh1bmcg bW9zdCBvZiB0aGUgdGltZTsgbm90IGFuIGZzeW5jIGNhbGwuICAgSSBzdXNwZWN0IGJlY2F1c2Ug aXQgaXMgdGhlIHdyaXRlcyB0aGF0IGV2ZW50dWFsbHkgZ2V0IHRoZSBjYWNoZS9idWZmZXJzIHRv IHRoZSBwb2ludCB3aGVyZSB0aGF0IHdyaXRlIGhhcyB0byBibG9jayB1bnRpbCB0aGUgY2FjaGUg Z2V0cyBzb21lIGJsb2NrIGZsdXNoZWQgdG8gbWFrZSByb29tLg0KDQotLS0tLU9yaWdpbmFsIE1l c3NhZ2UtLS0tLQ0KRnJvbTogQW5kcmV3IE1hcnRpbiA8YW1hcnRpbkB4ZXMtaW5jLmNvbT4NCkRh dGU6IFRodSwgNiBNYXIgMjAxNCAwOTozMDoyMSANClRvOiA8Ymhhd2xleUBsdW1pbmV4LmNvbT4N CkNjOiBOZWlsQnJvd248bmVpbGJAc3VzZS5kZT47IDxsaW51eC1uZnMtb3duZXJAdmdlci5rZXJu ZWwub3JnPjsgPGxpbnV4LW5mc0B2Z2VyLmtlcm5lbC5vcmc+DQpTdWJqZWN0OiBSZTogT3B0aW1h bCBORlMgbW91bnQgb3B0aW9ucyB0byBzYWZlbHkgYWxsb3cgaW50ZXJydXB0cyBhbmQNCiB0aW1l b3V0cyBvbiBuZXdlciBrZXJuZWxzDQoNCj4gRnJvbTogIkJyaWFuIEhhd2xleSIgPGJoYXdsZXlA bHVtaW5leC5jb20+DQo+IA0KPiBJIGVuZGVkIHVwIHdyaXRpbmcgYSAibWFuYWdlX21vdW50cyIg c2NyaXB0IHJ1biBieSBjcm9uIHRoYXQgY29tcGFyZXMNCj4gL3Byb2MvbW91bnRzIGFuZCB0aGUg ZnN0YWIsIHVzZWQgcGluZywgYW5kICJ0aW1lb3V0IiBtZXNzYWdlcyBpbg0KPiAvdmFyL2xvZy9t ZXNzYWdlcyB0byBpZGVudGlmeSBmaWxlc3lzdGVtcyB0aGF0IGFyZW4ndCByZXNwb25kaW5nLCBy ZXBlYXRlZGx5DQo+IGRvIHVtb3VudCAtZiB0byBmb3JjZSBpL28gZXJyb3JzIGJhY2sgdG8gdGhl IGNhbGxpbmcgYXBwbGljYXRpb25zOyBhbmQgd2hlbg0KPiBtaXNzaW5nIG1vdW50cyAoaW4gZnN0 YWIgYnV0IG5vdCAvcHJvYy9tb3VudHMpIGJ1dCB3ZXJlIG5vdyBwaW5nYWJsZSwNCj4gYXR0ZW1w dCB0byByZW1vdW50IHRoZW0uDQo+IA0KPiANCj4gRm9yIG1lLCB0aW1lbyBhbmQgcmV0cmFucyBh cmUgbmVjZXNzYXJ5LCBidXQgbm90IHN1ZmZpY2llbnQuICBUaGUgY2h1bmtpbmcgdG8NCj4gcnNp emUvd3NpemUgYW5kIGNhY2hpbmcgcGxheXMgYSByb2xlIGluIGhvdyB3ZWxsIGkvbyBlcnJvcnMg Z2V0IHJlbGF5ZWQgYmFjaw0KPiB0byB0aGUgYXBwbGljYXRpb25zIGRvaW5nIHRoZSBpL28uDQo+ IA0KPiBZb3Ugd2lsbCBjZXJ0YWlubHkgbG9zZSBkYXRhIGluIHRoZXNlIHNjZW5hcmlvJ3MuDQo+ IA0KPiBJdCB3b3VsZCBiZSBmYW50YXN0aWMgaWYgc29tZWhvdyB0aGUgdGltZW8gYW5kIHJldHJh bnMgd2VyZSBzdWZmaWNpZW50IChpZQ0KPiB3aGVuIHRoZXkgZmFpbCwgaS9vIGVycm9ycyBnZXQg YmFjayB0byB0aGUgYXBwbGljYXRpb25zIHRoYXQgcXVldWVkIHRoYXQgaS9vDQo+IChvciBldmVu IHRoZSBpL28gdGhhdCBjYXVzZSB0aGUgYXBwbGljYXRpb24gdG8gcGVuZCBiZWNhdXNlIHRoZSBy c2l6ZS93c2l6ZQ0KPiBvciBjYWNoZSB3YXMgZnVsbCkuDQo+IA0KPiBZb3UgY2FuIGVsaW1pbmF0 ZSBzb21lIG9mIHRoYXQgYmVoYXZpb3Igd2l0aCBzeW5jL2RpcmVjdGlvLCBidXQgcGVyZm9ybWFu Y2UNCj4gYmVjb21lcyBhYnlzbWFsLg0KPiANCj4gSSB0cmllZCAibGF6eSIgaXQgZGlkbid0IHBy b3ZpZGUgdGhlIGRlc2lyZWQgZWZmZWN0ICh0aGV5IHVubW91bnRlZCB3aGljaA0KPiBwcmV2ZW50 ZWQgbmV3IGkvbydzOyBidXQgZXhpc3RpbmcgSS9vJ3MgbmV2ZXIgZ290IGVycm9ycykuDQpUaGlz IGlzIHRoZSBwcm9ibGVtIEkgYW0gaGF2aW5nIC0gSSBjYW4gdW5tb3VudCB0aGUgZmlsZXN5c3Rl bSB3aXRoIC1sLCBidXQNCm9uY2UgaXQgaXMgdW5tb3VudGVkIHRoZSBleGlzdGluZyBhcGFjaGUg cHJvY2Vzc2VzIGFyZSBzdGlsbCBzdHVjayBmb3JldmVyLg0KRG9lcyByZXBlYXRlZGx5IHJ1bm5p bmcgInVtb3VudCAtZiIgaW5zdGVhZCBvZiAidW1vdW50IC1sIiBhcyB5b3UgZGVzY3JpYmUNCnJl dHVybiBJL08gZXJyb3JzIGJhY2sgdG8gZXhpc3RpbmcgcHJvY2Vzc2VzIGFuZCBhbGxvdyB0aGVt IHRvIHN0b3A/DQoNCg0KPiBGcm9tOiAiSmltIFJlZXMiIDxyZWVzQHVtaWNoLmVkdT4NCj4gR2l2 ZW4gdGhpcyBpcyBhcGFjaGUsIEkgdGhpbmsgaWYgSSB3ZXJlIGRvaW5nIHRoaXMgSSdkIHVzZSBy byxzb2Z0LGludHIsdGNwDQo+IGFuZCBub3QgdHJ5IHRvIHdyaXRlIGFueXRoaW5nIHRvIG5mcy4N Ckkgd2FzIHVzaW5nIHRjcCxiZyxzb2Z0LGludHIgd2hlbiB0aGlzIHByb2JsZW0gb2NjdXJyZWQu IEkgZG8gbm90IGtub3cgaWYNCmFwYWNoZSB3YXMgYXR0ZW1wdGluZyB0byBkbyBhIHdyaXRlIG9y IGEgcmVhZCwgYnV0IGl0IHNlZW1zIHRoYXQgdGNwLHNvZnQsaW50cg0Kd2FzIG5vdCBzdWZmaWNp ZW50IHRvIHByZXZlbnQgdGhlIHByb2JsZW0uIA0KDQo= ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 5:03 ` Andrew Martin 2014-03-06 5:37 ` NeilBrown @ 2014-03-06 12:34 ` Jim Rees 2014-03-06 15:26 ` Chuck Lever 1 sibling, 1 reply; 55+ messages in thread From: Jim Rees @ 2014-03-06 12:34 UTC (permalink / raw) To: Andrew Martin; +Cc: NeilBrown, linux-nfs Given this is apache, I think if I were doing this I'd use ro,soft,intr,tcp and not try to write anything to nfs. ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 12:34 ` Jim Rees @ 2014-03-06 15:26 ` Chuck Lever 2014-03-06 15:33 ` Trond Myklebust 0 siblings, 1 reply; 55+ messages in thread From: Chuck Lever @ 2014-03-06 15:26 UTC (permalink / raw) To: Andrew Martin; +Cc: Jim Rees, Neil Brown, Linux NFS Mailing List On Mar 6, 2014, at 7:34 AM, Jim Rees <rees@umich.edu> wrote: > Given this is apache, I think if I were doing this I'd use ro,soft,intr,tcp > and not try to write anything to nfs. I agree. A static web page workload should be read-mostly or read-only. The (small) corruption risk with “ro,soft" is that an interrupted read would cause the client to cache incomplete data. Skip “intr” though, it really is a no-op after 2.6.25. If your workload is really ONLY reading files that don’t change often, you might consider “ro,soft,vers=3,nocto”. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 15:26 ` Chuck Lever @ 2014-03-06 15:33 ` Trond Myklebust 2014-03-06 15:59 ` Chuck Lever 0 siblings, 1 reply; 55+ messages in thread From: Trond Myklebust @ 2014-03-06 15:33 UTC (permalink / raw) To: Lever Charles Edward Cc: Andrew Martin, Jim Rees, Brown Neil, Linux NFS Mailing List On Mar 6, 2014, at 10:26, Chuck Lever <chuck.lever@oracle.com> wrote: > > On Mar 6, 2014, at 7:34 AM, Jim Rees <rees@umich.edu> wrote: > >> Given this is apache, I think if I were doing this I'd use ro,soft,intr,tcp >> and not try to write anything to nfs. > > I agree. A static web page workload should be read-mostly or read-only. The (small) corruption risk with “ro,soft" is that an interrupted read would cause the client to cache incomplete data. What? How? If that were the case, we would have a blatant read bug. As I read the current code, _any_ error will cause the page to not be marked as up to date. _________________________________ Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 15:33 ` Trond Myklebust @ 2014-03-06 15:59 ` Chuck Lever 2014-03-06 16:02 ` Trond Myklebust 0 siblings, 1 reply; 55+ messages in thread From: Chuck Lever @ 2014-03-06 15:59 UTC (permalink / raw) To: Trond Myklebust Cc: Andrew Martin, Jim Rees, Neil Brown, Linux NFS Mailing List On Mar 6, 2014, at 10:33 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > > On Mar 6, 2014, at 10:26, Chuck Lever <chuck.lever@oracle.com> wrote: > >> >> On Mar 6, 2014, at 7:34 AM, Jim Rees <rees@umich.edu> wrote: >> >>> Given this is apache, I think if I were doing this I'd use ro,soft,intr,tcp >>> and not try to write anything to nfs. >> >> I agree. A static web page workload should be read-mostly or read-only. The (small) corruption risk with “ro,soft" is that an interrupted read would cause the client to cache incomplete data. > > What? How? If that were the case, we would have a blatant read bug. As I read the current code, _any_ error will cause the page to not be marked as up to date. Agree, the design is sound. But we don’t test this use case very much, so I don’t have 100% confidence that there are no bugs. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 15:59 ` Chuck Lever @ 2014-03-06 16:02 ` Trond Myklebust 2014-03-06 16:13 ` Chuck Lever 0 siblings, 1 reply; 55+ messages in thread From: Trond Myklebust @ 2014-03-06 16:02 UTC (permalink / raw) To: Lever Charles Edward Cc: Andrew Martin, Jim Rees, Brown Neil, Linux NFS Mailing List On Mar 6, 2014, at 10:59, Chuck Lever <chuck.lever@oracle.com> wrote: > > On Mar 6, 2014, at 10:33 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > >> >> On Mar 6, 2014, at 10:26, Chuck Lever <chuck.lever@oracle.com> wrote: >> >>> >>> On Mar 6, 2014, at 7:34 AM, Jim Rees <rees@umich.edu> wrote: >>> >>>> Given this is apache, I think if I were doing this I'd use ro,soft,intr,tcp >>>> and not try to write anything to nfs. >>> >>> I agree. A static web page workload should be read-mostly or read-only. The (small) corruption risk with “ro,soft" is that an interrupted read would cause the client to cache incomplete data. >> >> What? How? If that were the case, we would have a blatant read bug. As I read the current code, _any_ error will cause the page to not be marked as up to date. > > Agree, the design is sound. But we don’t test this use case very much, so I don’t have 100% confidence that there are no bugs. Is that the royal ‘we’, or are you talking on behalf of all the QA departments and testers here? I call bullshit... _________________________________ Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 16:02 ` Trond Myklebust @ 2014-03-06 16:13 ` Chuck Lever 2014-03-06 16:16 ` Trond Myklebust 0 siblings, 1 reply; 55+ messages in thread From: Chuck Lever @ 2014-03-06 16:13 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS Mailing List On Mar 6, 2014, at 11:02 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > > On Mar 6, 2014, at 10:59, Chuck Lever <chuck.lever@oracle.com> wrote: > >> >> On Mar 6, 2014, at 10:33 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: >> >>> >>> On Mar 6, 2014, at 10:26, Chuck Lever <chuck.lever@oracle.com> wrote: >>> >>>> >>>> On Mar 6, 2014, at 7:34 AM, Jim Rees <rees@umich.edu> wrote: >>>> >>>>> Given this is apache, I think if I were doing this I'd use ro,soft,intr,tcp >>>>> and not try to write anything to nfs. >>>> >>>> I agree. A static web page workload should be read-mostly or read-only. The (small) corruption risk with “ro,soft" is that an interrupted read would cause the client to cache incomplete data. >>> >>> What? How? If that were the case, we would have a blatant read bug. As I read the current code, _any_ error will cause the page to not be marked as up to date. >> >> Agree, the design is sound. But we don’t test this use case very much, so I don’t have 100% confidence that there are no bugs. > > Is that the royal ‘we’, or are you talking on behalf of all the QA departments and testers here? I call bullshit… If you want to differ with my opinion, fine. But your tone is not professional or appropriate for a public forum. You need to start treating all of your colleagues with respect, including me. If anyone else had claimed a testing gap, you would have said “If that were the case, we would have a blatant read bug” and left it at that. But you had to go one needless and provocative step further. Stop bullying me, Trond. I’ve had enough of it. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 16:13 ` Chuck Lever @ 2014-03-06 16:16 ` Trond Myklebust 2014-03-06 16:45 ` Chuck Lever 0 siblings, 1 reply; 55+ messages in thread From: Trond Myklebust @ 2014-03-06 16:16 UTC (permalink / raw) To: Lever Charles Edward; +Cc: Linux NFS Mailing List On Mar 6, 2014, at 11:13, Chuck Lever <chuck.lever@oracle.com> wrote: > > On Mar 6, 2014, at 11:02 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > >> >> On Mar 6, 2014, at 10:59, Chuck Lever <chuck.lever@oracle.com> wrote: >> >>> >>> On Mar 6, 2014, at 10:33 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: >>> >>>> >>>> On Mar 6, 2014, at 10:26, Chuck Lever <chuck.lever@oracle.com> wrote: >>>> >>>>> >>>>> On Mar 6, 2014, at 7:34 AM, Jim Rees <rees@umich.edu> wrote: >>>>> >>>>>> Given this is apache, I think if I were doing this I'd use ro,soft,intr,tcp >>>>>> and not try to write anything to nfs. >>>>> >>>>> I agree. A static web page workload should be read-mostly or read-only. The (small) corruption risk with “ro,soft" is that an interrupted read would cause the client to cache incomplete data. >>>> >>>> What? How? If that were the case, we would have a blatant read bug. As I read the current code, _any_ error will cause the page to not be marked as up to date. >>> >>> Agree, the design is sound. But we don’t test this use case very much, so I don’t have 100% confidence that there are no bugs. >> >> Is that the royal ‘we’, or are you talking on behalf of all the QA departments and testers here? I call bullshit… > > If you want to differ with my opinion, fine. But your tone is not professional or appropriate for a public forum. You need to start treating all of your colleagues with respect, including me. > > If anyone else had claimed a testing gap, you would have said “If that were the case, we would have a blatant read bug” and left it at that. But you had to go one needless and provocative step further. > > Stop bullying me, Trond. I’ve had enough of it. The stop spreading FUD. That is far from professional too... _________________________________ Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 16:16 ` Trond Myklebust @ 2014-03-06 16:45 ` Chuck Lever 2014-03-06 17:47 ` Trond Myklebust 0 siblings, 1 reply; 55+ messages in thread From: Chuck Lever @ 2014-03-06 16:45 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS Mailing List On Mar 6, 2014, at 11:16 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > > On Mar 6, 2014, at 11:13, Chuck Lever <chuck.lever@oracle.com> wrote: > >> >> On Mar 6, 2014, at 11:02 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: >> >>> >>> On Mar 6, 2014, at 10:59, Chuck Lever <chuck.lever@oracle.com> wrote: >>> >>>> >>>> On Mar 6, 2014, at 10:33 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: >>>> >>>>> >>>>> On Mar 6, 2014, at 10:26, Chuck Lever <chuck.lever@oracle.com> wrote: >>>>> >>>>>> >>>>>> On Mar 6, 2014, at 7:34 AM, Jim Rees <rees@umich.edu> wrote: >>>>>> >>>>>>> Given this is apache, I think if I were doing this I'd use ro,soft,intr,tcp >>>>>>> and not try to write anything to nfs. >>>>>> >>>>>> I agree. A static web page workload should be read-mostly or read-only. The (small) corruption risk with “ro,soft" is that an interrupted read would cause the client to cache incomplete data. >>>>> >>>>> What? How? If that were the case, we would have a blatant read bug. As I read the current code, _any_ error will cause the page to not be marked as up to date. >>>> >>>> Agree, the design is sound. But we don’t test this use case very much, so I don’t have 100% confidence that there are no bugs. >>> >>> Is that the royal ‘we’, or are you talking on behalf of all the QA departments and testers here? I call bullshit… >> >> If you want to differ with my opinion, fine. But your tone is not professional or appropriate for a public forum. You need to start treating all of your colleagues with respect, including me. >> >> If anyone else had claimed a testing gap, you would have said “If that were the case, we would have a blatant read bug” and left it at that. But you had to go one needless and provocative step further. >> >> Stop bullying me, Trond. I’ve had enough of it. > > The stop spreading FUD. That is far from professional too. FUD is a marketing term, and implies I had intent to deceive. Really? I expressed a technical opinion, with a degree of uncertainty, just like everyone else does. People who ask questions here are free to take our advice or not, based on their own experience. They are adults, they read “IMO” where it is implied. It is absolutely your right to say that I’m incorrect, or to clarify something I said. If you have test data that shows "ro,soft,tcp" cannot possibly cause any version of the Linux NFS client to cache corrupt data, show it, without invective. That is an appropriate response to my remark. Face it, you over-reacted. Again. Knock it off. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 16:45 ` Chuck Lever @ 2014-03-06 17:47 ` Trond Myklebust 2014-03-06 20:38 ` Chuck Lever 0 siblings, 1 reply; 55+ messages in thread From: Trond Myklebust @ 2014-03-06 17:47 UTC (permalink / raw) To: Lever Charles Edward; +Cc: Linux NFS Mailing List On Mar 6, 2014, at 11:45, Chuck Lever <chuck.lever@oracle.com> wrote: > > On Mar 6, 2014, at 11:16 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > >> >> On Mar 6, 2014, at 11:13, Chuck Lever <chuck.lever@oracle.com> wrote: >> >>> >>> On Mar 6, 2014, at 11:02 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: >>> >>>> >>>> On Mar 6, 2014, at 10:59, Chuck Lever <chuck.lever@oracle.com> wrote: >>>> >>>>> >>>>> On Mar 6, 2014, at 10:33 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: >>>>> >>>>>> >>>>>> On Mar 6, 2014, at 10:26, Chuck Lever <chuck.lever@oracle.com> wrote: >>>>>> >>>>>>> >>>>>>> On Mar 6, 2014, at 7:34 AM, Jim Rees <rees@umich.edu> wrote: >>>>>>> >>>>>>>> Given this is apache, I think if I were doing this I'd use ro,soft,intr,tcp >>>>>>>> and not try to write anything to nfs. >>>>>>> >>>>>>> I agree. A static web page workload should be read-mostly or read-only. The (small) corruption risk with “ro,soft" is that an interrupted read would cause the client to cache incomplete data. >>>>>> >>>>>> What? How? If that were the case, we would have a blatant read bug. As I read the current code, _any_ error will cause the page to not be marked as up to date. >>>>> >>>>> Agree, the design is sound. But we don’t test this use case very much, so I don’t have 100% confidence that there are no bugs. >>>> >>>> Is that the royal ‘we’, or are you talking on behalf of all the QA departments and testers here? I call bullshit… >>> >>> If you want to differ with my opinion, fine. But your tone is not professional or appropriate for a public forum. You need to start treating all of your colleagues with respect, including me. >>> >>> If anyone else had claimed a testing gap, you would have said “If that were the case, we would have a blatant read bug” and left it at that. But you had to go one needless and provocative step further. >>> >>> Stop bullying me, Trond. I’ve had enough of it. >> >> The stop spreading FUD. That is far from professional too. > > FUD is a marketing term, and implies I had intent to deceive. Really? > > I expressed a technical opinion, with a degree of uncertainty, just like everyone else does. People who ask questions here are free to take our advice or not, based on their own experience. They are adults, they read “IMO” where it is implied. > > It is absolutely your right to say that I’m incorrect, or to clarify something I said. If you have test data that shows "ro,soft,tcp" cannot possibly cause any version of the Linux NFS client to cache corrupt data, show it, without invective. That is an appropriate response to my remark. > > Face it, you over-reacted. Again. Knock it off. > You clearly don’t know what other people are testing with, and you clearly didn’t ask anyone before you started telling users that 'soft' is untested. I happen to know a server vendor for which _all_ internal QA tests are done using the ‘soft’ mount option on the clients. This is done for practical reasons in order to prevent client hangs if the server should panic. I strongly suspect that other QA departments are testing the ’soft' case too. Acting as if you are an authoritative source on the subject of testing, when you are not and you know that you are not, does constitute intentional deception, yes. …and no, I don’t see anything above to indicate that this was an ‘opinion’ on the subject of what is being tested which is precisely why I called it. There are good reasons to distrust the ‘soft’ mount option, but lack of testing is not it. The general lack of application support for handling the resulting EIO errors is, however... _________________________________ Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com ^ permalink raw reply [flat|nested] 55+ messages in thread
* Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels 2014-03-06 17:47 ` Trond Myklebust @ 2014-03-06 20:38 ` Chuck Lever 0 siblings, 0 replies; 55+ messages in thread From: Chuck Lever @ 2014-03-06 20:38 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS Mailing List On Mar 6, 2014, at 12:47 PM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > > On Mar 6, 2014, at 11:45, Chuck Lever <chuck.lever@oracle.com> wrote: > >> >> On Mar 6, 2014, at 11:16 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: >> >>> >>> On Mar 6, 2014, at 11:13, Chuck Lever <chuck.lever@oracle.com> wrote: >>> >>>> >>>> On Mar 6, 2014, at 11:02 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: >>>> >>>>> >>>>> On Mar 6, 2014, at 10:59, Chuck Lever <chuck.lever@oracle.com> wrote: >>>>> >>>>>> >>>>>> On Mar 6, 2014, at 10:33 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: >>>>>> >>>>>>> >>>>>>> On Mar 6, 2014, at 10:26, Chuck Lever <chuck.lever@oracle.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> On Mar 6, 2014, at 7:34 AM, Jim Rees <rees@umich.edu> wrote: >>>>>>>> >>>>>>>>> Given this is apache, I think if I were doing this I'd use ro,soft,intr,tcp >>>>>>>>> and not try to write anything to nfs. >>>>>>>> >>>>>>>> I agree. A static web page workload should be read-mostly or read-only. The (small) corruption risk with “ro,soft" is that an interrupted read would cause the client to cache incomplete data. >>>>>>> >>>>>>> What? How? If that were the case, we would have a blatant read bug. As I read the current code, _any_ error will cause the page to not be marked as up to date. >>>>>> >>>>>> Agree, the design is sound. But we don’t test this use case very much, so I don’t have 100% confidence that there are no bugs. >>>>> >>>>> Is that the royal ‘we’, or are you talking on behalf of all the QA departments and testers here? I call bullshit… >>>> >>>> If you want to differ with my opinion, fine. But your tone is not professional or appropriate for a public forum. You need to start treating all of your colleagues with respect, including me. >>>> >>>> If anyone else had claimed a testing gap, you would have said “If that were the case, we would have a blatant read bug” and left it at that. But you had to go one needless and provocative step further. >>>> >>>> Stop bullying me, Trond. I’ve had enough of it. >>> >>> The stop spreading FUD. That is far from professional too. >> >> FUD is a marketing term, and implies I had intent to deceive. Really? >> >> I expressed a technical opinion, with a degree of uncertainty, just like everyone else does. People who ask questions here are free to take our advice or not, based on their own experience. They are adults, they read “IMO” where it is implied. >> >> It is absolutely your right to say that I’m incorrect, or to clarify something I said. If you have test data that shows "ro,soft,tcp" cannot possibly cause any version of the Linux NFS client to cache corrupt data, show it, without invective. That is an appropriate response to my remark. >> >> Face it, you over-reacted. Again. Knock it off. >> > > You clearly don’t know what other people are testing with, and you clearly didn’t ask anyone before you started telling users that 'soft' is untested. I suggested in a reply TO YOU that perhaps this use case was untested... > I happen to know a server vendor for which _all_ internal QA tests are done using the ‘soft’ mount option on the clients. This is done for practical reasons in order to prevent client hangs if the server should panic. … and that’s all you needed to say in response. But you have chosen to turn it into a shouting match because you read more into my words than was there. > I strongly suspect that other QA departments are testing the ’soft' case too. “I strongly suspect” means you don’t know for sure either. Clearly Andrew and Brian are reporting a problem here, whether or not it’s related to data corruption, and vendor testing has not found it yet, apparently. I’m not surprised. Testing is difficult, and too often it finds only exactly what you’re looking for. (On the technical issue, just using “soft” does not constitute a robust test. Repeatedly exercising the soft timeout is not the same as having “soft” in play “just in case” the server panics.) > Acting as if you are an authoritative source on the subject of testing, when you are not and you know that you are not, does constitute intentional deception, yes. No-one is "acting like an authority on testing," except maybe you. What possible reason could I have for deceiving anyone about my authority or anything else? Do you understand that calling someone a liar in public is deeply offensive? Do you understand how unnecessarily humiliating your words are? Assuming that you do understand, the level of inappropriate heat here is a sign that you have a long-standing personal issue with me. You seem to always read my words as a challenge to your authority, and that is never what I intend. There is nothing I can do about your mistaken impression of me. > …and no, I don’t see anything above to indicate that this was an ‘opinion’ on the subject of what is being tested which is precisely why I called it. LOL. You “called it” because my claim that testing wasn’t sufficient touched a nerve. Are you really suggesting we all need to add “IMO” and a giant .sig disclaimer to everything we post to this list, or else Trond will swat us with a rolled up newspaper if he doesn’t happen to agree? -- Chuck Lever ^ permalink raw reply [flat|nested] 55+ messages in thread
end of thread, other threads:[~2014-04-04 18:16 UTC | newest]
Thread overview: 55+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1696396609.119284.1394040541217.JavaMail.zimbra@xes-inc.com>
2014-03-05 17:45 ` Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels Andrew Martin
2014-03-05 20:11 ` Jim Rees
2014-03-05 20:41 ` Andrew Martin
2014-03-05 21:11 ` Jim Rees
2014-03-06 3:34 ` NeilBrown
2014-03-06 3:47 ` Jim Rees
2014-03-06 4:37 ` NeilBrown
2014-03-05 20:15 ` Brian Hawley
2014-03-05 20:54 ` Chuck Lever
2014-03-06 9:37 ` Ric Wheeler
2014-03-06 3:50 ` NeilBrown
2014-03-06 5:03 ` Andrew Martin
2014-03-06 5:37 ` NeilBrown
2014-03-06 5:47 ` Brian Hawley
2014-03-06 15:30 ` Andrew Martin
2014-03-06 16:22 ` Jim Rees
2014-03-06 16:43 ` Andrew Martin
2014-03-06 17:36 ` Jim Rees
2014-03-06 18:26 ` Trond Myklebust
2014-03-06 18:35 ` Andrew Martin
2014-03-06 18:48 ` Jim Rees
2014-03-06 19:02 ` Trond Myklebust
2014-03-06 18:50 ` Trond Myklebust
2014-03-06 19:46 ` Andrew Martin
2014-03-06 19:52 ` Trond Myklebust
2014-03-06 20:45 ` Andrew Martin
2014-03-06 21:01 ` Trond Myklebust
2014-03-18 21:50 ` Andrew Martin
2014-03-18 22:27 ` Trond Myklebust
2014-03-28 22:00 ` Dr Fields James Bruce
2014-04-04 18:15 ` Andrew Martin
2014-03-06 19:00 ` Brian Hawley
2014-03-06 19:06 ` Trond Myklebust
2014-03-06 19:14 ` Brian Hawley
2014-03-06 19:26 ` Trond Myklebust
2014-03-06 19:33 ` Brian Hawley
2014-03-06 19:47 ` Trond Myklebust
2014-03-06 19:56 ` Brian Hawley
2014-03-06 20:31 ` Trond Myklebust
2014-03-06 20:34 ` Brian Hawley
2014-03-06 20:41 ` Trond Myklebust
2014-03-06 19:29 ` Ric Wheeler
2014-03-06 19:38 ` Brian Hawley
2014-04-04 18:15 ` Andrew Martin
2014-03-06 18:56 ` Brian Hawley
2014-03-06 12:34 ` Jim Rees
2014-03-06 15:26 ` Chuck Lever
2014-03-06 15:33 ` Trond Myklebust
2014-03-06 15:59 ` Chuck Lever
2014-03-06 16:02 ` Trond Myklebust
2014-03-06 16:13 ` Chuck Lever
2014-03-06 16:16 ` Trond Myklebust
2014-03-06 16:45 ` Chuck Lever
2014-03-06 17:47 ` Trond Myklebust
2014-03-06 20:38 ` Chuck Lever
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).