* linux>=4.10: PUTFH|GETATTR|CLOSE, GETATTR fails, CLOSE not re-issued @ 2017-08-31 17:34 Kjetil Joergensen 2017-09-01 18:44 ` Weston Andros Adamson 0 siblings, 1 reply; 5+ messages in thread From: Kjetil Joergensen @ 2017-08-31 17:34 UTC (permalink / raw) To: linux-nfs; +Cc: Thorvald Natvig Hi, (Now - I do not actually know the specification(s) all that well, so it may be that I've by accident cherry picked the bits that partially turns this into a linux-nfs-client bug, and I'd be more than happy with responses that'd be useful to yell at netapp with). after d8d849835eb2082ea17655538a83fa467633927f (NFSv4: Place the GETATTR operation before the CLOSE). If GETATTR actually fails, CLOSE will never be processed by the server, and it seems the linux nfs client never tries to re-issue CLOSE. We have client A holding file F open, client B goes ahead and unlinks F, at some point client a does PUTFH,GETATTR, for which the server responds NFS4ERR_STALE. Now, client A goes ahead and tries to clean up it's internal state, and sends the server compound PUTFH,GETATTR,CLOSE, for which the server responds with PUTFH(NFS4_OK),GETATTR(NFS4ERR_STALE). Which seems correct in the eyes of RFC7530 section 14.2., which says the server should stop processing the compound when a subop fails. The server has not processed the CLOSE op, and in the case of netapp it appears it keeps holding on to the stateid, waiting for the client to CLOSE it. Judging from tcpdump, the client never attempts to re-issue the CLOSE op that weren't processed. On the server side, the stateid sticks around until we tear down the client completely (umount or re-boot). Over time, this leads the netapp to bleed stateids. Compare this to pre d8d849835eb2082ea17655538a83fa467633927f, the client issues PUTFH,CLOSE,GETATTR. Both PUTFH & CLOSE succeeds, GETATTR as expected still gets NFS4ERR_STALE. The server did however process CLOSE, and retired it's stateid. Cheers, -- Kjetil Joergensen <kjetil@medallia.com> Phone: +1 (650) 739-6580 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: linux>=4.10: PUTFH|GETATTR|CLOSE, GETATTR fails, CLOSE not re-issued 2017-08-31 17:34 linux>=4.10: PUTFH|GETATTR|CLOSE, GETATTR fails, CLOSE not re-issued Kjetil Joergensen @ 2017-09-01 18:44 ` Weston Andros Adamson 2017-09-05 17:51 ` Weston Andros Adamson 0 siblings, 1 reply; 5+ messages in thread From: Weston Andros Adamson @ 2017-09-01 18:44 UTC (permalink / raw) To: Kjetil Joergensen; +Cc: linux-nfs list, Thorvald Natvig, Trond Myklebust Nice analysis! I think post d8d849835eb2082ea17655538a83fa467633927f, we need to retry with a [PUTFH, CLOSE] if the GETATTR fails. The problem as I see it is the GETATTR is tied to the CURRENT_FH, which = is stale for new operations since the file was unlinked, but the CLOSE is = tied to the (CURRENT_FH, open stateid) pair and is not stale because the state id is = still valid. Trond is out on PTO, should be back on or before next Tuesday. The = recent change was his and he might have a better idea how to handle this. -dros > On Aug 31, 2017, at 1:34 PM, Kjetil Joergensen <kjetil@medallia.com> = wrote: >=20 > Hi, >=20 > (Now - I do not actually know the specification(s) all that well, so > it may be that I've by accident cherry picked the bits that partially > turns this into a linux-nfs-client bug, and I'd be more than happy > with responses that'd be useful to yell at netapp with). >=20 > after d8d849835eb2082ea17655538a83fa467633927f (NFSv4: Place the > GETATTR operation before the CLOSE). If GETATTR actually fails, CLOSE > will never be processed by the server, and it seems the linux nfs > client never tries to re-issue CLOSE. >=20 > We have client A holding file F open, client B goes ahead and unlinks > F, at some point client a does PUTFH,GETATTR, for which the server > responds NFS4ERR_STALE. >=20 > Now, client A goes ahead and tries to clean up it's internal state, > and sends the server compound PUTFH,GETATTR,CLOSE, for which the > server responds with PUTFH(NFS4_OK),GETATTR(NFS4ERR_STALE). >=20 > Which seems correct in the eyes of RFC7530 section 14.2., which says > the server should stop processing the compound when a subop fails. >=20 > The server has not processed the CLOSE op, and in the case of netapp > it appears it keeps holding on to the stateid, waiting for the client > to CLOSE it. >=20 > Judging from tcpdump, the client never attempts to re-issue the CLOSE > op that weren't processed. >=20 > On the server side, the stateid sticks around until we tear down the > client completely (umount or re-boot). Over time, this leads the > netapp to bleed stateids. >=20 > Compare this to pre d8d849835eb2082ea17655538a83fa467633927f, the > client issues PUTFH,CLOSE,GETATTR. Both PUTFH & CLOSE succeeds, > GETATTR as expected still gets NFS4ERR_STALE. The server did however > process CLOSE, and retired it's stateid. >=20 > Cheers, >=20 > --=20 > Kjetil Joergensen <kjetil@medallia.com> > Phone: +1 (650) 739-6580 > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" = in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: linux>=4.10: PUTFH|GETATTR|CLOSE, GETATTR fails, CLOSE not re-issued 2017-09-01 18:44 ` Weston Andros Adamson @ 2017-09-05 17:51 ` Weston Andros Adamson 2017-09-05 22:31 ` Kjetil Joergensen 0 siblings, 1 reply; 5+ messages in thread From: Weston Andros Adamson @ 2017-09-05 17:51 UTC (permalink / raw) To: Kjetil Joergensen; +Cc: linux-nfs list, Thorvald Natvig, Trond Myklebust I chatted with Trond about this and he says it's a server bug if an = unlinked file keeps stateids around - the client doesn't need to issue a close in this = case. What version of ONTAP are you running? -dros > On Sep 1, 2017, at 2:44 PM, Weston Andros Adamson <dros@monkey.org> = wrote: >=20 > Nice analysis! I think post d8d849835eb2082ea17655538a83fa467633927f, = we > need to retry with a [PUTFH, CLOSE] if the GETATTR fails. >=20 > The problem as I see it is the GETATTR is tied to the CURRENT_FH, = which is > stale for new operations since the file was unlinked, but the CLOSE is = tied to the > (CURRENT_FH, open stateid) pair and is not stale because the state id = is still > valid. >=20 > Trond is out on PTO, should be back on or before next Tuesday. The = recent change > was his and he might have a better idea how to handle this. >=20 > -dros >=20 >=20 >> On Aug 31, 2017, at 1:34 PM, Kjetil Joergensen <kjetil@medallia.com> = wrote: >>=20 >> Hi, >>=20 >> (Now - I do not actually know the specification(s) all that well, so >> it may be that I've by accident cherry picked the bits that partially >> turns this into a linux-nfs-client bug, and I'd be more than happy >> with responses that'd be useful to yell at netapp with). >>=20 >> after d8d849835eb2082ea17655538a83fa467633927f (NFSv4: Place the >> GETATTR operation before the CLOSE). If GETATTR actually fails, CLOSE >> will never be processed by the server, and it seems the linux nfs >> client never tries to re-issue CLOSE. >>=20 >> We have client A holding file F open, client B goes ahead and = unlinks >> F, at some point client a does PUTFH,GETATTR, for which the server >> responds NFS4ERR_STALE. >>=20 >> Now, client A goes ahead and tries to clean up it's internal state, >> and sends the server compound PUTFH,GETATTR,CLOSE, for which the >> server responds with PUTFH(NFS4_OK),GETATTR(NFS4ERR_STALE). >>=20 >> Which seems correct in the eyes of RFC7530 section 14.2., which says >> the server should stop processing the compound when a subop fails. >>=20 >> The server has not processed the CLOSE op, and in the case of netapp >> it appears it keeps holding on to the stateid, waiting for the client >> to CLOSE it. >>=20 >> Judging from tcpdump, the client never attempts to re-issue the CLOSE >> op that weren't processed. >>=20 >> On the server side, the stateid sticks around until we tear down the >> client completely (umount or re-boot). Over time, this leads the >> netapp to bleed stateids. >>=20 >> Compare this to pre d8d849835eb2082ea17655538a83fa467633927f, the >> client issues PUTFH,CLOSE,GETATTR. Both PUTFH & CLOSE succeeds, >> GETATTR as expected still gets NFS4ERR_STALE. The server did however >> process CLOSE, and retired it's stateid. >>=20 >> Cheers, >>=20 >> --=20 >> Kjetil Joergensen <kjetil@medallia.com> >> Phone: +1 (650) 739-6580 >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" = in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >=20 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: linux>=4.10: PUTFH|GETATTR|CLOSE, GETATTR fails, CLOSE not re-issued 2017-09-05 17:51 ` Weston Andros Adamson @ 2017-09-05 22:31 ` Kjetil Joergensen [not found] ` <8804AFED-986B-4C93-92D4-65899A6F8707@primarydata.com> 0 siblings, 1 reply; 5+ messages in thread From: Kjetil Joergensen @ 2017-09-05 22:31 UTC (permalink / raw) To: Weston Andros Adamson; +Cc: linux-nfs list, Thorvald Natvig, Trond Myklebust Hi, On Tue, Sep 5, 2017 at 10:51 AM, Weston Andros Adamson <dros@monkey.org> wrote: > > I chatted with Trond about this and he says it's a server bug if an unlinked file > keeps stateids around - the client doesn't need to issue a close in this case. We don't disagree that this is a bug with the server, it is after all a rather efficient denial-of-service attack against it (Especially if you don't dismantle your clients all that often). Although, not calling CLOSE under certain circumstances doesn't seem correct. Continuing to cherrypick from RFCs: RFC5661 - 8.2.4. Stateid Lifetime and Validation Stateids must remain valid until either a client restart or a server restart or until the client returns all of the locks associated with the stateid by means of an operation such as CLOSE or DELEGRETURN. If the locks are lost due to revocation, as long as the client ID is valid, the stateid remains a valid designation of that revoked state until the client frees it by using FREE_STATEID. > What version of ONTAP are you running? Version: NetApp Release 8.2.4P6 7-Mode: Wed Jan 11 01:07:08 PST 2017 > > > -dros > > > > On Sep 1, 2017, at 2:44 PM, Weston Andros Adamson <dros@monkey.org> wrote: > > > > Nice analysis! I think post d8d849835eb2082ea17655538a83fa467633927f, we > > need to retry with a [PUTFH, CLOSE] if the GETATTR fails. > > > > The problem as I see it is the GETATTR is tied to the CURRENT_FH, which is > > stale for new operations since the file was unlinked, but the CLOSE is tied to the > > (CURRENT_FH, open stateid) pair and is not stale because the state id is still > > valid. > > > > Trond is out on PTO, should be back on or before next Tuesday. The recent change > > was his and he might have a better idea how to handle this. > > > > -dros > > > > > >> On Aug 31, 2017, at 1:34 PM, Kjetil Joergensen <kjetil@medallia.com> wrote: > >> > >> Hi, > >> > >> (Now - I do not actually know the specification(s) all that well, so > >> it may be that I've by accident cherry picked the bits that partially > >> turns this into a linux-nfs-client bug, and I'd be more than happy > >> with responses that'd be useful to yell at netapp with). > >> > >> after d8d849835eb2082ea17655538a83fa467633927f (NFSv4: Place the > >> GETATTR operation before the CLOSE). If GETATTR actually fails, CLOSE > >> will never be processed by the server, and it seems the linux nfs > >> client never tries to re-issue CLOSE. > >> > >> We have client A holding file F open, client B goes ahead and unlinks > >> F, at some point client a does PUTFH,GETATTR, for which the server > >> responds NFS4ERR_STALE. > >> > >> Now, client A goes ahead and tries to clean up it's internal state, > >> and sends the server compound PUTFH,GETATTR,CLOSE, for which the > >> server responds with PUTFH(NFS4_OK),GETATTR(NFS4ERR_STALE). > >> > >> Which seems correct in the eyes of RFC7530 section 14.2., which says > >> the server should stop processing the compound when a subop fails. > >> > >> The server has not processed the CLOSE op, and in the case of netapp > >> it appears it keeps holding on to the stateid, waiting for the client > >> to CLOSE it. > >> > >> Judging from tcpdump, the client never attempts to re-issue the CLOSE > >> op that weren't processed. > >> > >> On the server side, the stateid sticks around until we tear down the > >> client completely (umount or re-boot). Over time, this leads the > >> netapp to bleed stateids. > >> > >> Compare this to pre d8d849835eb2082ea17655538a83fa467633927f, the > >> client issues PUTFH,CLOSE,GETATTR. Both PUTFH & CLOSE succeeds, > >> GETATTR as expected still gets NFS4ERR_STALE. The server did however > >> process CLOSE, and retired it's stateid. > >> > >> Cheers, > >> > >> -- > >> Kjetil Joergensen <kjetil@medallia.com> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- Kjetil Joergensen <kjetil@medallia.com> SRE, Medallia Inc ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <8804AFED-986B-4C93-92D4-65899A6F8707@primarydata.com>]
* Re: linux>=4.10: PUTFH|GETATTR|CLOSE, GETATTR fails, CLOSE not re-issued [not found] ` <8804AFED-986B-4C93-92D4-65899A6F8707@primarydata.com> @ 2017-09-06 0:05 ` Trond Myklebust 0 siblings, 0 replies; 5+ messages in thread From: Trond Myklebust @ 2017-09-06 0:05 UTC (permalink / raw) To: kjetil@medallia.com Cc: thorvald@medallia.com, dros@monkey.org, linux-nfs@vger.kernel.org T24gVHVlLCAyMDE3LTA5LTA1IGF0IDIyOjQ0ICswMDAwLCBUcm9uZCBNeWtsZWJ1c3Qgd3JvdGU6 DQo+ID4gT24gU2VwIDUsIDIwMTcsIGF0IDE4OjMxLCBLamV0aWwgSm9lcmdlbnNlbiA8a2pldGls QG1lZGFsbGlhLmNvbT4NCj4gPiB3cm90ZToNCj4gPiANCj4gPiBIaSwNCj4gPiANCj4gPiBPbiBU dWUsIFNlcCA1LCAyMDE3IGF0IDEwOjUxIEFNLCBXZXN0b24gQW5kcm9zIEFkYW1zb24gPGRyb3NA bW9ua2V5DQo+ID4gLm9yZz4gd3JvdGU6DQo+ID4gPiBJIGNoYXR0ZWQgd2l0aCBUcm9uZCBhYm91 dCB0aGlzIGFuZCBoZSBzYXlzIGl0J3MgYSBzZXJ2ZXIgYnVnIGlmDQo+ID4gPiBhbiB1bmxpbmtl ZCBmaWxlDQo+ID4gPiBrZWVwcyBzdGF0ZWlkcyBhcm91bmQgLSB0aGUgY2xpZW50IGRvZXNuJ3Qg bmVlZCB0byBpc3N1ZSBhIGNsb3NlDQo+ID4gPiBpbiB0aGlzIGNhc2UuDQo+ID4gIA0KPiA+IFdl IGRvbid0IGRpc2FncmVlIHRoYXQgdGhpcyBpcyBhIGJ1ZyB3aXRoIHRoZSBzZXJ2ZXIsIGl0IGlz IGFmdGVyDQo+ID4gYWxsDQo+ID4gYSByYXRoZXIgZWZmaWNpZW50DQo+ID4gZGVuaWFsLW9mLXNl cnZpY2UgYXR0YWNrIGFnYWluc3QgaXQgKEVzcGVjaWFsbHkgaWYgeW91IGRvbid0DQo+ID4gZGlz bWFudGxlDQo+ID4geW91ciBjbGllbnRzDQo+ID4gYWxsIHRoYXQgb2Z0ZW4pLg0KPiA+IA0KPiA+ IEFsdGhvdWdoLCBub3QgY2FsbGluZyBDTE9TRSB1bmRlciBjZXJ0YWluIGNpcmN1bXN0YW5jZXMg ZG9lc24ndA0KPiA+IHNlZW0gY29ycmVjdC4NCj4gPiANCj4gPiBDb250aW51aW5nIHRvIGNoZXJy eXBpY2sgZnJvbSBSRkNzOg0KPiA+IA0KPiA+IFJGQzU2NjEgLSA4LjIuNC4gIFN0YXRlaWQgTGlm ZXRpbWUgYW5kIFZhbGlkYXRpb24NCj4gPiAgIFN0YXRlaWRzIG11c3QgcmVtYWluIHZhbGlkIHVu dGlsIGVpdGhlciBhIGNsaWVudCByZXN0YXJ0IG9yIGENCj4gPiBzZXJ2ZXINCj4gPiAgIHJlc3Rh cnQgb3IgdW50aWwgdGhlIGNsaWVudCByZXR1cm5zIGFsbCBvZiB0aGUgbG9ja3MgYXNzb2NpYXRl ZA0KPiA+IHdpdGgNCj4gPiAgIHRoZSBzdGF0ZWlkIGJ5IG1lYW5zIG9mIGFuIG9wZXJhdGlvbiBz dWNoIGFzIENMT1NFIG9yDQo+ID4gREVMRUdSRVRVUk4uDQo+ID4gICBJZiB0aGUgbG9ja3MgYXJl IGxvc3QgZHVlIHRvIHJldm9jYXRpb24sIGFzIGxvbmcgYXMgdGhlIGNsaWVudCBJRA0KPiA+IGlz DQo+ID4gICB2YWxpZCwgdGhlIHN0YXRlaWQgcmVtYWlucyBhIHZhbGlkIGRlc2lnbmF0aW9uIG9m IHRoYXQgcmV2b2tlZA0KPiA+IHN0YXRlDQo+ID4gICB1bnRpbCB0aGUgY2xpZW50IGZyZWVzIGl0 IGJ5IHVzaW5nIEZSRUVfU1RBVEVJRC4NCj4gPiANCj4gPiA+IFdoYXQgdmVyc2lvbiBvZiBPTlRB UCBhcmUgeW91IHJ1bm5pbmc/DQo+ID4gIA0KPiA+IFZlcnNpb246IE5ldEFwcCBSZWxlYXNlIDgu Mi40UDYgNy1Nb2RlOiBXZWQgSmFuIDExIDAxOjA3OjA4IFBTVA0KPiA+IDIwMTcNCj4gPiANCj4g PiANCj4gDQo+IFdl4oCZcmUgbm90IGZpeGluZyBhbnkgc2VydmVyIGJ1Z3Mgb24gdGhlIGNsaWVu dCwgYW5kIHRoaXMgaXMNCj4gZGVmaW5pdGVseSBhIHNlcnZlciBidWcuIFlvdSBjYW7igJl0IGhh dmUgc3RhdGUgYXNzb2NpYXRlZCB3aXRoIGEgbm9uLQ0KPiBleGlzdGVudCBvciBjb21wbGV0ZWx5 IGluYWNjZXNzaWJsZSBmaWxlLg0KPiANCg0KQ29uY2VybmluZyB5b3VyIHF1b3RlIHRoZXJlIGFi b3V0IEZSRUVfU1RBVEVJRCwgdGhhdCBoYXMgbm90aGluZyB0byBkbw0Kd2l0aCBkZWxldGVkIGZp bGVzLiBJdCBpcyBhIG1lY2hhbmlzbSB0byBhbGxvdyB0aGUgc2VydmVyIHRvIHNhZmVseQ0KY2Fj aGUgb3BlbiBzdGF0ZSBpbiB0aGUgcGFydGljdWxhciBjYXNlIHdoZXJlIGEgbmV0d29yayBwYXJ0 aXRpb24NCnByZXZlbnRzIHRoZSBjbGllbnQgZnJvbSByZW5ld2luZyBpdHMgbGVhc2UuIFRoZXJl IGlzIG5vdGhpbmcgdGhhdCBzYXlzDQppdCBhcHBsaWVzIHRvIGRlbGV0ZWQgZmlsZXMsIGFuZCBu b3IgaXMgdGhlcmUgYW55IHJlYXNvbiB3aHkgd2Ugd291bGQNCndhbnQgdG8gY2FjaGUgb3BlbiBv ciBsb2NrIHN0YXRlIGluIGEgY2FzZSB3aGVyZSB0aGUgZmlsZWhhbmRsZSBpcw0Kc3RhbGUuDQoN Ci0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50YWluZXIsIFByaW1h cnlEYXRhDQp0cm9uZC5teWtsZWJ1c3RAcHJpbWFyeWRhdGEuY29tDQo= ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-09-06 0:05 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-31 17:34 linux>=4.10: PUTFH|GETATTR|CLOSE, GETATTR fails, CLOSE not re-issued Kjetil Joergensen
2017-09-01 18:44 ` Weston Andros Adamson
2017-09-05 17:51 ` Weston Andros Adamson
2017-09-05 22:31 ` Kjetil Joergensen
[not found] ` <8804AFED-986B-4C93-92D4-65899A6F8707@primarydata.com>
2017-09-06 0:05 ` Trond Myklebust
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).