* [PATCH 1/2] NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE @ 2016-12-05 0:40 Trond Myklebust 2016-12-05 0:40 ` [PATCH 2/2] NFSv4.1: Don't schedule lease recovery in nfs4_schedule_session_recovery() Trond Myklebust 2017-02-15 20:16 ` [PATCH 1/2] NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE Olga Kornievskaia 0 siblings, 2 replies; 16+ messages in thread From: Trond Myklebust @ 2016-12-05 0:40 UTC (permalink / raw) To: linux-nfs In the case where SEQUENCE receives a NFS4ERR_BADSESSION or NFS4ERR_DEADSESSION error, we just want to report the session as needing recovery, and then we want to retry the operation. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/nfs4proc.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index f992281c9b29..4fd0e2b7b08e 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -816,6 +816,10 @@ static int nfs41_sequence_process(struct rpc_task *task, case -NFS4ERR_SEQ_FALSE_RETRY: ++slot->seq_nr; goto retry_nowait; + case -NFS4ERR_DEADSESSION: + case -NFS4ERR_BADSESSION: + nfs4_schedule_session_recovery(session, res->sr_status); + goto retry_nowait; default: /* Just update the slot sequence no. */ slot->seq_done = 1; -- 2.9.3 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/2] NFSv4.1: Don't schedule lease recovery in nfs4_schedule_session_recovery() 2016-12-05 0:40 [PATCH 1/2] NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE Trond Myklebust @ 2016-12-05 0:40 ` Trond Myklebust 2017-02-15 20:16 ` [PATCH 1/2] NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE Olga Kornievskaia 1 sibling, 0 replies; 16+ messages in thread From: Trond Myklebust @ 2016-12-05 0:40 UTC (permalink / raw) To: linux-nfs If the session has an error, then we want to start by recovering the session, as any SEQUENCE we send is going to fail with a session error. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> --- fs/nfs/nfs4state.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index 26b6b8b0cae3..95baf7d340f0 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -2193,7 +2193,7 @@ void nfs4_schedule_session_recovery(struct nfs4_session *session, int err) case -NFS4ERR_CONN_NOT_BOUND_TO_SESSION: set_bit(NFS4CLNT_BIND_CONN_TO_SESSION, &clp->cl_state); } - nfs4_schedule_lease_recovery(clp); + nfs4_schedule_state_manager(clp); } EXPORT_SYMBOL_GPL(nfs4_schedule_session_recovery); -- 2.9.3 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE 2016-12-05 0:40 [PATCH 1/2] NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE Trond Myklebust 2016-12-05 0:40 ` [PATCH 2/2] NFSv4.1: Don't schedule lease recovery in nfs4_schedule_session_recovery() Trond Myklebust @ 2017-02-15 20:16 ` Olga Kornievskaia 2017-02-15 20:48 ` Trond Myklebust 1 sibling, 1 reply; 16+ messages in thread From: Olga Kornievskaia @ 2017-02-15 20:16 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs On Sun, Dec 4, 2016 at 7:40 PM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > In the case where SEQUENCE receives a NFS4ERR_BADSESSION or > NFS4ERR_DEADSESSION error, we just want to report the session as needing > recovery, and then we want to retry the operation. > > Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> > --- > fs/nfs/nfs4proc.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c > index f992281c9b29..4fd0e2b7b08e 100644 > --- a/fs/nfs/nfs4proc.c > +++ b/fs/nfs/nfs4proc.c > @@ -816,6 +816,10 @@ static int nfs41_sequence_process(struct rpc_task *task, > case -NFS4ERR_SEQ_FALSE_RETRY: > ++slot->seq_nr; > goto retry_nowait; > + case -NFS4ERR_DEADSESSION: > + case -NFS4ERR_BADSESSION: > + nfs4_schedule_session_recovery(session, res->sr_status); > + goto retry_nowait; > default: > /* Just update the slot sequence no. */ > slot->seq_done = 1; > -- > 2.9.3 Hi Trond, Can you explain the implications of retrying the operation without waiting for recovery to complete? This patch introduces regression in intra COPY failing if the server rebooted during that operation. What happens is that COPY is re-sent with the same stateid from the old open instead of the open that was done from the recovery (leading to BAD_STATEID error and failure). I wonder if it's because COPY is written to just use nfs4_call_sync() instead of defining rpc_callback_ops to handle rpc_prepare() where a new stateid could be gotten? I have re-written the COPY to do that and I no longer see that problem. If my suspicion is correct, it would help for the future to know that any operations that use stateid must have rpc callback ops defined/used so that they avoid this problem. Perhaps as a comment in the code or maybe some other way? Thanks. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE 2017-02-15 20:16 ` [PATCH 1/2] NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE Olga Kornievskaia @ 2017-02-15 20:48 ` Trond Myklebust 2017-02-15 21:56 ` Olga Kornievskaia 0 siblings, 1 reply; 16+ messages in thread From: Trond Myklebust @ 2017-02-15 20:48 UTC (permalink / raw) To: aglo@umich.edu; +Cc: linux-nfs@vger.kernel.org T24gV2VkLCAyMDE3LTAyLTE1IGF0IDE1OjE2IC0wNTAwLCBPbGdhIEtvcm5pZXZza2FpYSB3cm90 ZToNCj4gT24gU3VuLCBEZWMgNCwgMjAxNiBhdCA3OjQwIFBNLCBUcm9uZCBNeWtsZWJ1c3QNCj4g PHRyb25kLm15a2xlYnVzdEBwcmltYXJ5ZGF0YS5jb20+IHdyb3RlOg0KPiA+IEluIHRoZSBjYXNl IHdoZXJlIFNFUVVFTkNFIHJlY2VpdmVzIGEgTkZTNEVSUl9CQURTRVNTSU9OIG9yDQo+ID4gTkZT NEVSUl9ERUFEU0VTU0lPTiBlcnJvciwgd2UganVzdCB3YW50IHRvIHJlcG9ydCB0aGUgc2Vzc2lv biBhcw0KPiA+IG5lZWRpbmcNCj4gPiByZWNvdmVyeSwgYW5kIHRoZW4gd2Ugd2FudCB0byByZXRy eSB0aGUgb3BlcmF0aW9uLg0KPiA+IA0KPiA+IFNpZ25lZC1vZmYtYnk6IFRyb25kIE15a2xlYnVz dCA8dHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbT4NCj4gPiAtLS0NCj4gPiDCoGZzL25m cy9uZnM0cHJvYy5jIHwgNCArKysrDQo+ID4gwqAxIGZpbGUgY2hhbmdlZCwgNCBpbnNlcnRpb25z KCspDQo+ID4gDQo+ID4gZGlmZiAtLWdpdCBhL2ZzL25mcy9uZnM0cHJvYy5jIGIvZnMvbmZzL25m czRwcm9jLmMNCj4gPiBpbmRleCBmOTkyMjgxYzliMjkuLjRmZDBlMmI3YjA4ZSAxMDA2NDQNCj4g PiAtLS0gYS9mcy9uZnMvbmZzNHByb2MuYw0KPiA+ICsrKyBiL2ZzL25mcy9uZnM0cHJvYy5jDQo+ ID4gQEAgLTgxNiw2ICs4MTYsMTAgQEAgc3RhdGljIGludCBuZnM0MV9zZXF1ZW5jZV9wcm9jZXNz KHN0cnVjdA0KPiA+IHJwY190YXNrICp0YXNrLA0KPiA+IMKgwqDCoMKgwqDCoMKgwqBjYXNlIC1O RlM0RVJSX1NFUV9GQUxTRV9SRVRSWToNCj4gPiDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg wqDCoCsrc2xvdC0+c2VxX25yOw0KPiA+IMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg Z290byByZXRyeV9ub3dhaXQ7DQo+ID4gK8KgwqDCoMKgwqDCoMKgY2FzZSAtTkZTNEVSUl9ERUFE U0VTU0lPTjoNCj4gPiArwqDCoMKgwqDCoMKgwqBjYXNlIC1ORlM0RVJSX0JBRFNFU1NJT046DQo+ ID4gK8KgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoG5mczRfc2NoZWR1bGVfc2Vzc2lvbl9y ZWNvdmVyeShzZXNzaW9uLCByZXMtDQo+ID4gPnNyX3N0YXR1cyk7DQo+ID4gK8KgwqDCoMKgwqDC oMKgwqDCoMKgwqDCoMKgwqDCoGdvdG8gcmV0cnlfbm93YWl0Ow0KPiA+IMKgwqDCoMKgwqDCoMKg wqBkZWZhdWx0Og0KPiA+IMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgLyogSnVzdCB1 cGRhdGUgdGhlIHNsb3Qgc2VxdWVuY2Ugbm8uICovDQo+ID4gwqDCoMKgwqDCoMKgwqDCoMKgwqDC oMKgwqDCoMKgwqBzbG90LT5zZXFfZG9uZSA9IDE7DQo+ID4gLS0NCj4gPiAyLjkuMw0KPiANCj4g SGkgVHJvbmQsDQo+IA0KPiBDYW4geW91IGV4cGxhaW4gdGhlIGltcGxpY2F0aW9ucyBvZiByZXRy eWluZyB0aGUgb3BlcmF0aW9uIHdpdGhvdXQNCj4gd2FpdGluZyBmb3IgcmVjb3ZlcnkgdG8gY29t cGxldGU/DQo+IA0KPiBUaGlzIHBhdGNoIGludHJvZHVjZXMgcmVncmVzc2lvbiBpbiBpbnRyYSBD T1BZIGZhaWxpbmcgaWYgdGhlIHNlcnZlcg0KPiByZWJvb3RlZCBkdXJpbmcgdGhhdCBvcGVyYXRp b24uIFdoYXQgaGFwcGVucyBpcyB0aGF0IENPUFkgaXMgcmUtc2VudA0KPiB3aXRoIHRoZSBzYW1l IHN0YXRlaWQgZnJvbSB0aGUgb2xkIG9wZW4gaW5zdGVhZCBvZiB0aGUgb3BlbiB0aGF0IHdhcw0K PiBkb25lIGZyb20gdGhlIHJlY292ZXJ5IChsZWFkaW5nIHRvIEJBRF9TVEFURUlEIGVycm9yIGFu ZCBmYWlsdXJlKS4NCj4gDQo+IEkgd29uZGVyIGlmIGl0J3MgYmVjYXVzZSBDT1BZIGlzIHdyaXR0 ZW4gdG8ganVzdCB1c2UgbmZzNF9jYWxsX3N5bmMoKQ0KPiBpbnN0ZWFkIG9mIGRlZmluaW5nIHJw Y19jYWxsYmFja19vcHMgdG8gaGFuZGxlIHJwY19wcmVwYXJlKCkgd2hlcmUgYQ0KPiBuZXcgc3Rh dGVpZCBjb3VsZCBiZSBnb3R0ZW4/IEkgaGF2ZSByZS13cml0dGVuIHRoZSBDT1BZIHRvIGRvIHRo YXQNCj4gYW5kDQo+IEkgbm8gbG9uZ2VyIHNlZSB0aGF0IHByb2JsZW0uDQo+IA0KPiBJZiBteSBz dXNwaWNpb24gaXMgY29ycmVjdCwgaXQgd291bGQgaGVscCBmb3IgdGhlIGZ1dHVyZSB0byBrbm93 IHRoYXQNCj4gYW55IG9wZXJhdGlvbnMgdGhhdCB1c2Ugc3RhdGVpZCBtdXN0IGhhdmUgcnBjIGNh bGxiYWNrIG9wcw0KPiBkZWZpbmVkL3VzZWQgc28gdGhhdCB0aGV5IGF2b2lkIHRoaXMgcHJvYmxl bS4gUGVyaGFwcyBhcyBhIGNvbW1lbnQgaW4NCj4gdGhlIGNvZGUgb3IgbWF5YmUgc29tZSBvdGhl ciB3YXk/DQo+IA0KPiBUaGFua3MuDQo+IA0KWWVzLCB0aGUgd2F5IHRoYXQgcnBjX2NhbGxfc3lu YygpIGhhcyBiZWVuIHdyaXR0ZW4sIGl0IGFzc3VtZXMgdGhhdCB0aGUNCmNhbGwgaXMgdW5hZmZl Y3RlZCBieSByZWJvb3QgcmVjb3ZlcnksIG9yIHRoYXQgdGhlIHJlc3VsdGluZyBzdGF0ZQ0KZXJy b3JzIHdpbGwgYmUgaGFuZGxlZCBieSB0aGUgY2FsbGVyLiBUaGUgcGF0Y2ggeW91IHJlZmVyZW5j ZSBkb2VzIG5vdA0KY2hhbmdlIHRoYXQgZXhwZWN0YXRpb24uDQoNClRoZSBzYW1lIHJhY2UgY2Fu IGhhcHBlbiwgZm9yIGluc3RhbmNlLCBpZiB5b3VyIGNhbGwgdG8gcnBjX2NhbGxfc3luYygpDQog Y29pbmNpZGVzIHdpdGggYSByZWJvb3QgcmVjb3ZlcnkgdGhhdCB3YXMgc3RhcnRlZCBieSBhbm90 aGVyIHByb2Nlc3MuDQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50IG1h aW50YWluZXIsIFByaW1hcnlEYXRhDQp0cm9uZC5teWtsZWJ1c3RAcHJpbWFyeWRhdGEuY29tDQo= ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE 2017-02-15 20:48 ` Trond Myklebust @ 2017-02-15 21:56 ` Olga Kornievskaia 2017-02-15 22:23 ` Trond Myklebust 0 siblings, 1 reply; 16+ messages in thread From: Olga Kornievskaia @ 2017-02-15 21:56 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs@vger.kernel.org On Wed, Feb 15, 2017 at 3:48 PM, Trond Myklebust <trondmy@primarydata.com> wrote: > On Wed, 2017-02-15 at 15:16 -0500, Olga Kornievskaia wrote: >> On Sun, Dec 4, 2016 at 7:40 PM, Trond Myklebust >> <trond.myklebust@primarydata.com> wrote: >> > In the case where SEQUENCE receives a NFS4ERR_BADSESSION or >> > NFS4ERR_DEADSESSION error, we just want to report the session as >> > needing >> > recovery, and then we want to retry the operation. >> > >> > Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> >> > --- >> > fs/nfs/nfs4proc.c | 4 ++++ >> > 1 file changed, 4 insertions(+) >> > >> > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c >> > index f992281c9b29..4fd0e2b7b08e 100644 >> > --- a/fs/nfs/nfs4proc.c >> > +++ b/fs/nfs/nfs4proc.c >> > @@ -816,6 +816,10 @@ static int nfs41_sequence_process(struct >> > rpc_task *task, >> > case -NFS4ERR_SEQ_FALSE_RETRY: >> > ++slot->seq_nr; >> > goto retry_nowait; >> > + case -NFS4ERR_DEADSESSION: >> > + case -NFS4ERR_BADSESSION: >> > + nfs4_schedule_session_recovery(session, res- >> > >sr_status); >> > + goto retry_nowait; >> > default: >> > /* Just update the slot sequence no. */ >> > slot->seq_done = 1; >> > -- >> > 2.9.3 >> >> Hi Trond, >> >> Can you explain the implications of retrying the operation without >> waiting for recovery to complete? >> >> This patch introduces regression in intra COPY failing if the server >> rebooted during that operation. What happens is that COPY is re-sent >> with the same stateid from the old open instead of the open that was >> done from the recovery (leading to BAD_STATEID error and failure). >> >> I wonder if it's because COPY is written to just use nfs4_call_sync() >> instead of defining rpc_callback_ops to handle rpc_prepare() where a >> new stateid could be gotten? I have re-written the COPY to do that >> and >> I no longer see that problem. >> >> If my suspicion is correct, it would help for the future to know that >> any operations that use stateid must have rpc callback ops >> defined/used so that they avoid this problem. Perhaps as a comment in >> the code or maybe some other way? >> >> Thanks. >> > Yes, the way that rpc_call_sync() has been written, it assumes that the > call is unaffected by reboot recovery, or that the resulting state > errors will be handled by the caller. The patch you reference does not > change that expectation. The "or" is confusing me. The current COPY code does call into nfs4_handle_exception() and "should handle errors". Yet after this patch, the code fails to in presence of a reboot. I don't see what else can the current code should do in terms of handling errors to fix that problem. I'm almost ready to submit the patch that turns the existing code into async rpc but if this is not the right approach then I'd like to know where I should be looking into instead. Thanks. > > The same race can happen, for instance, if your call to rpc_call_sync() > coincides with a reboot recovery that was started by another process. > > -- > Trond Myklebust > Linux NFS client maintainer, PrimaryData > trond.myklebust@primarydata.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE 2017-02-15 21:56 ` Olga Kornievskaia @ 2017-02-15 22:23 ` Trond Myklebust 2017-02-16 14:16 ` Olga Kornievskaia 0 siblings, 1 reply; 16+ messages in thread From: Trond Myklebust @ 2017-02-15 22:23 UTC (permalink / raw) To: aglo@umich.edu; +Cc: linux-nfs@vger.kernel.org T24gV2VkLCAyMDE3LTAyLTE1IGF0IDE2OjU2IC0wNTAwLCBPbGdhIEtvcm5pZXZza2FpYSB3cm90 ZToNCj4gT24gV2VkLCBGZWIgMTUsIDIwMTcgYXQgMzo0OCBQTSwgVHJvbmQgTXlrbGVidXN0DQo+ IDx0cm9uZG15QHByaW1hcnlkYXRhLmNvbT4gd3JvdGU6DQo+ID4gT24gV2VkLCAyMDE3LTAyLTE1 IGF0IDE1OjE2IC0wNTAwLCBPbGdhIEtvcm5pZXZza2FpYSB3cm90ZToNCj4gPiA+IE9uIFN1biwg RGVjIDQsIDIwMTYgYXQgNzo0MCBQTSwgVHJvbmQgTXlrbGVidXN0DQo+ID4gPiA8dHJvbmQubXlr bGVidXN0QHByaW1hcnlkYXRhLmNvbT4gd3JvdGU6DQo+ID4gPiA+IEluIHRoZSBjYXNlIHdoZXJl IFNFUVVFTkNFIHJlY2VpdmVzIGEgTkZTNEVSUl9CQURTRVNTSU9OIG9yDQo+ID4gPiA+IE5GUzRF UlJfREVBRFNFU1NJT04gZXJyb3IsIHdlIGp1c3Qgd2FudCB0byByZXBvcnQgdGhlIHNlc3Npb24N Cj4gPiA+ID4gYXMNCj4gPiA+ID4gbmVlZGluZw0KPiA+ID4gPiByZWNvdmVyeSwgYW5kIHRoZW4g d2Ugd2FudCB0byByZXRyeSB0aGUgb3BlcmF0aW9uLg0KPiA+ID4gPiANCj4gPiA+ID4gU2lnbmVk LW9mZi1ieTogVHJvbmQgTXlrbGVidXN0IDx0cm9uZC5teWtsZWJ1c3RAcHJpbWFyeWRhdGEuY29t DQo+ID4gPiA+ID4NCj4gPiA+ID4gLS0tDQo+ID4gPiA+IMKgZnMvbmZzL25mczRwcm9jLmMgfCA0 ICsrKysNCj4gPiA+ID4gwqAxIGZpbGUgY2hhbmdlZCwgNCBpbnNlcnRpb25zKCspDQo+ID4gPiA+ IA0KPiA+ID4gPiBkaWZmIC0tZ2l0IGEvZnMvbmZzL25mczRwcm9jLmMgYi9mcy9uZnMvbmZzNHBy b2MuYw0KPiA+ID4gPiBpbmRleCBmOTkyMjgxYzliMjkuLjRmZDBlMmI3YjA4ZSAxMDA2NDQNCj4g PiA+ID4gLS0tIGEvZnMvbmZzL25mczRwcm9jLmMNCj4gPiA+ID4gKysrIGIvZnMvbmZzL25mczRw cm9jLmMNCj4gPiA+ID4gQEAgLTgxNiw2ICs4MTYsMTAgQEAgc3RhdGljIGludCBuZnM0MV9zZXF1 ZW5jZV9wcm9jZXNzKHN0cnVjdA0KPiA+ID4gPiBycGNfdGFzayAqdGFzaywNCj4gPiA+ID4gwqDC oMKgwqDCoMKgwqDCoGNhc2UgLU5GUzRFUlJfU0VRX0ZBTFNFX1JFVFJZOg0KPiA+ID4gPiDCoMKg wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoCsrc2xvdC0+c2VxX25yOw0KPiA+ID4gPiDCoMKg wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoGdvdG8gcmV0cnlfbm93YWl0Ow0KPiA+ID4gPiAr wqDCoMKgwqDCoMKgwqBjYXNlIC1ORlM0RVJSX0RFQURTRVNTSU9OOg0KPiA+ID4gPiArwqDCoMKg wqDCoMKgwqBjYXNlIC1ORlM0RVJSX0JBRFNFU1NJT046DQo+ID4gPiA+ICvCoMKgwqDCoMKgwqDC oMKgwqDCoMKgwqDCoMKgwqBuZnM0X3NjaGVkdWxlX3Nlc3Npb25fcmVjb3Zlcnkoc2Vzc2lvbiwg cmVzLQ0KPiA+ID4gPiA+IHNyX3N0YXR1cyk7DQo+ID4gPiA+IA0KPiA+ID4gPiArwqDCoMKgwqDC oMKgwqDCoMKgwqDCoMKgwqDCoMKgZ290byByZXRyeV9ub3dhaXQ7DQo+ID4gPiA+IMKgwqDCoMKg wqDCoMKgwqBkZWZhdWx0Og0KPiA+ID4gPiDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC oC8qIEp1c3QgdXBkYXRlIHRoZSBzbG90IHNlcXVlbmNlIG5vLiAqLw0KPiA+ID4gPiDCoMKgwqDC oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoHNsb3QtPnNlcV9kb25lID0gMTsNCj4gPiA+ID4gLS0N Cj4gPiA+ID4gMi45LjMNCj4gPiA+IA0KPiA+ID4gSGkgVHJvbmQsDQo+ID4gPiANCj4gPiA+IENh biB5b3UgZXhwbGFpbiB0aGUgaW1wbGljYXRpb25zIG9mIHJldHJ5aW5nIHRoZSBvcGVyYXRpb24N Cj4gPiA+IHdpdGhvdXQNCj4gPiA+IHdhaXRpbmcgZm9yIHJlY292ZXJ5IHRvIGNvbXBsZXRlPw0K PiA+ID4gDQo+ID4gPiBUaGlzIHBhdGNoIGludHJvZHVjZXMgcmVncmVzc2lvbiBpbiBpbnRyYSBD T1BZIGZhaWxpbmcgaWYgdGhlDQo+ID4gPiBzZXJ2ZXINCj4gPiA+IHJlYm9vdGVkIGR1cmluZyB0 aGF0IG9wZXJhdGlvbi4gV2hhdCBoYXBwZW5zIGlzIHRoYXQgQ09QWSBpcyByZS0NCj4gPiA+IHNl bnQNCj4gPiA+IHdpdGggdGhlIHNhbWUgc3RhdGVpZCBmcm9tIHRoZSBvbGQgb3BlbiBpbnN0ZWFk IG9mIHRoZSBvcGVuIHRoYXQNCj4gPiA+IHdhcw0KPiA+ID4gZG9uZSBmcm9tIHRoZSByZWNvdmVy eSAobGVhZGluZyB0byBCQURfU1RBVEVJRCBlcnJvciBhbmQNCj4gPiA+IGZhaWx1cmUpLg0KPiA+ ID4gDQo+ID4gPiBJIHdvbmRlciBpZiBpdCdzIGJlY2F1c2UgQ09QWSBpcyB3cml0dGVuIHRvIGp1 c3QgdXNlDQo+ID4gPiBuZnM0X2NhbGxfc3luYygpDQo+ID4gPiBpbnN0ZWFkIG9mIGRlZmluaW5n IHJwY19jYWxsYmFja19vcHMgdG8gaGFuZGxlIHJwY19wcmVwYXJlKCkNCj4gPiA+IHdoZXJlIGEN Cj4gPiA+IG5ldyBzdGF0ZWlkIGNvdWxkIGJlIGdvdHRlbj8gSSBoYXZlIHJlLXdyaXR0ZW4gdGhl IENPUFkgdG8gZG8NCj4gPiA+IHRoYXQNCj4gPiA+IGFuZA0KPiA+ID4gSSBubyBsb25nZXIgc2Vl IHRoYXQgcHJvYmxlbS4NCj4gPiA+IA0KPiA+ID4gSWYgbXkgc3VzcGljaW9uIGlzIGNvcnJlY3Qs IGl0IHdvdWxkIGhlbHAgZm9yIHRoZSBmdXR1cmUgdG8ga25vdw0KPiA+ID4gdGhhdA0KPiA+ID4g YW55IG9wZXJhdGlvbnMgdGhhdCB1c2Ugc3RhdGVpZCBtdXN0IGhhdmUgcnBjIGNhbGxiYWNrIG9w cw0KPiA+ID4gZGVmaW5lZC91c2VkIHNvIHRoYXQgdGhleSBhdm9pZCB0aGlzIHByb2JsZW0uIFBl cmhhcHMgYXMgYQ0KPiA+ID4gY29tbWVudCBpbg0KPiA+ID4gdGhlIGNvZGUgb3IgbWF5YmUgc29t ZSBvdGhlciB3YXk/DQo+ID4gPiANCj4gPiA+IFRoYW5rcy4NCj4gPiA+IA0KPiA+IA0KPiA+IFll cywgdGhlIHdheSB0aGF0IHJwY19jYWxsX3N5bmMoKSBoYXMgYmVlbiB3cml0dGVuLCBpdCBhc3N1 bWVzIHRoYXQNCj4gPiB0aGUNCj4gPiBjYWxsIGlzIHVuYWZmZWN0ZWQgYnkgcmVib290IHJlY292 ZXJ5LCBvciB0aGF0IHRoZSByZXN1bHRpbmcgc3RhdGUNCj4gPiBlcnJvcnMgd2lsbCBiZSBoYW5k bGVkIGJ5IHRoZSBjYWxsZXIuIFRoZSBwYXRjaCB5b3UgcmVmZXJlbmNlIGRvZXMNCj4gPiBub3QN Cj4gPiBjaGFuZ2UgdGhhdCBleHBlY3RhdGlvbi4NCj4gDQo+IFRoZSAib3IiIGlzIGNvbmZ1c2lu ZyBtZS4gVGhlIGN1cnJlbnQgQ09QWSBjb2RlIGRvZXMgY2FsbCBpbnRvDQo+IG5mczRfaGFuZGxl X2V4Y2VwdGlvbigpIGFuZCAic2hvdWxkIGhhbmRsZSBlcnJvcnMiLiBZZXQgYWZ0ZXIgdGhpcw0K PiBwYXRjaCwgdGhlIGNvZGUgZmFpbHMgdG8gaW4gcHJlc2VuY2Ugb2YgYSByZWJvb3QuIEkgZG9u J3Qgc2VlIHdoYXQNCj4gZWxzZSBjYW4gdGhlIGN1cnJlbnQgY29kZSBzaG91bGQgZG8gaW4gdGVy bXMgb2YgaGFuZGxpbmcgZXJyb3JzIHRvDQo+IGZpeA0KPiB0aGF0IHByb2JsZW0uDQo+DQo+IEkn bSBhbG1vc3QgcmVhZHkgdG8gc3VibWl0IHRoZSBwYXRjaCB0aGF0IHR1cm5zIHRoZSBleGlzdGlu ZyBjb2RlDQo+IGludG8NCj4gYXN5bmMgcnBjIGJ1dCBpZiB0aGlzIGlzIG5vdCB0aGUgcmlnaHQg YXBwcm9hY2ggdGhlbiBJJ2QgbGlrZSB0byBrbm93DQo+IHdoZXJlIEkgc2hvdWxkIGJlIGxvb2tp bmcgaW50byBpbnN0ZWFkLg0KPiANCj4gVGhhbmtzLg0KPiANCj4gPiANCj4gPiBUaGUgc2FtZSBy YWNlIGNhbiBoYXBwZW4sIGZvciBpbnN0YW5jZSwgaWYgeW91ciBjYWxsIHRvDQo+ID4gcnBjX2Nh bGxfc3luYygpDQo+ID4gwqBjb2luY2lkZXMgd2l0aCBhIHJlYm9vdCByZWNvdmVyeSB0aGF0IHdh cyBzdGFydGVkIGJ5IGFub3RoZXINCj4gPiBwcm9jZXNzLg0KPiA+IA0KDQoNCkkgZG9uJ3QgdW5k ZXJzdGFuZC4gSWYgaXQgaXMgaGFuZGxpbmcgdGhlIHJlc3VsdGluZyBORlM0RVJSX0JBRF9TVEFU RUlEDQplcnJvciBjb3JyZWN0bHksIHRoZW4gd2hhdCBpcyBmYWlsaW5nPyBBcyBJIHNhaWQgYWJv dmUsIHlvdSBjYW4gZ2V0IHRoZQ0KTkZTNEVSUl9CQURfU1RBVEVJRCBmcm9twqBvdGhlciBpbnN0 YW5jZXMgb2YgdGhlIHNhbWUgcmFjZS4NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5G UyBjbGllbnQgbWFpbnRhaW5lciwgUHJpbWFyeURhdGENCnRyb25kLm15a2xlYnVzdEBwcmltYXJ5 ZGF0YS5jb20NCg== ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE 2017-02-15 22:23 ` Trond Myklebust @ 2017-02-16 14:16 ` Olga Kornievskaia 2017-02-16 16:04 ` Olga Kornievskaia 0 siblings, 1 reply; 16+ messages in thread From: Olga Kornievskaia @ 2017-02-16 14:16 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs@vger.kernel.org On Wed, Feb 15, 2017 at 5:23 PM, Trond Myklebust <trondmy@primarydata.com> wrote: > On Wed, 2017-02-15 at 16:56 -0500, Olga Kornievskaia wrote: >> On Wed, Feb 15, 2017 at 3:48 PM, Trond Myklebust >> <trondmy@primarydata.com> wrote: >> > On Wed, 2017-02-15 at 15:16 -0500, Olga Kornievskaia wrote: >> > > On Sun, Dec 4, 2016 at 7:40 PM, Trond Myklebust >> > > <trond.myklebust@primarydata.com> wrote: >> > > > In the case where SEQUENCE receives a NFS4ERR_BADSESSION or >> > > > NFS4ERR_DEADSESSION error, we just want to report the session >> > > > as >> > > > needing >> > > > recovery, and then we want to retry the operation. >> > > > >> > > > Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com >> > > > > >> > > > --- >> > > > fs/nfs/nfs4proc.c | 4 ++++ >> > > > 1 file changed, 4 insertions(+) >> > > > >> > > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c >> > > > index f992281c9b29..4fd0e2b7b08e 100644 >> > > > --- a/fs/nfs/nfs4proc.c >> > > > +++ b/fs/nfs/nfs4proc.c >> > > > @@ -816,6 +816,10 @@ static int nfs41_sequence_process(struct >> > > > rpc_task *task, >> > > > case -NFS4ERR_SEQ_FALSE_RETRY: >> > > > ++slot->seq_nr; >> > > > goto retry_nowait; >> > > > + case -NFS4ERR_DEADSESSION: >> > > > + case -NFS4ERR_BADSESSION: >> > > > + nfs4_schedule_session_recovery(session, res- >> > > > > sr_status); >> > > > >> > > > + goto retry_nowait; >> > > > default: >> > > > /* Just update the slot sequence no. */ >> > > > slot->seq_done = 1; >> > > > -- >> > > > 2.9.3 >> > > >> > > Hi Trond, >> > > >> > > Can you explain the implications of retrying the operation >> > > without >> > > waiting for recovery to complete? >> > > >> > > This patch introduces regression in intra COPY failing if the >> > > server >> > > rebooted during that operation. What happens is that COPY is re- >> > > sent >> > > with the same stateid from the old open instead of the open that >> > > was >> > > done from the recovery (leading to BAD_STATEID error and >> > > failure). >> > > >> > > I wonder if it's because COPY is written to just use >> > > nfs4_call_sync() >> > > instead of defining rpc_callback_ops to handle rpc_prepare() >> > > where a >> > > new stateid could be gotten? I have re-written the COPY to do >> > > that >> > > and >> > > I no longer see that problem. >> > > >> > > If my suspicion is correct, it would help for the future to know >> > > that >> > > any operations that use stateid must have rpc callback ops >> > > defined/used so that they avoid this problem. Perhaps as a >> > > comment in >> > > the code or maybe some other way? >> > > >> > > Thanks. >> > > >> > >> > Yes, the way that rpc_call_sync() has been written, it assumes that >> > the >> > call is unaffected by reboot recovery, or that the resulting state >> > errors will be handled by the caller. The patch you reference does >> > not >> > change that expectation. >> >> The "or" is confusing me. The current COPY code does call into >> nfs4_handle_exception() and "should handle errors". Yet after this >> patch, the code fails to in presence of a reboot. I don't see what >> else can the current code should do in terms of handling errors to >> fix >> that problem. >> >> I'm almost ready to submit the patch that turns the existing code >> into >> async rpc but if this is not the right approach then I'd like to know >> where I should be looking into instead. >> >> Thanks. >> >> > >> > The same race can happen, for instance, if your call to >> > rpc_call_sync() >> > coincides with a reboot recovery that was started by another >> > process. >> > > > > I don't understand. If it is handling the resulting NFS4ERR_BAD_STATEID > error correctly, then what is failing? As I said above, you can get the > NFS4ERR_BAD_STATEID from other instances of the same race. COPY (just like READ or WRITE) can not handle BAD_STATEID error because it holds a lock and that lock is marked lost. COPY should have never been sent with the old stated after the new open stateid and lock stateid was gotten. But with this patch the behavior changed and thus I was trying to understand what has changed. I think that previously BAD_SESSION error was not handled by the SEQUENCE and was handled by each of the operations calling the general nfs4_handle_exception() routine that actually waits on recovery to complete before retrying the operation. Then the retry actually would allow COPY to get the new stateid. However, with the new code SEQUENCE handles the error and calls to retry the operation without the wait which actually again gets the BAD_SESSION then recovery happens but 2nd SEQUENCE again retries the operation without going all the out to the operation so no new stateid is gotten. > > -- > Trond Myklebust > Linux NFS client maintainer, PrimaryData > trond.myklebust@primarydata.com ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE 2017-02-16 14:16 ` Olga Kornievskaia @ 2017-02-16 16:04 ` Olga Kornievskaia 2017-02-16 16:12 ` Olga Kornievskaia 0 siblings, 1 reply; 16+ messages in thread From: Olga Kornievskaia @ 2017-02-16 16:04 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs@vger.kernel.org On Thu, Feb 16, 2017 at 9:16 AM, Olga Kornievskaia <aglo@umich.edu> wrote: > On Wed, Feb 15, 2017 at 5:23 PM, Trond Myklebust > <trondmy@primarydata.com> wrote: >> On Wed, 2017-02-15 at 16:56 -0500, Olga Kornievskaia wrote: >>> On Wed, Feb 15, 2017 at 3:48 PM, Trond Myklebust >>> <trondmy@primarydata.com> wrote: >>> > On Wed, 2017-02-15 at 15:16 -0500, Olga Kornievskaia wrote: >>> > > On Sun, Dec 4, 2016 at 7:40 PM, Trond Myklebust >>> > > <trond.myklebust@primarydata.com> wrote: >>> > > > In the case where SEQUENCE receives a NFS4ERR_BADSESSION or >>> > > > NFS4ERR_DEADSESSION error, we just want to report the session >>> > > > as >>> > > > needing >>> > > > recovery, and then we want to retry the operation. >>> > > > >>> > > > Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com >>> > > > > >>> > > > --- >>> > > > fs/nfs/nfs4proc.c | 4 ++++ >>> > > > 1 file changed, 4 insertions(+) >>> > > > >>> > > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c >>> > > > index f992281c9b29..4fd0e2b7b08e 100644 >>> > > > --- a/fs/nfs/nfs4proc.c >>> > > > +++ b/fs/nfs/nfs4proc.c >>> > > > @@ -816,6 +816,10 @@ static int nfs41_sequence_process(struct >>> > > > rpc_task *task, >>> > > > case -NFS4ERR_SEQ_FALSE_RETRY: >>> > > > ++slot->seq_nr; >>> > > > goto retry_nowait; >>> > > > + case -NFS4ERR_DEADSESSION: >>> > > > + case -NFS4ERR_BADSESSION: >>> > > > + nfs4_schedule_session_recovery(session, res- >>> > > > > sr_status); >>> > > > >>> > > > + goto retry_nowait; >>> > > > default: >>> > > > /* Just update the slot sequence no. */ >>> > > > slot->seq_done = 1; >>> > > > -- >>> > > > 2.9.3 >>> > > >>> > > Hi Trond, >>> > > >>> > > Can you explain the implications of retrying the operation >>> > > without >>> > > waiting for recovery to complete? >>> > > >>> > > This patch introduces regression in intra COPY failing if the >>> > > server >>> > > rebooted during that operation. What happens is that COPY is re- >>> > > sent >>> > > with the same stateid from the old open instead of the open that >>> > > was >>> > > done from the recovery (leading to BAD_STATEID error and >>> > > failure). >>> > > >>> > > I wonder if it's because COPY is written to just use >>> > > nfs4_call_sync() >>> > > instead of defining rpc_callback_ops to handle rpc_prepare() >>> > > where a >>> > > new stateid could be gotten? I have re-written the COPY to do >>> > > that >>> > > and >>> > > I no longer see that problem. >>> > > >>> > > If my suspicion is correct, it would help for the future to know >>> > > that >>> > > any operations that use stateid must have rpc callback ops >>> > > defined/used so that they avoid this problem. Perhaps as a >>> > > comment in >>> > > the code or maybe some other way? >>> > > >>> > > Thanks. >>> > > >>> > >>> > Yes, the way that rpc_call_sync() has been written, it assumes that >>> > the >>> > call is unaffected by reboot recovery, or that the resulting state >>> > errors will be handled by the caller. The patch you reference does >>> > not >>> > change that expectation. >>> >>> The "or" is confusing me. The current COPY code does call into >>> nfs4_handle_exception() and "should handle errors". Yet after this >>> patch, the code fails to in presence of a reboot. I don't see what >>> else can the current code should do in terms of handling errors to >>> fix >>> that problem. >>> >>> I'm almost ready to submit the patch that turns the existing code >>> into >>> async rpc but if this is not the right approach then I'd like to know >>> where I should be looking into instead. >>> >>> Thanks. >>> >>> > >>> > The same race can happen, for instance, if your call to >>> > rpc_call_sync() >>> > coincides with a reboot recovery that was started by another >>> > process. >>> > >> >> >> I don't understand. If it is handling the resulting NFS4ERR_BAD_STATEID >> error correctly, then what is failing? As I said above, you can get the >> NFS4ERR_BAD_STATEID from other instances of the same race. > > COPY (just like READ or WRITE) can not handle BAD_STATEID error > because it holds a lock and that lock is marked lost. COPY should have > never been sent with the old stated after the new open stateid and > lock stateid was gotten. But with this patch the behavior changed and > thus I was trying to understand what has changed. > > I think that previously BAD_SESSION error was not handled by the > SEQUENCE and was handled by each of the operations calling the general > nfs4_handle_exception() routine that actually waits on recovery to > complete before retrying the operation. Then the retry actually would > allow COPY to get the new stateid. However, with the new code SEQUENCE > handles the error and calls to retry the operation without the wait > which actually again gets the BAD_SESSION then recovery happens but > 2nd SEQUENCE again retries the operation without going all the out to > the operation so no new stateid is gotten. > I also would like to point out that not only does this patch introduces a regression with the COPY. It also introduces the BAD_SESSION loop where the SEQUENCE that gets this error just keeps getting resent over and over again. This happens to the heartbeat SEQUENCE when server reboots. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE 2017-02-16 16:04 ` Olga Kornievskaia @ 2017-02-16 16:12 ` Olga Kornievskaia 2017-02-16 21:45 ` Trond Myklebust 0 siblings, 1 reply; 16+ messages in thread From: Olga Kornievskaia @ 2017-02-16 16:12 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs@vger.kernel.org On Thu, Feb 16, 2017 at 11:04 AM, Olga Kornievskaia <aglo@umich.edu> wrote: > On Thu, Feb 16, 2017 at 9:16 AM, Olga Kornievskaia <aglo@umich.edu> wrote: >> On Wed, Feb 15, 2017 at 5:23 PM, Trond Myklebust >> <trondmy@primarydata.com> wrote: >>> On Wed, 2017-02-15 at 16:56 -0500, Olga Kornievskaia wrote: >>>> On Wed, Feb 15, 2017 at 3:48 PM, Trond Myklebust >>>> <trondmy@primarydata.com> wrote: >>>> > On Wed, 2017-02-15 at 15:16 -0500, Olga Kornievskaia wrote: >>>> > > On Sun, Dec 4, 2016 at 7:40 PM, Trond Myklebust >>>> > > <trond.myklebust@primarydata.com> wrote: >>>> > > > In the case where SEQUENCE receives a NFS4ERR_BADSESSION or >>>> > > > NFS4ERR_DEADSESSION error, we just want to report the session >>>> > > > as >>>> > > > needing >>>> > > > recovery, and then we want to retry the operation. >>>> > > > >>>> > > > Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com >>>> > > > > >>>> > > > --- >>>> > > > fs/nfs/nfs4proc.c | 4 ++++ >>>> > > > 1 file changed, 4 insertions(+) >>>> > > > >>>> > > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c >>>> > > > index f992281c9b29..4fd0e2b7b08e 100644 >>>> > > > --- a/fs/nfs/nfs4proc.c >>>> > > > +++ b/fs/nfs/nfs4proc.c >>>> > > > @@ -816,6 +816,10 @@ static int nfs41_sequence_process(struct >>>> > > > rpc_task *task, >>>> > > > case -NFS4ERR_SEQ_FALSE_RETRY: >>>> > > > ++slot->seq_nr; >>>> > > > goto retry_nowait; >>>> > > > + case -NFS4ERR_DEADSESSION: >>>> > > > + case -NFS4ERR_BADSESSION: >>>> > > > + nfs4_schedule_session_recovery(session, res- >>>> > > > > sr_status); >>>> > > > >>>> > > > + goto retry_nowait; >>>> > > > default: >>>> > > > /* Just update the slot sequence no. */ >>>> > > > slot->seq_done = 1; >>>> > > > -- >>>> > > > 2.9.3 >>>> > > >>>> > > Hi Trond, >>>> > > >>>> > > Can you explain the implications of retrying the operation >>>> > > without >>>> > > waiting for recovery to complete? >>>> > > >>>> > > This patch introduces regression in intra COPY failing if the >>>> > > server >>>> > > rebooted during that operation. What happens is that COPY is re- >>>> > > sent >>>> > > with the same stateid from the old open instead of the open that >>>> > > was >>>> > > done from the recovery (leading to BAD_STATEID error and >>>> > > failure). >>>> > > >>>> > > I wonder if it's because COPY is written to just use >>>> > > nfs4_call_sync() >>>> > > instead of defining rpc_callback_ops to handle rpc_prepare() >>>> > > where a >>>> > > new stateid could be gotten? I have re-written the COPY to do >>>> > > that >>>> > > and >>>> > > I no longer see that problem. >>>> > > >>>> > > If my suspicion is correct, it would help for the future to know >>>> > > that >>>> > > any operations that use stateid must have rpc callback ops >>>> > > defined/used so that they avoid this problem. Perhaps as a >>>> > > comment in >>>> > > the code or maybe some other way? >>>> > > >>>> > > Thanks. >>>> > > >>>> > >>>> > Yes, the way that rpc_call_sync() has been written, it assumes that >>>> > the >>>> > call is unaffected by reboot recovery, or that the resulting state >>>> > errors will be handled by the caller. The patch you reference does >>>> > not >>>> > change that expectation. >>>> >>>> The "or" is confusing me. The current COPY code does call into >>>> nfs4_handle_exception() and "should handle errors". Yet after this >>>> patch, the code fails to in presence of a reboot. I don't see what >>>> else can the current code should do in terms of handling errors to >>>> fix >>>> that problem. >>>> >>>> I'm almost ready to submit the patch that turns the existing code >>>> into >>>> async rpc but if this is not the right approach then I'd like to know >>>> where I should be looking into instead. >>>> >>>> Thanks. >>>> >>>> > >>>> > The same race can happen, for instance, if your call to >>>> > rpc_call_sync() >>>> > coincides with a reboot recovery that was started by another >>>> > process. >>>> > >>> >>> >>> I don't understand. If it is handling the resulting NFS4ERR_BAD_STATEID >>> error correctly, then what is failing? As I said above, you can get the >>> NFS4ERR_BAD_STATEID from other instances of the same race. >> >> COPY (just like READ or WRITE) can not handle BAD_STATEID error >> because it holds a lock and that lock is marked lost. COPY should have >> never been sent with the old stated after the new open stateid and >> lock stateid was gotten. But with this patch the behavior changed and >> thus I was trying to understand what has changed. >> >> I think that previously BAD_SESSION error was not handled by the >> SEQUENCE and was handled by each of the operations calling the general >> nfs4_handle_exception() routine that actually waits on recovery to >> complete before retrying the operation. Then the retry actually would >> allow COPY to get the new stateid. However, with the new code SEQUENCE >> handles the error and calls to retry the operation without the wait >> which actually again gets the BAD_SESSION then recovery happens but >> 2nd SEQUENCE again retries the operation without going all the out to >> the operation so no new stateid is gotten. >> > > I also would like to point out that not only does this patch > introduces a regression with the COPY. It also introduces the > BAD_SESSION loop where the SEQUENCE that gets this error just keeps > getting resent over and over again. This happens to the heartbeat > SEQUENCE when server reboots. I have written a test case for the SETATTR since it also uses a stateid and calls nfs4_sync_call() and it also after the reboot will send the operation with the old stateid. Thankfully, SETATTR doesn't really care about the 'validity' of the stateid and will retry the operation. It will produce no errors but I can't say the client is working properly. So I'd save that the patch has introduced some significant change to the philosophy that wasn't expected. Do we really need this patch?? Does it solve an existing problem? All the operations are able to handle BAD_SESSION errors in their corresponding nfs4_handle_exception code that they do. Why change it. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE 2017-02-16 16:12 ` Olga Kornievskaia @ 2017-02-16 21:45 ` Trond Myklebust 2017-02-16 22:14 ` Olga Kornievskaia 0 siblings, 1 reply; 16+ messages in thread From: Trond Myklebust @ 2017-02-16 21:45 UTC (permalink / raw) To: Olga Kornievskaia; +Cc: Linux NFS Mailing List DQo+IE9uIEZlYiAxNiwgMjAxNywgYXQgMTE6MTIsIE9sZ2EgS29ybmlldnNrYWlhIDxhZ2xvQHVt aWNoLmVkdT4gd3JvdGU6DQo+IA0KPiBPbiBUaHUsIEZlYiAxNiwgMjAxNyBhdCAxMTowNCBBTSwg T2xnYSBLb3JuaWV2c2thaWEgPGFnbG9AdW1pY2guZWR1PiB3cm90ZToNCj4+IE9uIFRodSwgRmVi IDE2LCAyMDE3IGF0IDk6MTYgQU0sIE9sZ2EgS29ybmlldnNrYWlhIDxhZ2xvQHVtaWNoLmVkdT4g d3JvdGU6DQo+Pj4gT24gV2VkLCBGZWIgMTUsIDIwMTcgYXQgNToyMyBQTSwgVHJvbmQgTXlrbGVi dXN0DQo+Pj4gPHRyb25kbXlAcHJpbWFyeWRhdGEuY29tPiB3cm90ZToNCj4+Pj4gT24gV2VkLCAy MDE3LTAyLTE1IGF0IDE2OjU2IC0wNTAwLCBPbGdhIEtvcm5pZXZza2FpYSB3cm90ZToNCj4+Pj4+ IE9uIFdlZCwgRmViIDE1LCAyMDE3IGF0IDM6NDggUE0sIFRyb25kIE15a2xlYnVzdA0KPj4+Pj4g PHRyb25kbXlAcHJpbWFyeWRhdGEuY29tPiB3cm90ZToNCj4+Pj4+PiBPbiBXZWQsIDIwMTctMDIt MTUgYXQgMTU6MTYgLTA1MDAsIE9sZ2EgS29ybmlldnNrYWlhIHdyb3RlOg0KPj4+Pj4+PiBPbiBT dW4sIERlYyA0LCAyMDE2IGF0IDc6NDAgUE0sIFRyb25kIE15a2xlYnVzdA0KPj4+Pj4+PiA8dHJv bmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbT4gd3JvdGU6DQo+Pj4+Pj4+PiBJbiB0aGUgY2Fz ZSB3aGVyZSBTRVFVRU5DRSByZWNlaXZlcyBhIE5GUzRFUlJfQkFEU0VTU0lPTiBvcg0KPj4+Pj4+ Pj4gTkZTNEVSUl9ERUFEU0VTU0lPTiBlcnJvciwgd2UganVzdCB3YW50IHRvIHJlcG9ydCB0aGUg c2Vzc2lvbg0KPj4+Pj4+Pj4gYXMNCj4+Pj4+Pj4+IG5lZWRpbmcNCj4+Pj4+Pj4+IHJlY292ZXJ5 LCBhbmQgdGhlbiB3ZSB3YW50IHRvIHJldHJ5IHRoZSBvcGVyYXRpb24uDQo+Pj4+Pj4+PiANCj4+ Pj4+Pj4+IFNpZ25lZC1vZmYtYnk6IFRyb25kIE15a2xlYnVzdCA8dHJvbmQubXlrbGVidXN0QHBy aW1hcnlkYXRhLmNvbQ0KPj4+Pj4+Pj4+IA0KPj4+Pj4+Pj4gLS0tDQo+Pj4+Pj4+PiBmcy9uZnMv bmZzNHByb2MuYyB8IDQgKysrKw0KPj4+Pj4+Pj4gMSBmaWxlIGNoYW5nZWQsIDQgaW5zZXJ0aW9u cygrKQ0KPj4+Pj4+Pj4gDQo+Pj4+Pj4+PiBkaWZmIC0tZ2l0IGEvZnMvbmZzL25mczRwcm9jLmMg Yi9mcy9uZnMvbmZzNHByb2MuYw0KPj4+Pj4+Pj4gaW5kZXggZjk5MjI4MWM5YjI5Li40ZmQwZTJi N2IwOGUgMTAwNjQ0DQo+Pj4+Pj4+PiAtLS0gYS9mcy9uZnMvbmZzNHByb2MuYw0KPj4+Pj4+Pj4g KysrIGIvZnMvbmZzL25mczRwcm9jLmMNCj4+Pj4+Pj4+IEBAIC04MTYsNiArODE2LDEwIEBAIHN0 YXRpYyBpbnQgbmZzNDFfc2VxdWVuY2VfcHJvY2VzcyhzdHJ1Y3QNCj4+Pj4+Pj4+IHJwY190YXNr ICp0YXNrLA0KPj4+Pj4+Pj4gICAgICAgIGNhc2UgLU5GUzRFUlJfU0VRX0ZBTFNFX1JFVFJZOg0K Pj4+Pj4+Pj4gICAgICAgICAgICAgICAgKytzbG90LT5zZXFfbnI7DQo+Pj4+Pj4+PiAgICAgICAg ICAgICAgICBnb3RvIHJldHJ5X25vd2FpdDsNCj4+Pj4+Pj4+ICsgICAgICAgY2FzZSAtTkZTNEVS Ul9ERUFEU0VTU0lPTjoNCj4+Pj4+Pj4+ICsgICAgICAgY2FzZSAtTkZTNEVSUl9CQURTRVNTSU9O Og0KPj4+Pj4+Pj4gKyAgICAgICAgICAgICAgIG5mczRfc2NoZWR1bGVfc2Vzc2lvbl9yZWNvdmVy eShzZXNzaW9uLCByZXMtDQo+Pj4+Pj4+Pj4gc3Jfc3RhdHVzKTsNCj4+Pj4+Pj4+IA0KPj4+Pj4+ Pj4gKyAgICAgICAgICAgICAgIGdvdG8gcmV0cnlfbm93YWl0Ow0KPj4+Pj4+Pj4gICAgICAgIGRl ZmF1bHQ6DQo+Pj4+Pj4+PiAgICAgICAgICAgICAgICAvKiBKdXN0IHVwZGF0ZSB0aGUgc2xvdCBz ZXF1ZW5jZSBuby4gKi8NCj4+Pj4+Pj4+ICAgICAgICAgICAgICAgIHNsb3QtPnNlcV9kb25lID0g MTsNCj4+Pj4+Pj4+IC0tDQo+Pj4+Pj4+PiAyLjkuMw0KPj4+Pj4+PiANCj4+Pj4+Pj4gSGkgVHJv bmQsDQo+Pj4+Pj4+IA0KPj4+Pj4+PiBDYW4geW91IGV4cGxhaW4gdGhlIGltcGxpY2F0aW9ucyBv ZiByZXRyeWluZyB0aGUgb3BlcmF0aW9uDQo+Pj4+Pj4+IHdpdGhvdXQNCj4+Pj4+Pj4gd2FpdGlu ZyBmb3IgcmVjb3ZlcnkgdG8gY29tcGxldGU/DQo+Pj4+Pj4+IA0KPj4+Pj4+PiBUaGlzIHBhdGNo IGludHJvZHVjZXMgcmVncmVzc2lvbiBpbiBpbnRyYSBDT1BZIGZhaWxpbmcgaWYgdGhlDQo+Pj4+ Pj4+IHNlcnZlcg0KPj4+Pj4+PiByZWJvb3RlZCBkdXJpbmcgdGhhdCBvcGVyYXRpb24uIFdoYXQg aGFwcGVucyBpcyB0aGF0IENPUFkgaXMgcmUtDQo+Pj4+Pj4+IHNlbnQNCj4+Pj4+Pj4gd2l0aCB0 aGUgc2FtZSBzdGF0ZWlkIGZyb20gdGhlIG9sZCBvcGVuIGluc3RlYWQgb2YgdGhlIG9wZW4gdGhh dA0KPj4+Pj4+PiB3YXMNCj4+Pj4+Pj4gZG9uZSBmcm9tIHRoZSByZWNvdmVyeSAobGVhZGluZyB0 byBCQURfU1RBVEVJRCBlcnJvciBhbmQNCj4+Pj4+Pj4gZmFpbHVyZSkuDQo+Pj4+Pj4+IA0KPj4+ Pj4+PiBJIHdvbmRlciBpZiBpdCdzIGJlY2F1c2UgQ09QWSBpcyB3cml0dGVuIHRvIGp1c3QgdXNl DQo+Pj4+Pj4+IG5mczRfY2FsbF9zeW5jKCkNCj4+Pj4+Pj4gaW5zdGVhZCBvZiBkZWZpbmluZyBy cGNfY2FsbGJhY2tfb3BzIHRvIGhhbmRsZSBycGNfcHJlcGFyZSgpDQo+Pj4+Pj4+IHdoZXJlIGEN Cj4+Pj4+Pj4gbmV3IHN0YXRlaWQgY291bGQgYmUgZ290dGVuPyBJIGhhdmUgcmUtd3JpdHRlbiB0 aGUgQ09QWSB0byBkbw0KPj4+Pj4+PiB0aGF0DQo+Pj4+Pj4+IGFuZA0KPj4+Pj4+PiBJIG5vIGxv bmdlciBzZWUgdGhhdCBwcm9ibGVtLg0KPj4+Pj4+PiANCj4+Pj4+Pj4gSWYgbXkgc3VzcGljaW9u IGlzIGNvcnJlY3QsIGl0IHdvdWxkIGhlbHAgZm9yIHRoZSBmdXR1cmUgdG8ga25vdw0KPj4+Pj4+ PiB0aGF0DQo+Pj4+Pj4+IGFueSBvcGVyYXRpb25zIHRoYXQgdXNlIHN0YXRlaWQgbXVzdCBoYXZl IHJwYyBjYWxsYmFjayBvcHMNCj4+Pj4+Pj4gZGVmaW5lZC91c2VkIHNvIHRoYXQgdGhleSBhdm9p ZCB0aGlzIHByb2JsZW0uIFBlcmhhcHMgYXMgYQ0KPj4+Pj4+PiBjb21tZW50IGluDQo+Pj4+Pj4+ IHRoZSBjb2RlIG9yIG1heWJlIHNvbWUgb3RoZXIgd2F5Pw0KPj4+Pj4+PiANCj4+Pj4+Pj4gVGhh bmtzLg0KPj4+Pj4+PiANCj4+Pj4+PiANCj4+Pj4+PiBZZXMsIHRoZSB3YXkgdGhhdCBycGNfY2Fs bF9zeW5jKCkgaGFzIGJlZW4gd3JpdHRlbiwgaXQgYXNzdW1lcyB0aGF0DQo+Pj4+Pj4gdGhlDQo+ Pj4+Pj4gY2FsbCBpcyB1bmFmZmVjdGVkIGJ5IHJlYm9vdCByZWNvdmVyeSwgb3IgdGhhdCB0aGUg cmVzdWx0aW5nIHN0YXRlDQo+Pj4+Pj4gZXJyb3JzIHdpbGwgYmUgaGFuZGxlZCBieSB0aGUgY2Fs bGVyLiBUaGUgcGF0Y2ggeW91IHJlZmVyZW5jZSBkb2VzDQo+Pj4+Pj4gbm90DQo+Pj4+Pj4gY2hh bmdlIHRoYXQgZXhwZWN0YXRpb24uDQo+Pj4+PiANCj4+Pj4+IFRoZSAib3IiIGlzIGNvbmZ1c2lu ZyBtZS4gVGhlIGN1cnJlbnQgQ09QWSBjb2RlIGRvZXMgY2FsbCBpbnRvDQo+Pj4+PiBuZnM0X2hh bmRsZV9leGNlcHRpb24oKSBhbmQgInNob3VsZCBoYW5kbGUgZXJyb3JzIi4gWWV0IGFmdGVyIHRo aXMNCj4+Pj4+IHBhdGNoLCB0aGUgY29kZSBmYWlscyB0byBpbiBwcmVzZW5jZSBvZiBhIHJlYm9v dC4gSSBkb24ndCBzZWUgd2hhdA0KPj4+Pj4gZWxzZSBjYW4gdGhlIGN1cnJlbnQgY29kZSBzaG91 bGQgZG8gaW4gdGVybXMgb2YgaGFuZGxpbmcgZXJyb3JzIHRvDQo+Pj4+PiBmaXgNCj4+Pj4+IHRo YXQgcHJvYmxlbS4NCj4+Pj4+IA0KPj4+Pj4gSSdtIGFsbW9zdCByZWFkeSB0byBzdWJtaXQgdGhl IHBhdGNoIHRoYXQgdHVybnMgdGhlIGV4aXN0aW5nIGNvZGUNCj4+Pj4+IGludG8NCj4+Pj4+IGFz eW5jIHJwYyBidXQgaWYgdGhpcyBpcyBub3QgdGhlIHJpZ2h0IGFwcHJvYWNoIHRoZW4gSSdkIGxp a2UgdG8ga25vdw0KPj4+Pj4gd2hlcmUgSSBzaG91bGQgYmUgbG9va2luZyBpbnRvIGluc3RlYWQu DQo+Pj4+PiANCj4+Pj4+IFRoYW5rcy4NCj4+Pj4+IA0KPj4+Pj4+IA0KPj4+Pj4+IFRoZSBzYW1l IHJhY2UgY2FuIGhhcHBlbiwgZm9yIGluc3RhbmNlLCBpZiB5b3VyIGNhbGwgdG8NCj4+Pj4+PiBy cGNfY2FsbF9zeW5jKCkNCj4+Pj4+PiBjb2luY2lkZXMgd2l0aCBhIHJlYm9vdCByZWNvdmVyeSB0 aGF0IHdhcyBzdGFydGVkIGJ5IGFub3RoZXINCj4+Pj4+PiBwcm9jZXNzLg0KPj4+Pj4+IA0KPj4+ PiANCj4+Pj4gDQo+Pj4+IEkgZG9uJ3QgdW5kZXJzdGFuZC4gSWYgaXQgaXMgaGFuZGxpbmcgdGhl IHJlc3VsdGluZyBORlM0RVJSX0JBRF9TVEFURUlEDQo+Pj4+IGVycm9yIGNvcnJlY3RseSwgdGhl biB3aGF0IGlzIGZhaWxpbmc/IEFzIEkgc2FpZCBhYm92ZSwgeW91IGNhbiBnZXQgdGhlDQo+Pj4+ IE5GUzRFUlJfQkFEX1NUQVRFSUQgZnJvbSBvdGhlciBpbnN0YW5jZXMgb2YgdGhlIHNhbWUgcmFj ZS4NCj4+PiANCj4+PiBDT1BZIChqdXN0IGxpa2UgUkVBRCBvciBXUklURSkgY2FuIG5vdCBoYW5k bGUgQkFEX1NUQVRFSUQgZXJyb3INCj4+PiBiZWNhdXNlIGl0IGhvbGRzIGEgbG9jayBhbmQgdGhh dCBsb2NrIGlzIG1hcmtlZCBsb3N0LiBDT1BZIHNob3VsZCBoYXZlDQo+Pj4gbmV2ZXIgYmVlbiBz ZW50IHdpdGggdGhlIG9sZCBzdGF0ZWQgYWZ0ZXIgdGhlIG5ldyBvcGVuIHN0YXRlaWQgYW5kDQo+ Pj4gbG9jayBzdGF0ZWlkIHdhcyBnb3R0ZW4uIEJ1dCB3aXRoIHRoaXMgcGF0Y2ggdGhlIGJlaGF2 aW9yIGNoYW5nZWQgYW5kDQo+Pj4gdGh1cyBJIHdhcyB0cnlpbmcgdG8gdW5kZXJzdGFuZCB3aGF0 IGhhcyBjaGFuZ2VkLg0KPj4+IA0KPj4+IEkgdGhpbmsgdGhhdCBwcmV2aW91c2x5IEJBRF9TRVNT SU9OIGVycm9yIHdhcyBub3QgaGFuZGxlZCBieSB0aGUNCj4+PiBTRVFVRU5DRSBhbmQgd2FzIGhh bmRsZWQgYnkgZWFjaCBvZiB0aGUgb3BlcmF0aW9ucyBjYWxsaW5nIHRoZSBnZW5lcmFsDQo+Pj4g bmZzNF9oYW5kbGVfZXhjZXB0aW9uKCkgcm91dGluZSB0aGF0IGFjdHVhbGx5IHdhaXRzIG9uIHJl Y292ZXJ5IHRvDQo+Pj4gY29tcGxldGUgYmVmb3JlIHJldHJ5aW5nIHRoZSBvcGVyYXRpb24uIFRo ZW4gdGhlIHJldHJ5IGFjdHVhbGx5IHdvdWxkDQo+Pj4gYWxsb3cgQ09QWSB0byBnZXQgdGhlIG5l dyBzdGF0ZWlkLiBIb3dldmVyLCB3aXRoIHRoZSBuZXcgY29kZSBTRVFVRU5DRQ0KPj4+IGhhbmRs ZXMgdGhlIGVycm9yIGFuZCBjYWxscyB0byByZXRyeSB0aGUgb3BlcmF0aW9uIHdpdGhvdXQgdGhl IHdhaXQNCj4+PiB3aGljaCBhY3R1YWxseSBhZ2FpbiBnZXRzIHRoZSBCQURfU0VTU0lPTiB0aGVu IHJlY292ZXJ5IGhhcHBlbnMgYnV0DQo+Pj4gMm5kIFNFUVVFTkNFIGFnYWluIHJldHJpZXMgdGhl IG9wZXJhdGlvbiB3aXRob3V0IGdvaW5nIGFsbCB0aGUgb3V0IHRvDQo+Pj4gdGhlIG9wZXJhdGlv biBzbyBubyBuZXcgc3RhdGVpZCBpcyBnb3R0ZW4uDQo+Pj4gDQo+PiANCj4+IEkgYWxzbyB3b3Vs ZCBsaWtlIHRvIHBvaW50IG91dCB0aGF0IG5vdCBvbmx5IGRvZXMgdGhpcyBwYXRjaA0KPj4gaW50 cm9kdWNlcyBhIHJlZ3Jlc3Npb24gd2l0aCB0aGUgQ09QWS4gSXQgYWxzbyBpbnRyb2R1Y2VzIHRo ZQ0KPj4gQkFEX1NFU1NJT04gbG9vcCB3aGVyZSB0aGUgU0VRVUVOQ0UgdGhhdCBnZXRzIHRoaXMg ZXJyb3IganVzdCBrZWVwcw0KPj4gZ2V0dGluZyByZXNlbnQgb3ZlciBhbmQgb3ZlciBhZ2Fpbi4g VGhpcyBoYXBwZW5zIHRvIHRoZSBoZWFydGJlYXQNCj4+IFNFUVVFTkNFIHdoZW4gc2VydmVyIHJl Ym9vdHMuDQo+IA0KPiBJIGhhdmUgd3JpdHRlbiBhIHRlc3QgY2FzZSBmb3IgdGhlIFNFVEFUVFIg c2luY2UgaXQgYWxzbyB1c2VzIGENCj4gc3RhdGVpZCBhbmQgY2FsbHMgbmZzNF9zeW5jX2NhbGwo KSBhbmQgaXQgYWxzbyBhZnRlciB0aGUgcmVib290IHdpbGwNCj4gc2VuZCB0aGUgb3BlcmF0aW9u IHdpdGggdGhlIG9sZCBzdGF0ZWlkLiBUaGFua2Z1bGx5LCBTRVRBVFRSIGRvZXNuJ3QNCj4gcmVh bGx5IGNhcmUgYWJvdXQgdGhlICd2YWxpZGl0eScgb2YgdGhlIHN0YXRlaWQgYW5kIHdpbGwgcmV0 cnkgdGhlDQo+IG9wZXJhdGlvbi4gSXQgd2lsbCBwcm9kdWNlIG5vIGVycm9ycyBidXQgSSBjYW4n dCBzYXkgdGhlIGNsaWVudCBpcw0KPiB3b3JraW5nIHByb3Blcmx5Lg0KDQpUaGF04oCZcyBiZWNh dXNlIFNFVEFUVFIgd2FzIGRlc2lnbmVkIGNvcnJlY3RseS4gSnVzdCBsaWtlIGNvcHkgb2ZmbG9h ZCwgaXQgaGFzIHRvIGRlYWwgd2l0aCB0aGUgZmFjdCB0aGF0IGl0IGNhbiByYWNlIHdpdGggYSBz ZXJ2ZXIgcmVib290Lg0KDQpPbGdhLCBhbGwgeW91ciB0ZXN0IGlzIGRvaW5nIGlzIHNob3dpbmcg dGhhdCB3ZSBoaXQgdGhlIHJhY2UgbW9yZSBvZnRlbi4gRmFpciBlbm91Z2gsIEnigJlsbCBhc2sg QW5uYSB0byByZXZlcnQgdGhlIHBhdGNoLiBIb3dldmVyIHJldmVydGluZyB0aGUgcGF0Y2ggd29u 4oCZdCBwcmV2ZW50IHRoZSBzZXJ2ZXIgZnJvbSByZXR1cm5pbmcgTkZTNEVSUl9CQURfU1RBVEVJ RCBpbiBhbnkgY2FzZXMgd2hlcmUgdGhlIGNhbGxzIHRvIG5mczRfc2V0X3J3X3N0YXRlaWQoKSBo YXBwZW4gYmVmb3JlIHN0YXRlIHJlY292ZXJ5LiBUaGlzIGlzIHdoeSB3ZSBoYXZlIHRoZSBsb29w IGluIG5mczQyX3Byb2NfY29weSgpLg0KDQo+IA0KPiBTbyBJJ2Qgc2F2ZSB0aGF0IHRoZSBwYXRj aCBoYXMgaW50cm9kdWNlZCBzb21lIHNpZ25pZmljYW50IGNoYW5nZSB0bw0KPiB0aGUgcGhpbG9z b3BoeSB0aGF0IHdhc24ndCBleHBlY3RlZC4NCg0KTm8uIFRoZSBwaGlsb3NvcGh5IGlzIHRoZSBz YW1lLiBCb3RoIHdpdGggb3Igd2l0aG91dCB0aGlzIHBhdGNoLCBhbiBSUEMgY2FsbCB0aGF0IGNh cnJpZXMgYSBzdGF0ZWQgTVVTVCBiZSBwcmVwYXJlZCB0byBoYW5kbGUgTkZTNEVSUl9CQURfU1RB VEVJRC4NCg0KX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fDQpUcm9uZCBNeWtsZWJ1 c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lciwgUHJpbWFyeURhdGENCnRyb25kLm15a2xl YnVzdEBwcmltYXJ5ZGF0YS5jb20NCg0K ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE 2017-02-16 21:45 ` Trond Myklebust @ 2017-02-16 22:14 ` Olga Kornievskaia 2017-02-16 23:28 ` Trond Myklebust 0 siblings, 1 reply; 16+ messages in thread From: Olga Kornievskaia @ 2017-02-16 22:14 UTC (permalink / raw) To: Trond Myklebust; +Cc: Linux NFS Mailing List On Thu, Feb 16, 2017 at 4:45 PM, Trond Myklebust <trondmy@primarydata.com> wrote: > >> On Feb 16, 2017, at 11:12, Olga Kornievskaia <aglo@umich.edu> wrote: >> >> On Thu, Feb 16, 2017 at 11:04 AM, Olga Kornievskaia <aglo@umich.edu> wro= te: >>> On Thu, Feb 16, 2017 at 9:16 AM, Olga Kornievskaia <aglo@umich.edu> wro= te: >>>> On Wed, Feb 15, 2017 at 5:23 PM, Trond Myklebust >>>> <trondmy@primarydata.com> wrote: >>>>> On Wed, 2017-02-15 at 16:56 -0500, Olga Kornievskaia wrote: >>>>>> On Wed, Feb 15, 2017 at 3:48 PM, Trond Myklebust >>>>>> <trondmy@primarydata.com> wrote: >>>>>>> On Wed, 2017-02-15 at 15:16 -0500, Olga Kornievskaia wrote: >>>>>>>> On Sun, Dec 4, 2016 at 7:40 PM, Trond Myklebust >>>>>>>> <trond.myklebust@primarydata.com> wrote: >>>>>>>>> In the case where SEQUENCE receives a NFS4ERR_BADSESSION or >>>>>>>>> NFS4ERR_DEADSESSION error, we just want to report the session >>>>>>>>> as >>>>>>>>> needing >>>>>>>>> recovery, and then we want to retry the operation. >>>>>>>>> >>>>>>>>> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com >>>>>>>>>> >>>>>>>>> --- >>>>>>>>> fs/nfs/nfs4proc.c | 4 ++++ >>>>>>>>> 1 file changed, 4 insertions(+) >>>>>>>>> >>>>>>>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c >>>>>>>>> index f992281c9b29..4fd0e2b7b08e 100644 >>>>>>>>> --- a/fs/nfs/nfs4proc.c >>>>>>>>> +++ b/fs/nfs/nfs4proc.c >>>>>>>>> @@ -816,6 +816,10 @@ static int nfs41_sequence_process(struct >>>>>>>>> rpc_task *task, >>>>>>>>> case -NFS4ERR_SEQ_FALSE_RETRY: >>>>>>>>> ++slot->seq_nr; >>>>>>>>> goto retry_nowait; >>>>>>>>> + case -NFS4ERR_DEADSESSION: >>>>>>>>> + case -NFS4ERR_BADSESSION: >>>>>>>>> + nfs4_schedule_session_recovery(session, res- >>>>>>>>>> sr_status); >>>>>>>>> >>>>>>>>> + goto retry_nowait; >>>>>>>>> default: >>>>>>>>> /* Just update the slot sequence no. */ >>>>>>>>> slot->seq_done =3D 1; >>>>>>>>> -- >>>>>>>>> 2.9.3 >>>>>>>> >>>>>>>> Hi Trond, >>>>>>>> >>>>>>>> Can you explain the implications of retrying the operation >>>>>>>> without >>>>>>>> waiting for recovery to complete? >>>>>>>> >>>>>>>> This patch introduces regression in intra COPY failing if the >>>>>>>> server >>>>>>>> rebooted during that operation. What happens is that COPY is re- >>>>>>>> sent >>>>>>>> with the same stateid from the old open instead of the open that >>>>>>>> was >>>>>>>> done from the recovery (leading to BAD_STATEID error and >>>>>>>> failure). >>>>>>>> >>>>>>>> I wonder if it's because COPY is written to just use >>>>>>>> nfs4_call_sync() >>>>>>>> instead of defining rpc_callback_ops to handle rpc_prepare() >>>>>>>> where a >>>>>>>> new stateid could be gotten? I have re-written the COPY to do >>>>>>>> that >>>>>>>> and >>>>>>>> I no longer see that problem. >>>>>>>> >>>>>>>> If my suspicion is correct, it would help for the future to know >>>>>>>> that >>>>>>>> any operations that use stateid must have rpc callback ops >>>>>>>> defined/used so that they avoid this problem. Perhaps as a >>>>>>>> comment in >>>>>>>> the code or maybe some other way? >>>>>>>> >>>>>>>> Thanks. >>>>>>>> >>>>>>> >>>>>>> Yes, the way that rpc_call_sync() has been written, it assumes that >>>>>>> the >>>>>>> call is unaffected by reboot recovery, or that the resulting state >>>>>>> errors will be handled by the caller. The patch you reference does >>>>>>> not >>>>>>> change that expectation. >>>>>> >>>>>> The "or" is confusing me. The current COPY code does call into >>>>>> nfs4_handle_exception() and "should handle errors". Yet after this >>>>>> patch, the code fails to in presence of a reboot. I don't see what >>>>>> else can the current code should do in terms of handling errors to >>>>>> fix >>>>>> that problem. >>>>>> >>>>>> I'm almost ready to submit the patch that turns the existing code >>>>>> into >>>>>> async rpc but if this is not the right approach then I'd like to kno= w >>>>>> where I should be looking into instead. >>>>>> >>>>>> Thanks. >>>>>> >>>>>>> >>>>>>> The same race can happen, for instance, if your call to >>>>>>> rpc_call_sync() >>>>>>> coincides with a reboot recovery that was started by another >>>>>>> process. >>>>>>> >>>>> >>>>> >>>>> I don't understand. If it is handling the resulting NFS4ERR_BAD_STATE= ID >>>>> error correctly, then what is failing? As I said above, you can get t= he >>>>> NFS4ERR_BAD_STATEID from other instances of the same race. >>>> >>>> COPY (just like READ or WRITE) can not handle BAD_STATEID error >>>> because it holds a lock and that lock is marked lost. COPY should have >>>> never been sent with the old stated after the new open stateid and >>>> lock stateid was gotten. But with this patch the behavior changed and >>>> thus I was trying to understand what has changed. >>>> >>>> I think that previously BAD_SESSION error was not handled by the >>>> SEQUENCE and was handled by each of the operations calling the general >>>> nfs4_handle_exception() routine that actually waits on recovery to >>>> complete before retrying the operation. Then the retry actually would >>>> allow COPY to get the new stateid. However, with the new code SEQUENCE >>>> handles the error and calls to retry the operation without the wait >>>> which actually again gets the BAD_SESSION then recovery happens but >>>> 2nd SEQUENCE again retries the operation without going all the out to >>>> the operation so no new stateid is gotten. >>>> >>> >>> I also would like to point out that not only does this patch >>> introduces a regression with the COPY. It also introduces the >>> BAD_SESSION loop where the SEQUENCE that gets this error just keeps >>> getting resent over and over again. This happens to the heartbeat >>> SEQUENCE when server reboots. >> >> I have written a test case for the SETATTR since it also uses a >> stateid and calls nfs4_sync_call() and it also after the reboot will >> send the operation with the old stateid. Thankfully, SETATTR doesn't >> really care about the 'validity' of the stateid and will retry the >> operation. It will produce no errors but I can't say the client is >> working properly. > > That=E2=80=99s because SETATTR was designed correctly. Just like copy off= load, it has to deal with the fact that it can race with a server reboot. Let's agree to disagree that design is correct. Copy offload does everything that current design asks it to do but there is still a failure so let's figure out how to fix it. > Olga, all your test is doing is showing that we hit the race more often. = Fair enough, I=E2=80=99ll ask Anna to revert the patch. However reverting t= he patch won=E2=80=99t prevent the server from returning NFS4ERR_BAD_STATEI= D in any cases where the calls to nfs4_set_rw_stateid() happen before state= recovery. This is why we have the loop in nfs42_proc_copy(). I thought that if retry of the operation waits for the recovery to complete then nfs4_set_rw_stateid() will choose the correct stateid, is that not correct? If that's not correct, then we somehow need to separate the case when BAD_STATEID should indeed mark the locks lost vs this case where the code erroneously used the bad stateid (oops) and it should really ignore this error. This really doesn't sound very plausible to accomplish. >> So I'd save that the patch has introduced some significant change to >> the philosophy that wasn't expected. > > No. The philosophy is the same. Both with or without this patch, an RPC c= all that carries a stated MUST be prepared to handle NFS4ERR_BAD_STATEID. > > _________________________________ > Trond Myklebust > Linux NFS client maintainer, PrimaryData > trond.myklebust@primarydata.com > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE 2017-02-16 22:14 ` Olga Kornievskaia @ 2017-02-16 23:28 ` Trond Myklebust 2017-02-17 14:46 ` Olga Kornievskaia 0 siblings, 1 reply; 16+ messages in thread From: Trond Myklebust @ 2017-02-16 23:28 UTC (permalink / raw) To: aglo@umich.edu; +Cc: linux-nfs@vger.kernel.org T24gVGh1LCAyMDE3LTAyLTE2IGF0IDE3OjE0IC0wNTAwLCBPbGdhIEtvcm5pZXZza2FpYSB3cm90 ZToNCj4gT24gVGh1LCBGZWIgMTYsIDIwMTcgYXQgNDo0NSBQTSwgVHJvbmQgTXlrbGVidXN0DQo+ IDx0cm9uZG15QHByaW1hcnlkYXRhLmNvbT4gd3JvdGU6DQo+ID4gDQo+ID4gT2xnYSwgYWxsIHlv dXIgdGVzdCBpcyBkb2luZyBpcyBzaG93aW5nIHRoYXQgd2UgaGl0IHRoZSByYWNlIG1vcmUNCj4g PiBvZnRlbi4gRmFpciBlbm91Z2gsIEnigJlsbCBhc2sgQW5uYSB0byByZXZlcnQgdGhlIHBhdGNo LiBIb3dldmVyDQo+ID4gcmV2ZXJ0aW5nIHRoZSBwYXRjaCB3b27igJl0IHByZXZlbnQgdGhlIHNl cnZlciBmcm9tIHJldHVybmluZw0KPiA+IE5GUzRFUlJfQkFEX1NUQVRFSUQgaW4gYW55IGNhc2Vz IHdoZXJlIHRoZSBjYWxscyB0bw0KPiA+IG5mczRfc2V0X3J3X3N0YXRlaWQoKSBoYXBwZW4gYmVm b3JlIHN0YXRlIHJlY292ZXJ5LiBUaGlzIGlzIHdoeSB3ZQ0KPiA+IGhhdmUgdGhlIGxvb3AgaW4g bmZzNDJfcHJvY19jb3B5KCkuDQo+IA0KPiBJIHRob3VnaHQgdGhhdCBpZiByZXRyeSBvZiB0aGUg b3BlcmF0aW9uIHdhaXRzIGZvciB0aGUgcmVjb3ZlcnkgdG8NCj4gY29tcGxldGUgdGhlbiBuZnM0 X3NldF9yd19zdGF0ZWlkKCkgd2lsbCBjaG9vc2UgdGhlIGNvcnJlY3Qgc3RhdGVpZCwNCj4gaXMg dGhhdCBub3QgY29ycmVjdD8NCj4gDQo+IElmIHRoYXQncyBub3QgY29ycmVjdCwgdGhlbiB3ZSBz b21laG93IG5lZWQgdG8gc2VwYXJhdGUgdGhlIGNhc2Ugd2hlbg0KPiBCQURfU1RBVEVJRCBzaG91 bGQgaW5kZWVkIG1hcmsgdGhlIGxvY2tzIGxvc3QgdnMgdGhpcyBjYXNlIHdoZXJlIHRoZQ0KPiBj b2RlIGVycm9uZW91c2x5IHVzZWQgdGhlIGJhZCBzdGF0ZWlkIChvb3BzKSBhbmQgaXQgc2hvdWxk IHJlYWxseQ0KPiBpZ25vcmUgdGhpcyBlcnJvci4gVGhpcyByZWFsbHkgZG9lc24ndCBzb3VuZCB2 ZXJ5IHBsYXVzaWJsZSB0bw0KPiBhY2NvbXBsaXNoLg0KDQpEb2VzIHRoaXMgcGF0Y2ggZml4IHRo ZSBwcm9ibGVtPw0KDQo4PC0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0NCkZyb20gYmJhZTk1ZThmOTdjYWQ2YTkxY2E4Y2FhNTBiNjFj YWU3ODk2MzJmOSBNb24gU2VwIDE3IDAwOjAwOjAwIDIwMDENCkZyb206IFRyb25kIE15a2xlYnVz dCA8dHJvbmQubXlrbGVidXN0QHByaW1hcnlkYXRhLmNvbT4NCkRhdGU6IFRodSwgMTYgRmViIDIw MTcgMTg6MTQ6MzggLTA1MDANClN1YmplY3Q6IFtQQVRDSF0gTkZTdjQ6IEZpeCByZWJvb3QgcmVj b3ZlcnkgaW4gY29weSBvZmZsb2FkDQoNCkNvcHkgb2ZmbG9hZCBjb2RlIG5lZWRzIHRvIGJlIGhv b2tlZCBpbnRvIHRoZSBjb2RlIGZvciBoYW5kbGluZw0KTkZTNEVSUl9CQURfU1RBVEVJRCBieSBl bnN1cmluZyB0aGF0IHdlIHNldCB0aGUgInN0YXRlaWQiIGZpZWxkDQppbiBzdHJ1Y3QgbmZzNF9l eGNlcHRpb24uDQoNClJlcG9ydGVkLWJ5OiBPbGdhIEtvcm5pZXZza2FpYSA8YWdsb0B1bWljaC5l ZHU+DQpGaXhlczogMmU3MjQ0OGIwN2RjMyAoIk5GUzogQWRkIENPUFkgbmZzIG9wZXJhdGlvbiIp DQpTaWduZWQtb2ZmLWJ5OiBUcm9uZCBNeWtsZWJ1c3QgPHRyb25kLm15a2xlYnVzdEBwcmltYXJ5 ZGF0YS5jb20+DQotLS0NCiBmcy9uZnMvbmZzNDJwcm9jLmMgfCA1NyArKysrKysrKysrKysrKysr KysrKysrKysrKysrKysrLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0NCiAxIGZpbGUgY2hhbmdlZCwg MzMgaW5zZXJ0aW9ucygrKSwgMjQgZGVsZXRpb25zKC0pDQoNCmRpZmYgLS1naXQgYS9mcy9uZnMv bmZzNDJwcm9jLmMgYi9mcy9uZnMvbmZzNDJwcm9jLmMNCmluZGV4IGQxMmZmOTM4NWY0OS4uYmFm MWZlMmRjMjk2IDEwMDY0NA0KLS0tIGEvZnMvbmZzL25mczQycHJvYy5jDQorKysgYi9mcy9uZnMv bmZzNDJwcm9jLmMNCkBAIC0xMjgsMjAgKzEyOCwxMyBAQCBpbnQgbmZzNDJfcHJvY19kZWFsbG9j YXRlKHN0cnVjdCBmaWxlICpmaWxlcCwgbG9mZl90IG9mZnNldCwgbG9mZl90IGxlbikNCiAJcmV0 dXJuIGVycjsNCiB9DQogDQotc3RhdGljIHNzaXplX3QgX25mczQyX3Byb2NfY29weShzdHJ1Y3Qg ZmlsZSAqc3JjLCBsb2ZmX3QgcG9zX3NyYywNCitzdGF0aWMgc3NpemVfdCBfbmZzNDJfcHJvY19j b3B5KHN0cnVjdCBmaWxlICpzcmMsDQogCQkJCXN0cnVjdCBuZnNfbG9ja19jb250ZXh0ICpzcmNf bG9jaywNCi0JCQkJc3RydWN0IGZpbGUgKmRzdCwgbG9mZl90IHBvc19kc3QsDQorCQkJCXN0cnVj dCBmaWxlICpkc3QsDQogCQkJCXN0cnVjdCBuZnNfbG9ja19jb250ZXh0ICpkc3RfbG9jaywNCi0J CQkJc2l6ZV90IGNvdW50KQ0KKwkJCQlzdHJ1Y3QgbmZzNDJfY29weV9hcmdzICphcmdzLA0KKwkJ CQlzdHJ1Y3QgbmZzNDJfY29weV9yZXMgKnJlcykNCiB7DQotCXN0cnVjdCBuZnM0Ml9jb3B5X2Fy Z3MgYXJncyA9IHsNCi0JCS5zcmNfZmgJCT0gTkZTX0ZIKGZpbGVfaW5vZGUoc3JjKSksDQotCQku c3JjX3Bvcwk9IHBvc19zcmMsDQotCQkuZHN0X2ZoCQk9IE5GU19GSChmaWxlX2lub2RlKGRzdCkp LA0KLQkJLmRzdF9wb3MJPSBwb3NfZHN0LA0KLQkJLmNvdW50CQk9IGNvdW50LA0KLQl9Ow0KLQlz dHJ1Y3QgbmZzNDJfY29weV9yZXMgcmVzOw0KIAlzdHJ1Y3QgcnBjX21lc3NhZ2UgbXNnID0gew0K IAkJLnJwY19wcm9jID0gJm5mczRfcHJvY2VkdXJlc1tORlNQUk9DNF9DTE5UX0NPUFldLA0KIAkJ LnJwY19hcmdwID0gJmFyZ3MsDQpAQCAtMTQ5LDkgKzE0MiwxMiBAQCBzdGF0aWMgc3NpemVfdCBf bmZzNDJfcHJvY19jb3B5KHN0cnVjdCBmaWxlICpzcmMsIGxvZmZfdCBwb3Nfc3JjLA0KIAl9Ow0K IAlzdHJ1Y3QgaW5vZGUgKmRzdF9pbm9kZSA9IGZpbGVfaW5vZGUoZHN0KTsNCiAJc3RydWN0IG5m c19zZXJ2ZXIgKnNlcnZlciA9IE5GU19TRVJWRVIoZHN0X2lub2RlKTsNCisJbG9mZl90IHBvc19z cmMgPSBhcmdzLT5zcmNfcG9zOw0KKwlsb2ZmX3QgcG9zX2RzdCA9IGFyZ3MtPmRzdF9wb3M7DQor CXNpemVfdCBjb3VudCA9IGFyZ3MtPmNvdW50Ow0KIAlpbnQgc3RhdHVzOw0KIA0KLQlzdGF0dXMg PSBuZnM0X3NldF9yd19zdGF0ZWlkKCZhcmdzLnNyY19zdGF0ZWlkLCBzcmNfbG9jay0+b3Blbl9j b250ZXh0LA0KKwlzdGF0dXMgPSBuZnM0X3NldF9yd19zdGF0ZWlkKCZhcmdzLT5zcmNfc3RhdGVp ZCwgc3JjX2xvY2stPm9wZW5fY29udGV4dCwNCiAJCQkJICAgICBzcmNfbG9jaywgRk1PREVfUkVB RCk7DQogCWlmIChzdGF0dXMpDQogCQlyZXR1cm4gc3RhdHVzOw0KQEAgLTE2MSw3ICsxNTcsNyBA QCBzdGF0aWMgc3NpemVfdCBfbmZzNDJfcHJvY19jb3B5KHN0cnVjdCBmaWxlICpzcmMsIGxvZmZf dCBwb3Nfc3JjLA0KIAlpZiAoc3RhdHVzKQ0KIAkJcmV0dXJuIHN0YXR1czsNCiANCi0Jc3RhdHVz ID0gbmZzNF9zZXRfcndfc3RhdGVpZCgmYXJncy5kc3Rfc3RhdGVpZCwgZHN0X2xvY2stPm9wZW5f Y29udGV4dCwNCisJc3RhdHVzID0gbmZzNF9zZXRfcndfc3RhdGVpZCgmYXJncy0+ZHN0X3N0YXRl aWQsIGRzdF9sb2NrLT5vcGVuX2NvbnRleHQsDQogCQkJCSAgICAgZHN0X2xvY2ssIEZNT0RFX1dS SVRFKTsNCiAJaWYgKHN0YXR1cykNCiAJCXJldHVybiBzdGF0dXM7DQpAQCAtMTcxLDIyICsxNjcs MjIgQEAgc3RhdGljIHNzaXplX3QgX25mczQyX3Byb2NfY29weShzdHJ1Y3QgZmlsZSAqc3JjLCBs b2ZmX3QgcG9zX3NyYywNCiAJCXJldHVybiBzdGF0dXM7DQogDQogCXN0YXR1cyA9IG5mczRfY2Fs bF9zeW5jKHNlcnZlci0+Y2xpZW50LCBzZXJ2ZXIsICZtc2csDQotCQkJCSZhcmdzLnNlcV9hcmdz LCAmcmVzLnNlcV9yZXMsIDApOw0KKwkJCQkmYXJncy0+c2VxX2FyZ3MsICZyZXMtPnNlcV9yZXMs IDApOw0KIAlpZiAoc3RhdHVzID09IC1FTk9UU1VQUCkNCiAJCXNlcnZlci0+Y2FwcyAmPSB+TkZT X0NBUF9DT1BZOw0KIAlpZiAoc3RhdHVzKQ0KIAkJcmV0dXJuIHN0YXR1czsNCiANCi0JaWYgKHJl cy53cml0ZV9yZXMudmVyaWZpZXIuY29tbWl0dGVkICE9IE5GU19GSUxFX1NZTkMpIHsNCi0JCXN0 YXR1cyA9IG5mc19jb21taXRfZmlsZShkc3QsICZyZXMud3JpdGVfcmVzLnZlcmlmaWVyLnZlcmlm aWVyKTsNCisJaWYgKHJlcy0+d3JpdGVfcmVzLnZlcmlmaWVyLmNvbW1pdHRlZCAhPSBORlNfRklM RV9TWU5DKSB7DQorCQlzdGF0dXMgPSBuZnNfY29tbWl0X2ZpbGUoZHN0LCAmcmVzLT53cml0ZV9y ZXMudmVyaWZpZXIudmVyaWZpZXIpOw0KIAkJaWYgKHN0YXR1cykNCiAJCQlyZXR1cm4gc3RhdHVz Ow0KIAl9DQogDQogCXRydW5jYXRlX3BhZ2VjYWNoZV9yYW5nZShkc3RfaW5vZGUsIHBvc19kc3Qs DQotCQkJCSBwb3NfZHN0ICsgcmVzLndyaXRlX3Jlcy5jb3VudCk7DQorCQkJCSBwb3NfZHN0ICsg cmVzLT53cml0ZV9yZXMuY291bnQpOw0KIA0KLQlyZXR1cm4gcmVzLndyaXRlX3Jlcy5jb3VudDsN CisJcmV0dXJuIHJlcy0+d3JpdGVfcmVzLmNvdW50Ow0KIH0NCiANCiBzc2l6ZV90IG5mczQyX3By b2NfY29weShzdHJ1Y3QgZmlsZSAqc3JjLCBsb2ZmX3QgcG9zX3NyYywNCkBAIC0xOTYsOCArMTky LDIyIEBAIHNzaXplX3QgbmZzNDJfcHJvY19jb3B5KHN0cnVjdCBmaWxlICpzcmMsIGxvZmZfdCBw b3Nfc3JjLA0KIAlzdHJ1Y3QgbmZzX3NlcnZlciAqc2VydmVyID0gTkZTX1NFUlZFUihmaWxlX2lu b2RlKGRzdCkpOw0KIAlzdHJ1Y3QgbmZzX2xvY2tfY29udGV4dCAqc3JjX2xvY2s7DQogCXN0cnVj dCBuZnNfbG9ja19jb250ZXh0ICpkc3RfbG9jazsNCi0Jc3RydWN0IG5mczRfZXhjZXB0aW9uIHNy Y19leGNlcHRpb24gPSB7IH07DQotCXN0cnVjdCBuZnM0X2V4Y2VwdGlvbiBkc3RfZXhjZXB0aW9u ID0geyB9Ow0KKwlzdHJ1Y3QgbmZzNDJfY29weV9hcmdzIGFyZ3MgPSB7DQorCQkuc3JjX2ZoCQk9 IE5GU19GSChmaWxlX2lub2RlKHNyYykpLA0KKwkJLnNyY19wb3MJPSBwb3Nfc3JjLA0KKwkJLmRz dF9maAkJPSBORlNfRkgoZmlsZV9pbm9kZShkc3QpKSwNCisJCS5kc3RfcG9zCT0gcG9zX2RzdCwN CisJCS5jb3VudAkJPSBjb3VudCwNCisJfTsNCisJc3RydWN0IG5mczQyX2NvcHlfcmVzIHJlczsN CisJc3RydWN0IG5mczRfZXhjZXB0aW9uIHNyY19leGNlcHRpb24gPSB7DQorCQkuaW5vZGUJCT0g ZmlsZV9pbm9kZShzcmMpLA0KKwkJLnN0YXRlaWQJPSAmYXJncy5zcmNfc3RhdGVpZCwNCisJfTsN CisJc3RydWN0IG5mczRfZXhjZXB0aW9uIGRzdF9leGNlcHRpb24gPSB7DQorCQkuaW5vZGUJCT0g ZmlsZV9pbm9kZShkc3QpLA0KKwkJLnN0YXRlaWQJPSAmYXJncy5kc3Rfc3RhdGVpZCwNCisJfTsN CiAJc3NpemVfdCBlcnIsIGVycjI7DQogDQogCWlmICghbmZzX3NlcnZlcl9jYXBhYmxlKGZpbGVf aW5vZGUoZHN0KSwgTkZTX0NBUF9DT1BZKSkNCkBAIC0yMDcsNyArMjE3LDYgQEAgc3NpemVfdCBu ZnM0Ml9wcm9jX2NvcHkoc3RydWN0IGZpbGUgKnNyYywgbG9mZl90IHBvc19zcmMsDQogCWlmIChJ U19FUlIoc3JjX2xvY2spKQ0KIAkJcmV0dXJuIFBUUl9FUlIoc3JjX2xvY2spOw0KIA0KLQlzcmNf ZXhjZXB0aW9uLmlub2RlID0gZmlsZV9pbm9kZShzcmMpOw0KIAlzcmNfZXhjZXB0aW9uLnN0YXRl ID0gc3JjX2xvY2stPm9wZW5fY29udGV4dC0+c3RhdGU7DQogDQogCWRzdF9sb2NrID0gbmZzX2dl dF9sb2NrX2NvbnRleHQobmZzX2ZpbGVfb3Blbl9jb250ZXh0KGRzdCkpOw0KQEAgLTIxNiwxMyAr MjI1LDEzIEBAIHNzaXplX3QgbmZzNDJfcHJvY19jb3B5KHN0cnVjdCBmaWxlICpzcmMsIGxvZmZf dCBwb3Nfc3JjLA0KIAkJZ290byBvdXRfcHV0X3NyY19sb2NrOw0KIAl9DQogDQotCWRzdF9leGNl cHRpb24uaW5vZGUgPSBmaWxlX2lub2RlKGRzdCk7DQogCWRzdF9leGNlcHRpb24uc3RhdGUgPSBk c3RfbG9jay0+b3Blbl9jb250ZXh0LT5zdGF0ZTsNCiANCiAJZG8gew0KIAkJaW5vZGVfbG9jayhm aWxlX2lub2RlKGRzdCkpOw0KLQkJZXJyID0gX25mczQyX3Byb2NfY29weShzcmMsIHBvc19zcmMs IHNyY19sb2NrLA0KLQkJCQkgICAgICAgZHN0LCBwb3NfZHN0LCBkc3RfbG9jaywgY291bnQpOw0K KwkJZXJyID0gX25mczQyX3Byb2NfY29weShzcmMsIHNyY19sb2NrLA0KKwkJCQlkc3QsIGRzdF9s b2NrLA0KKwkJCQkmYXJncywgJnJlcyk7DQogCQlpbm9kZV91bmxvY2soZmlsZV9pbm9kZShkc3Qp KTsNCiANCiAJCWlmIChlcnIgPT0gLUVOT1RTVVBQKSB7DQotLSANCjIuOS4zDQoNCi0tIA0KVHJv bmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50IG1haW50YWluZXIsIFByaW1hcnlEYXRhDQp0 cm9uZC5teWtsZWJ1c3RAcHJpbWFyeWRhdGEuY29tDQo= ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE 2017-02-16 23:28 ` Trond Myklebust @ 2017-02-17 14:46 ` Olga Kornievskaia 2017-02-17 14:58 ` Trond Myklebust 0 siblings, 1 reply; 16+ messages in thread From: Olga Kornievskaia @ 2017-02-17 14:46 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs@vger.kernel.org On Thu, Feb 16, 2017 at 6:28 PM, Trond Myklebust <trondmy@primarydata.com> wrote: > On Thu, 2017-02-16 at 17:14 -0500, Olga Kornievskaia wrote: >> On Thu, Feb 16, 2017 at 4:45 PM, Trond Myklebust >> <trondmy@primarydata.com> wrote: >> > >> > Olga, all your test is doing is showing that we hit the race more >> > often. Fair enough, I=E2=80=99ll ask Anna to revert the patch. However >> > reverting the patch won=E2=80=99t prevent the server from returning >> > NFS4ERR_BAD_STATEID in any cases where the calls to >> > nfs4_set_rw_stateid() happen before state recovery. This is why we >> > have the loop in nfs42_proc_copy(). >> >> I thought that if retry of the operation waits for the recovery to >> complete then nfs4_set_rw_stateid() will choose the correct stateid, >> is that not correct? >> >> If that's not correct, then we somehow need to separate the case when >> BAD_STATEID should indeed mark the locks lost vs this case where the >> code erroneously used the bad stateid (oops) and it should really >> ignore this error. This really doesn't sound very plausible to >> accomplish. > > Does this patch fix the problem? > > 8<------------------------------------------------------------- > From bbae95e8f97cad6a91ca8caa50b61cae789632f9 Mon Sep 17 00:00:00 2001 > From: Trond Myklebust <trond.myklebust@primarydata.com> > Date: Thu, 16 Feb 2017 18:14:38 -0500 > Subject: [PATCH] NFSv4: Fix reboot recovery in copy offload > > Copy offload code needs to be hooked into the code for handling > NFS4ERR_BAD_STATEID by ensuring that we set the "stateid" field > in struct nfs4_exception. > > Reported-by: Olga Kornievskaia <aglo@umich.edu> > Fixes: 2e72448b07dc3 ("NFS: Add COPY nfs operation") > Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> > --- > fs/nfs/nfs42proc.c | 57 +++++++++++++++++++++++++++++++-----------------= ------ > 1 file changed, 33 insertions(+), 24 deletions(-) > > diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c > index d12ff9385f49..baf1fe2dc296 100644 > --- a/fs/nfs/nfs42proc.c > +++ b/fs/nfs/nfs42proc.c > @@ -128,20 +128,13 @@ int nfs42_proc_deallocate(struct file *filep, loff_= t offset, loff_t len) > return err; > } > > -static ssize_t _nfs42_proc_copy(struct file *src, loff_t pos_src, > +static ssize_t _nfs42_proc_copy(struct file *src, > struct nfs_lock_context *src_lock, > - struct file *dst, loff_t pos_dst, > + struct file *dst, > struct nfs_lock_context *dst_lock, > - size_t count) > + struct nfs42_copy_args *args, > + struct nfs42_copy_res *res) > { > - struct nfs42_copy_args args =3D { > - .src_fh =3D NFS_FH(file_inode(src)), > - .src_pos =3D pos_src, > - .dst_fh =3D NFS_FH(file_inode(dst)), > - .dst_pos =3D pos_dst, > - .count =3D count, > - }; > - struct nfs42_copy_res res; > struct rpc_message msg =3D { > .rpc_proc =3D &nfs4_procedures[NFSPROC4_CLNT_COPY], > .rpc_argp =3D &args, > @@ -149,9 +142,12 @@ static ssize_t _nfs42_proc_copy(struct file *src, lo= ff_t pos_src, > }; > struct inode *dst_inode =3D file_inode(dst); > struct nfs_server *server =3D NFS_SERVER(dst_inode); > + loff_t pos_src =3D args->src_pos; > + loff_t pos_dst =3D args->dst_pos; > + size_t count =3D args->count; > int status; > > - status =3D nfs4_set_rw_stateid(&args.src_stateid, src_lock->open_= context, > + status =3D nfs4_set_rw_stateid(&args->src_stateid, src_lock->open= _context, > src_lock, FMODE_READ); > if (status) > return status; > @@ -161,7 +157,7 @@ static ssize_t _nfs42_proc_copy(struct file *src, lof= f_t pos_src, > if (status) > return status; > > - status =3D nfs4_set_rw_stateid(&args.dst_stateid, dst_lock->open_= context, > + status =3D nfs4_set_rw_stateid(&args->dst_stateid, dst_lock->open= _context, > dst_lock, FMODE_WRITE); > if (status) > return status; > @@ -171,22 +167,22 @@ static ssize_t _nfs42_proc_copy(struct file *src, l= off_t pos_src, > return status; > > status =3D nfs4_call_sync(server->client, server, &msg, > - &args.seq_args, &res.seq_res, 0); > + &args->seq_args, &res->seq_res, 0); > if (status =3D=3D -ENOTSUPP) > server->caps &=3D ~NFS_CAP_COPY; > if (status) > return status; > > - if (res.write_res.verifier.committed !=3D NFS_FILE_SYNC) { > - status =3D nfs_commit_file(dst, &res.write_res.verifier.v= erifier); > + if (res->write_res.verifier.committed !=3D NFS_FILE_SYNC) { > + status =3D nfs_commit_file(dst, &res->write_res.verifier.= verifier); > if (status) > return status; > } > > truncate_pagecache_range(dst_inode, pos_dst, > - pos_dst + res.write_res.count); > + pos_dst + res->write_res.count); > > - return res.write_res.count; > + return res->write_res.count; > } > > ssize_t nfs42_proc_copy(struct file *src, loff_t pos_src, > @@ -196,8 +192,22 @@ ssize_t nfs42_proc_copy(struct file *src, loff_t pos= _src, > struct nfs_server *server =3D NFS_SERVER(file_inode(dst)); > struct nfs_lock_context *src_lock; > struct nfs_lock_context *dst_lock; > - struct nfs4_exception src_exception =3D { }; > - struct nfs4_exception dst_exception =3D { }; > + struct nfs42_copy_args args =3D { > + .src_fh =3D NFS_FH(file_inode(src)), > + .src_pos =3D pos_src, > + .dst_fh =3D NFS_FH(file_inode(dst)), > + .dst_pos =3D pos_dst, > + .count =3D count, > + }; > + struct nfs42_copy_res res; > + struct nfs4_exception src_exception =3D { > + .inode =3D file_inode(src), > + .stateid =3D &args.src_stateid, > + }; > + struct nfs4_exception dst_exception =3D { > + .inode =3D file_inode(dst), > + .stateid =3D &args.dst_stateid, > + }; > ssize_t err, err2; > > if (!nfs_server_capable(file_inode(dst), NFS_CAP_COPY)) > @@ -207,7 +217,6 @@ ssize_t nfs42_proc_copy(struct file *src, loff_t pos_= src, > if (IS_ERR(src_lock)) > return PTR_ERR(src_lock); > > - src_exception.inode =3D file_inode(src); > src_exception.state =3D src_lock->open_context->state; > > dst_lock =3D nfs_get_lock_context(nfs_file_open_context(dst)); > @@ -216,13 +225,13 @@ ssize_t nfs42_proc_copy(struct file *src, loff_t po= s_src, > goto out_put_src_lock; > } > > - dst_exception.inode =3D file_inode(dst); > dst_exception.state =3D dst_lock->open_context->state; > > do { > inode_lock(file_inode(dst)); > - err =3D _nfs42_proc_copy(src, pos_src, src_lock, > - dst, pos_dst, dst_lock, count); > + err =3D _nfs42_proc_copy(src, src_lock, > + dst, dst_lock, > + &args, &res); > inode_unlock(file_inode(dst)); > > if (err =3D=3D -ENOTSUPP) { > -- > 2.9.3 I wish it did but no it does not. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE 2017-02-17 14:46 ` Olga Kornievskaia @ 2017-02-17 14:58 ` Trond Myklebust 2017-02-17 15:07 ` Olga Kornievskaia 0 siblings, 1 reply; 16+ messages in thread From: Trond Myklebust @ 2017-02-17 14:58 UTC (permalink / raw) To: aglo@umich.edu; +Cc: linux-nfs@vger.kernel.org T24gRnJpLCAyMDE3LTAyLTE3IGF0IDA5OjQ2IC0wNTAwLCBPbGdhIEtvcm5pZXZza2FpYSB3cm90 ZToNCj4gT24gVGh1LCBGZWIgMTYsIDIwMTcgYXQgNjoyOCBQTSwgVHJvbmQgTXlrbGVidXN0DQo+ IDx0cm9uZG15QHByaW1hcnlkYXRhLmNvbT4gd3JvdGU6DQo+ID4gT24gVGh1LCAyMDE3LTAyLTE2 IGF0IDE3OjE0IC0wNTAwLCBPbGdhIEtvcm5pZXZza2FpYSB3cm90ZToNCj4gPiA+IE9uIFRodSwg RmViIDE2LCAyMDE3IGF0IDQ6NDUgUE0sIFRyb25kIE15a2xlYnVzdA0KPiA+ID4gPHRyb25kbXlA cHJpbWFyeWRhdGEuY29tPiB3cm90ZToNCj4gPiA+ID4gDQo+ID4gPiA+IE9sZ2EsIGFsbCB5b3Vy IHRlc3QgaXMgZG9pbmcgaXMgc2hvd2luZyB0aGF0IHdlIGhpdCB0aGUgcmFjZQ0KPiA+ID4gPiBt b3JlDQo+ID4gPiA+IG9mdGVuLiBGYWlyIGVub3VnaCwgSeKAmWxsIGFzayBBbm5hIHRvIHJldmVy dCB0aGUgcGF0Y2guIEhvd2V2ZXINCj4gPiA+ID4gcmV2ZXJ0aW5nIHRoZSBwYXRjaCB3b27igJl0 IHByZXZlbnQgdGhlIHNlcnZlciBmcm9tIHJldHVybmluZw0KPiA+ID4gPiBORlM0RVJSX0JBRF9T VEFURUlEIGluIGFueSBjYXNlcyB3aGVyZSB0aGUgY2FsbHMgdG8NCj4gPiA+ID4gbmZzNF9zZXRf cndfc3RhdGVpZCgpIGhhcHBlbiBiZWZvcmUgc3RhdGUgcmVjb3ZlcnkuIFRoaXMgaXMgd2h5DQo+ ID4gPiA+IHdlDQo+ID4gPiA+IGhhdmUgdGhlIGxvb3AgaW4gbmZzNDJfcHJvY19jb3B5KCkuDQo+ ID4gPiANCj4gPiA+IEkgdGhvdWdodCB0aGF0IGlmIHJldHJ5IG9mIHRoZSBvcGVyYXRpb24gd2Fp dHMgZm9yIHRoZSByZWNvdmVyeQ0KPiA+ID4gdG8NCj4gPiA+IGNvbXBsZXRlIHRoZW4gbmZzNF9z ZXRfcndfc3RhdGVpZCgpIHdpbGwgY2hvb3NlIHRoZSBjb3JyZWN0DQo+ID4gPiBzdGF0ZWlkLA0K PiA+ID4gaXMgdGhhdCBub3QgY29ycmVjdD8NCj4gPiA+IA0KPiA+ID4gSWYgdGhhdCdzIG5vdCBj b3JyZWN0LCB0aGVuIHdlIHNvbWVob3cgbmVlZCB0byBzZXBhcmF0ZSB0aGUgY2FzZQ0KPiA+ID4g d2hlbg0KPiA+ID4gQkFEX1NUQVRFSUQgc2hvdWxkIGluZGVlZCBtYXJrIHRoZSBsb2NrcyBsb3N0 IHZzIHRoaXMgY2FzZSB3aGVyZQ0KPiA+ID4gdGhlDQo+ID4gPiBjb2RlIGVycm9uZW91c2x5IHVz ZWQgdGhlIGJhZCBzdGF0ZWlkIChvb3BzKSBhbmQgaXQgc2hvdWxkIHJlYWxseQ0KPiA+ID4gaWdu b3JlIHRoaXMgZXJyb3IuIFRoaXMgcmVhbGx5IGRvZXNuJ3Qgc291bmQgdmVyeSBwbGF1c2libGUg dG8NCj4gPiA+IGFjY29tcGxpc2guDQo+ID4gDQo+ID4gRG9lcyB0aGlzIHBhdGNoIGZpeCB0aGUg cHJvYmxlbT8NCj4gPiANCj4gPiA4PC0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0NCj4gPiBGcm9tIGJiYWU5NWU4Zjk3Y2FkNmE5MWNh OGNhYTUwYjYxY2FlNzg5NjMyZjkgTW9uIFNlcCAxNyAwMDowMDowMA0KPiA+IDIwMDENCj4gPiBG cm9tOiBUcm9uZCBNeWtsZWJ1c3QgPHRyb25kLm15a2xlYnVzdEBwcmltYXJ5ZGF0YS5jb20+DQo+ ID4gRGF0ZTogVGh1LCAxNiBGZWIgMjAxNyAxODoxNDozOCAtMDUwMA0KPiA+IFN1YmplY3Q6IFtQ QVRDSF0gTkZTdjQ6IEZpeCByZWJvb3QgcmVjb3ZlcnkgaW4gY29weSBvZmZsb2FkDQo+ID4gDQo+ ID4gQ29weSBvZmZsb2FkIGNvZGUgbmVlZHMgdG8gYmUgaG9va2VkIGludG8gdGhlIGNvZGUgZm9y IGhhbmRsaW5nDQo+ID4gTkZTNEVSUl9CQURfU1RBVEVJRCBieSBlbnN1cmluZyB0aGF0IHdlIHNl dCB0aGUgInN0YXRlaWQiIGZpZWxkDQo+ID4gaW4gc3RydWN0IG5mczRfZXhjZXB0aW9uLg0KPiA+ IA0KPiA+IFJlcG9ydGVkLWJ5OiBPbGdhIEtvcm5pZXZza2FpYSA8YWdsb0B1bWljaC5lZHU+DQo+ ID4gRml4ZXM6IDJlNzI0NDhiMDdkYzMgKCJORlM6IEFkZCBDT1BZIG5mcyBvcGVyYXRpb24iKQ0K PiA+IFNpZ25lZC1vZmYtYnk6IFRyb25kIE15a2xlYnVzdCA8dHJvbmQubXlrbGVidXN0QHByaW1h cnlkYXRhLmNvbT4NCj4gPiAtLS0NCj4gPiDCoGZzL25mcy9uZnM0MnByb2MuYyB8IDU3ICsrKysr KysrKysrKysrKysrKysrKysrKysrKysrKystLS0tLS0tLS0tLQ0KPiA+IC0tLS0tLS0tLS0tLQ0K PiA+IMKgMSBmaWxlIGNoYW5nZWQsIDMzIGluc2VydGlvbnMoKyksIDI0IGRlbGV0aW9ucygtKQ0K PiA+IA0KPiA+IGRpZmYgLS1naXQgYS9mcy9uZnMvbmZzNDJwcm9jLmMgYi9mcy9uZnMvbmZzNDJw cm9jLmMNCj4gPiBpbmRleCBkMTJmZjkzODVmNDkuLmJhZjFmZTJkYzI5NiAxMDA2NDQNCj4gPiAt LS0gYS9mcy9uZnMvbmZzNDJwcm9jLmMNCj4gPiArKysgYi9mcy9uZnMvbmZzNDJwcm9jLmMNCj4g PiBAQCAtMTI4LDIwICsxMjgsMTMgQEAgaW50IG5mczQyX3Byb2NfZGVhbGxvY2F0ZShzdHJ1Y3Qg ZmlsZSAqZmlsZXAsDQo+ID4gbG9mZl90IG9mZnNldCwgbG9mZl90IGxlbikNCj4gPiDCoMKgwqDC oMKgwqDCoMKgcmV0dXJuIGVycjsNCj4gPiDCoH0NCj4gPiANCj4gPiAtc3RhdGljIHNzaXplX3Qg X25mczQyX3Byb2NfY29weShzdHJ1Y3QgZmlsZSAqc3JjLCBsb2ZmX3QgcG9zX3NyYywNCj4gPiAr c3RhdGljIHNzaXplX3QgX25mczQyX3Byb2NfY29weShzdHJ1Y3QgZmlsZSAqc3JjLA0KPiA+IMKg wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC oMKgwqBzdHJ1Y3QgbmZzX2xvY2tfY29udGV4dCAqc3JjX2xvY2ssDQo+ID4gLcKgwqDCoMKgwqDC oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgc3RydWN0 IGZpbGUgKmRzdCwgbG9mZl90IHBvc19kc3QsDQo+ID4gK8KgwqDCoMKgwqDCoMKgwqDCoMKgwqDC oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgc3RydWN0IGZpbGUgKmRzdCwN Cj4gPiDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg wqDCoMKgwqDCoMKgc3RydWN0IG5mc19sb2NrX2NvbnRleHQgKmRzdF9sb2NrLA0KPiA+IC3CoMKg wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC oHNpemVfdCBjb3VudCkNCj4gPiArwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqBzdHJ1Y3QgbmZzNDJfY29weV9hcmdzICphcmdzLA0K PiA+ICvCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg wqDCoMKgwqDCoHN0cnVjdCBuZnM0Ml9jb3B5X3JlcyAqcmVzKQ0KPiA+IMKgew0KPiA+IC3CoMKg wqDCoMKgwqDCoHN0cnVjdCBuZnM0Ml9jb3B5X2FyZ3MgYXJncyA9IHsNCj4gPiAtwqDCoMKgwqDC oMKgwqDCoMKgwqDCoMKgwqDCoMKgLnNyY19maMKgwqDCoMKgwqDCoMKgwqDCoD0gTkZTX0ZIKGZp bGVfaW5vZGUoc3JjKSksDQo+ID4gLcKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoC5zcmNf cG9zwqDCoMKgwqDCoMKgwqDCoD0gcG9zX3NyYywNCj4gPiAtwqDCoMKgwqDCoMKgwqDCoMKgwqDC oMKgwqDCoMKgLmRzdF9maMKgwqDCoMKgwqDCoMKgwqDCoD0gTkZTX0ZIKGZpbGVfaW5vZGUoZHN0 KSksDQo+ID4gLcKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoC5kc3RfcG9zwqDCoMKgwqDC oMKgwqDCoD0gcG9zX2RzdCwNCj4gPiAtwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgLmNv dW50wqDCoMKgwqDCoMKgwqDCoMKgwqA9IGNvdW50LA0KPiA+IC3CoMKgwqDCoMKgwqDCoH07DQo+ ID4gLcKgwqDCoMKgwqDCoMKgc3RydWN0IG5mczQyX2NvcHlfcmVzIHJlczsNCj4gPiDCoMKgwqDC oMKgwqDCoMKgc3RydWN0IHJwY19tZXNzYWdlIG1zZyA9IHsNCj4gPiDCoMKgwqDCoMKgwqDCoMKg wqDCoMKgwqDCoMKgwqDCoC5ycGNfcHJvYyA9ICZuZnM0X3Byb2NlZHVyZXNbTkZTUFJPQzRfQ0xO VF9DT1BZXSwNCj4gPiDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoC5ycGNfYXJncCA9 ICZhcmdzLA0KPiA+IEBAIC0xNDksOSArMTQyLDEyIEBAIHN0YXRpYyBzc2l6ZV90IF9uZnM0Ml9w cm9jX2NvcHkoc3RydWN0IGZpbGUNCj4gPiAqc3JjLCBsb2ZmX3QgcG9zX3NyYywNCj4gPiDCoMKg wqDCoMKgwqDCoMKgfTsNCj4gPiDCoMKgwqDCoMKgwqDCoMKgc3RydWN0IGlub2RlICpkc3RfaW5v ZGUgPSBmaWxlX2lub2RlKGRzdCk7DQo+ID4gwqDCoMKgwqDCoMKgwqDCoHN0cnVjdCBuZnNfc2Vy dmVyICpzZXJ2ZXIgPSBORlNfU0VSVkVSKGRzdF9pbm9kZSk7DQo+ID4gK8KgwqDCoMKgwqDCoMKg bG9mZl90IHBvc19zcmMgPSBhcmdzLT5zcmNfcG9zOw0KPiA+ICvCoMKgwqDCoMKgwqDCoGxvZmZf dCBwb3NfZHN0ID0gYXJncy0+ZHN0X3BvczsNCj4gPiArwqDCoMKgwqDCoMKgwqBzaXplX3QgY291 bnQgPSBhcmdzLT5jb3VudDsNCj4gPiDCoMKgwqDCoMKgwqDCoMKgaW50IHN0YXR1czsNCj4gPiAN Cj4gPiAtwqDCoMKgwqDCoMKgwqBzdGF0dXMgPSBuZnM0X3NldF9yd19zdGF0ZWlkKCZhcmdzLnNy Y19zdGF0ZWlkLCBzcmNfbG9jay0NCj4gPiA+b3Blbl9jb250ZXh0LA0KPiA+ICvCoMKgwqDCoMKg wqDCoHN0YXR1cyA9IG5mczRfc2V0X3J3X3N0YXRlaWQoJmFyZ3MtPnNyY19zdGF0ZWlkLCBzcmNf bG9jay0NCj4gPiA+b3Blbl9jb250ZXh0LA0KPiA+IMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgc3JjX2xvY2ss IEZNT0RFX1JFQUQpOw0KPiA+IMKgwqDCoMKgwqDCoMKgwqBpZiAoc3RhdHVzKQ0KPiA+IMKgwqDC oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgcmV0dXJuIHN0YXR1czsNCj4gPiBAQCAtMTYxLDcg KzE1Nyw3IEBAIHN0YXRpYyBzc2l6ZV90IF9uZnM0Ml9wcm9jX2NvcHkoc3RydWN0IGZpbGUNCj4g PiAqc3JjLCBsb2ZmX3QgcG9zX3NyYywNCj4gPiDCoMKgwqDCoMKgwqDCoMKgaWYgKHN0YXR1cykN Cj4gPiDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoHJldHVybiBzdGF0dXM7DQo+ID4g DQo+ID4gLcKgwqDCoMKgwqDCoMKgc3RhdHVzID0gbmZzNF9zZXRfcndfc3RhdGVpZCgmYXJncy5k c3Rfc3RhdGVpZCwgZHN0X2xvY2stDQo+ID4gPm9wZW5fY29udGV4dCwNCj4gPiArwqDCoMKgwqDC oMKgwqBzdGF0dXMgPSBuZnM0X3NldF9yd19zdGF0ZWlkKCZhcmdzLT5kc3Rfc3RhdGVpZCwgZHN0 X2xvY2stDQo+ID4gPm9wZW5fY29udGV4dCwNCj4gPiDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoGRzdF9sb2Nr LCBGTU9ERV9XUklURSk7DQo+ID4gwqDCoMKgwqDCoMKgwqDCoGlmIChzdGF0dXMpDQo+ID4gwqDC oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqByZXR1cm4gc3RhdHVzOw0KPiA+IEBAIC0xNzEs MjIgKzE2NywyMiBAQCBzdGF0aWMgc3NpemVfdCBfbmZzNDJfcHJvY19jb3B5KHN0cnVjdCBmaWxl DQo+ID4gKnNyYywgbG9mZl90IHBvc19zcmMsDQo+ID4gwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg wqDCoMKgwqByZXR1cm4gc3RhdHVzOw0KPiA+IA0KPiA+IMKgwqDCoMKgwqDCoMKgwqBzdGF0dXMg PSBuZnM0X2NhbGxfc3luYyhzZXJ2ZXItPmNsaWVudCwgc2VydmVyLCAmbXNnLA0KPiA+IC3CoMKg wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC oCZhcmdzLnNlcV9hcmdzLCAmcmVzLnNlcV9yZXMsIDApOw0KPiA+ICvCoMKgwqDCoMKgwqDCoMKg wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoCZhcmdzLT5zZXFf YXJncywgJnJlcy0+c2VxX3JlcywgMCk7DQo+ID4gwqDCoMKgwqDCoMKgwqDCoGlmIChzdGF0dXMg PT0gLUVOT1RTVVBQKQ0KPiA+IMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgc2VydmVy LT5jYXBzICY9IH5ORlNfQ0FQX0NPUFk7DQo+ID4gwqDCoMKgwqDCoMKgwqDCoGlmIChzdGF0dXMp DQo+ID4gwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqByZXR1cm4gc3RhdHVzOw0KPiA+ IA0KPiA+IC3CoMKgwqDCoMKgwqDCoGlmIChyZXMud3JpdGVfcmVzLnZlcmlmaWVyLmNvbW1pdHRl ZCAhPSBORlNfRklMRV9TWU5DKSB7DQo+ID4gLcKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC oHN0YXR1cyA9IG5mc19jb21taXRfZmlsZShkc3QsDQo+ID4gJnJlcy53cml0ZV9yZXMudmVyaWZp ZXIudmVyaWZpZXIpOw0KPiA+ICvCoMKgwqDCoMKgwqDCoGlmIChyZXMtPndyaXRlX3Jlcy52ZXJp Zmllci5jb21taXR0ZWQgIT0gTkZTX0ZJTEVfU1lOQykgew0KPiA+ICvCoMKgwqDCoMKgwqDCoMKg wqDCoMKgwqDCoMKgwqBzdGF0dXMgPSBuZnNfY29tbWl0X2ZpbGUoZHN0LCAmcmVzLQ0KPiA+ID53 cml0ZV9yZXMudmVyaWZpZXIudmVyaWZpZXIpOw0KPiA+IMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC oMKgwqDCoMKgaWYgKHN0YXR1cykNCj4gPiDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC oMKgwqDCoMKgwqDCoMKgwqByZXR1cm4gc3RhdHVzOw0KPiA+IMKgwqDCoMKgwqDCoMKgwqB9DQo+ ID4gDQo+ID4gwqDCoMKgwqDCoMKgwqDCoHRydW5jYXRlX3BhZ2VjYWNoZV9yYW5nZShkc3RfaW5v ZGUsIHBvc19kc3QsDQo+ID4gLcKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqBwb3NfZHN0ICsgcmVzLndyaXRlX3Jlcy5jb3VudCk7 DQo+ID4gK8KgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg wqDCoMKgwqDCoMKgwqBwb3NfZHN0ICsgcmVzLT53cml0ZV9yZXMuY291bnQpOw0KPiA+IA0KPiA+ IC3CoMKgwqDCoMKgwqDCoHJldHVybiByZXMud3JpdGVfcmVzLmNvdW50Ow0KPiA+ICvCoMKgwqDC oMKgwqDCoHJldHVybiByZXMtPndyaXRlX3Jlcy5jb3VudDsNCj4gPiDCoH0NCj4gPiANCj4gPiDC oHNzaXplX3QgbmZzNDJfcHJvY19jb3B5KHN0cnVjdCBmaWxlICpzcmMsIGxvZmZfdCBwb3Nfc3Jj LA0KPiA+IEBAIC0xOTYsOCArMTkyLDIyIEBAIHNzaXplX3QgbmZzNDJfcHJvY19jb3B5KHN0cnVj dCBmaWxlICpzcmMsDQo+ID4gbG9mZl90IHBvc19zcmMsDQo+ID4gwqDCoMKgwqDCoMKgwqDCoHN0 cnVjdCBuZnNfc2VydmVyICpzZXJ2ZXIgPSBORlNfU0VSVkVSKGZpbGVfaW5vZGUoZHN0KSk7DQo+ ID4gwqDCoMKgwqDCoMKgwqDCoHN0cnVjdCBuZnNfbG9ja19jb250ZXh0ICpzcmNfbG9jazsNCj4g PiDCoMKgwqDCoMKgwqDCoMKgc3RydWN0IG5mc19sb2NrX2NvbnRleHQgKmRzdF9sb2NrOw0KPiA+ IC3CoMKgwqDCoMKgwqDCoHN0cnVjdCBuZnM0X2V4Y2VwdGlvbiBzcmNfZXhjZXB0aW9uID0geyB9 Ow0KPiA+IC3CoMKgwqDCoMKgwqDCoHN0cnVjdCBuZnM0X2V4Y2VwdGlvbiBkc3RfZXhjZXB0aW9u ID0geyB9Ow0KPiA+ICvCoMKgwqDCoMKgwqDCoHN0cnVjdCBuZnM0Ml9jb3B5X2FyZ3MgYXJncyA9 IHsNCj4gPiArwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgLnNyY19maMKgwqDCoMKgwqDC oMKgwqDCoD0gTkZTX0ZIKGZpbGVfaW5vZGUoc3JjKSksDQo+ID4gK8KgwqDCoMKgwqDCoMKgwqDC oMKgwqDCoMKgwqDCoC5zcmNfcG9zwqDCoMKgwqDCoMKgwqDCoD0gcG9zX3NyYywNCj4gPiArwqDC oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgLmRzdF9maMKgwqDCoMKgwqDCoMKgwqDCoD0gTkZT X0ZIKGZpbGVfaW5vZGUoZHN0KSksDQo+ID4gK8KgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC oC5kc3RfcG9zwqDCoMKgwqDCoMKgwqDCoD0gcG9zX2RzdCwNCj4gPiArwqDCoMKgwqDCoMKgwqDC oMKgwqDCoMKgwqDCoMKgLmNvdW50wqDCoMKgwqDCoMKgwqDCoMKgwqA9IGNvdW50LA0KPiA+ICvC oMKgwqDCoMKgwqDCoH07DQo+ID4gK8KgwqDCoMKgwqDCoMKgc3RydWN0IG5mczQyX2NvcHlfcmVz IHJlczsNCj4gPiArwqDCoMKgwqDCoMKgwqBzdHJ1Y3QgbmZzNF9leGNlcHRpb24gc3JjX2V4Y2Vw dGlvbiA9IHsNCj4gPiArwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgLmlub2RlwqDCoMKg wqDCoMKgwqDCoMKgwqA9IGZpbGVfaW5vZGUoc3JjKSwNCj4gPiArwqDCoMKgwqDCoMKgwqDCoMKg wqDCoMKgwqDCoMKgLnN0YXRlaWTCoMKgwqDCoMKgwqDCoMKgPSAmYXJncy5zcmNfc3RhdGVpZCwN Cj4gPiArwqDCoMKgwqDCoMKgwqB9Ow0KPiA+ICvCoMKgwqDCoMKgwqDCoHN0cnVjdCBuZnM0X2V4 Y2VwdGlvbiBkc3RfZXhjZXB0aW9uID0gew0KPiA+ICvCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC oMKgwqAuaW5vZGXCoMKgwqDCoMKgwqDCoMKgwqDCoD0gZmlsZV9pbm9kZShkc3QpLA0KPiA+ICvC oMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqAuc3RhdGVpZMKgwqDCoMKgwqDCoMKgwqA9ICZh cmdzLmRzdF9zdGF0ZWlkLA0KPiA+ICvCoMKgwqDCoMKgwqDCoH07DQo+ID4gwqDCoMKgwqDCoMKg wqDCoHNzaXplX3QgZXJyLCBlcnIyOw0KPiA+IA0KPiA+IMKgwqDCoMKgwqDCoMKgwqBpZiAoIW5m c19zZXJ2ZXJfY2FwYWJsZShmaWxlX2lub2RlKGRzdCksIE5GU19DQVBfQ09QWSkpDQo+ID4gQEAg LTIwNyw3ICsyMTcsNiBAQCBzc2l6ZV90IG5mczQyX3Byb2NfY29weShzdHJ1Y3QgZmlsZSAqc3Jj LA0KPiA+IGxvZmZfdCBwb3Nfc3JjLA0KPiA+IMKgwqDCoMKgwqDCoMKgwqBpZiAoSVNfRVJSKHNy Y19sb2NrKSkNCj4gPiDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoHJldHVybiBQVFJf RVJSKHNyY19sb2NrKTsNCj4gPiANCj4gPiAtwqDCoMKgwqDCoMKgwqBzcmNfZXhjZXB0aW9uLmlu b2RlID0gZmlsZV9pbm9kZShzcmMpOw0KPiA+IMKgwqDCoMKgwqDCoMKgwqBzcmNfZXhjZXB0aW9u LnN0YXRlID0gc3JjX2xvY2stPm9wZW5fY29udGV4dC0+c3RhdGU7DQo+ID4gDQo+ID4gwqDCoMKg wqDCoMKgwqDCoGRzdF9sb2NrID0NCj4gPiBuZnNfZ2V0X2xvY2tfY29udGV4dChuZnNfZmlsZV9v cGVuX2NvbnRleHQoZHN0KSk7DQo+ID4gQEAgLTIxNiwxMyArMjI1LDEzIEBAIHNzaXplX3QgbmZz NDJfcHJvY19jb3B5KHN0cnVjdCBmaWxlICpzcmMsDQo+ID4gbG9mZl90IHBvc19zcmMsDQo+ID4g wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqBnb3RvIG91dF9wdXRfc3JjX2xvY2s7DQo+ ID4gwqDCoMKgwqDCoMKgwqDCoH0NCj4gPiANCj4gPiAtwqDCoMKgwqDCoMKgwqBkc3RfZXhjZXB0 aW9uLmlub2RlID0gZmlsZV9pbm9kZShkc3QpOw0KPiA+IMKgwqDCoMKgwqDCoMKgwqBkc3RfZXhj ZXB0aW9uLnN0YXRlID0gZHN0X2xvY2stPm9wZW5fY29udGV4dC0+c3RhdGU7DQo+ID4gDQo+ID4g wqDCoMKgwqDCoMKgwqDCoGRvIHsNCj4gPiDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC oGlub2RlX2xvY2soZmlsZV9pbm9kZShkc3QpKTsNCj4gPiAtwqDCoMKgwqDCoMKgwqDCoMKgwqDC oMKgwqDCoMKgZXJyID0gX25mczQyX3Byb2NfY29weShzcmMsIHBvc19zcmMsIHNyY19sb2NrLA0K PiA+IC3CoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgZHN0LCBwb3NfZHN0LCBkc3RfbG9jaywNCj4gPiBjb3Vu dCk7DQo+ID4gK8KgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoGVyciA9IF9uZnM0Ml9wcm9j X2NvcHkoc3JjLCBzcmNfbG9jaywNCj4gPiArwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKg wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqBkc3QsIGRzdF9sb2NrLA0KPiA+ICvCoMKg wqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDC oCZhcmdzLCAmcmVzKTsNCj4gPiDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoMKgwqDCoGlub2Rl X3VubG9jayhmaWxlX2lub2RlKGRzdCkpOw0KPiA+IA0KPiA+IMKgwqDCoMKgwqDCoMKgwqDCoMKg wqDCoMKgwqDCoMKgaWYgKGVyciA9PSAtRU5PVFNVUFApIHsNCj4gPiAtLQ0KPiA+IDIuOS4zDQo+ IA0KPiBJIHdpc2ggaXQgZGlkIGJ1dCBubyBpdCBkb2VzIG5vdC4NCj4gDQoNClNvIHdoYXQgaXMg c3RpbGwgaGFwcGVuaW5nPyBXaXRoIHRoaXMgcGF0Y2gsIHRoZSBlcnJvciBoYW5kbGVyIHNob3Vs ZA0KYmUgYWJsZSB0byBkaXN0aW5ndWlzaCBiZXR3ZWVuIGEgc3RhdGVpZCB0aGF0IGlzIHVwIHRv IGRhdGUgYW5kIG9uZQ0KdGhhdCBpc24ndC4NCg0KVGhlcmUgbWlnaHQsIGhvd2V2ZXIsIHN0aWxs IGJlIGEgcHJvYmxlbSBiZWNhdXNlIHdlIGhhdmUgMiBzdGF0ZWlkcywNCm1lYW5pbmcgdGhhdCBv bmUgY291bGQgYmUgc3RhbGUgKGFuZCBnZW5lcmF0aW5nIE5GUzRFUlJfQkFEX1NUQVRFSUQpDQph bmQgdGhlIG90aGVyIG9uZSBub3QuIFdlIG1pZ2h0IGhhdmUgdG8gc3BlY2lhbCBjYXNlIHRoYXQs IGFuZCBkbyB0aGUNCmNvbXBhcmlzb25zIGluc2lkZSBuZnM0Ml9wcm9jX2NvcHkgaW5zdGVhZCBv ZiB1c2luZyB0aGUgZ2VuZXJpYyBjb2RlIGluDQpuZnM0X2hhbmRsZV9leGNlcHRpb24oKS4NCg0K LS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lciwgUHJpbWFy eURhdGENCnRyb25kLm15a2xlYnVzdEBwcmltYXJ5ZGF0YS5jb20NCg== ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE 2017-02-17 14:58 ` Trond Myklebust @ 2017-02-17 15:07 ` Olga Kornievskaia 2017-02-17 15:22 ` Olga Kornievskaia 0 siblings, 1 reply; 16+ messages in thread From: Olga Kornievskaia @ 2017-02-17 15:07 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs@vger.kernel.org On Fri, Feb 17, 2017 at 9:58 AM, Trond Myklebust <trondmy@primarydata.com> wrote: > On Fri, 2017-02-17 at 09:46 -0500, Olga Kornievskaia wrote: >> On Thu, Feb 16, 2017 at 6:28 PM, Trond Myklebust >> <trondmy@primarydata.com> wrote: >> > On Thu, 2017-02-16 at 17:14 -0500, Olga Kornievskaia wrote: >> > > On Thu, Feb 16, 2017 at 4:45 PM, Trond Myklebust >> > > <trondmy@primarydata.com> wrote: >> > > > >> > > > Olga, all your test is doing is showing that we hit the race >> > > > more >> > > > often. Fair enough, I=E2=80=99ll ask Anna to revert the patch. How= ever >> > > > reverting the patch won=E2=80=99t prevent the server from returnin= g >> > > > NFS4ERR_BAD_STATEID in any cases where the calls to >> > > > nfs4_set_rw_stateid() happen before state recovery. This is why >> > > > we >> > > > have the loop in nfs42_proc_copy(). >> > > >> > > I thought that if retry of the operation waits for the recovery >> > > to >> > > complete then nfs4_set_rw_stateid() will choose the correct >> > > stateid, >> > > is that not correct? >> > > >> > > If that's not correct, then we somehow need to separate the case >> > > when >> > > BAD_STATEID should indeed mark the locks lost vs this case where >> > > the >> > > code erroneously used the bad stateid (oops) and it should really >> > > ignore this error. This really doesn't sound very plausible to >> > > accomplish. >> > >> > Does this patch fix the problem? >> > >> > 8<------------------------------------------------------------- >> > From bbae95e8f97cad6a91ca8caa50b61cae789632f9 Mon Sep 17 00:00:00 >> > 2001 >> > From: Trond Myklebust <trond.myklebust@primarydata.com> >> > Date: Thu, 16 Feb 2017 18:14:38 -0500 >> > Subject: [PATCH] NFSv4: Fix reboot recovery in copy offload >> > >> > Copy offload code needs to be hooked into the code for handling >> > NFS4ERR_BAD_STATEID by ensuring that we set the "stateid" field >> > in struct nfs4_exception. >> > >> > Reported-by: Olga Kornievskaia <aglo@umich.edu> >> > Fixes: 2e72448b07dc3 ("NFS: Add COPY nfs operation") >> > Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> >> > --- >> > fs/nfs/nfs42proc.c | 57 +++++++++++++++++++++++++++++++----------- >> > ------------ >> > 1 file changed, 33 insertions(+), 24 deletions(-) >> > >> > diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c >> > index d12ff9385f49..baf1fe2dc296 100644 >> > --- a/fs/nfs/nfs42proc.c >> > +++ b/fs/nfs/nfs42proc.c >> > @@ -128,20 +128,13 @@ int nfs42_proc_deallocate(struct file *filep, >> > loff_t offset, loff_t len) >> > return err; >> > } >> > >> > -static ssize_t _nfs42_proc_copy(struct file *src, loff_t pos_src, >> > +static ssize_t _nfs42_proc_copy(struct file *src, >> > struct nfs_lock_context *src_lock, >> > - struct file *dst, loff_t pos_dst, >> > + struct file *dst, >> > struct nfs_lock_context *dst_lock, >> > - size_t count) >> > + struct nfs42_copy_args *args, >> > + struct nfs42_copy_res *res) >> > { >> > - struct nfs42_copy_args args =3D { >> > - .src_fh =3D NFS_FH(file_inode(src)), >> > - .src_pos =3D pos_src, >> > - .dst_fh =3D NFS_FH(file_inode(dst)), >> > - .dst_pos =3D pos_dst, >> > - .count =3D count, >> > - }; >> > - struct nfs42_copy_res res; >> > struct rpc_message msg =3D { >> > .rpc_proc =3D &nfs4_procedures[NFSPROC4_CLNT_COPY], >> > .rpc_argp =3D &args, >> > @@ -149,9 +142,12 @@ static ssize_t _nfs42_proc_copy(struct file >> > *src, loff_t pos_src, >> > }; >> > struct inode *dst_inode =3D file_inode(dst); >> > struct nfs_server *server =3D NFS_SERVER(dst_inode); >> > + loff_t pos_src =3D args->src_pos; >> > + loff_t pos_dst =3D args->dst_pos; >> > + size_t count =3D args->count; >> > int status; >> > >> > - status =3D nfs4_set_rw_stateid(&args.src_stateid, src_lock- >> > >open_context, >> > + status =3D nfs4_set_rw_stateid(&args->src_stateid, src_lock- >> > >open_context, >> > src_lock, FMODE_READ); >> > if (status) >> > return status; >> > @@ -161,7 +157,7 @@ static ssize_t _nfs42_proc_copy(struct file >> > *src, loff_t pos_src, >> > if (status) >> > return status; >> > >> > - status =3D nfs4_set_rw_stateid(&args.dst_stateid, dst_lock- >> > >open_context, >> > + status =3D nfs4_set_rw_stateid(&args->dst_stateid, dst_lock- >> > >open_context, >> > dst_lock, FMODE_WRITE); >> > if (status) >> > return status; >> > @@ -171,22 +167,22 @@ static ssize_t _nfs42_proc_copy(struct file >> > *src, loff_t pos_src, >> > return status; >> > >> > status =3D nfs4_call_sync(server->client, server, &msg, >> > - &args.seq_args, &res.seq_res, 0); >> > + &args->seq_args, &res->seq_res, 0); >> > if (status =3D=3D -ENOTSUPP) >> > server->caps &=3D ~NFS_CAP_COPY; >> > if (status) >> > return status; >> > >> > - if (res.write_res.verifier.committed !=3D NFS_FILE_SYNC) { >> > - status =3D nfs_commit_file(dst, >> > &res.write_res.verifier.verifier); >> > + if (res->write_res.verifier.committed !=3D NFS_FILE_SYNC) { >> > + status =3D nfs_commit_file(dst, &res- >> > >write_res.verifier.verifier); >> > if (status) >> > return status; >> > } >> > >> > truncate_pagecache_range(dst_inode, pos_dst, >> > - pos_dst + res.write_res.count); >> > + pos_dst + res->write_res.count); >> > >> > - return res.write_res.count; >> > + return res->write_res.count; >> > } >> > >> > ssize_t nfs42_proc_copy(struct file *src, loff_t pos_src, >> > @@ -196,8 +192,22 @@ ssize_t nfs42_proc_copy(struct file *src, >> > loff_t pos_src, >> > struct nfs_server *server =3D NFS_SERVER(file_inode(dst)); >> > struct nfs_lock_context *src_lock; >> > struct nfs_lock_context *dst_lock; >> > - struct nfs4_exception src_exception =3D { }; >> > - struct nfs4_exception dst_exception =3D { }; >> > + struct nfs42_copy_args args =3D { >> > + .src_fh =3D NFS_FH(file_inode(src)), >> > + .src_pos =3D pos_src, >> > + .dst_fh =3D NFS_FH(file_inode(dst)), >> > + .dst_pos =3D pos_dst, >> > + .count =3D count, >> > + }; >> > + struct nfs42_copy_res res; >> > + struct nfs4_exception src_exception =3D { >> > + .inode =3D file_inode(src), >> > + .stateid =3D &args.src_stateid, >> > + }; >> > + struct nfs4_exception dst_exception =3D { >> > + .inode =3D file_inode(dst), >> > + .stateid =3D &args.dst_stateid, >> > + }; >> > ssize_t err, err2; >> > >> > if (!nfs_server_capable(file_inode(dst), NFS_CAP_COPY)) >> > @@ -207,7 +217,6 @@ ssize_t nfs42_proc_copy(struct file *src, >> > loff_t pos_src, >> > if (IS_ERR(src_lock)) >> > return PTR_ERR(src_lock); >> > >> > - src_exception.inode =3D file_inode(src); >> > src_exception.state =3D src_lock->open_context->state; >> > >> > dst_lock =3D >> > nfs_get_lock_context(nfs_file_open_context(dst)); >> > @@ -216,13 +225,13 @@ ssize_t nfs42_proc_copy(struct file *src, >> > loff_t pos_src, >> > goto out_put_src_lock; >> > } >> > >> > - dst_exception.inode =3D file_inode(dst); >> > dst_exception.state =3D dst_lock->open_context->state; >> > >> > do { >> > inode_lock(file_inode(dst)); >> > - err =3D _nfs42_proc_copy(src, pos_src, src_lock, >> > - dst, pos_dst, dst_lock, >> > count); >> > + err =3D _nfs42_proc_copy(src, src_lock, >> > + dst, dst_lock, >> > + &args, &res); >> > inode_unlock(file_inode(dst)); >> > >> > if (err =3D=3D -ENOTSUPP) { >> > -- >> > 2.9.3 >> >> I wish it did but no it does not. >> > > So what is still happening? With this patch, the error handler should > be able to distinguish between a stateid that is up to date and one > that isn't. > > There might, however, still be a problem because we have 2 stateids, > meaning that one could be stale (and generating NFS4ERR_BAD_STATEID) > and the other one not. We might have to special case that, and do the > comparisons inside nfs42_proc_copy instead of using the generic code in > nfs4_handle_exception(). After COPY gets the BAD_STATEID (on the old stateids), I see 4 TEST_STATEIDs sent with 3 distinct stateids which are the new open stateid for the src file, the locking stateid for the source file and the new old stateid for the destination file. All of which are ok. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE 2017-02-17 15:07 ` Olga Kornievskaia @ 2017-02-17 15:22 ` Olga Kornievskaia 0 siblings, 0 replies; 16+ messages in thread From: Olga Kornievskaia @ 2017-02-17 15:22 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs@vger.kernel.org On Fri, Feb 17, 2017 at 10:07 AM, Olga Kornievskaia <aglo@umich.edu> wrote: > On Fri, Feb 17, 2017 at 9:58 AM, Trond Myklebust > <trondmy@primarydata.com> wrote: >> On Fri, 2017-02-17 at 09:46 -0500, Olga Kornievskaia wrote: >>> On Thu, Feb 16, 2017 at 6:28 PM, Trond Myklebust >>> <trondmy@primarydata.com> wrote: >>> > On Thu, 2017-02-16 at 17:14 -0500, Olga Kornievskaia wrote: >>> > > On Thu, Feb 16, 2017 at 4:45 PM, Trond Myklebust >>> > > <trondmy@primarydata.com> wrote: >>> > > > >>> > > > Olga, all your test is doing is showing that we hit the race >>> > > > more >>> > > > often. Fair enough, I=E2=80=99ll ask Anna to revert the patch. Ho= wever >>> > > > reverting the patch won=E2=80=99t prevent the server from returni= ng >>> > > > NFS4ERR_BAD_STATEID in any cases where the calls to >>> > > > nfs4_set_rw_stateid() happen before state recovery. This is why >>> > > > we >>> > > > have the loop in nfs42_proc_copy(). >>> > > >>> > > I thought that if retry of the operation waits for the recovery >>> > > to >>> > > complete then nfs4_set_rw_stateid() will choose the correct >>> > > stateid, >>> > > is that not correct? >>> > > >>> > > If that's not correct, then we somehow need to separate the case >>> > > when >>> > > BAD_STATEID should indeed mark the locks lost vs this case where >>> > > the >>> > > code erroneously used the bad stateid (oops) and it should really >>> > > ignore this error. This really doesn't sound very plausible to >>> > > accomplish. >>> > >>> > Does this patch fix the problem? >>> > >>> > 8<------------------------------------------------------------- >>> > From bbae95e8f97cad6a91ca8caa50b61cae789632f9 Mon Sep 17 00:00:00 >>> > 2001 >>> > From: Trond Myklebust <trond.myklebust@primarydata.com> >>> > Date: Thu, 16 Feb 2017 18:14:38 -0500 >>> > Subject: [PATCH] NFSv4: Fix reboot recovery in copy offload >>> > >>> > Copy offload code needs to be hooked into the code for handling >>> > NFS4ERR_BAD_STATEID by ensuring that we set the "stateid" field >>> > in struct nfs4_exception. >>> > >>> > Reported-by: Olga Kornievskaia <aglo@umich.edu> >>> > Fixes: 2e72448b07dc3 ("NFS: Add COPY nfs operation") >>> > Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> >>> > --- >>> > fs/nfs/nfs42proc.c | 57 +++++++++++++++++++++++++++++++----------- >>> > ------------ >>> > 1 file changed, 33 insertions(+), 24 deletions(-) >>> > >>> > diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c >>> > index d12ff9385f49..baf1fe2dc296 100644 >>> > --- a/fs/nfs/nfs42proc.c >>> > +++ b/fs/nfs/nfs42proc.c >>> > @@ -128,20 +128,13 @@ int nfs42_proc_deallocate(struct file *filep, >>> > loff_t offset, loff_t len) >>> > return err; >>> > } >>> > >>> > -static ssize_t _nfs42_proc_copy(struct file *src, loff_t pos_src, >>> > +static ssize_t _nfs42_proc_copy(struct file *src, >>> > struct nfs_lock_context *src_lock, >>> > - struct file *dst, loff_t pos_dst, >>> > + struct file *dst, >>> > struct nfs_lock_context *dst_lock, >>> > - size_t count) >>> > + struct nfs42_copy_args *args, >>> > + struct nfs42_copy_res *res) >>> > { >>> > - struct nfs42_copy_args args =3D { >>> > - .src_fh =3D NFS_FH(file_inode(src)), >>> > - .src_pos =3D pos_src, >>> > - .dst_fh =3D NFS_FH(file_inode(dst)), >>> > - .dst_pos =3D pos_dst, >>> > - .count =3D count, >>> > - }; >>> > - struct nfs42_copy_res res; >>> > struct rpc_message msg =3D { >>> > .rpc_proc =3D &nfs4_procedures[NFSPROC4_CLNT_COPY], >>> > .rpc_argp =3D &args, >>> > @@ -149,9 +142,12 @@ static ssize_t _nfs42_proc_copy(struct file >>> > *src, loff_t pos_src, >>> > }; >>> > struct inode *dst_inode =3D file_inode(dst); >>> > struct nfs_server *server =3D NFS_SERVER(dst_inode); >>> > + loff_t pos_src =3D args->src_pos; >>> > + loff_t pos_dst =3D args->dst_pos; >>> > + size_t count =3D args->count; >>> > int status; >>> > >>> > - status =3D nfs4_set_rw_stateid(&args.src_stateid, src_lock- >>> > >open_context, >>> > + status =3D nfs4_set_rw_stateid(&args->src_stateid, src_lock- >>> > >open_context, >>> > src_lock, FMODE_READ); >>> > if (status) >>> > return status; >>> > @@ -161,7 +157,7 @@ static ssize_t _nfs42_proc_copy(struct file >>> > *src, loff_t pos_src, >>> > if (status) >>> > return status; >>> > >>> > - status =3D nfs4_set_rw_stateid(&args.dst_stateid, dst_lock- >>> > >open_context, >>> > + status =3D nfs4_set_rw_stateid(&args->dst_stateid, dst_lock- >>> > >open_context, >>> > dst_lock, FMODE_WRITE); >>> > if (status) >>> > return status; >>> > @@ -171,22 +167,22 @@ static ssize_t _nfs42_proc_copy(struct file >>> > *src, loff_t pos_src, >>> > return status; >>> > >>> > status =3D nfs4_call_sync(server->client, server, &msg, >>> > - &args.seq_args, &res.seq_res, 0); >>> > + &args->seq_args, &res->seq_res, 0); >>> > if (status =3D=3D -ENOTSUPP) >>> > server->caps &=3D ~NFS_CAP_COPY; >>> > if (status) >>> > return status; >>> > >>> > - if (res.write_res.verifier.committed !=3D NFS_FILE_SYNC) { >>> > - status =3D nfs_commit_file(dst, >>> > &res.write_res.verifier.verifier); >>> > + if (res->write_res.verifier.committed !=3D NFS_FILE_SYNC) { >>> > + status =3D nfs_commit_file(dst, &res- >>> > >write_res.verifier.verifier); >>> > if (status) >>> > return status; >>> > } >>> > >>> > truncate_pagecache_range(dst_inode, pos_dst, >>> > - pos_dst + res.write_res.count); >>> > + pos_dst + res->write_res.count); >>> > >>> > - return res.write_res.count; >>> > + return res->write_res.count; >>> > } >>> > >>> > ssize_t nfs42_proc_copy(struct file *src, loff_t pos_src, >>> > @@ -196,8 +192,22 @@ ssize_t nfs42_proc_copy(struct file *src, >>> > loff_t pos_src, >>> > struct nfs_server *server =3D NFS_SERVER(file_inode(dst)); >>> > struct nfs_lock_context *src_lock; >>> > struct nfs_lock_context *dst_lock; >>> > - struct nfs4_exception src_exception =3D { }; >>> > - struct nfs4_exception dst_exception =3D { }; >>> > + struct nfs42_copy_args args =3D { >>> > + .src_fh =3D NFS_FH(file_inode(src)), >>> > + .src_pos =3D pos_src, >>> > + .dst_fh =3D NFS_FH(file_inode(dst)), >>> > + .dst_pos =3D pos_dst, >>> > + .count =3D count, >>> > + }; >>> > + struct nfs42_copy_res res; >>> > + struct nfs4_exception src_exception =3D { >>> > + .inode =3D file_inode(src), >>> > + .stateid =3D &args.src_stateid, >>> > + }; >>> > + struct nfs4_exception dst_exception =3D { >>> > + .inode =3D file_inode(dst), >>> > + .stateid =3D &args.dst_stateid, >>> > + }; >>> > ssize_t err, err2; >>> > >>> > if (!nfs_server_capable(file_inode(dst), NFS_CAP_COPY)) >>> > @@ -207,7 +217,6 @@ ssize_t nfs42_proc_copy(struct file *src, >>> > loff_t pos_src, >>> > if (IS_ERR(src_lock)) >>> > return PTR_ERR(src_lock); >>> > >>> > - src_exception.inode =3D file_inode(src); >>> > src_exception.state =3D src_lock->open_context->state; >>> > >>> > dst_lock =3D >>> > nfs_get_lock_context(nfs_file_open_context(dst)); >>> > @@ -216,13 +225,13 @@ ssize_t nfs42_proc_copy(struct file *src, >>> > loff_t pos_src, >>> > goto out_put_src_lock; >>> > } >>> > >>> > - dst_exception.inode =3D file_inode(dst); >>> > dst_exception.state =3D dst_lock->open_context->state; >>> > >>> > do { >>> > inode_lock(file_inode(dst)); >>> > - err =3D _nfs42_proc_copy(src, pos_src, src_lock, >>> > - dst, pos_dst, dst_lock, >>> > count); >>> > + err =3D _nfs42_proc_copy(src, src_lock, >>> > + dst, dst_lock, >>> > + &args, &res); >>> > inode_unlock(file_inode(dst)); >>> > >>> > if (err =3D=3D -ENOTSUPP) { >>> > -- >>> > 2.9.3 >>> >>> I wish it did but no it does not. >>> >> >> So what is still happening? With this patch, the error handler should >> be able to distinguish between a stateid that is up to date and one >> that isn't. >> >> There might, however, still be a problem because we have 2 stateids, >> meaning that one could be stale (and generating NFS4ERR_BAD_STATEID) >> and the other one not. We might have to special case that, and do the >> comparisons inside nfs42_proc_copy instead of using the generic code in >> nfs4_handle_exception(). > > After COPY gets the BAD_STATEID (on the old stateids), I see 4 > TEST_STATEIDs sent with 3 distinct stateids which are the new open > stateid for the src file, the locking stateid for the source file and > the new old stateid for the destination file. All of which are ok. Urgh. sorry please wait. Last time I booted into the wrong kernel. You patch currently oops. I need to fix that first. There is HOPE now it works. ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2017-02-17 15:22 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-12-05 0:40 [PATCH 1/2] NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE Trond Myklebust 2016-12-05 0:40 ` [PATCH 2/2] NFSv4.1: Don't schedule lease recovery in nfs4_schedule_session_recovery() Trond Myklebust 2017-02-15 20:16 ` [PATCH 1/2] NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE Olga Kornievskaia 2017-02-15 20:48 ` Trond Myklebust 2017-02-15 21:56 ` Olga Kornievskaia 2017-02-15 22:23 ` Trond Myklebust 2017-02-16 14:16 ` Olga Kornievskaia 2017-02-16 16:04 ` Olga Kornievskaia 2017-02-16 16:12 ` Olga Kornievskaia 2017-02-16 21:45 ` Trond Myklebust 2017-02-16 22:14 ` Olga Kornievskaia 2017-02-16 23:28 ` Trond Myklebust 2017-02-17 14:46 ` Olga Kornievskaia 2017-02-17 14:58 ` Trond Myklebust 2017-02-17 15:07 ` Olga Kornievskaia 2017-02-17 15:22 ` Olga Kornievskaia
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).