[PATCH] NFS: Don't disconnect open-owner on NFS4ERR_BAD

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] NFS: Don't disconnect open-owner on NFS4ERR_BAD_SEQID
@ 2016-06-03  5:24 NeilBrown
  2016-10-07  1:27 ` NeilBrown
  0 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2016-06-03  5:24 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: NFS List

[-- Attachment #1: Type: text/plain, Size: 4006 bytes --]

Hi Trond/Anna,
 I'd like your thoughts on this patch.
 I have a customer whose NFS client occasionally gets into a state
 where it spins indefinitely sending WRITE (or OPEN) and getting an
 NFS4ERR_BAD_STATEID (or NFS4ERR_BAD_SEQID) error.  The  state manager
 doesn't try any recovery as the open owner is no longer in the rbtree -
 presumably due to an earlier NFS4ERR_BAD_SEQID.  I've seen at least one
   NFS: v4 server XXX returned a bad sequence-id error!
 in the logs.
 I don't know if this is a server problem or a client problem - client
 didn't have
  Commit: c7848f69ec4a ("NFSv4: OPEN must handle the NFS4ERR_IO return code correctly")
 which cause cause BAD_SEQID errors.
 However wherever the bug is we don't want the NFS client to spin like
 that.
 This patch removes the spinning when tested against a hacked NFS
 server which can be convinced to return BAD_SEQID and BAD_STATED for a
 given open owner.

 My main concern is: is there some other reason to remove the open owner
 from the rbtree other than to make sure it isn't used for new opens?

Thanks,
NeilBrown

When an NFS4ERR_BAD_SEQID is received the open-owner is removed from
the ->state_owners rbtree so that it will no longer be used.

If any stateids attached to this open-owner are still in use, and if a
request using one get an NFS4ERR_BAD_STATEID reply, this can for bad.

The state is marked as needing recovery and the nfs4_state_manager()
is scheduled to clean up.  nfs4_state_manager() finds states to be
recovered by walking the state_owners rbtree.  As the open-owner is
not in the rbtree, the bad state is not found so nfs4_state_manager()
completes having done nothing.  The request is then retried, with a
predicatable result (indefinite retries).

This patch changes NFS4ERR_BAD_SEQID handling to leave the open-owner
in the rbtree but mark it a 'stale'.  With this the indefinite retries
no longer happen.  Errors get to user-space instead if recovery
doesn't work.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/nfs/nfs4_fs.h   |  3 ++-
 fs/nfs/nfs4state.c | 18 +++++-------------
 2 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 768456fa1b17..42244d1123f6 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -113,7 +113,8 @@ struct nfs4_state_owner {

 enum {
 	NFS_OWNER_RECLAIM_REBOOT,
-	NFS_OWNER_RECLAIM_NOGRACE
+	NFS_OWNER_RECLAIM_NOGRACE,
+	NFS_OWNER_STALE,
 };

 #define NFS_LOCK_NEW		0
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 9679f4749364..abacaf521d29 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -400,6 +400,8 @@ nfs4_find_state_owner_locked(struct nfs_server *server, struct rpc_cred *cred)
 			p = &parent->rb_left;
 		else if (cred > sp->so_cred)
 			p = &parent->rb_right;
+		else if (test_bit(NFS_OWNER_STALE, &sp->so_flags))
+			p = &parent->rb_left;
 		else {
 			if (!list_empty(&sp->so_lru))
 				list_del_init(&sp->so_lru);
@@ -427,6 +429,8 @@ nfs4_insert_state_owner_locked(struct nfs4_state_owner *new)
 			p = &parent->rb_left;
 		else if (new->so_cred > sp->so_cred)
 			p = &parent->rb_right;
+		else if (test_bit(NFS_OWNER_STALE, &sp->so_flags))
+			p = &parent->rb_left;
 		else {
 			if (!list_empty(&sp->so_lru))
 				list_del_init(&sp->so_lru);
@@ -499,19 +503,7 @@ nfs4_alloc_state_owner(struct nfs_server *server,
 static void
 nfs4_drop_state_owner(struct nfs4_state_owner *sp)
 {
-	struct rb_node *rb_node = &sp->so_server_node;
-
-	if (!RB_EMPTY_NODE(rb_node)) {
-		struct nfs_server *server = sp->so_server;
-		struct nfs_client *clp = server->nfs_client;
-
-		spin_lock(&clp->cl_lock);
-		if (!RB_EMPTY_NODE(rb_node)) {
-			rb_erase(rb_node, &server->state_owners);
-			RB_CLEAR_NODE(rb_node);
-		}
-		spin_unlock(&clp->cl_lock);
-	}
+	set_bit(NFS_OWNER_STALE, &sp->so_flags);
 }

 static void nfs4_free_state_owner(struct nfs4_state_owner *sp)
-- 
2.8.3

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] NFS: Don't disconnect open-owner on NFS4ERR_BAD_SEQID
  2016-06-03  5:24 [PATCH] NFS: Don't disconnect open-owner on NFS4ERR_BAD_SEQID NeilBrown
@ 2016-10-07  1:27 ` NeilBrown
  2016-10-07 16:21   ` Trond Myklebust
  0 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2016-10-07  1:27 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: NFS List

[-- Attachment #1: Type: text/plain, Size: 4158 bytes --]

Hi again,
 I posted a version of this patch 4 months and got no reply, so
 thought it might be time to try again.
 This version includes a small change to handle the case when a
 delegation stateid gets a BAD_STATEID, in the context of the open-owner
 getting a BAD_SEQID.
 Obviously this whole issue can only happen if the server is buggy (or
 if the client is buggy, but I don't think it is), but it would be best
 to handle that case gracefully.  Currently it spins indefinitely.

Thanks,
NeilBrown

From: NeilBrown <neilb@suse.com>
Subject: [PATCH] NFS: Don't disconnect open-owner on NFS4ERR_BAD_SEQID

When an NFS4ERR_BAD_SEQID is received the open-owner is removed from
the ->state_owners rbtree so that it will no longer be used.

If any stateids attached to this open-owner are still in use, and if a
request using one get an NFS4ERR_BAD_STATEID reply, this can for bad.

The state is marked as needing recovery and the nfs4_state_manager()
is scheduled to clean up.  nfs4_state_manager() finds states to be
recovered by walking the state_owners rbtree.  As the open-owner is
not in the rbtree, the bad state is not found so nfs4_state_manager()
completes having done nothing.  The request is then retried, with a
predicatable result (indefinite retries).

This patch changes NFS4ERR_BAD_SEQID handling to leave the open-owner
in the rbtree but mark it a 'stale'.  With this the indefinite retries
no longer happen.  Errors get to user-space instead if recovery
doesn't work.

If the stateid is for a delegation, the result is more complex.
nfs4_state_manager() tries to return the delegation but uses the
open-owner with the bad seqid to open files on the server, and this
fails with more BAD_SEQID errors.  To avoid this we update the
so_seqid.create_time of the bad open-owner so that it looks to the server
like a new open-owner and an OPEN_CONFIRM is requested.  This allows
the return of the delagation to complete.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/nfs/nfs4_fs.h   |  3 ++-
 fs/nfs/nfs4state.c | 22 +++++++++-------------
 2 files changed, 11 insertions(+), 14 deletions(-)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 3f0e459f2499..6be19814553f 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -113,7 +113,8 @@ struct nfs4_state_owner {

 enum {
 	NFS_OWNER_RECLAIM_REBOOT,
-	NFS_OWNER_RECLAIM_NOGRACE
+	NFS_OWNER_RECLAIM_NOGRACE,
+	NFS_OWNER_STALE,
 };

 #define NFS_LOCK_NEW		0
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 74cc32490c7a..8ed2285fc527 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -397,6 +397,8 @@ nfs4_find_state_owner_locked(struct nfs_server *server, struct rpc_cred *cred)
 			p = &parent->rb_left;
 		else if (cred > sp->so_cred)
 			p = &parent->rb_right;
+		else if (test_bit(NFS_OWNER_STALE, &sp->so_flags))
+			p = &parent->rb_left;
 		else {
 			if (!list_empty(&sp->so_lru))
 				list_del_init(&sp->so_lru);
@@ -424,6 +426,8 @@ nfs4_insert_state_owner_locked(struct nfs4_state_owner *new)
 			p = &parent->rb_left;
 		else if (new->so_cred > sp->so_cred)
 			p = &parent->rb_right;
+		else if (test_bit(NFS_OWNER_STALE, &sp->so_flags))
+			p = &parent->rb_left;
 		else {
 			if (!list_empty(&sp->so_lru))
 				list_del_init(&sp->so_lru);
@@ -496,19 +500,11 @@ nfs4_alloc_state_owner(struct nfs_server *server,
 static void
 nfs4_drop_state_owner(struct nfs4_state_owner *sp)
 {
-	struct rb_node *rb_node = &sp->so_server_node;
-
-	if (!RB_EMPTY_NODE(rb_node)) {
-		struct nfs_server *server = sp->so_server;
-		struct nfs_client *clp = server->nfs_client;
-
-		spin_lock(&clp->cl_lock);
-		if (!RB_EMPTY_NODE(rb_node)) {
-			rb_erase(rb_node, &server->state_owners);
-			RB_CLEAR_NODE(rb_node);
-		}
-		spin_unlock(&clp->cl_lock);
-	}
+	set_bit(NFS_OWNER_STALE, &sp->so_flags);
+	/* Delegation recall might insist on using this open_owner
+	 * so reset it to force a new 'confirm' stage to be initiated.
+	 */
+	sp->so_seqid.create_time = ktime_get();
 }

 static void nfs4_free_state_owner(struct nfs4_state_owner *sp)
-- 
2.10.0

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 800 bytes --]

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] NFS: Don't disconnect open-owner on NFS4ERR_BAD_SEQID
  2016-10-07  1:27 ` NeilBrown
@ 2016-10-07 16:21   ` Trond Myklebust
  2016-10-11 22:14     ` NeilBrown
  0 siblings, 1 reply; 9+ messages in thread
From: Trond Myklebust @ 2016-10-07 16:21 UTC (permalink / raw)
  To: NeilBrown; +Cc: Schumaker Anna, List Linux NFS Mailing

DQo+IE9uIE9jdCA2LCAyMDE2LCBhdCAyMToyNywgTmVpbEJyb3duIDxuZWlsYkBzdXNlLmNvbT4g
d3JvdGU6DQo+IA0KPiANCj4gSGkgYWdhaW4sDQo+IEkgcG9zdGVkIGEgdmVyc2lvbiBvZiB0aGlz
IHBhdGNoIDQgbW9udGhzIGFuZCBnb3Qgbm8gcmVwbHksIHNvDQo+IHRob3VnaHQgaXQgbWlnaHQg
YmUgdGltZSB0byB0cnkgYWdhaW4uDQo+IFRoaXMgdmVyc2lvbiBpbmNsdWRlcyBhIHNtYWxsIGNo
YW5nZSB0byBoYW5kbGUgdGhlIGNhc2Ugd2hlbiBhDQo+IGRlbGVnYXRpb24gc3RhdGVpZCBnZXRz
IGEgQkFEX1NUQVRFSUQsIGluIHRoZSBjb250ZXh0IG9mIHRoZSBvcGVuLW93bmVyDQo+IGdldHRp
bmcgYSBCQURfU0VRSUQuDQo+IE9idmlvdXNseSB0aGlzIHdob2xlIGlzc3VlIGNhbiBvbmx5IGhh
cHBlbiBpZiB0aGUgc2VydmVyIGlzIGJ1Z2d5IChvcg0KPiBpZiB0aGUgY2xpZW50IGlzIGJ1Z2d5
LCBidXQgSSBkb24ndCB0aGluayBpdCBpcyksIGJ1dCBpdCB3b3VsZCBiZSBiZXN0DQo+IHRvIGhh
bmRsZSB0aGF0IGNhc2UgZ3JhY2VmdWxseS4gIEN1cnJlbnRseSBpdCBzcGlucyBpbmRlZmluaXRl
bHkuDQo+IA0KPiBUaGFua3MsDQo+IE5laWxCcm93bg0KPiANCj4gRnJvbTogTmVpbEJyb3duIDxu
ZWlsYkBzdXNlLmNvbT4NCj4gU3ViamVjdDogW1BBVENIXSBORlM6IERvbid0IGRpc2Nvbm5lY3Qg
b3Blbi1vd25lciBvbiBORlM0RVJSX0JBRF9TRVFJRA0KPiANCj4gV2hlbiBhbiBORlM0RVJSX0JB
RF9TRVFJRCBpcyByZWNlaXZlZCB0aGUgb3Blbi1vd25lciBpcyByZW1vdmVkIGZyb20NCj4gdGhl
IC0+c3RhdGVfb3duZXJzIHJidHJlZSBzbyB0aGF0IGl0IHdpbGwgbm8gbG9uZ2VyIGJlIHVzZWQu
DQo+IA0KPiBJZiBhbnkgc3RhdGVpZHMgYXR0YWNoZWQgdG8gdGhpcyBvcGVuLW93bmVyIGFyZSBz
dGlsbCBpbiB1c2UsIGFuZCBpZiBhDQo+IHJlcXVlc3QgdXNpbmcgb25lIGdldCBhbiBORlM0RVJS
X0JBRF9TVEFURUlEIHJlcGx5LCB0aGlzIGNhbiBmb3IgYmFkLg0KPiANCj4gVGhlIHN0YXRlIGlz
IG1hcmtlZCBhcyBuZWVkaW5nIHJlY292ZXJ5IGFuZCB0aGUgbmZzNF9zdGF0ZV9tYW5hZ2VyKCkN
Cj4gaXMgc2NoZWR1bGVkIHRvIGNsZWFuIHVwLiAgbmZzNF9zdGF0ZV9tYW5hZ2VyKCkgZmluZHMg
c3RhdGVzIHRvIGJlDQo+IHJlY292ZXJlZCBieSB3YWxraW5nIHRoZSBzdGF0ZV9vd25lcnMgcmJ0
cmVlLiAgQXMgdGhlIG9wZW4tb3duZXIgaXMNCj4gbm90IGluIHRoZSByYnRyZWUsIHRoZSBiYWQg
c3RhdGUgaXMgbm90IGZvdW5kIHNvIG5mczRfc3RhdGVfbWFuYWdlcigpDQo+IGNvbXBsZXRlcyBo
YXZpbmcgZG9uZSBub3RoaW5nLiAgVGhlIHJlcXVlc3QgaXMgdGhlbiByZXRyaWVkLCB3aXRoIGEN
Cj4gcHJlZGljYXRhYmxlIHJlc3VsdCAoaW5kZWZpbml0ZSByZXRyaWVzKS4NCj4gDQo+IFRoaXMg
cGF0Y2ggY2hhbmdlcyBORlM0RVJSX0JBRF9TRVFJRCBoYW5kbGluZyB0byBsZWF2ZSB0aGUgb3Bl
bi1vd25lcg0KPiBpbiB0aGUgcmJ0cmVlIGJ1dCBtYXJrIGl0IGEgJ3N0YWxlJy4gIFdpdGggdGhp
cyB0aGUgaW5kZWZpbml0ZSByZXRyaWVzDQo+IG5vIGxvbmdlciBoYXBwZW4uICBFcnJvcnMgZ2V0
IHRvIHVzZXItc3BhY2UgaW5zdGVhZCBpZiByZWNvdmVyeQ0KPiBkb2Vzbid0IHdvcmsuDQo+IA0K
PiBJZiB0aGUgc3RhdGVpZCBpcyBmb3IgYSBkZWxlZ2F0aW9uLCB0aGUgcmVzdWx0IGlzIG1vcmUg
Y29tcGxleC4NCj4gbmZzNF9zdGF0ZV9tYW5hZ2VyKCkgdHJpZXMgdG8gcmV0dXJuIHRoZSBkZWxl
Z2F0aW9uIGJ1dCB1c2VzIHRoZQ0KPiBvcGVuLW93bmVyIHdpdGggdGhlIGJhZCBzZXFpZCB0byBv
cGVuIGZpbGVzIG9uIHRoZSBzZXJ2ZXIsIGFuZCB0aGlzDQo+IGZhaWxzIHdpdGggbW9yZSBCQURf
U0VRSUQgZXJyb3JzLiAgVG8gYXZvaWQgdGhpcyB3ZSB1cGRhdGUgdGhlDQo+IHNvX3NlcWlkLmNy
ZWF0ZV90aW1lIG9mIHRoZSBiYWQgb3Blbi1vd25lciBzbyB0aGF0IGl0IGxvb2tzIHRvIHRoZSBz
ZXJ2ZXINCj4gbGlrZSBhIG5ldyBvcGVuLW93bmVyIGFuZCBhbiBPUEVOX0NPTkZJUk0gaXMgcmVx
dWVzdGVkLiAgVGhpcyBhbGxvd3MNCj4gdGhlIHJldHVybiBvZiB0aGUgZGVsYWdhdGlvbiB0byBj
b21wbGV0ZS4NCj4gDQo+IFNpZ25lZC1vZmYtYnk6IE5laWxCcm93biA8bmVpbGJAc3VzZS5jb20+
DQo+IC0tLQ0KPiBmcy9uZnMvbmZzNF9mcy5oICAgfCAgMyArKy0NCj4gZnMvbmZzL25mczRzdGF0
ZS5jIHwgMjIgKysrKysrKysrLS0tLS0tLS0tLS0tLQ0KPiAyIGZpbGVzIGNoYW5nZWQsIDExIGlu
c2VydGlvbnMoKyksIDE0IGRlbGV0aW9ucygtKQ0KPiANCj4gZGlmZiAtLWdpdCBhL2ZzL25mcy9u
ZnM0X2ZzLmggYi9mcy9uZnMvbmZzNF9mcy5oDQo+IGluZGV4IDNmMGU0NTlmMjQ5OS4uNmJlMTk4
MTQ1NTNmIDEwMDY0NA0KPiAtLS0gYS9mcy9uZnMvbmZzNF9mcy5oDQo+ICsrKyBiL2ZzL25mcy9u
ZnM0X2ZzLmgNCj4gQEAgLTExMyw3ICsxMTMsOCBAQCBzdHJ1Y3QgbmZzNF9zdGF0ZV9vd25lciB7
DQo+IA0KPiBlbnVtIHsNCj4gCU5GU19PV05FUl9SRUNMQUlNX1JFQk9PVCwNCj4gLQlORlNfT1dO
RVJfUkVDTEFJTV9OT0dSQUNFDQo+ICsJTkZTX09XTkVSX1JFQ0xBSU1fTk9HUkFDRSwNCj4gKwlO
RlNfT1dORVJfU1RBTEUsDQo+IH07DQo+IA0KPiAjZGVmaW5lIE5GU19MT0NLX05FVwkJMA0KPiBk
aWZmIC0tZ2l0IGEvZnMvbmZzL25mczRzdGF0ZS5jIGIvZnMvbmZzL25mczRzdGF0ZS5jDQo+IGlu
ZGV4IDc0Y2MzMjQ5MGM3YS4uOGVkMjI4NWZjNTI3IDEwMDY0NA0KPiAtLS0gYS9mcy9uZnMvbmZz
NHN0YXRlLmMNCj4gKysrIGIvZnMvbmZzL25mczRzdGF0ZS5jDQo+IEBAIC0zOTcsNiArMzk3LDgg
QEAgbmZzNF9maW5kX3N0YXRlX293bmVyX2xvY2tlZChzdHJ1Y3QgbmZzX3NlcnZlciAqc2VydmVy
LCBzdHJ1Y3QgcnBjX2NyZWQgKmNyZWQpDQo+IAkJCXAgPSAmcGFyZW50LT5yYl9sZWZ0Ow0KPiAJ
CWVsc2UgaWYgKGNyZWQgPiBzcC0+c29fY3JlZCkNCj4gCQkJcCA9ICZwYXJlbnQtPnJiX3JpZ2h0
Ow0KPiArCQllbHNlIGlmICh0ZXN0X2JpdChORlNfT1dORVJfU1RBTEUsICZzcC0+c29fZmxhZ3Mp
KQ0KPiArCQkJcCA9ICZwYXJlbnQtPnJiX2xlZnQ7DQo+IAkJZWxzZSB7DQo+IAkJCWlmICghbGlz
dF9lbXB0eSgmc3AtPnNvX2xydSkpDQo+IAkJCQlsaXN0X2RlbF9pbml0KCZzcC0+c29fbHJ1KTsN
Cj4gQEAgLTQyNCw2ICs0MjYsOCBAQCBuZnM0X2luc2VydF9zdGF0ZV9vd25lcl9sb2NrZWQoc3Ry
dWN0IG5mczRfc3RhdGVfb3duZXIgKm5ldykNCj4gCQkJcCA9ICZwYXJlbnQtPnJiX2xlZnQ7DQo+
IAkJZWxzZSBpZiAobmV3LT5zb19jcmVkID4gc3AtPnNvX2NyZWQpDQo+IAkJCXAgPSAmcGFyZW50
LT5yYl9yaWdodDsNCj4gKwkJZWxzZSBpZiAodGVzdF9iaXQoTkZTX09XTkVSX1NUQUxFLCAmc3At
PnNvX2ZsYWdzKSkNCj4gKwkJCXAgPSAmcGFyZW50LT5yYl9sZWZ0Ow0KPiAJCWVsc2Ugew0KPiAJ
CQlpZiAoIWxpc3RfZW1wdHkoJnNwLT5zb19scnUpKQ0KPiAJCQkJbGlzdF9kZWxfaW5pdCgmc3At
PnNvX2xydSk7DQo+IEBAIC00OTYsMTkgKzUwMCwxMSBAQCBuZnM0X2FsbG9jX3N0YXRlX293bmVy
KHN0cnVjdCBuZnNfc2VydmVyICpzZXJ2ZXIsDQo+IHN0YXRpYyB2b2lkDQo+IG5mczRfZHJvcF9z
dGF0ZV9vd25lcihzdHJ1Y3QgbmZzNF9zdGF0ZV9vd25lciAqc3ApDQo+IHsNCj4gLQlzdHJ1Y3Qg
cmJfbm9kZSAqcmJfbm9kZSA9ICZzcC0+c29fc2VydmVyX25vZGU7DQo+IC0NCj4gLQlpZiAoIVJC
X0VNUFRZX05PREUocmJfbm9kZSkpIHsNCj4gLQkJc3RydWN0IG5mc19zZXJ2ZXIgKnNlcnZlciA9
IHNwLT5zb19zZXJ2ZXI7DQo+IC0JCXN0cnVjdCBuZnNfY2xpZW50ICpjbHAgPSBzZXJ2ZXItPm5m
c19jbGllbnQ7DQo+IC0NCj4gLQkJc3Bpbl9sb2NrKCZjbHAtPmNsX2xvY2spOw0KPiAtCQlpZiAo
IVJCX0VNUFRZX05PREUocmJfbm9kZSkpIHsNCj4gLQkJCXJiX2VyYXNlKHJiX25vZGUsICZzZXJ2
ZXItPnN0YXRlX293bmVycyk7DQo+IC0JCQlSQl9DTEVBUl9OT0RFKHJiX25vZGUpOw0KPiAtCQl9
DQo+IC0JCXNwaW5fdW5sb2NrKCZjbHAtPmNsX2xvY2spOw0KPiAtCX0NCj4gKwlzZXRfYml0KE5G
U19PV05FUl9TVEFMRSwgJnNwLT5zb19mbGFncyk7DQo+ICsJLyogRGVsZWdhdGlvbiByZWNhbGwg
bWlnaHQgaW5zaXN0IG9uIHVzaW5nIHRoaXMgb3Blbl9vd25lcg0KPiArCSAqIHNvIHJlc2V0IGl0
IHRvIGZvcmNlIGEgbmV3ICdjb25maXJtJyBzdGFnZSB0byBiZSBpbml0aWF0ZWQuDQo+ICsJICov
DQo+ICsJc3AtPnNvX3NlcWlkLmNyZWF0ZV90aW1lID0ga3RpbWVfZ2V0KCk7DQoNCkhtbeKApi4g
SWYgeW914oCZcmUgZ29pbmcgdG8gZG8gdGhpcywgdGhlbiB3aHkgbWFyayB0aGUgc3RhdGVfb3du
ZXIgYXMgc3RhbGUgYXQgYWxsPyBXaHkgbm90IGp1c3QgcmVzZXQgc3AtPnNvX3NlcWlkLmZsYWdz
IGFuZCBjYWxsIG5mczRfY2xlYXJfb3Blbl9zdGF0ZSgpIHRvIGxldCB0aGUgc3RhdGUgcmVjb3Zl
cnkgdHJ5IHRvIGRvIHdoYXQgaXQgY2FuLg0KDQoNCg==


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] NFS: Don't disconnect open-owner on NFS4ERR_BAD_SEQID
  2016-10-07 16:21   ` Trond Myklebust
@ 2016-10-11 22:14     ` NeilBrown
  2016-12-19  0:48       ` [PATCH resend] " NeilBrown
  0 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2016-10-11 22:14 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Schumaker Anna, List Linux NFS Mailing

[-- Attachment #1: Type: text/plain, Size: 3089 bytes --]

On Sat, Oct 08 2016, Trond Myklebust wrote:

>> +	set_bit(NFS_OWNER_STALE, &sp->so_flags);
>> +	/* Delegation recall might insist on using this open_owner
>> +	 * so reset it to force a new 'confirm' stage to be initiated.
>> +	 */
>> +	sp->so_seqid.create_time = ktime_get();
>
> Hmm…. If you’re going to do this, then why mark the state_owner as stale at all? Why not just reset sp->so_seqid.flags and call nfs4_clear_open_state() to let the state recovery try to do what it can.

Good point ... I was too focused on how I got there and what I had
tested...

I cannot test this properly as I don't have a server that returned
BAD_SEQID, and I doubt I can convince the customer to test some new as
they have something that works, but I think this is correct.

Thanks,
NeilBrown

From: NeilBrown <neilb@suse.com>
Date: Fri, 3 Jun 2016 14:58:02 +1000
Subject: [PATCH] NFS: Don't disconnect open-owner on NFS4ERR_BAD_SEQID

When an NFS4ERR_BAD_SEQID is received the open-owner is removed from
the ->state_owners rbtree so that it will no longer be used.

If any stateids attached to this open-owner are still in use, and if a
request using one gets an NFS4ERR_BAD_STATEID reply, this can for bad.

The state is marked as needing recovery and the nfs4_state_manager()
is scheduled to clean up.  nfs4_state_manager() finds states to be
recovered by walking the state_owners rbtree.  As the open-owner is
not in the rbtree, the bad state is not found so nfs4_state_manager()
completes having done nothing.  The request is then retried, with a
predicatable result (indefinite retries).

If the stateid is for a delegation, this open_owner will be used
to open files when the delegation is returned.  For that to work,
a new open-owner needs to be presented to the server.

This patch changes NFS4ERR_BAD_SEQID handling to leave the open-owner
in the rbtree but updates the 'create_time' so it looks like a new
open-owner.  With this the indefinite retries no longer happen.

Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/nfs/nfs4state.c | 17 ++++-------------
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 74cc32490c7a..c4d721ed493d 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -496,19 +496,10 @@ nfs4_alloc_state_owner(struct nfs_server *server,
 static void
 nfs4_drop_state_owner(struct nfs4_state_owner *sp)
 {
-	struct rb_node *rb_node = &sp->so_server_node;
-
-	if (!RB_EMPTY_NODE(rb_node)) {
-		struct nfs_server *server = sp->so_server;
-		struct nfs_client *clp = server->nfs_client;
-
-		spin_lock(&clp->cl_lock);
-		if (!RB_EMPTY_NODE(rb_node)) {
-			rb_erase(rb_node, &server->state_owners);
-			RB_CLEAR_NODE(rb_node);
-		}
-		spin_unlock(&clp->cl_lock);
-	}
+	/* Delegation recall might insist on using this open_owner
+	 * so reset it to force a new 'confirm' stage to be initiated.
+	 */
+	sp->so_seqid.create_time = ktime_get();
 }

 static void nfs4_free_state_owner(struct nfs4_state_owner *sp)
-- 
2.10.0

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 800 bytes --]

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH resend] NFS: Don't disconnect open-owner on NFS4ERR_BAD_SEQID
  2016-10-11 22:14     ` NeilBrown
@ 2016-12-19  0:48       ` NeilBrown
  2017-03-20 21:00         ` Olga Kornievskaia
  0 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2016-12-19  0:48 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Schumaker Anna, List Linux NFS Mailing

[-- Attachment #1: Type: text/plain, Size: 3052 bytes --]

When an NFS4ERR_BAD_SEQID is received the open-owner is removed from
the ->state_owners rbtree so that it will no longer be used.

If any stateids attached to this open-owner are still in use, and if a
request using one gets an NFS4ERR_BAD_STATEID reply, this can for bad.

The state is marked as needing recovery and the nfs4_state_manager()
is scheduled to clean up.  nfs4_state_manager() finds states to be
recovered by walking the state_owners rbtree.  As the open-owner is
not in the rbtree, the bad state is not found so nfs4_state_manager()
completes having done nothing.  The request is then retried, with a
predicatable result (indefinite retries).

If the stateid is for a delegation, this open_owner will be used
to open files when the delegation is returned.  For that to work,
a new open-owner needs to be presented to the server.

This patch changes NFS4ERR_BAD_SEQID handling to leave the open-owner
in the rbtree but updates the 'create_time' so it looks like a new
open-owner.  With this the indefinite retries no longer happen.

Signed-off-by: NeilBrown <neilb@suse.com>
---

Hi Trond,
 It appears this one got lost too.
 I've added a comment as I thought an explanation was needed, and
 renamed the function from "drop" to "reset".

Thanks,
NeilBrown

 fs/nfs/nfs4state.c | 29 +++++++++++++----------------
 1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index cf869802ff23..1d152f4470cd 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -494,21 +494,18 @@ nfs4_alloc_state_owner(struct nfs_server *server,
 }

 static void
-nfs4_drop_state_owner(struct nfs4_state_owner *sp)
-{
-	struct rb_node *rb_node = &sp->so_server_node;
-
-	if (!RB_EMPTY_NODE(rb_node)) {
-		struct nfs_server *server = sp->so_server;
-		struct nfs_client *clp = server->nfs_client;
-
-		spin_lock(&clp->cl_lock);
-		if (!RB_EMPTY_NODE(rb_node)) {
-			rb_erase(rb_node, &server->state_owners);
-			RB_CLEAR_NODE(rb_node);
-		}
-		spin_unlock(&clp->cl_lock);
-	}
+nfs4_reset_state_owner(struct nfs4_state_owner *sp)
+{
+	/* This state_owner is no longer usable, but must
+	 * remain in place so that state recovery can find it
+	 * and the opens associated with it.
+	 * It may also be used for new 'open' request to
+	 * return a delegation to the server.
+	 * So update the 'create_time' so that it looks like
+	 * a new state_owner.  This will cause the server to
+	 * request an OPEN_CONFIRM to start a new sequence.
+	 */
+	sp->so_seqid.create_time = ktime_get();
 }

 static void nfs4_free_state_owner(struct nfs4_state_owner *sp)
@@ -1113,7 +1110,7 @@ void nfs_increment_open_seqid(int status, struct nfs_seqid *seqid)

 	sp = container_of(seqid->sequence, struct nfs4_state_owner, so_seqid);
 	if (status == -NFS4ERR_BAD_SEQID)
-		nfs4_drop_state_owner(sp);
+		nfs4_reset_state_owner(sp);
 	if (!nfs4_has_session(sp->so_server->nfs_client))
 		nfs_increment_seqid(status, seqid);
 }
-- 
2.11.0

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH resend] NFS: Don't disconnect open-owner on NFS4ERR_BAD_SEQID
  2016-12-19  0:48       ` [PATCH resend] " NeilBrown
@ 2017-03-20 21:00         ` Olga Kornievskaia
  2017-03-20 23:26           ` NeilBrown
  0 siblings, 1 reply; 9+ messages in thread
From: Olga Kornievskaia @ 2017-03-20 21:00 UTC (permalink / raw)
  To: NeilBrown; +Cc: Trond Myklebust, Schumaker Anna, List Linux NFS Mailing

Since open_owner is no longer removed then the resources are not
freed. They will not be freed until unmount  (or reboot recovery).

There are (buggy) servers out there that might be forcing the client
to keep creating new open_owners. So if they are no longer released,
can't the client get into trouble here?

On Sun, Dec 18, 2016 at 7:48 PM, NeilBrown <neilb@suse.com> wrote:
>
> When an NFS4ERR_BAD_SEQID is received the open-owner is removed from
> the ->state_owners rbtree so that it will no longer be used.
>
> If any stateids attached to this open-owner are still in use, and if a
> request using one gets an NFS4ERR_BAD_STATEID reply, this can for bad.
>
> The state is marked as needing recovery and the nfs4_state_manager()
> is scheduled to clean up.  nfs4_state_manager() finds states to be
> recovered by walking the state_owners rbtree.  As the open-owner is
> not in the rbtree, the bad state is not found so nfs4_state_manager()
> completes having done nothing.  The request is then retried, with a
> predicatable result (indefinite retries).
>
> If the stateid is for a delegation, this open_owner will be used
> to open files when the delegation is returned.  For that to work,
> a new open-owner needs to be presented to the server.
>
> This patch changes NFS4ERR_BAD_SEQID handling to leave the open-owner
> in the rbtree but updates the 'create_time' so it looks like a new
> open-owner.  With this the indefinite retries no longer happen.
>
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>
> Hi Trond,
>  It appears this one got lost too.
>  I've added a comment as I thought an explanation was needed, and
>  renamed the function from "drop" to "reset".
>
> Thanks,
> NeilBrown
>
>  fs/nfs/nfs4state.c | 29 +++++++++++++----------------
>  1 file changed, 13 insertions(+), 16 deletions(-)
>
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index cf869802ff23..1d152f4470cd 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -494,21 +494,18 @@ nfs4_alloc_state_owner(struct nfs_server *server,
>  }
>
>  static void
> -nfs4_drop_state_owner(struct nfs4_state_owner *sp)
> -{
> -       struct rb_node *rb_node = &sp->so_server_node;
> -
> -       if (!RB_EMPTY_NODE(rb_node)) {
> -               struct nfs_server *server = sp->so_server;
> -               struct nfs_client *clp = server->nfs_client;
> -
> -               spin_lock(&clp->cl_lock);
> -               if (!RB_EMPTY_NODE(rb_node)) {
> -                       rb_erase(rb_node, &server->state_owners);
> -                       RB_CLEAR_NODE(rb_node);
> -               }
> -               spin_unlock(&clp->cl_lock);
> -       }
> +nfs4_reset_state_owner(struct nfs4_state_owner *sp)
> +{
> +       /* This state_owner is no longer usable, but must
> +        * remain in place so that state recovery can find it
> +        * and the opens associated with it.
> +        * It may also be used for new 'open' request to
> +        * return a delegation to the server.
> +        * So update the 'create_time' so that it looks like
> +        * a new state_owner.  This will cause the server to
> +        * request an OPEN_CONFIRM to start a new sequence.
> +        */
> +       sp->so_seqid.create_time = ktime_get();
>  }
>
>  static void nfs4_free_state_owner(struct nfs4_state_owner *sp)
> @@ -1113,7 +1110,7 @@ void nfs_increment_open_seqid(int status, struct nfs_seqid *seqid)
>
>         sp = container_of(seqid->sequence, struct nfs4_state_owner, so_seqid);
>         if (status == -NFS4ERR_BAD_SEQID)
> -               nfs4_drop_state_owner(sp);
> +               nfs4_reset_state_owner(sp);
>         if (!nfs4_has_session(sp->so_server->nfs_client))
>                 nfs_increment_seqid(status, seqid);
>  }
> --
> 2.11.0
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH resend] NFS: Don't disconnect open-owner on NFS4ERR_BAD_SEQID
  2017-03-20 21:00         ` Olga Kornievskaia
@ 2017-03-20 23:26           ` NeilBrown
  2017-03-21 16:43             ` Olga Kornievskaia
  0 siblings, 1 reply; 9+ messages in thread
From: NeilBrown @ 2017-03-20 23:26 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: Trond Myklebust, Schumaker Anna, List Linux NFS Mailing

[-- Attachment #1: Type: text/plain, Size: 4501 bytes --]

On Mon, Mar 20 2017, Olga Kornievskaia wrote:

> Since open_owner is no longer removed then the resources are not
> freed. They will not be freed until unmount  (or reboot recovery).

I don't follow your reasoning.
When the last uses for a state owner is dropped by
nfs4_put_state_owner() (e.g. when the last open file using it is
closed), nfs4_put_state_owner() will add the state to
server->state_owners_lru.

nfs4_gc_state_owners() is called periodically (every time any file is
opened I think) and it will call nfs4_remove_state_owner_locked()
on any state which is sufficiently old enough (hasn't been used in
'lease-time'), and will then call nfs4_free_state_owner().
These two will release all resources.

Does that explanation fit with your understanding?

Thanks,
NeilBrown


>
> There are (buggy) servers out there that might be forcing the client
> to keep creating new open_owners. So if they are no longer released,
> can't the client get into trouble here?
>
> On Sun, Dec 18, 2016 at 7:48 PM, NeilBrown <neilb@suse.com> wrote:
>>
>> When an NFS4ERR_BAD_SEQID is received the open-owner is removed from
>> the ->state_owners rbtree so that it will no longer be used.
>>
>> If any stateids attached to this open-owner are still in use, and if a
>> request using one gets an NFS4ERR_BAD_STATEID reply, this can for bad.
>>
>> The state is marked as needing recovery and the nfs4_state_manager()
>> is scheduled to clean up.  nfs4_state_manager() finds states to be
>> recovered by walking the state_owners rbtree.  As the open-owner is
>> not in the rbtree, the bad state is not found so nfs4_state_manager()
>> completes having done nothing.  The request is then retried, with a
>> predicatable result (indefinite retries).
>>
>> If the stateid is for a delegation, this open_owner will be used
>> to open files when the delegation is returned.  For that to work,
>> a new open-owner needs to be presented to the server.
>>
>> This patch changes NFS4ERR_BAD_SEQID handling to leave the open-owner
>> in the rbtree but updates the 'create_time' so it looks like a new
>> open-owner.  With this the indefinite retries no longer happen.
>>
>> Signed-off-by: NeilBrown <neilb@suse.com>
>> ---
>>
>> Hi Trond,
>>  It appears this one got lost too.
>>  I've added a comment as I thought an explanation was needed, and
>>  renamed the function from "drop" to "reset".
>>
>> Thanks,
>> NeilBrown
>>
>>  fs/nfs/nfs4state.c | 29 +++++++++++++----------------
>>  1 file changed, 13 insertions(+), 16 deletions(-)
>>
>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>> index cf869802ff23..1d152f4470cd 100644
>> --- a/fs/nfs/nfs4state.c
>> +++ b/fs/nfs/nfs4state.c
>> @@ -494,21 +494,18 @@ nfs4_alloc_state_owner(struct nfs_server *server,
>>  }
>>
>>  static void
>> -nfs4_drop_state_owner(struct nfs4_state_owner *sp)
>> -{
>> -       struct rb_node *rb_node = &sp->so_server_node;
>> -
>> -       if (!RB_EMPTY_NODE(rb_node)) {
>> -               struct nfs_server *server = sp->so_server;
>> -               struct nfs_client *clp = server->nfs_client;
>> -
>> -               spin_lock(&clp->cl_lock);
>> -               if (!RB_EMPTY_NODE(rb_node)) {
>> -                       rb_erase(rb_node, &server->state_owners);
>> -                       RB_CLEAR_NODE(rb_node);
>> -               }
>> -               spin_unlock(&clp->cl_lock);
>> -       }
>> +nfs4_reset_state_owner(struct nfs4_state_owner *sp)
>> +{
>> +       /* This state_owner is no longer usable, but must
>> +        * remain in place so that state recovery can find it
>> +        * and the opens associated with it.
>> +        * It may also be used for new 'open' request to
>> +        * return a delegation to the server.
>> +        * So update the 'create_time' so that it looks like
>> +        * a new state_owner.  This will cause the server to
>> +        * request an OPEN_CONFIRM to start a new sequence.
>> +        */
>> +       sp->so_seqid.create_time = ktime_get();
>>  }
>>
>>  static void nfs4_free_state_owner(struct nfs4_state_owner *sp)
>> @@ -1113,7 +1110,7 @@ void nfs_increment_open_seqid(int status, struct nfs_seqid *seqid)
>>
>>         sp = container_of(seqid->sequence, struct nfs4_state_owner, so_seqid);
>>         if (status == -NFS4ERR_BAD_SEQID)
>> -               nfs4_drop_state_owner(sp);
>> +               nfs4_reset_state_owner(sp);
>>         if (!nfs4_has_session(sp->so_server->nfs_client))
>>                 nfs_increment_seqid(status, seqid);
>>  }
>> --
>> 2.11.0
>>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH resend] NFS: Don't disconnect open-owner on NFS4ERR_BAD_SEQID
  2017-03-20 23:26           ` NeilBrown
@ 2017-03-21 16:43             ` Olga Kornievskaia
  2018-03-08 18:58               ` Olga Kornievskaia
  0 siblings, 1 reply; 9+ messages in thread
From: Olga Kornievskaia @ 2017-03-21 16:43 UTC (permalink / raw)
  To: NeilBrown; +Cc: Trond Myklebust, Schumaker Anna, List Linux NFS Mailing

On Mon, Mar 20, 2017 at 7:26 PM, NeilBrown <neilb@suse.com> wrote:
> On Mon, Mar 20 2017, Olga Kornievskaia wrote:
>
>> Since open_owner is no longer removed then the resources are not
>> freed. They will not be freed until unmount  (or reboot recovery).
>
> I don't follow your reasoning.
> When the last uses for a state owner is dropped by
> nfs4_put_state_owner() (e.g. when the last open file using it is
> closed), nfs4_put_state_owner() will add the state to
> server->state_owners_lru.
>
> nfs4_gc_state_owners() is called periodically (every time any file is
> opened I think) and it will call nfs4_remove_state_owner_locked()
> on any state which is sufficiently old enough (hasn't been used in
> 'lease-time'), and will then call nfs4_free_state_owner().
> These two will release all resources.
>
> Does that explanation fit with your understanding?

Thanks for the explanation. I was thinking about the rb-tree nodes.
The new open finds the same rb-tree node. I incorrectly thought that
it'd be generating a new node each time for the new open owner.

Thanks.

>
> Thanks,
> NeilBrown
>
>
>>
>> There are (buggy) servers out there that might be forcing the client
>> to keep creating new open_owners. So if they are no longer released,
>> can't the client get into trouble here?
>>
>> On Sun, Dec 18, 2016 at 7:48 PM, NeilBrown <neilb@suse.com> wrote:
>>>
>>> When an NFS4ERR_BAD_SEQID is received the open-owner is removed from
>>> the ->state_owners rbtree so that it will no longer be used.
>>>
>>> If any stateids attached to this open-owner are still in use, and if a
>>> request using one gets an NFS4ERR_BAD_STATEID reply, this can for bad.
>>>
>>> The state is marked as needing recovery and the nfs4_state_manager()
>>> is scheduled to clean up.  nfs4_state_manager() finds states to be
>>> recovered by walking the state_owners rbtree.  As the open-owner is
>>> not in the rbtree, the bad state is not found so nfs4_state_manager()
>>> completes having done nothing.  The request is then retried, with a
>>> predicatable result (indefinite retries).
>>>
>>> If the stateid is for a delegation, this open_owner will be used
>>> to open files when the delegation is returned.  For that to work,
>>> a new open-owner needs to be presented to the server.
>>>
>>> This patch changes NFS4ERR_BAD_SEQID handling to leave the open-owner
>>> in the rbtree but updates the 'create_time' so it looks like a new
>>> open-owner.  With this the indefinite retries no longer happen.
>>>
>>> Signed-off-by: NeilBrown <neilb@suse.com>
>>> ---
>>>
>>> Hi Trond,
>>>  It appears this one got lost too.
>>>  I've added a comment as I thought an explanation was needed, and
>>>  renamed the function from "drop" to "reset".
>>>
>>> Thanks,
>>> NeilBrown
>>>
>>>  fs/nfs/nfs4state.c | 29 +++++++++++++----------------
>>>  1 file changed, 13 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>>> index cf869802ff23..1d152f4470cd 100644
>>> --- a/fs/nfs/nfs4state.c
>>> +++ b/fs/nfs/nfs4state.c
>>> @@ -494,21 +494,18 @@ nfs4_alloc_state_owner(struct nfs_server *server,
>>>  }
>>>
>>>  static void
>>> -nfs4_drop_state_owner(struct nfs4_state_owner *sp)
>>> -{
>>> -       struct rb_node *rb_node = &sp->so_server_node;
>>> -
>>> -       if (!RB_EMPTY_NODE(rb_node)) {
>>> -               struct nfs_server *server = sp->so_server;
>>> -               struct nfs_client *clp = server->nfs_client;
>>> -
>>> -               spin_lock(&clp->cl_lock);
>>> -               if (!RB_EMPTY_NODE(rb_node)) {
>>> -                       rb_erase(rb_node, &server->state_owners);
>>> -                       RB_CLEAR_NODE(rb_node);
>>> -               }
>>> -               spin_unlock(&clp->cl_lock);
>>> -       }
>>> +nfs4_reset_state_owner(struct nfs4_state_owner *sp)
>>> +{
>>> +       /* This state_owner is no longer usable, but must
>>> +        * remain in place so that state recovery can find it
>>> +        * and the opens associated with it.
>>> +        * It may also be used for new 'open' request to
>>> +        * return a delegation to the server.
>>> +        * So update the 'create_time' so that it looks like
>>> +        * a new state_owner.  This will cause the server to
>>> +        * request an OPEN_CONFIRM to start a new sequence.
>>> +        */
>>> +       sp->so_seqid.create_time = ktime_get();
>>>  }
>>>
>>>  static void nfs4_free_state_owner(struct nfs4_state_owner *sp)
>>> @@ -1113,7 +1110,7 @@ void nfs_increment_open_seqid(int status, struct nfs_seqid *seqid)
>>>
>>>         sp = container_of(seqid->sequence, struct nfs4_state_owner, so_seqid);
>>>         if (status == -NFS4ERR_BAD_SEQID)
>>> -               nfs4_drop_state_owner(sp);
>>> +               nfs4_reset_state_owner(sp);
>>>         if (!nfs4_has_session(sp->so_server->nfs_client))
>>>                 nfs_increment_seqid(status, seqid);
>>>  }
>>> --
>>> 2.11.0
>>>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH resend] NFS: Don't disconnect open-owner on NFS4ERR_BAD_SEQID
  2017-03-21 16:43             ` Olga Kornievskaia
@ 2018-03-08 18:58               ` Olga Kornievskaia
  0 siblings, 0 replies; 9+ messages in thread
From: Olga Kornievskaia @ 2018-03-08 18:58 UTC (permalink / raw)
  To: NeilBrown; +Cc: Trond Myklebust, Schumaker Anna, List Linux NFS Mailing

On Tue, Mar 21, 2017 at 12:43 PM, Olga Kornievskaia <aglo@umich.edu> wrote:
> On Mon, Mar 20, 2017 at 7:26 PM, NeilBrown <neilb@suse.com> wrote:
>> On Mon, Mar 20 2017, Olga Kornievskaia wrote:
>>
>>> Since open_owner is no longer removed then the resources are not
>>> freed. They will not be freed until unmount  (or reboot recovery).
>>
>> I don't follow your reasoning.
>> When the last uses for a state owner is dropped by
>> nfs4_put_state_owner() (e.g. when the last open file using it is
>> closed), nfs4_put_state_owner() will add the state to
>> server->state_owners_lru.
>>
>> nfs4_gc_state_owners() is called periodically (every time any file is
>> opened I think) and it will call nfs4_remove_state_owner_locked()
>> on any state which is sufficiently old enough (hasn't been used in
>> 'lease-time'), and will then call nfs4_free_state_owner().
>> These two will release all resources.
>>
>> Does that explanation fit with your understanding?
>
> Thanks for the explanation. I was thinking about the rb-tree nodes.
> The new open finds the same rb-tree node. I incorrectly thought that
> it'd be generating a new node each time for the new open owner.

I have another question about this patch. With this patch, when the
new open owner is sent, it is sent with a seqid of what the old open
owner had. Is this intentional or accidental?

>
> Thanks.
>
>>
>> Thanks,
>> NeilBrown
>>
>>
>>>
>>> There are (buggy) servers out there that might be forcing the client
>>> to keep creating new open_owners. So if they are no longer released,
>>> can't the client get into trouble here?
>>>
>>> On Sun, Dec 18, 2016 at 7:48 PM, NeilBrown <neilb@suse.com> wrote:
>>>>
>>>> When an NFS4ERR_BAD_SEQID is received the open-owner is removed from
>>>> the ->state_owners rbtree so that it will no longer be used.
>>>>
>>>> If any stateids attached to this open-owner are still in use, and if a
>>>> request using one gets an NFS4ERR_BAD_STATEID reply, this can for bad.
>>>>
>>>> The state is marked as needing recovery and the nfs4_state_manager()
>>>> is scheduled to clean up.  nfs4_state_manager() finds states to be
>>>> recovered by walking the state_owners rbtree.  As the open-owner is
>>>> not in the rbtree, the bad state is not found so nfs4_state_manager()
>>>> completes having done nothing.  The request is then retried, with a
>>>> predicatable result (indefinite retries).
>>>>
>>>> If the stateid is for a delegation, this open_owner will be used
>>>> to open files when the delegation is returned.  For that to work,
>>>> a new open-owner needs to be presented to the server.
>>>>
>>>> This patch changes NFS4ERR_BAD_SEQID handling to leave the open-owner
>>>> in the rbtree but updates the 'create_time' so it looks like a new
>>>> open-owner.  With this the indefinite retries no longer happen.
>>>>
>>>> Signed-off-by: NeilBrown <neilb@suse.com>
>>>> ---
>>>>
>>>> Hi Trond,
>>>>  It appears this one got lost too.
>>>>  I've added a comment as I thought an explanation was needed, and
>>>>  renamed the function from "drop" to "reset".
>>>>
>>>> Thanks,
>>>> NeilBrown
>>>>
>>>>  fs/nfs/nfs4state.c | 29 +++++++++++++----------------
>>>>  1 file changed, 13 insertions(+), 16 deletions(-)
>>>>
>>>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>>>> index cf869802ff23..1d152f4470cd 100644
>>>> --- a/fs/nfs/nfs4state.c
>>>> +++ b/fs/nfs/nfs4state.c
>>>> @@ -494,21 +494,18 @@ nfs4_alloc_state_owner(struct nfs_server *server,
>>>>  }
>>>>
>>>>  static void
>>>> -nfs4_drop_state_owner(struct nfs4_state_owner *sp)
>>>> -{
>>>> -       struct rb_node *rb_node = &sp->so_server_node;
>>>> -
>>>> -       if (!RB_EMPTY_NODE(rb_node)) {
>>>> -               struct nfs_server *server = sp->so_server;
>>>> -               struct nfs_client *clp = server->nfs_client;
>>>> -
>>>> -               spin_lock(&clp->cl_lock);
>>>> -               if (!RB_EMPTY_NODE(rb_node)) {
>>>> -                       rb_erase(rb_node, &server->state_owners);
>>>> -                       RB_CLEAR_NODE(rb_node);
>>>> -               }
>>>> -               spin_unlock(&clp->cl_lock);
>>>> -       }
>>>> +nfs4_reset_state_owner(struct nfs4_state_owner *sp)
>>>> +{
>>>> +       /* This state_owner is no longer usable, but must
>>>> +        * remain in place so that state recovery can find it
>>>> +        * and the opens associated with it.
>>>> +        * It may also be used for new 'open' request to
>>>> +        * return a delegation to the server.
>>>> +        * So update the 'create_time' so that it looks like
>>>> +        * a new state_owner.  This will cause the server to
>>>> +        * request an OPEN_CONFIRM to start a new sequence.
>>>> +        */
>>>> +       sp->so_seqid.create_time = ktime_get();
>>>>  }
>>>>
>>>>  static void nfs4_free_state_owner(struct nfs4_state_owner *sp)
>>>> @@ -1113,7 +1110,7 @@ void nfs_increment_open_seqid(int status, struct nfs_seqid *seqid)
>>>>
>>>>         sp = container_of(seqid->sequence, struct nfs4_state_owner, so_seqid);
>>>>         if (status == -NFS4ERR_BAD_SEQID)
>>>> -               nfs4_drop_state_owner(sp);
>>>> +               nfs4_reset_state_owner(sp);
>>>>         if (!nfs4_has_session(sp->so_server->nfs_client))
>>>>                 nfs_increment_seqid(status, seqid);
>>>>  }
>>>> --
>>>> 2.11.0
>>>>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-03-08 18:58 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-03  5:24 [PATCH] NFS: Don't disconnect open-owner on NFS4ERR_BAD_SEQID NeilBrown
2016-10-07  1:27 ` NeilBrown
2016-10-07 16:21   ` Trond Myklebust
2016-10-11 22:14     ` NeilBrown
2016-12-19  0:48       ` [PATCH resend] " NeilBrown
2017-03-20 21:00         ` Olga Kornievskaia
2017-03-20 23:26           ` NeilBrown
2017-03-21 16:43             ` Olga Kornievskaia
2018-03-08 18:58               ` Olga Kornievskaia

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).