* hang on server boot
@ 2010-09-20 3:36 J. Bruce Fields
2010-09-20 15:03 ` J. Bruce Fields
0 siblings, 1 reply; 6+ messages in thread
From: J. Bruce Fields @ 2010-09-20 3:36 UTC (permalink / raw)
To: linux-nfs
Trying to do some reboot testing, I hit this bug. I'm planning to queue
up this fix for 2.6.37 absent any objections.
(Arguably it could go to 2.6.36, but we're getting closer to the next
release, the consequences of this bug aren't too horrible, and it's not
a new regression.)
--b.
commit fae1561d4f5ccd315741cf4cb9ca2fb7c3fbe377
Author: J. Bruce Fields <bfields@redhat.com>
Date: Sun Sep 19 22:55:06 2010 -0400
nfsd4: fix hang on fast-booting nfs servers
The last_close field of a cache_detail is initialized to zero, so the
condition
detail->last_close < seconds_since_boot() - 30
may be false even for a cache that was never opened.
However, we want to immediately fail upcalls to caches that were never
opened: in the case of the auth_unix_gid cache, especially, which may
never be opened by mountd (if the --manage-gids option is not set), we
want to fail the upcall immediately. Otherwise client requests will be
dropped unnecessarily on reboot.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index da872f9..ca7c621 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -1091,6 +1091,23 @@ static void warn_no_listener(struct cache_detail *detail)
}
}
+static bool cache_listeners_exist(struct cache_detail *detail)
+{
+ if (atomic_read(&detail->readers))
+ return true;
+ if (detail->last_close == 0)
+ /* This cache was never opened */
+ return false;
+ if (detail->last_close < seconds_since_boot() - 30)
+ /*
+ * We allow for the possibility that someone might
+ * restart a userspace daemon without restarting the
+ * server; but after 30 seconds, we give up.
+ */
+ return false;
+ return true;
+}
+
/*
* register an upcall request to user-space and queue it up for read() by the
* upcall daemon.
@@ -1109,10 +1126,9 @@ int sunrpc_cache_pipe_upcall(struct cache_detail *detail, struct cache_head *h,
char *bp;
int len;
- if (atomic_read(&detail->readers) == 0 &&
- detail->last_close < seconds_since_boot() - 30) {
- warn_no_listener(detail);
- return -EINVAL;
+ if (!cache_listeners_exist(detail)) {
+ warn_no_listener(detail);
+ return -EINVAL;
}
buf = kmalloc(PAGE_SIZE, GFP_KERNEL);
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: hang on server boot
2010-09-20 3:36 hang on server boot J. Bruce Fields
@ 2010-09-20 15:03 ` J. Bruce Fields
2010-09-20 15:04 ` [PATCH 1/2] TESTS: fix error when rebootscript defined but not rebootargs J. Bruce Fields
2010-09-20 15:05 ` hang on server boot J. Bruce Fields
0 siblings, 2 replies; 6+ messages in thread
From: J. Bruce Fields @ 2010-09-20 15:03 UTC (permalink / raw)
To: iisaman; +Cc: linux-nfs
On Sun, Sep 19, 2010 at 11:36:21PM -0400, bfields wrote:
> Trying to do some reboot testing, I hit this bug. I'm planning to queue
> up this fix for 2.6.37 absent any objections.
>
> (Arguably it could go to 2.6.36, but we're getting closer to the next
> release, the consequences of this bug aren't too horrible, and it's not
> a new regression.)
With that kernel fix and the following two pynfs fixes, I pass the
reboot tests except for REBT4, 5, 6, 7, and 11 (which I haven't tried to
triage yet).
--b.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] TESTS: fix error when rebootscript defined but not rebootargs
2010-09-20 15:03 ` J. Bruce Fields
@ 2010-09-20 15:04 ` J. Bruce Fields
2010-09-20 15:04 ` [PATCH 2/2] TESTS: make reboot tests run non-interactively J. Bruce Fields
2010-09-20 15:05 ` hang on server boot J. Bruce Fields
1 sibling, 1 reply; 6+ messages in thread
From: J. Bruce Fields @ 2010-09-20 15:04 UTC (permalink / raw)
To: iisaman; +Cc: linux-nfs
From: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
lib/nfs4/servertests/st_reboot.py | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/lib/nfs4/servertests/st_reboot.py b/lib/nfs4/servertests/st_reboot.py
index f924716..4828cb8 100644
--- a/lib/nfs4/servertests/st_reboot.py
+++ b/lib/nfs4/servertests/st_reboot.py
@@ -39,8 +39,10 @@ def _waitForReboot(c):
sys.stdin.readline()
print "Continuing with test"
else:
- # Invoke the reboot script, passing it rebootargs as an argument.
- os.system(c.opts.rebootscript + ' ' + c.opts.rebootargs)
+ args = c.opts.rebootscript
+ if c.opts.rebootargs:
+ c.opts.rebootscript += ' ' + c.opts.rebootargs
+ os.system(args)
# Wait until the server is back up.
# c.null() blocks until it gets a response,
--
1.7.0.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] TESTS: make reboot tests run non-interactively
2010-09-20 15:04 ` [PATCH 1/2] TESTS: fix error when rebootscript defined but not rebootargs J. Bruce Fields
@ 2010-09-20 15:04 ` J. Bruce Fields
0 siblings, 0 replies; 6+ messages in thread
From: J. Bruce Fields @ 2010-09-20 15:04 UTC (permalink / raw)
To: iisaman; +Cc: linux-nfs
From: J. Bruce Fields <bfields@redhat.com>
If I asked for reboot tests on the commandline, then I shouldn't need to
be asked again.
And if I want something different from the default I'll modify the
script, or add a commandline option. It's more important that the tests
be easy to run unattended.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
lib/nfs4/servertests/st_reboot.py | 49 +++++++++----------------------------
1 files changed, 12 insertions(+), 37 deletions(-)
diff --git a/lib/nfs4/servertests/st_reboot.py b/lib/nfs4/servertests/st_reboot.py
index 4828cb8..758c54c 100644
--- a/lib/nfs4/servertests/st_reboot.py
+++ b/lib/nfs4/servertests/st_reboot.py
@@ -5,29 +5,6 @@ import os
# NOTE - reboot tests are NOT part of the standard test suite
-__asked = None
-
-def _ask(t, env):
- global __asked
- if __asked is None:
- print "Reboot tests are not part of the standard test suite."
- if not env.opts.verbose:
- print "Also, it is probably better to use the -v option"
- print "Are you *sure* you want to run them?"
- answer = sys.stdin.readline()
- c = answer.lower()[0]
- __asked = (c=='y')
- return __asked
-
-def _getcount(t, env):
- print "For test %s, how many clientids to use?" % t.code
- answer = sys.stdin.readline()
- try:
- t.__clientcount = int(answer)
- return True
- except:
- return False
-
def _waitForReboot(c):
"""Wait for server to reboot.
@@ -57,7 +34,7 @@ def testRebootValid(t, env):
"""REBOOT with valid CLAIM_PREVIOUS
FLAGS: reboot
- DEPEND: _ask MKFILE
+ DEPEND: MKFILE
CODE: REBT1
"""
c = env.c1
@@ -79,21 +56,19 @@ def testManyClaims(t, env):
"""REBOOT test
FLAGS: reboot
- DEPEND: _ask _getcount MKDIR MKFILE
+ DEPEND: MKDIR MKFILE
CODE: REBT2
"""
c = env.c1
- if not hasattr(t, '__clientcount'):
- # default if use --force
- t.__clientcount = 5
+ clientcount = 5
pid = str(os.getpid())
basedir = c.homedir + [t.code]
res = c.create_obj(basedir)
check(res, msg="Creating test directory %s" % t.code)
# Make lots of client ids
fhdict = {}
- idlist = ['pynfs%s%06i' % (pid, x) for x in range(t.__clientcount)]
- badids = ['badpynfs%s%06i' % (pid, x) for x in range(t.__clientcount)]
+ idlist = ['pynfs%s%06i' % (pid, x) for x in range(clientcount)]
+ badids = ['badpynfs%s%06i' % (pid, x) for x in range(clientcount)]
for id in idlist:
c.init_connection(id)
fh, stateid = c.create_confirm(t.code, basedir + [id])
@@ -120,7 +95,7 @@ def testRebootWait(t, env):
"""REBOOT with late CLAIM_PREVIOUS should return NFS4ERR_NO_GRACE
FLAGS: reboot
- DEPEND: _ask MKFILE
+ DEPEND: MKFILE
CODE: REBT3
"""
c = env.c1
@@ -142,7 +117,7 @@ def testRebootInvalid(t, env):
"""REBOOT with invalid CLAIM_PREVIOUS
FLAGS: reboot
- DEPEND: _ask MKFILE
+ DEPEND: MKFILE
CODE: REBT4
"""
c = env.c1
@@ -164,7 +139,7 @@ def testEdge1(t, env):
"""REBOOT with first edge condition from RFC 3530
FLAGS: reboot
- DEPEND: _ask MKFILE
+ DEPEND: MKFILE
CODE: REBT5
"""
c1 = env.c1
@@ -210,7 +185,7 @@ def testEdge2(t, env):
"""REBOOT with second edge condition from RFC 3530
FLAGS: reboot
- DEPEND: _ask MKFILE
+ DEPEND: MKFILE
CODE: REBT6
"""
c1 = env.c1
@@ -257,7 +232,7 @@ def testRootSquash(t, env):
"""REBOOT root squash does not work after grace ends?
FLAGS: reboot
- DEPEND: _ask MKFILE MKDIR
+ DEPEND: MKFILE MKDIR
CODE: REBT7
"""
# Note this assumes we can legally use uid 0...either we are using
@@ -295,7 +270,7 @@ def testValidDeleg(t, env):
"""REBOOT with read delegation and reclaim it
FLAGS: reboot delegations
- DEPEND: _ask MKFILE
+ DEPEND: MKFILE
CODE: REBT8
"""
from st_delegation import _get_deleg
@@ -325,7 +300,7 @@ def testRebootMultiple(t, env):
"""REBOOT multiple times with valid CLAIM_PREVIOUS
FLAGS: reboot
- DEPEND: _ask MKFILE
+ DEPEND: MKFILE
CODE: REBT10
"""
c = env.c1
--
1.7.0.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: hang on server boot
2010-09-20 15:03 ` J. Bruce Fields
2010-09-20 15:04 ` [PATCH 1/2] TESTS: fix error when rebootscript defined but not rebootargs J. Bruce Fields
@ 2010-09-20 15:05 ` J. Bruce Fields
2010-09-21 16:11 ` [PATCH] CLNT: preliminary reboot test J. Bruce Fields
1 sibling, 1 reply; 6+ messages in thread
From: J. Bruce Fields @ 2010-09-20 15:05 UTC (permalink / raw)
To: iisaman; +Cc: linux-nfs
On Mon, Sep 20, 2010 at 11:03:17AM -0400, J. Bruce Fields wrote:
> On Sun, Sep 19, 2010 at 11:36:21PM -0400, bfields wrote:
> > Trying to do some reboot testing, I hit this bug. I'm planning to queue
> > up this fix for 2.6.37 absent any objections.
> >
> > (Arguably it could go to 2.6.36, but we're getting closer to the next
> > release, the consequences of this bug aren't too horrible, and it's not
> > a new regression.)
>
> With that kernel fix and the following two pynfs fixes, I pass the
> reboot tests except for REBT4, 5, 6, 7, and 11 (which I haven't tried to
> triage yet).
(pynfs changes also available from:
git://linux-nfs.org/~bfields/pynfs.git
)
--b.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH] CLNT: preliminary reboot test
2010-09-20 15:05 ` hang on server boot J. Bruce Fields
@ 2010-09-21 16:11 ` J. Bruce Fields
0 siblings, 0 replies; 6+ messages in thread
From: J. Bruce Fields @ 2010-09-21 16:11 UTC (permalink / raw)
To: iisaman; +Cc: linux-nfs, groshans
From: Mike Groshans <mike@Moscow.citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
nfs4.1/server41tests/__init__.py | 1 +
nfs4.1/server41tests/environment.py | 13 +++++-
nfs4.1/server41tests/st_reboot.py | 76 +++++++++++++++++++++++++++++++++++
nfs4.1/testserver.py | 7 +++
4 files changed, 95 insertions(+), 2 deletions(-)
create mode 100644 nfs4.1/server41tests/st_reboot.py
Also, the linux 4.1 server now passes the test added by this patch, also
available from
git://linux-nfs.org/~bfields/pynfs41.git
I've updated my test scripts at
git://linux-nfs.org/~bfields/testd.git
to turn on this test, and also to turn on some of the 4.0 "timed" tests.
Note there's a bug in the 4.1 pynfs tests: they should be routinely
doing a reclaim_complete after each exchange_id/create_session.
--b.
diff --git a/nfs4.1/server41tests/__init__.py b/nfs4.1/server41tests/__init__.py
index 8d8212b..d631b74 100644
--- a/nfs4.1/server41tests/__init__.py
+++ b/nfs4.1/server41tests/__init__.py
@@ -8,6 +8,7 @@ __all__ = ["st_exchange_id.py", # draft 21
"st_lookupp.py",
"st_rename.py",
"st_putfh.py",
+ "st_reboot.py",
## "st_lookup.py",
##################
"st_block.py",
diff --git a/nfs4.1/server41tests/environment.py b/nfs4.1/server41tests/environment.py
index 983ab92..e961082 100644
--- a/nfs4.1/server41tests/environment.py
+++ b/nfs4.1/server41tests/environment.py
@@ -471,7 +471,9 @@ def create_file(sess, owner, path=None, attrs={FATTR4_MODE: 0644},
def open_file(sess, owner, path=None,
access=OPEN4_SHARE_ACCESS_READ,
deny=OPEN4_SHARE_DENY_NONE,
+ claim_type=CLAIM_NULL,
want_deleg=False,
+ deleg_type=None,
# Setting the following should induce server errors
seqid=0, clientid=0):
# Set defaults
@@ -484,10 +486,17 @@ def open_file(sess, owner, path=None,
if not want_deleg and access & OPEN4_SHARE_ACCESS_WANT_DELEG_MASK == 0:
access |= OPEN4_SHARE_ACCESS_WANT_NO_DELEG
# Open the file
+ if claim_type==CLAIM_NULL:
+ fh_op = use_obj(dir)
+ elif claim_type==CLAIM_PREVIOUS:
+ fh_op = [op.putfh(path)]
+ name = None
+ if not want_deleg and access & OPEN4_SHARE_ACCESS_WANT_DELEG_MASK == 0:
+ access |= OPEN4_SHARE_ACCESS_WANT_NO_DELEG
open_op = op.open(seqid, access, deny, open_owner4(clientid, owner),
openflag4(OPEN4_NOCREATE),
- open_claim4(CLAIM_NULL, name))
- return sess.compound(use_obj(dir) + [open_op, op.getfh()])
+ open_claim4(claim_type, name, deleg_type))
+ return sess.compound(fh_op + [open_op, op.getfh()])
def create_confirm(sess, owner, path=None, attrs={FATTR4_MODE: 0644},
access=OPEN4_SHARE_ACCESS_BOTH,
diff --git a/nfs4.1/server41tests/st_reboot.py b/nfs4.1/server41tests/st_reboot.py
new file mode 100644
index 0000000..9eee19d
--- /dev/null
+++ b/nfs4.1/server41tests/st_reboot.py
@@ -0,0 +1,76 @@
+from nfs4_const import *
+from nfs4_type import channel_attrs4
+from environment import check, checklist, fail, create_file, open_file, create_confirm
+import sys
+import os
+import nfs4lib
+import nfs4_ops as op
+from rpc import RPCTimeout
+
+# NOTE - reboot tests are NOT part of the standard test suite
+
+def _getleasetime(sess):
+ res = sess.compound([op.putrootfh(), op.getattr(1 << FATTR4_LEASE_TIME)])
+ return res.resarray[-1].obj_attributes[FATTR4_LEASE_TIME]
+
+def _waitForReboot(c, sess, env):
+ """Wait for server to reboot.
+
+ Returns an estimate of how long grace period will last.
+ """
+ oldleasetime = _getleasetime(sess)
+ if env.opts.rebootscript is None:
+ print "Hit ENTER to continue after server is reset"
+ sys.stdin.readline()
+ print "Continuing with test"
+ else:
+ if env.opts.rebootargs is not None:
+ # Invoke the reboot script, passing it rebootargs as an argument.
+ os.system(env.opts.rebootscript + ' ' + env.opts.rebootargs)
+ else:
+ os.system(env.opts.rebootscript)
+ env.c1.c1 = env.c1.connect(env.c1.server_address)
+ return 5 + oldleasetime
+
+def create_session(c, cred=None, flags=0):
+ """Send a simple CREATE_SESSION"""
+ chan_attrs = channel_attrs4(0,8192,8192,8192,128,8,[])
+ res = c.c.compound([op.create_session(c.clientid, c.seqid, flags,
+ chan_attrs, chan_attrs,
+ 123, [])], cred)
+ return res
+
+def reclaim_complete(sess):
+ rc_op = op.reclaim_complete(rca_one_fs=False)
+ res = sess.compound([rc_op])
+ check(res, msg="reclaim_complete")
+
+#####################################################
+
+def testRebootValid(t, env):
+ """REBOOT with valid CLAIM_PREVIOUS
+
+ FLAGS: reboot
+ DEPEND:
+ CODE: REBT1
+ """
+ name = env.testname(t)
+ owner = "owner_%s" % name
+ c = env.c1.new_client(env.testname(t))
+ sess = c.create_session()
+ reclaim_complete(sess)
+ fh, stateid = create_confirm(sess, owner)
+ sleeptime = _waitForReboot(c, sess, env)
+ try:
+ res = create_session(c)
+ check(res, NFS4ERR_STALE_CLIENTID, "Reclaim using old clientid")
+ c = env.c1.new_client(env.testname(t))
+ sess = c.create_session()
+ res = open_file(sess, owner, path=fh, claim_type=CLAIM_PREVIOUS,
+ access=OPEN4_SHARE_ACCESS_BOTH,
+ deny=OPEN4_SHARE_DENY_NONE,
+ deleg_type=OPEN_DELEGATE_NONE)
+ check(res, msg="Reclaim using newly created clientid")
+ reclaim_complete(sess)
+ finally:
+ env.sleep(sleeptime, "Waiting for grace period to end")
diff --git a/nfs4.1/testserver.py b/nfs4.1/testserver.py
index 073291f..450d6c3 100755
--- a/nfs4.1/testserver.py
+++ b/nfs4.1/testserver.py
@@ -193,6 +193,13 @@ def scan_options(p):
## g.add_option("--secure", action="store_true", default=False,
## help="Try to use 'secure' port number <1024 for client [False]")
## p.add_option_group(g)
+
+ g.add_option("--rebootscript", default=None, metavar="FILE",
+ help="Use FILE as the script to reboot SERVER.")
+
+ g.add_option("--rebootargs", default=None, metavar="ARGS",
+ help="Pass ARGS as a string to the reboot script.")
+
return p.parse_args()
class Argtype(object):
--
1.7.0.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-09-21 16:12 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-20 3:36 hang on server boot J. Bruce Fields
2010-09-20 15:03 ` J. Bruce Fields
2010-09-20 15:04 ` [PATCH 1/2] TESTS: fix error when rebootscript defined but not rebootargs J. Bruce Fields
2010-09-20 15:04 ` [PATCH 2/2] TESTS: make reboot tests run non-interactively J. Bruce Fields
2010-09-20 15:05 ` hang on server boot J. Bruce Fields
2010-09-21 16:11 ` [PATCH] CLNT: preliminary reboot test J. Bruce Fields
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).