* NFS clients hang
@ 2003-07-11 14:49 Jean-Christophe Ducom
2003-07-11 18:26 ` Trond Myklebust
0 siblings, 1 reply; 2+ messages in thread
From: Jean-Christophe Ducom @ 2003-07-11 14:49 UTC (permalink / raw)
To: nfs
Hardware:
Dell Precision 530 Dual Xeon1.7GHz 1GB RDRAM
Redhat 7.2 distribution is used with kernel 2.4.21 patched with IngoMolnar IRQ
balancing as there is a 'bug' with Xeon processors (only one would take care of
all interrupts), configured with CONFIG_HIGHMEM option. nfs-utils-1.0.3-1 is used.
The mount option for the client is:
rw,nosuid,nodev,hard,intr,bg,rsize=8192,wsize=8192,nfsvers=3,lock,udp
Sometimes the NFS client was hanging for no reason (usually when a medium size
file (30MB+) was handled over NFS or a lot files were moved around).
The machine was then completely locked (no access at all to it, even thru serial
console, no display on the monitor anymore). tcpdump files didn't show anything
wrong but just stop to report traffic until the lock.
I reconfigured the kernel with CONFIG_DEBUG_SPINLOCK and CONFIG_SOFT_WATCHDOG to
have access to some debug info. The client was still completely locked up and
couldn't get out of the locked state.
I finally read in Chuck Lever's technical report "Customers that use 2.4 kernels
on hardware with more than 896MB shoould know that a special kernel option,
known as CONFIG_HIGHMEM, is required to access and use memory above 896M. The
linux NFS client has a known problem in these configurations where an
application or the whole client system can hang at random. This issue has been
addressed in the 2.4.20 kernel, but still haunts kernels contained in
distribution from RedHat and SUSE that are based on earlier kernels".
So I recompiled 2.4.21 without CONFIG_HIGHMEM and since the NFS clients have
been rock solid. Not a single hang since but of course 100+MB of memory are
'lost' now.
So is this problem still around or is it specific to Xeons?
Thanks
JC
-------------------------------------------------------
This SF.Net email sponsored by: Parasoft
Error proof Web apps, automate testing & more.
Download & eval WebKing and get a free book.
www.parasoft.com/bulletproofapps1
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: NFS clients hang
2003-07-11 14:49 NFS clients hang Jean-Christophe Ducom
@ 2003-07-11 18:26 ` Trond Myklebust
0 siblings, 0 replies; 2+ messages in thread
From: Trond Myklebust @ 2003-07-11 18:26 UTC (permalink / raw)
To: Jean-Christophe Ducom; +Cc: nfs
>>>>> " " == Jean-Christophe Ducom <jducom@nd.edu> writes:
> So is this problem still around or is it specific to Xeons?
It was believed to be fixed in the final version of 2.4.21. Andrea
Archangeli made a patch that adds a non-blocking version of the kmap()
call.
Hmm... It all looks very wrong though: We certainly don't want to be
sending partial requests...
Does the following patch help?
Cheers,
Trond
diff -u --recursive --new-file linux-2.4.22-pre4/net/sunrpc/xdr.c linux-2.4.22-fix_highmem/net/sunrpc/xdr.c
--- linux-2.4.22-pre4/net/sunrpc/xdr.c 2003-06-27 12:29:06.000000000 +0200
+++ linux-2.4.22-fix_highmem/net/sunrpc/xdr.c 2003-07-11 20:23:05.000000000 +0200
@@ -180,7 +180,8 @@
{
struct iovec *iov = iov_base;
struct page **ppage = xdr->pages;
- unsigned int len, pglen = xdr->page_len, first_kmap;
+ struct page **first_kmap = NULL;
+ unsigned int len, pglen = xdr->page_len;
len = xdr->head[0].iov_len;
if (base < len) {
@@ -203,16 +204,15 @@
ppage += base >> PAGE_CACHE_SHIFT;
base &= ~PAGE_CACHE_MASK;
}
- first_kmap = 1;
do {
len = PAGE_CACHE_SIZE;
- if (first_kmap) {
- first_kmap = 0;
+ if (!first_kmap) {
+ first_kmap = ppage;
iov->iov_base = kmap(*ppage);
} else {
iov->iov_base = kmap_nonblock(*ppage);
if (!iov->iov_base)
- goto out;
+ goto out_err;
}
if (base) {
iov->iov_base += base;
@@ -233,6 +233,10 @@
}
out:
return (iov - iov_base);
+out_err:
+ for (; first_kmap != ppage; first_kmap++)
+ kunmap(*first_kmap);
+ return 0;
}
void xdr_kunmap(struct xdr_buf *xdr, unsigned int base, int niov)
diff -u --recursive --new-file linux-2.4.22-pre4/net/sunrpc/xprt.c linux-2.4.22-fix_highmem/net/sunrpc/xprt.c
--- linux-2.4.22-pre4/net/sunrpc/xprt.c 2003-07-08 11:59:01.000000000 +0200
+++ linux-2.4.22-fix_highmem/net/sunrpc/xprt.c 2003-07-11 20:23:50.000000000 +0200
@@ -237,6 +237,10 @@
unsigned int slen_part, n;
niov = xdr_kmap(niv, xdr, skip);
+ if (!niov) {
+ result = -EAGAIN;
+ break;
+ }
msg.msg_flags = MSG_DONTWAIT|MSG_NOSIGNAL;
msg.msg_iov = niv;
-------------------------------------------------------
This SF.Net email sponsored by: Parasoft
Error proof Web apps, automate testing & more.
Download & eval WebKing and get a free book.
www.parasoft.com/bulletproofapps1
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2003-07-11 18:26 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-07-11 14:49 NFS clients hang Jean-Christophe Ducom
2003-07-11 18:26 ` Trond Myklebust
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.