* [NFS] How to set-up a Linux NFS server to handle massive number of requests
@ 2008-04-10 12:12 Carsten Aulbert
[not found] ` <47FE044A.7020008-l1a6w7hxd2yELgA04lAiVw@public.gmane.org>
0 siblings, 1 reply; 16+ messages in thread
From: Carsten Aulbert @ 2008-04-10 12:12 UTC (permalink / raw)
To: nfs
Hi all,
we have a pretty extreme problem here and I try to figure out how to get
it done right.
We have a large cluster consisting of 1340 compute nodes who have a
automount directory which will subsequently trigger a NFS mount (read-only):
$ ypcat auto.data
-fstype=nfs,nfsvers=3,hard,intr,rsize=8192,wsize=8192,tcp &:/data
$ grep auto.data /etc/auto.master
/atlas/data yp:auto.data --timeout=5
So far so good.
When submitting 1000 jobs just doing a md5sum of the very same file from
one single data server, I see very weird effects.
In the standard set-up many connections get into the box (tcp connection
status SYN_RECV) but those fall over after some time and stay in
CLOSE_WAIT state until I restart the nfs-kernel-server. Typically that
looks like (netstat -an):
tcp 0 0 10.20.10.14:687 10.10.2.87:799 SYN_RECV
tcp 0 0 10.20.10.14:687 10.10.4.1:823 SYN_RECV
tcp 0 0 10.20.10.14:687 10.10.1.65:656 SYN_RECV
tcp 0 0 10.20.10.14:687 10.10.1.30:650 SYN_RECV
tcp 0 0 10.20.10.14:687 10.10.0.71:789 SYN_RECV
tcp 0 0 10.20.10.14:687 10.10.1.4:602 SYN_RECV
tcp 0 0 10.20.10.14:687 10.10.1.1:967 SYN_RECV
tcp 0 0 10.20.10.14:687 10.10.3.66:915 SYN_RECV
tcp 0 0 10.20.10.14:687 10.10.0.55:620 SYN_RECV
tcp 0 0 10.20.10.14:687 10.10.1.41:835 SYN_RECV
tcp 0 0 10.20.10.14:687 10.10.2.29:958 SYN_RECV
tcp 0 0 10.20.10.14:687 10.10.1.12:998 SYN_RECV
tcp 0 0 10.20.10.14:687 10.10.1.30:651 SYN_RECV
tcp 0 0 10.20.10.14:687 10.10.1.4:601 SYN_RECV
tcp 0 0 10.20.10.14:2049 10.10.1.19:846
ESTABLISHED
tcp 45 0 10.20.10.14:687 10.10.0.68:979
CLOSE_WAIT
tcp 45 0 10.20.10.14:687 10.10.3.83:680
CLOSE_WAIT
tcp 89 0 10.20.10.14:687 10.10.0.79:604
CLOSE_WAIT
tcp 0 0 10.20.10.14:2049 10.10.2.6:676
ESTABLISHED
tcp 45 0 10.20.10.14:687 10.10.2.56:913
CLOSE_WAIT
tcp 45 0 10.20.10.14:687 10.10.0.60:827
CLOSE_WAIT
tcp 0 0 10.20.10.14:2049 10.10.3.55:778
ESTABLISHED
tcp 45 0 10.20.10.14:687 10.10.2.86:981
CLOSE_WAIT
tcp 45 0 10.20.10.14:687 10.10.9.13:792
CLOSE_WAIT
tcp 89 0 10.20.10.14:687 10.10.2.93:728
CLOSE_WAIT
tcp 45 0 10.20.10.14:687 10.10.0.20:742
CLOSE_WAIT
tcp 45 0 10.20.10.14:687 10.10.3.44:982
CLOSE_WAIT
I played with different numbers of of nfsd (ranging from 8-1024) and
increasing the number of threads for rpc.mountd from 1 to 64, in quite a
few combinations, but so far I have not found a consistent set of
parameters where 1000 nodes are able to read this file at the same time.
Any ideas from anyone or do you need more input from me?
TIA
Carsten
PS: Please Cc me, I'm not yet subscribed.
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that nfs@lists.sourceforge.net is being discontinued.
Please subscribe to linux-nfs@vger.kernel.org instead.
http://vger.kernel.org/vger-lists.html#linux-nfs
^ permalink raw reply [flat|nested] 16+ messages in thread[parent not found: <47FE044A.7020008-l1a6w7hxd2yELgA04lAiVw@public.gmane.org>]
* Re: [NFS] How to set-up a Linux NFS server to handle massive number of requests [not found] ` <47FE044A.7020008-l1a6w7hxd2yELgA04lAiVw@public.gmane.org> @ 2008-04-11 23:07 ` J. Bruce Fields 2008-04-12 6:45 ` Carsten Aulbert 2008-04-15 4:48 ` Tom Tucker 0 siblings, 2 replies; 16+ messages in thread From: J. Bruce Fields @ 2008-04-11 23:07 UTC (permalink / raw) To: Carsten Aulbert; +Cc: nfs On Thu, Apr 10, 2008 at 02:12:58PM +0200, Carsten Aulbert wrote: > Hi all, > > we have a pretty extreme problem here and I try to figure out how to get > it done right. > > We have a large cluster consisting of 1340 compute nodes who have a > automount directory which will subsequently trigger a NFS mount (read-only): > > $ ypcat auto.data > -fstype=nfs,nfsvers=3,hard,intr,rsize=8192,wsize=8192,tcp &:/data > > $ grep auto.data /etc/auto.master > /atlas/data yp:auto.data --timeout=5 > > So far so good. > > When submitting 1000 jobs just doing a md5sum of the very same file from > one single data server, I see very weird effects. > > In the standard set-up many connections get into the box (tcp connection > status SYN_RECV) but those fall over after some time and stay in > CLOSE_WAIT state until I restart the nfs-kernel-server. Typically that > looks like (netstat -an): That's interesting! But I'm not sure how to figure this out. Is it possible to get a network trace that shows what's going on? What happens on the clients? What kernel version are you using?--b. > > tcp 0 0 10.20.10.14:687 10.10.2.87:799 SYN_RECV > tcp 0 0 10.20.10.14:687 10.10.4.1:823 SYN_RECV > tcp 0 0 10.20.10.14:687 10.10.1.65:656 SYN_RECV > tcp 0 0 10.20.10.14:687 10.10.1.30:650 SYN_RECV > tcp 0 0 10.20.10.14:687 10.10.0.71:789 SYN_RECV > tcp 0 0 10.20.10.14:687 10.10.1.4:602 SYN_RECV > tcp 0 0 10.20.10.14:687 10.10.1.1:967 SYN_RECV > tcp 0 0 10.20.10.14:687 10.10.3.66:915 SYN_RECV > tcp 0 0 10.20.10.14:687 10.10.0.55:620 SYN_RECV > tcp 0 0 10.20.10.14:687 10.10.1.41:835 SYN_RECV > tcp 0 0 10.20.10.14:687 10.10.2.29:958 SYN_RECV > tcp 0 0 10.20.10.14:687 10.10.1.12:998 SYN_RECV > tcp 0 0 10.20.10.14:687 10.10.1.30:651 SYN_RECV > tcp 0 0 10.20.10.14:687 10.10.1.4:601 SYN_RECV > tcp 0 0 10.20.10.14:2049 10.10.1.19:846 > ESTABLISHED > tcp 45 0 10.20.10.14:687 10.10.0.68:979 > CLOSE_WAIT > tcp 45 0 10.20.10.14:687 10.10.3.83:680 > CLOSE_WAIT > tcp 89 0 10.20.10.14:687 10.10.0.79:604 > CLOSE_WAIT > tcp 0 0 10.20.10.14:2049 10.10.2.6:676 > ESTABLISHED > tcp 45 0 10.20.10.14:687 10.10.2.56:913 > CLOSE_WAIT > tcp 45 0 10.20.10.14:687 10.10.0.60:827 > CLOSE_WAIT > tcp 0 0 10.20.10.14:2049 10.10.3.55:778 > ESTABLISHED > tcp 45 0 10.20.10.14:687 10.10.2.86:981 > CLOSE_WAIT > tcp 45 0 10.20.10.14:687 10.10.9.13:792 > CLOSE_WAIT > tcp 89 0 10.20.10.14:687 10.10.2.93:728 > CLOSE_WAIT > tcp 45 0 10.20.10.14:687 10.10.0.20:742 > CLOSE_WAIT > tcp 45 0 10.20.10.14:687 10.10.3.44:982 > CLOSE_WAIT > > > I played with different numbers of of nfsd (ranging from 8-1024) and > increasing the number of threads for rpc.mountd from 1 to 64, in quite a > few combinations, but so far I have not found a consistent set of > parameters where 1000 nodes are able to read this file at the same time. > > Any ideas from anyone or do you need more input from me? > > TIA > > Carsten > > PS: Please Cc me, I'm not yet subscribed. > > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > _______________________________________________ > NFS maillist - NFS@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nfs > _______________________________________________ > Please note that nfs@lists.sourceforge.net is being discontinued. > Please subscribe to linux-nfs@vger.kernel.org instead. > http://vger.kernel.org/vger-lists.html#linux-nfs > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs _______________________________________________ Please note that nfs@lists.sourceforge.net is being discontinued. Please subscribe to linux-nfs@vger.kernel.org instead. http://vger.kernel.org/vger-lists.html#linux-nfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [NFS] How to set-up a Linux NFS server to handle massive number of requests 2008-04-11 23:07 ` J. Bruce Fields @ 2008-04-12 6:45 ` Carsten Aulbert [not found] ` <48005A78.9090609-l1a6w7hxd2yELgA04lAiVw@public.gmane.org> 2008-04-15 4:48 ` Tom Tucker 1 sibling, 1 reply; 16+ messages in thread From: Carsten Aulbert @ 2008-04-12 6:45 UTC (permalink / raw) To: J. Bruce Fields; +Cc: nfs 2.6.24.Hi, J. Bruce Fields wrote: >> In the standard set-up many connections get into the box (tcp connection >> status SYN_RECV) but those fall over after some time and stay in >> CLOSE_WAIT state until I restart the nfs-kernel-server. Typically that >> looks like (netstat -an): > > That's interesting! But I'm not sure how to figure this out. > > Is it possible to get a network trace that shows what's going on? > In principle yes, but (1) it's huge. I only get this when doing this with 500-1000 clients starting at about the same time (2) It seems that I don't get a full trace, i.e. the session seem to be incomplete - sometimes I only see a single packet with FIN set. I tried doing this both with wireshark running locally and with ntap's capturing device. > What happens on the clients? > In the logs (/var/log/daemon.log) I only see that the mount request fails in different ways. Apr 9 12:07:55 n0078 automount[26838]: >> mount: RPC: Timed out Apr 9 12:07:55 n0078 automount[26838]: mount(nfs): nfs: mount failure d14:/data on /atlas/data/d14 Apr 9 12:07:55 n0078 automount[26838]: failed to mount /atlas/data/d14 Apr 9 12:18:56 n0078 automount[27977]: >> mount: RPC: Remote system error - Connection timed out Apr 9 12:18:56 n0078 automount[27977]: mount(nfs): nfs: mount failure d14:/data on /atlas/data/d14 I have not yet run tshark in the background on many nodes to see if I can capture the client's view. Would that be beneficial? > What kernel version are you using?--b. 2.6.24.4 on Debian Etch Right now, it seems that running 196 nfsd plus 64 threads for mountd solves the problem for the time being. Although it would be nice to understand these "magic" numbers ;) Thanks! Carsten ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs _______________________________________________ Please note that nfs@lists.sourceforge.net is being discontinued. Please subscribe to linux-nfs@vger.kernel.org instead. http://vger.kernel.org/vger-lists.html#linux-nfs ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <48005A78.9090609-l1a6w7hxd2yELgA04lAiVw@public.gmane.org>]
* Re: [NFS] How to set-up a Linux NFS server to handle massive number of requests [not found] ` <48005A78.9090609-l1a6w7hxd2yELgA04lAiVw@public.gmane.org> @ 2008-04-14 17:06 ` J. Bruce Fields 0 siblings, 0 replies; 16+ messages in thread From: J. Bruce Fields @ 2008-04-14 17:06 UTC (permalink / raw) To: Carsten Aulbert; +Cc: nfs On Sat, Apr 12, 2008 at 08:45:12AM +0200, Carsten Aulbert wrote: > 2.6.24.Hi, > > J. Bruce Fields wrote: >>> In the standard set-up many connections get into the box (tcp >>> connection status SYN_RECV) but those fall over after some time and >>> stay in CLOSE_WAIT state until I restart the nfs-kernel-server. >>> Typically that looks like (netstat -an): >> >> That's interesting! But I'm not sure how to figure this out. >> >> Is it possible to get a network trace that shows what's going on? >> > > In principle yes, but > (1) it's huge. I only get this when doing this with 500-1000 clients > starting at about the same time > (2) It seems that I don't get a full trace, i.e. the session seem to be > incomplete - sometimes I only see a single packet with FIN set. I tried > doing this both with wireshark running locally and with ntap's capturing > device. Yeah, that's not surprising. You'd probably want to dedicate a machine to doing the capture, and then I'm not sure what kind of hardware you'd need for a given network to get everything. Probably it's not worth it. >> What happens on the clients? >> > In the logs (/var/log/daemon.log) I only see that the mount request > fails in different ways. > > Apr 9 12:07:55 n0078 automount[26838]: >> mount: RPC: Timed out > Apr 9 12:07:55 n0078 automount[26838]: mount(nfs): nfs: mount failure > d14:/data on /atlas/data/d14 > Apr 9 12:07:55 n0078 automount[26838]: failed to mount /atlas/data/d14 > Apr 9 12:18:56 n0078 automount[27977]: >> mount: RPC: Remote system > error - Connection timed out > Apr 9 12:18:56 n0078 automount[27977]: mount(nfs): nfs: mount failure > d14:/data on /atlas/data/d14 > > I have not yet run tshark in the background on many nodes to see if I > can capture the client's view. Would that be beneficial? Couldn't hurt. Hauling out TCP/IP Illustrated and refreshing my memory of the tcp state transition diagram.... So if the server has a lot of connections stuck in CLOSE_WAIT, that means it got FIN's from the clients (perhaps after they timed out), but never shut down its side of the connection. Sounds like a bug in some server-side rpc code. (Hm. But all those SYN_RECV's are somebody waiting for a client to ACK a SYN. Why are there so many of those?) Those connections are actually to port 687, which I assume is mountd (what does rpcinfo -p say?). (And probably if you just killed and restarted mountd, instead of doing a complete "/etc/init.d/nfs-kernel-server restart", that'd also clear those out.) In fact, in the example you gave only three out of about 27 connections (the only ESTABLISHED connections) were to port 2049 (nfsd itself). So it looks like it's mountd that's not keeping up (and that's leaving connections sitting around too long), and the mountd processes are probably what we should be debugging. >> What kernel version are you using?--b. > > 2.6.24.4 on Debian Etch > > Right now, it seems that running 196 nfsd plus 64 threads for mountd > solves the problem for the time being. Although it would be nice to > understand these "magic" numbers ;) Yes, definitely. I'm surprised the number of nfsd threads matters much at all, actually, if mountd is the bottleneck. --b. ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs _______________________________________________ Please note that nfs@lists.sourceforge.net is being discontinued. Please subscribe to linux-nfs@vger.kernel.org instead. http://vger.kernel.org/vger-lists.html#linux-nfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [NFS] How to set-up a Linux NFS server to handle massive number of requests 2008-04-11 23:07 ` J. Bruce Fields 2008-04-12 6:45 ` Carsten Aulbert @ 2008-04-15 4:48 ` Tom Tucker [not found] ` <1208234913.17169.50.camel-SMNkleLxa3ZimH42XvhXlA@public.gmane.org> 1 sibling, 1 reply; 16+ messages in thread From: Tom Tucker @ 2008-04-15 4:48 UTC (permalink / raw) To: J. Bruce Fields; +Cc: nfs, Carsten Aulbert Maybe this this is a TCP_BACKLOG issue? BTW, with that many mounts won't you run out of "secure" ports (< 1024), so you'll need to use 'insecure' as a mount option. On Fri, 2008-04-11 at 19:07 -0400, J. Bruce Fields wrote: > On Thu, Apr 10, 2008 at 02:12:58PM +0200, Carsten Aulbert wrote: > > Hi all, > > > > we have a pretty extreme problem here and I try to figure out how to get > > it done right. > > > > We have a large cluster consisting of 1340 compute nodes who have a > > automount directory which will subsequently trigger a NFS mount (read-only): > > > > $ ypcat auto.data > > -fstype=nfs,nfsvers=3,hard,intr,rsize=8192,wsize=8192,tcp &:/data > > > > $ grep auto.data /etc/auto.master > > /atlas/data yp:auto.data --timeout=5 > > > > So far so good. > > > > When submitting 1000 jobs just doing a md5sum of the very same file from > > one single data server, I see very weird effects. > > > > In the standard set-up many connections get into the box (tcp connection > > status SYN_RECV) but those fall over after some time and stay in > > CLOSE_WAIT state until I restart the nfs-kernel-server. Typically that > > looks like (netstat -an): > > That's interesting! But I'm not sure how to figure this out. > > Is it possible to get a network trace that shows what's going on? > > What happens on the clients? > > What kernel version are you using?--b. > > > > > tcp 0 0 10.20.10.14:687 10.10.2.87:799 SYN_RECV > > tcp 0 0 10.20.10.14:687 10.10.4.1:823 SYN_RECV > > tcp 0 0 10.20.10.14:687 10.10.1.65:656 SYN_RECV > > tcp 0 0 10.20.10.14:687 10.10.1.30:650 SYN_RECV > > tcp 0 0 10.20.10.14:687 10.10.0.71:789 SYN_RECV > > tcp 0 0 10.20.10.14:687 10.10.1.4:602 SYN_RECV > > tcp 0 0 10.20.10.14:687 10.10.1.1:967 SYN_RECV > > tcp 0 0 10.20.10.14:687 10.10.3.66:915 SYN_RECV > > tcp 0 0 10.20.10.14:687 10.10.0.55:620 SYN_RECV > > tcp 0 0 10.20.10.14:687 10.10.1.41:835 SYN_RECV > > tcp 0 0 10.20.10.14:687 10.10.2.29:958 SYN_RECV > > tcp 0 0 10.20.10.14:687 10.10.1.12:998 SYN_RECV > > tcp 0 0 10.20.10.14:687 10.10.1.30:651 SYN_RECV > > tcp 0 0 10.20.10.14:687 10.10.1.4:601 SYN_RECV > > tcp 0 0 10.20.10.14:2049 10.10.1.19:846 > > ESTABLISHED > > tcp 45 0 10.20.10.14:687 10.10.0.68:979 > > CLOSE_WAIT > > tcp 45 0 10.20.10.14:687 10.10.3.83:680 > > CLOSE_WAIT > > tcp 89 0 10.20.10.14:687 10.10.0.79:604 > > CLOSE_WAIT > > tcp 0 0 10.20.10.14:2049 10.10.2.6:676 > > ESTABLISHED > > tcp 45 0 10.20.10.14:687 10.10.2.56:913 > > CLOSE_WAIT > > tcp 45 0 10.20.10.14:687 10.10.0.60:827 > > CLOSE_WAIT > > tcp 0 0 10.20.10.14:2049 10.10.3.55:778 > > ESTABLISHED > > tcp 45 0 10.20.10.14:687 10.10.2.86:981 > > CLOSE_WAIT > > tcp 45 0 10.20.10.14:687 10.10.9.13:792 > > CLOSE_WAIT > > tcp 89 0 10.20.10.14:687 10.10.2.93:728 > > CLOSE_WAIT > > tcp 45 0 10.20.10.14:687 10.10.0.20:742 > > CLOSE_WAIT > > tcp 45 0 10.20.10.14:687 10.10.3.44:982 > > CLOSE_WAIT > > > > > > I played with different numbers of of nfsd (ranging from 8-1024) and > > increasing the number of threads for rpc.mountd from 1 to 64, in quite a > > few combinations, but so far I have not found a consistent set of > > parameters where 1000 nodes are able to read this file at the same time. > > > > Any ideas from anyone or do you need more input from me? > > > > TIA > > > > Carsten > > > > PS: Please Cc me, I'm not yet subscribed. > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > > Don't miss this year's exciting event. There's still time to save $100. > > Use priority code J8TL2D2. > > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > > _______________________________________________ > > NFS maillist - NFS@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/nfs > > _______________________________________________ > > Please note that nfs@lists.sourceforge.net is being discontinued. > > Please subscribe to linux-nfs@vger.kernel.org instead. > > http://vger.kernel.org/vger-lists.html#linux-nfs > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > _______________________________________________ > NFS maillist - NFS@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nfs > _______________________________________________ > Please note that nfs@lists.sourceforge.net is being discontinued. > Please subscribe to linux-nfs@vger.kernel.org instead. > http://vger.kernel.org/vger-lists.html#linux-nfs ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs _______________________________________________ Please note that nfs@lists.sourceforge.net is being discontinued. Please subscribe to linux-nfs@vger.kernel.org instead. http://vger.kernel.org/vger-lists.html#linux-nfs ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <1208234913.17169.50.camel-SMNkleLxa3ZimH42XvhXlA@public.gmane.org>]
* Re: [NFS] How to set-up a Linux NFS server to handle massive number of requests [not found] ` <1208234913.17169.50.camel-SMNkleLxa3ZimH42XvhXlA@public.gmane.org> @ 2008-04-15 5:42 ` Carsten Aulbert [not found] ` <48044055.2060500-l1a6w7hxd2yELgA04lAiVw@public.gmane.org> 2008-04-15 15:12 ` J. Bruce Fields 1 sibling, 1 reply; 16+ messages in thread From: Carsten Aulbert @ 2008-04-15 5:42 UTC (permalink / raw) To: Tom Tucker; +Cc: J. Bruce Fields, nfs Tom Tucker wrote: > Maybe this this is a TCP_BACKLOG issue? > Hmm, Google does not yield much information about this. I think I know what that would be, is there a cure or some kernel switches for tuning that? > BTW, with that many mounts won't you run out of "secure" ports (< 1024), > so you'll need to use 'insecure' as a mount option. Not to my knowledge. All connections go to a single port onto the server box (well, one port per service). Only the clients may run out of privileged ports of they do too much mounting, but mostly this option is just for "security" reasons. At least that's my understanding. By the ways, discussing this issue with my colleague cluster admins, the question popped up, if there is a guideline/rule of thump of how many nfsd one should run - or asking the other way round, how to arrive at a good compromise. Our server boxes are pretty big (8 cores, 16 GB memory, 16 disk Areca1261 RAID6), so the resources used by the nfsd are not much of an issue - I even tested with 1024 nfsd idling around. AT some point increasing the number does not make much sense because I cannot get the data out fast enough or the seeks will likely "kill" the box^Wperformance. Any thoughts on that? Cheers Carsten ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs _______________________________________________ Please note that nfs@lists.sourceforge.net is being discontinued. Please subscribe to linux-nfs@vger.kernel.org instead. http://vger.kernel.org/vger-lists.html#linux-nfs ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <48044055.2060500-l1a6w7hxd2yELgA04lAiVw@public.gmane.org>]
* Re: [NFS] How to set-up a Linux NFS server to handle massive number of requests [not found] ` <48044055.2060500-l1a6w7hxd2yELgA04lAiVw@public.gmane.org> @ 2008-04-15 13:58 ` J. Bruce Fields 2008-04-16 2:49 ` Tom Tucker 1 sibling, 0 replies; 16+ messages in thread From: J. Bruce Fields @ 2008-04-15 13:58 UTC (permalink / raw) To: Carsten Aulbert; +Cc: nfs On Tue, Apr 15, 2008 at 07:42:45AM +0200, Carsten Aulbert wrote: > By the ways, discussing this issue with my colleague cluster admins, the > question popped up, if there is a guideline/rule of thump of how many > nfsd one should run - or asking the other way round, how to arrive at a > good compromise. > > Our server boxes are pretty big (8 cores, 16 GB memory, 16 disk > Areca1261 RAID6), so the resources used by the nfsd are not much of an > issue - I even tested with 1024 nfsd idling around. AT some point > increasing the number does not make much sense because I cannot get the > data out fast enough or the seeks will likely "kill" the box^Wperformance. > > Any thoughts on that? The only advice I know of is to check the "th" line in /proc/net/rpc/nfsd and adjust the number of threads until you can verify that they're rarely all in use; see http://nfs.sourceforge.net/nfs-howto/ar01s05.html#nfsd_daemon_instances --b. ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs _______________________________________________ Please note that nfs@lists.sourceforge.net is being discontinued. Please subscribe to linux-nfs@vger.kernel.org instead. http://vger.kernel.org/vger-lists.html#linux-nfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [NFS] How to set-up a Linux NFS server to handle massive number of requests [not found] ` <48044055.2060500-l1a6w7hxd2yELgA04lAiVw@public.gmane.org> 2008-04-15 13:58 ` J. Bruce Fields @ 2008-04-16 2:49 ` Tom Tucker 1 sibling, 0 replies; 16+ messages in thread From: Tom Tucker @ 2008-04-16 2:49 UTC (permalink / raw) To: Carsten Aulbert; +Cc: J. Bruce Fields, nfs On Tue, 2008-04-15 at 07:42 +0200, Carsten Aulbert wrote: > > Tom Tucker wrote: > > Maybe this this is a TCP_BACKLOG issue? > > > > Hmm, Google does not yield much information about this. I think I know > what that would be, is there a cure or some kernel switches for tuning that? > > > BTW, with that many mounts won't you run out of "secure" ports (< 1024), > > so you'll need to use 'insecure' as a mount option. > Not to my knowledge. All connections go to a single port onto the server > box (well, one port per service). Only the clients may run out of > privileged ports of they do too much mounting, but mostly this option is > just for "security" reasons. At least that's my understanding. Yes, you're right...I was being dumb here. Sorry. > > By the ways, discussing this issue with my colleague cluster admins, the > question popped up, if there is a guideline/rule of thump of how many > nfsd one should run - or asking the other way round, how to arrive at a > good compromise. > > Our server boxes are pretty big (8 cores, 16 GB memory, 16 disk > Areca1261 RAID6), so the resources used by the nfsd are not much of an > issue - I even tested with 1024 nfsd idling around. AT some point > increasing the number does not make much sense because I cannot get the > data out fast enough or the seeks will likely "kill" the box^Wperformance. > > Any thoughts on that? > > Cheers > > Carsten > > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > _______________________________________________ > NFS maillist - NFS@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nfs > _______________________________________________ > Please note that nfs@lists.sourceforge.net is being discontinued. > Please subscribe to linux-nfs@vger.kernel.org instead. > http://vger.kernel.org/vger-lists.html#linux-nfs > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs _______________________________________________ Please note that nfs@lists.sourceforge.net is being discontinued. Please subscribe to linux-nfs@vger.kernel.org instead. http://vger.kernel.org/vger-lists.html#linux-nfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [NFS] How to set-up a Linux NFS server to handle massive number of requests [not found] ` <1208234913.17169.50.camel-SMNkleLxa3ZimH42XvhXlA@public.gmane.org> 2008-04-15 5:42 ` Carsten Aulbert @ 2008-04-15 15:12 ` J. Bruce Fields 2008-04-16 2:43 ` Tom Tucker 1 sibling, 1 reply; 16+ messages in thread From: J. Bruce Fields @ 2008-04-15 15:12 UTC (permalink / raw) To: Tom Tucker; +Cc: nfs, Carsten Aulbert On Mon, Apr 14, 2008 at 11:48:33PM -0500, Tom Tucker wrote: > > Maybe this this is a TCP_BACKLOG issue? So, looking around.... There seems to be a global limit in /proc/sys/net/ipv4/tcp_max_syn_backlog (default 1024?); might be worth seeing what happens if that's increased, e.g., with echo 2048 >/proc/sys/net/ipv4/tcp_max_syn_backlog Though each client does have to make more than one tcp connection, I wouldn't expect it to be making more than one at a time, so with 1340 clients, and assuming the requests are spread out at least a tiny bit, I would have thought 1024 would be enough. Oh, but: Grepping the glibc rpc code, it looks like it calls listen with second argument SOMAXCONN == 128. You can confirm that by strace'ing rpc.mountd -F and looking for the listen call. And that socket's shared between all the mountd processes, so I guess that's the real limit. I don't see an easy way to adjust that. You'd also need to increase /proc/sys/net/core/somaxconn first. But none of this explains why we'd see connections stuck in CLOSE_WAIT indefinitely? --b. > > BTW, with that many mounts won't you run out of "secure" ports (< 1024), > so you'll need to use 'insecure' as a mount option. > > > On Fri, 2008-04-11 at 19:07 -0400, J. Bruce Fields wrote: > > On Thu, Apr 10, 2008 at 02:12:58PM +0200, Carsten Aulbert wrote: > > > Hi all, > > > > > > we have a pretty extreme problem here and I try to figure out how to get > > > it done right. > > > > > > We have a large cluster consisting of 1340 compute nodes who have a > > > automount directory which will subsequently trigger a NFS mount (read-only): > > > > > > $ ypcat auto.data > > > -fstype=nfs,nfsvers=3,hard,intr,rsize=8192,wsize=8192,tcp &:/data > > > > > > $ grep auto.data /etc/auto.master > > > /atlas/data yp:auto.data --timeout=5 > > > > > > So far so good. > > > > > > When submitting 1000 jobs just doing a md5sum of the very same file from > > > one single data server, I see very weird effects. > > > > > > In the standard set-up many connections get into the box (tcp connection > > > status SYN_RECV) but those fall over after some time and stay in > > > CLOSE_WAIT state until I restart the nfs-kernel-server. Typically that > > > looks like (netstat -an): > > > > That's interesting! But I'm not sure how to figure this out. > > > > Is it possible to get a network trace that shows what's going on? > > > > What happens on the clients? > > > > What kernel version are you using?--b. > > > > > > > > tcp 0 0 10.20.10.14:687 10.10.2.87:799 SYN_RECV > > > tcp 0 0 10.20.10.14:687 10.10.4.1:823 SYN_RECV > > > tcp 0 0 10.20.10.14:687 10.10.1.65:656 SYN_RECV > > > tcp 0 0 10.20.10.14:687 10.10.1.30:650 SYN_RECV > > > tcp 0 0 10.20.10.14:687 10.10.0.71:789 SYN_RECV > > > tcp 0 0 10.20.10.14:687 10.10.1.4:602 SYN_RECV > > > tcp 0 0 10.20.10.14:687 10.10.1.1:967 SYN_RECV > > > tcp 0 0 10.20.10.14:687 10.10.3.66:915 SYN_RECV > > > tcp 0 0 10.20.10.14:687 10.10.0.55:620 SYN_RECV > > > tcp 0 0 10.20.10.14:687 10.10.1.41:835 SYN_RECV > > > tcp 0 0 10.20.10.14:687 10.10.2.29:958 SYN_RECV > > > tcp 0 0 10.20.10.14:687 10.10.1.12:998 SYN_RECV > > > tcp 0 0 10.20.10.14:687 10.10.1.30:651 SYN_RECV > > > tcp 0 0 10.20.10.14:687 10.10.1.4:601 SYN_RECV > > > tcp 0 0 10.20.10.14:2049 10.10.1.19:846 > > > ESTABLISHED > > > tcp 45 0 10.20.10.14:687 10.10.0.68:979 > > > CLOSE_WAIT > > > tcp 45 0 10.20.10.14:687 10.10.3.83:680 > > > CLOSE_WAIT > > > tcp 89 0 10.20.10.14:687 10.10.0.79:604 > > > CLOSE_WAIT > > > tcp 0 0 10.20.10.14:2049 10.10.2.6:676 > > > ESTABLISHED > > > tcp 45 0 10.20.10.14:687 10.10.2.56:913 > > > CLOSE_WAIT > > > tcp 45 0 10.20.10.14:687 10.10.0.60:827 > > > CLOSE_WAIT > > > tcp 0 0 10.20.10.14:2049 10.10.3.55:778 > > > ESTABLISHED > > > tcp 45 0 10.20.10.14:687 10.10.2.86:981 > > > CLOSE_WAIT > > > tcp 45 0 10.20.10.14:687 10.10.9.13:792 > > > CLOSE_WAIT > > > tcp 89 0 10.20.10.14:687 10.10.2.93:728 > > > CLOSE_WAIT > > > tcp 45 0 10.20.10.14:687 10.10.0.20:742 > > > CLOSE_WAIT > > > tcp 45 0 10.20.10.14:687 10.10.3.44:982 > > > CLOSE_WAIT > > > > > > > > > I played with different numbers of of nfsd (ranging from 8-1024) and > > > increasing the number of threads for rpc.mountd from 1 to 64, in quite a > > > few combinations, but so far I have not found a consistent set of > > > parameters where 1000 nodes are able to read this file at the same time. > > > > > > Any ideas from anyone or do you need more input from me? > > > > > > TIA > > > > > > Carsten > > > > > > PS: Please Cc me, I'm not yet subscribed. > > > > > > ------------------------------------------------------------------------- > > > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > > > Don't miss this year's exciting event. There's still time to save $100. > > > Use priority code J8TL2D2. > > > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > > > _______________________________________________ > > > NFS maillist - NFS@lists.sourceforge.net > > > https://lists.sourceforge.net/lists/listinfo/nfs > > > _______________________________________________ > > > Please note that nfs@lists.sourceforge.net is being discontinued. > > > Please subscribe to linux-nfs@vger.kernel.org instead. > > > http://vger.kernel.org/vger-lists.html#linux-nfs > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > > Don't miss this year's exciting event. There's still time to save $100. > > Use priority code J8TL2D2. > > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > > _______________________________________________ > > NFS maillist - NFS@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/nfs > > _______________________________________________ > > Please note that nfs@lists.sourceforge.net is being discontinued. > > Please subscribe to linux-nfs@vger.kernel.org instead. > > http://vger.kernel.org/vger-lists.html#linux-nfs > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > _______________________________________________ > NFS maillist - NFS@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nfs > _______________________________________________ > Please note that nfs@lists.sourceforge.net is being discontinued. > Please subscribe to linux-nfs@vger.kernel.org instead. > http://vger.kernel.org/vger-lists.html#linux-nfs > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs _______________________________________________ Please note that nfs@lists.sourceforge.net is being discontinued. Please subscribe to linux-nfs@vger.kernel.org instead. http://vger.kernel.org/vger-lists.html#linux-nfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [NFS] How to set-up a Linux NFS server to handle massive number of requests 2008-04-15 15:12 ` J. Bruce Fields @ 2008-04-16 2:43 ` Tom Tucker [not found] ` <1208313790.3521.32.camel-SMNkleLxa3ZimH42XvhXlA@public.gmane.org> 0 siblings, 1 reply; 16+ messages in thread From: Tom Tucker @ 2008-04-16 2:43 UTC (permalink / raw) To: J. Bruce Fields; +Cc: nfs, Carsten Aulbert On Tue, 2008-04-15 at 11:12 -0400, J. Bruce Fields wrote: > On Mon, Apr 14, 2008 at 11:48:33PM -0500, Tom Tucker wrote: > > > > Maybe this this is a TCP_BACKLOG issue? > > So, looking around.... There seems to be a global limit in > /proc/sys/net/ipv4/tcp_max_syn_backlog (default 1024?); might be worth > seeing what happens if that's increased, e.g., with > > echo 2048 >/proc/sys/net/ipv4/tcp_max_syn_backlog I think this represents the collective total for all listening endpoints. I think we're only talking about mountd. Shooting from the hip... My gray haired recollection is that the single connection default is a backlog of 10 (SYN received, not accepted connections). Additional SYN's received to this endpoint will be dropped...clients will retry the SYN as part of normal TCP retransmit... It might be that the CLOSE_WAIT's in the log are _normal_. That is, they reflect completed mount requests that are in the normal close path. If they never go away, then that's not normal. Is this the case? Suppose the 10 is roughly correct. The remaining "jilted" clients will retransmit their SYN after a randomized exponential backoff. I think you can imagine that trying 1300+ connections of which only 10 succeed and then retrying 1300-10 based on a randomized exponential backoff might get you some pretty bad performance. Just a thought -- > > Though each client does have to make more than one tcp connection, I > wouldn't expect it to be making more than one at a time, so with 1340 > clients, and assuming the requests are spread out at least a tiny bit, I > would have thought 1024 would be enough. > > Oh, but: Grepping the glibc rpc code, it looks like it calls listen with > second argument SOMAXCONN == 128. You can confirm that by strace'ing > rpc.mountd -F and looking for the listen call. > > And that socket's shared between all the mountd processes, so I guess > that's the real limit. I don't see an easy way to adjust that. You'd > also need to increase /proc/sys/net/core/somaxconn first. > > But none of this explains why we'd see connections stuck in CLOSE_WAIT > indefinitely? > > --b. > > > > > BTW, with that many mounts won't you run out of "secure" ports (< 1024), > > so you'll need to use 'insecure' as a mount option. > > > > > > On Fri, 2008-04-11 at 19:07 -0400, J. Bruce Fields wrote: > > > On Thu, Apr 10, 2008 at 02:12:58PM +0200, Carsten Aulbert wrote: > > > > Hi all, > > > > > > > > we have a pretty extreme problem here and I try to figure out how to get > > > > it done right. > > > > > > > > We have a large cluster consisting of 1340 compute nodes who have a > > > > automount directory which will subsequently trigger a NFS mount (read-only): > > > > > > > > $ ypcat auto.data > > > > -fstype=nfs,nfsvers=3,hard,intr,rsize=8192,wsize=8192,tcp &:/data > > > > > > > > $ grep auto.data /etc/auto.master > > > > /atlas/data yp:auto.data --timeout=5 > > > > > > > > So far so good. > > > > > > > > When submitting 1000 jobs just doing a md5sum of the very same file from > > > > one single data server, I see very weird effects. > > > > > > > > In the standard set-up many connections get into the box (tcp connection > > > > status SYN_RECV) but those fall over after some time and stay in > > > > CLOSE_WAIT state until I restart the nfs-kernel-server. Typically that > > > > looks like (netstat -an): > > > > > > That's interesting! But I'm not sure how to figure this out. > > > > > > Is it possible to get a network trace that shows what's going on? > > > > > > What happens on the clients? > > > > > > What kernel version are you using?--b. > > > > > > > > > > > tcp 0 0 10.20.10.14:687 10.10.2.87:799 SYN_RECV > > > > tcp 0 0 10.20.10.14:687 10.10.4.1:823 SYN_RECV > > > > tcp 0 0 10.20.10.14:687 10.10.1.65:656 SYN_RECV > > > > tcp 0 0 10.20.10.14:687 10.10.1.30:650 SYN_RECV > > > > tcp 0 0 10.20.10.14:687 10.10.0.71:789 SYN_RECV > > > > tcp 0 0 10.20.10.14:687 10.10.1.4:602 SYN_RECV > > > > tcp 0 0 10.20.10.14:687 10.10.1.1:967 SYN_RECV > > > > tcp 0 0 10.20.10.14:687 10.10.3.66:915 SYN_RECV > > > > tcp 0 0 10.20.10.14:687 10.10.0.55:620 SYN_RECV > > > > tcp 0 0 10.20.10.14:687 10.10.1.41:835 SYN_RECV > > > > tcp 0 0 10.20.10.14:687 10.10.2.29:958 SYN_RECV > > > > tcp 0 0 10.20.10.14:687 10.10.1.12:998 SYN_RECV > > > > tcp 0 0 10.20.10.14:687 10.10.1.30:651 SYN_RECV > > > > tcp 0 0 10.20.10.14:687 10.10.1.4:601 SYN_RECV > > > > tcp 0 0 10.20.10.14:2049 10.10.1.19:846 > > > > ESTABLISHED > > > > tcp 45 0 10.20.10.14:687 10.10.0.68:979 > > > > CLOSE_WAIT > > > > tcp 45 0 10.20.10.14:687 10.10.3.83:680 > > > > CLOSE_WAIT > > > > tcp 89 0 10.20.10.14:687 10.10.0.79:604 > > > > CLOSE_WAIT > > > > tcp 0 0 10.20.10.14:2049 10.10.2.6:676 > > > > ESTABLISHED > > > > tcp 45 0 10.20.10.14:687 10.10.2.56:913 > > > > CLOSE_WAIT > > > > tcp 45 0 10.20.10.14:687 10.10.0.60:827 > > > > CLOSE_WAIT > > > > tcp 0 0 10.20.10.14:2049 10.10.3.55:778 > > > > ESTABLISHED > > > > tcp 45 0 10.20.10.14:687 10.10.2.86:981 > > > > CLOSE_WAIT > > > > tcp 45 0 10.20.10.14:687 10.10.9.13:792 > > > > CLOSE_WAIT > > > > tcp 89 0 10.20.10.14:687 10.10.2.93:728 > > > > CLOSE_WAIT > > > > tcp 45 0 10.20.10.14:687 10.10.0.20:742 > > > > CLOSE_WAIT > > > > tcp 45 0 10.20.10.14:687 10.10.3.44:982 > > > > CLOSE_WAIT > > > > > > > > > > > > I played with different numbers of of nfsd (ranging from 8-1024) and > > > > increasing the number of threads for rpc.mountd from 1 to 64, in quite a > > > > few combinations, but so far I have not found a consistent set of > > > > parameters where 1000 nodes are able to read this file at the same time. > > > > > > > > Any ideas from anyone or do you need more input from me? > > > > > > > > TIA > > > > > > > > Carsten > > > > > > > > PS: Please Cc me, I'm not yet subscribed. > > > > > > > > ------------------------------------------------------------------------- > > > > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > > > > Don't miss this year's exciting event. There's still time to save $100. > > > > Use priority code J8TL2D2. > > > > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > > > > _______________________________________________ > > > > NFS maillist - NFS@lists.sourceforge.net > > > > https://lists.sourceforge.net/lists/listinfo/nfs > > > > _______________________________________________ > > > > Please note that nfs@lists.sourceforge.net is being discontinued. > > > > Please subscribe to linux-nfs@vger.kernel.org instead. > > > > http://vger.kernel.org/vger-lists.html#linux-nfs > > > > > > > > -- > > > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > > > > the body of a message to majordomo@vger.kernel.org > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > ------------------------------------------------------------------------- > > > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > > > Don't miss this year's exciting event. There's still time to save $100. > > > Use priority code J8TL2D2. > > > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > > > _______________________________________________ > > > NFS maillist - NFS@lists.sourceforge.net > > > https://lists.sourceforge.net/lists/listinfo/nfs > > > _______________________________________________ > > > Please note that nfs@lists.sourceforge.net is being discontinued. > > > Please subscribe to linux-nfs@vger.kernel.org instead. > > > http://vger.kernel.org/vger-lists.html#linux-nfs > > > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > > Don't miss this year's exciting event. There's still time to save $100. > > Use priority code J8TL2D2. > > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > > _______________________________________________ > > NFS maillist - NFS@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/nfs > > _______________________________________________ > > Please note that nfs@lists.sourceforge.net is being discontinued. > > Please subscribe to linux-nfs@vger.kernel.org instead. > > http://vger.kernel.org/vger-lists.html#linux-nfs > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs _______________________________________________ Please note that nfs@lists.sourceforge.net is being discontinued. Please subscribe to linux-nfs@vger.kernel.org instead. http://vger.kernel.org/vger-lists.html#linux-nfs ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <1208313790.3521.32.camel-SMNkleLxa3ZimH42XvhXlA@public.gmane.org>]
* Re: [NFS] How to set-up a Linux NFS server to handle massive number of requests [not found] ` <1208313790.3521.32.camel-SMNkleLxa3ZimH42XvhXlA@public.gmane.org> @ 2008-04-16 2:58 ` J. Bruce Fields 2008-04-16 3:22 ` Tom Tucker 0 siblings, 1 reply; 16+ messages in thread From: J. Bruce Fields @ 2008-04-16 2:58 UTC (permalink / raw) To: Tom Tucker; +Cc: nfs, Carsten Aulbert On Tue, Apr 15, 2008 at 09:43:10PM -0500, Tom Tucker wrote: > > On Tue, 2008-04-15 at 11:12 -0400, J. Bruce Fields wrote: > > On Mon, Apr 14, 2008 at 11:48:33PM -0500, Tom Tucker wrote: > > > > > > Maybe this this is a TCP_BACKLOG issue? > > > > So, looking around.... There seems to be a global limit in > > /proc/sys/net/ipv4/tcp_max_syn_backlog (default 1024?); might be worth > > seeing what happens if that's increased, e.g., with > > > > echo 2048 >/proc/sys/net/ipv4/tcp_max_syn_backlog > > I think this represents the collective total for all listening > endpoints. I think we're only talking about mountd. Yes. > Shooting from the hip... > > My gray haired recollection is that the single connection default is a > backlog of 10 (SYN received, not accepted connections). Additional SYN's > received to this endpoint will be dropped...clients will retry the SYN > as part of normal TCP retransmit... > > It might be that the CLOSE_WAIT's in the log are _normal_. That is, they > reflect completed mount requests that are in the normal close path. If > they never go away, then that's not normal. Is this the case? What he said was: "those fall over after some time and stay in CLOSE_WAIT state until I restart the nfs-kernel-server." Carsten, are you positive that the same sockets were in CLOSE_WAIT the whole time you were watching? And how long was it before you gave up and restarted? > Suppose the 10 is roughly correct. The remaining "jilted" clients will > retransmit their SYN after a randomized exponential backoff. I think you > can imagine that trying 1300+ connections of which only 10 succeed and > then retrying 1300-10 based on a randomized exponential backoff might > get you some pretty bad performance. Right, could be, but: ... > > Oh, but: Grepping the glibc rpc code, it looks like it calls listen with > > second argument SOMAXCONN == 128. You can confirm that by strace'ing > > rpc.mountd -F and looking for the listen call. > > > > And that socket's shared between all the mountd processes, so I guess > > that's the real limit. I don't see an easy way to adjust that. You'd > > also need to increase /proc/sys/net/core/somaxconn first. > > > > But none of this explains why we'd see connections stuck in CLOSE_WAIT > > indefinitely? So the limit appears to be more like 128, and (based on my quick look at the code) that appears to baked in to the glibc rpc code. Maybe you could code around that in mountd. Looks like the relevant code is in nfs-utils/support/include/rpcmisc.c:rpc_init(). --b. ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs _______________________________________________ Please note that nfs@lists.sourceforge.net is being discontinued. Please subscribe to linux-nfs@vger.kernel.org instead. http://vger.kernel.org/vger-lists.html#linux-nfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [NFS] How to set-up a Linux NFS server to handle massive number of requests 2008-04-16 2:58 ` J. Bruce Fields @ 2008-04-16 3:22 ` Tom Tucker [not found] ` <1208316166.3521.42.camel-SMNkleLxa3ZimH42XvhXlA@public.gmane.org> 0 siblings, 1 reply; 16+ messages in thread From: Tom Tucker @ 2008-04-16 3:22 UTC (permalink / raw) To: J. Bruce Fields; +Cc: nfs, Carsten Aulbert On Tue, 2008-04-15 at 22:58 -0400, J. Bruce Fields wrote: > On Tue, Apr 15, 2008 at 09:43:10PM -0500, Tom Tucker wrote: > > > > On Tue, 2008-04-15 at 11:12 -0400, J. Bruce Fields wrote: > > > On Mon, Apr 14, 2008 at 11:48:33PM -0500, Tom Tucker wrote: > > > > > > > > Maybe this this is a TCP_BACKLOG issue? > > > > > > So, looking around.... There seems to be a global limit in > > > /proc/sys/net/ipv4/tcp_max_syn_backlog (default 1024?); might be worth > > > seeing what happens if that's increased, e.g., with > > > > > > echo 2048 >/proc/sys/net/ipv4/tcp_max_syn_backlog > > > > I think this represents the collective total for all listening > > endpoints. I think we're only talking about mountd. > > Yes. > > > Shooting from the hip... > > > > My gray haired recollection is that the single connection default is a > > backlog of 10 (SYN received, not accepted connections). Additional SYN's > > received to this endpoint will be dropped...clients will retry the SYN > > as part of normal TCP retransmit... > > > > It might be that the CLOSE_WAIT's in the log are _normal_. That is, they > > reflect completed mount requests that are in the normal close path. If > > they never go away, then that's not normal. Is this the case? > > What he said was: > > "those fall over after some time and stay in CLOSE_WAIT state > until I restart the nfs-kernel-server." > > Carsten, are you positive that the same sockets were in CLOSE_WAIT the > whole time you were watching? And how long was it before you gave up > and restarted? > > > Suppose the 10 is roughly correct. The remaining "jilted" clients will > > retransmit their SYN after a randomized exponential backoff. I think you > > can imagine that trying 1300+ connections of which only 10 succeed and > > then retrying 1300-10 based on a randomized exponential backoff might > > get you some pretty bad performance. > > Right, could be, but: > > ... > > > Oh, but: Grepping the glibc rpc code, it looks like it calls listen with > > > second argument SOMAXCONN == 128. You can confirm that by strace'ing > > > rpc.mountd -F and looking for the listen call. > > > > > > And that socket's shared between all the mountd processes, so I guess > > > that's the real limit. I don't see an easy way to adjust that. You'd > > > also need to increase /proc/sys/net/core/somaxconn first. > > > > > > But none of this explains why we'd see connections stuck in CLOSE_WAIT > > > indefinitely? > > So the limit appears to be more like 128, and (based on my quick look at > the code) that appears to baked in to the glibc rpc code. > > Maybe you could code around that in mountd. Looks like the relevant > code is in nfs-utils/support/include/rpcmisc.c:rpc_init(). If you really need to start 1300 mounts all at once then something needs to change. BTW even after you get past mountd, the server is going to get pounded with SYN and RPC_NOP. It might be interesting to look at httpd (Apache) to see what it does. I would think it faces similar traffic flows. > > --b. > > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > _______________________________________________ > NFS maillist - NFS@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nfs > _______________________________________________ > Please note that nfs@lists.sourceforge.net is being discontinued. > Please subscribe to linux-nfs@vger.kernel.org instead. > http://vger.kernel.org/vger-lists.html#linux-nfs > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs _______________________________________________ Please note that nfs@lists.sourceforge.net is being discontinued. Please subscribe to linux-nfs@vger.kernel.org instead. http://vger.kernel.org/vger-lists.html#linux-nfs ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <1208316166.3521.42.camel-SMNkleLxa3ZimH42XvhXlA@public.gmane.org>]
* Re: [NFS] How to set-up a Linux NFS server to handle massive number of requests [not found] ` <1208316166.3521.42.camel-SMNkleLxa3ZimH42XvhXlA@public.gmane.org> @ 2008-04-16 13:45 ` Chuck Lever 2008-04-16 14:35 ` Carsten Aulbert 2008-05-01 19:47 ` Dean Hildebrand 1 sibling, 1 reply; 16+ messages in thread From: Chuck Lever @ 2008-04-16 13:45 UTC (permalink / raw) To: J. Bruce Fields, Tom Tucker, Carsten Aulbert; +Cc: nfs On Apr 15, 2008, at 11:22 PM, Tom Tucker wrote: > On Tue, 2008-04-15 at 22:58 -0400, J. Bruce Fields wrote: >> On Tue, Apr 15, 2008 at 09:43:10PM -0500, Tom Tucker wrote: >>> >>> On Tue, 2008-04-15 at 11:12 -0400, J. Bruce Fields wrote: >>>> On Mon, Apr 14, 2008 at 11:48:33PM -0500, Tom Tucker wrote: >>>>> >>>>> Maybe this this is a TCP_BACKLOG issue? >>>> >>>> So, looking around.... There seems to be a global limit in >>>> /proc/sys/net/ipv4/tcp_max_syn_backlog (default 1024?); might be >>>> worth >>>> seeing what happens if that's increased, e.g., with >>>> >>>> echo 2048 >/proc/sys/net/ipv4/tcp_max_syn_backlog >>> >>> I think this represents the collective total for all listening >>> endpoints. I think we're only talking about mountd. >> >> Yes. >> >>> Shooting from the hip... >>> >>> My gray haired recollection is that the single connection default >>> is a >>> backlog of 10 (SYN received, not accepted connections). Additional >>> SYN's >>> received to this endpoint will be dropped...clients will retry the >>> SYN >>> as part of normal TCP retransmit... >>> >>> It might be that the CLOSE_WAIT's in the log are _normal_. That >>> is, they >>> reflect completed mount requests that are in the normal close >>> path. If >>> they never go away, then that's not normal. Is this the case? >> >> What he said was: >> >> "those fall over after some time and stay in CLOSE_WAIT state >> until I restart the nfs-kernel-server." >> >> Carsten, are you positive that the same sockets were in CLOSE_WAIT >> the >> whole time you were watching? And how long was it before you gave up >> and restarted? >> >>> Suppose the 10 is roughly correct. The remaining "jilted" clients >>> will >>> retransmit their SYN after a randomized exponential backoff. I >>> think you >>> can imagine that trying 1300+ connections of which only 10 succeed >>> and >>> then retrying 1300-10 based on a randomized exponential backoff >>> might >>> get you some pretty bad performance. >> >> Right, could be, but: >> >> ... >>>> Oh, but: Grepping the glibc rpc code, it looks like it calls >>>> listen with >>>> second argument SOMAXCONN == 128. You can confirm that by >>>> strace'ing >>>> rpc.mountd -F and looking for the listen call. >>>> >>>> And that socket's shared between all the mountd processes, so I >>>> guess >>>> that's the real limit. I don't see an easy way to adjust that. >>>> You'd >>>> also need to increase /proc/sys/net/core/somaxconn first. >>>> >>>> But none of this explains why we'd see connections stuck in >>>> CLOSE_WAIT >>>> indefinitely? >> >> So the limit appears to be more like 128, and (based on my quick >> look at >> the code) that appears to baked in to the glibc rpc code. >> >> Maybe you could code around that in mountd. Looks like the relevant >> code is in nfs-utils/support/include/rpcmisc.c:rpc_init(). > > If you really need to start 1300 mounts all at once then something > needs > to change. BTW even after you get past mountd, the server is going to > get pounded with SYN and RPC_NOP. Would it be worth trying UDP, just as an experiment? Force UDP for the mountd protocol by specifying the "mountproto=udp" option. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs _______________________________________________ Please note that nfs@lists.sourceforge.net is being discontinued. Please subscribe to linux-nfs@vger.kernel.org instead. http://vger.kernel.org/vger-lists.html#linux-nfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [NFS] How to set-up a Linux NFS server to handle massive number of requests 2008-04-16 13:45 ` Chuck Lever @ 2008-04-16 14:35 ` Carsten Aulbert 0 siblings, 0 replies; 16+ messages in thread From: Carsten Aulbert @ 2008-04-16 14:35 UTC (permalink / raw) To: Chuck Lever; +Cc: J. Bruce Fields, nfs Chuck Lever wrote: > > Force UDP for the mountd protocol by specifying the "mountproto=udp" > option. I'll give that also a try. I'm currently busy running other benchmarks. I'll try to get some results by the weekend, if nothing comes from my side by then, I'm probably buried alive in work, but please send me a (friendly) reminder then. Thanks already for all the input! Cheers Carsten ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs _______________________________________________ Please note that nfs@lists.sourceforge.net is being discontinued. Please subscribe to linux-nfs@vger.kernel.org instead. http://vger.kernel.org/vger-lists.html#linux-nfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [NFS] How to set-up a Linux NFS server to handle massive number of requests [not found] ` <1208316166.3521.42.camel-SMNkleLxa3ZimH42XvhXlA@public.gmane.org> 2008-04-16 13:45 ` Chuck Lever @ 2008-05-01 19:47 ` Dean Hildebrand 2008-05-01 19:51 ` J. Bruce Fields 1 sibling, 1 reply; 16+ messages in thread From: Dean Hildebrand @ 2008-05-01 19:47 UTC (permalink / raw) To: Tom Tucker; +Cc: J. Bruce Fields, nfs, Carsten Aulbert > If you really need to start 1300 mounts all at once then something needs > to change. BTW even after you get past mountd, the server is going to > get pounded with SYN and RPC_NOP. > Just to give my 2 cents after the fact,.. a new approach is definitely needed. For example, a small 10 line MPI program that has a single client mount the server, calculate the md5sum, and distribute the result to the other 999 clients would be a much better approach.... Dean > It might be interesting to look at httpd (Apache) to see what it does. I > would think it faces similar traffic flows. > > >> --b. >> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference >> Don't miss this year's exciting event. There's still time to save $100. >> Use priority code J8TL2D2. >> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone >> _______________________________________________ >> NFS maillist - NFS@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/nfs >> _______________________________________________ >> Please note that nfs@lists.sourceforge.net is being discontinued. >> Please subscribe to linux-nfs@vger.kernel.org instead. >> http://vger.kernel.org/vger-lists.html#linux-nfs >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Don't miss this year's exciting event. There's still time to save $100. > Use priority code J8TL2D2. > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > _______________________________________________ > NFS maillist - NFS@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nfs > _______________________________________________ > Please note that nfs@lists.sourceforge.net is being discontinued. > Please subscribe to linux-nfs@vger.kernel.org instead. > http://vger.kernel.org/vger-lists.html#linux-nfs > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs _______________________________________________ Please note that nfs@lists.sourceforge.net is being discontinued. Please subscribe to linux-nfs@vger.kernel.org instead. http://vger.kernel.org/vger-lists.html#linux-nfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [NFS] How to set-up a Linux NFS server to handle massive number of requests 2008-05-01 19:47 ` Dean Hildebrand @ 2008-05-01 19:51 ` J. Bruce Fields 0 siblings, 0 replies; 16+ messages in thread From: J. Bruce Fields @ 2008-05-01 19:51 UTC (permalink / raw) To: Dean Hildebrand; +Cc: nfs, Carsten Aulbert On Thu, May 01, 2008 at 12:47:06PM -0700, Dean Hildebrand wrote: > >> If you really need to start 1300 mounts all at once then something needs >> to change. BTW even after you get past mountd, the server is going to >> get pounded with SYN and RPC_NOP. > Just to give my 2 cents after the fact,.. a new approach is definitely > needed. For example, a small 10 line MPI program that has a single > client mount the server, calculate the md5sum, and distribute the result > to the other 999 clients would be a much better approach.... For that toy example, yes, but we still need to fix whatever's preventing us from handling 1000 simultaneous mounts. --b. ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs _______________________________________________ Please note that nfs@lists.sourceforge.net is being discontinued. Please subscribe to linux-nfs@vger.kernel.org instead. http://vger.kernel.org/vger-lists.html#linux-nfs ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2008-05-01 19:51 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-10 12:12 [NFS] How to set-up a Linux NFS server to handle massive number of requests Carsten Aulbert
[not found] ` <47FE044A.7020008-l1a6w7hxd2yELgA04lAiVw@public.gmane.org>
2008-04-11 23:07 ` J. Bruce Fields
2008-04-12 6:45 ` Carsten Aulbert
[not found] ` <48005A78.9090609-l1a6w7hxd2yELgA04lAiVw@public.gmane.org>
2008-04-14 17:06 ` J. Bruce Fields
2008-04-15 4:48 ` Tom Tucker
[not found] ` <1208234913.17169.50.camel-SMNkleLxa3ZimH42XvhXlA@public.gmane.org>
2008-04-15 5:42 ` Carsten Aulbert
[not found] ` <48044055.2060500-l1a6w7hxd2yELgA04lAiVw@public.gmane.org>
2008-04-15 13:58 ` J. Bruce Fields
2008-04-16 2:49 ` Tom Tucker
2008-04-15 15:12 ` J. Bruce Fields
2008-04-16 2:43 ` Tom Tucker
[not found] ` <1208313790.3521.32.camel-SMNkleLxa3ZimH42XvhXlA@public.gmane.org>
2008-04-16 2:58 ` J. Bruce Fields
2008-04-16 3:22 ` Tom Tucker
[not found] ` <1208316166.3521.42.camel-SMNkleLxa3ZimH42XvhXlA@public.gmane.org>
2008-04-16 13:45 ` Chuck Lever
2008-04-16 14:35 ` Carsten Aulbert
2008-05-01 19:47 ` Dean Hildebrand
2008-05-01 19:51 ` J. Bruce Fields
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.