* setsockopt()
@ 2008-07-07 18:18 Olga Kornievskaia
2008-07-07 21:24 ` setsockopt() Stephen Hemminger
2008-07-08 1:17 ` setsockopt() John Heffner
0 siblings, 2 replies; 43+ messages in thread
From: Olga Kornievskaia @ 2008-07-07 18:18 UTC (permalink / raw)
To: netdev; +Cc: Jim Rees, J. Bruce Fields
Hi,
I'd like to ask a question regarding socket options, more
specifically send and receive buffer sizes.
One simple question: (on the server-side) is it true that, to set
send/receive buffer size, setsockopt() can only be called before
listen()? From what I can tell, if I were to set socket options for the
listening socket, they get inherited by the socket created during the
accept(). However, when I try to change send/receive buffer size for the
new socket, they take no affect.
The server in question is the NFSD server in the kernel. NFSD's code
tries to adjust the buffer size (in order to have TCP increase the
window size appropriately) but it does so after the new socket is
created. It leads to the fact that the TCP window doesn't open beyond
the TCP's "default" sysctl value (that would be the 2nd value in the
triple net.ipv4.tcp_rmem, which on our system is set to 64KB). We
changed the code so that setsockopt() is called for the listening socket
is created and we set the buffer sizes to something bigger, like 8MB.
Then we try to increase the buffer size for each socket created by the
accept() but what is seen on the network trace is that window size
doesn't open beyond the values used for the listening socket.
I looked around in the code. There is a variable called
"window_clamp" that seems to specifies the largest possible window
advertisement. window_clamp gets set during the creation of the accept
socket. At that time, it's value is based on the sk_rcvbuf of the
listening socket. Thus, that would explain the behavior that window
doesn't grow beyond the values used in setsockopt() for the listening
socket, even though the new socket has new (larger) sk_sndbuf and
sk_rcvbuf than the listening socket.
I realize that send/receive buffer size and window advertisement are
different but they are related in the way that by telling TCP that we
have a certain amount of memory for socket operations, it should try to
open big enough window (provided that there is no congestion).
Can somebody advise us on how to properly set send/receive buffer
sizes for the NFSD in the kernel such that (1) the window is not bound
by the TCP's default sysctl value and (2) if it is possible to do so for
the accept sockets and not the listening socket.
I would appreciate if we could be CC-ed on the reply as we are not
subscribed to the netdev mailing list.
Thank you.
-Olga
^ permalink raw reply [flat|nested] 43+ messages in thread* Re: setsockopt() 2008-07-07 18:18 setsockopt() Olga Kornievskaia @ 2008-07-07 21:24 ` Stephen Hemminger 2008-07-07 21:30 ` setsockopt() Olga Kornievskaia 2008-07-07 21:32 ` setsockopt() J. Bruce Fields 2008-07-08 1:17 ` setsockopt() John Heffner 1 sibling, 2 replies; 43+ messages in thread From: Stephen Hemminger @ 2008-07-07 21:24 UTC (permalink / raw) To: Olga Kornievskaia; +Cc: netdev, Jim Rees, J. Bruce Fields On Mon, 07 Jul 2008 14:18:38 -0400 Olga Kornievskaia <aglo@citi.umich.edu> wrote: > Hi, > > I'd like to ask a question regarding socket options, more > specifically send and receive buffer sizes. > > One simple question: (on the server-side) is it true that, to set > send/receive buffer size, setsockopt() can only be called before > listen()? From what I can tell, if I were to set socket options for the > listening socket, they get inherited by the socket created during the > accept(). However, when I try to change send/receive buffer size for the > new socket, they take no affect. > > The server in question is the NFSD server in the kernel. NFSD's code > tries to adjust the buffer size (in order to have TCP increase the > window size appropriately) but it does so after the new socket is > created. It leads to the fact that the TCP window doesn't open beyond > the TCP's "default" sysctl value (that would be the 2nd value in the > triple net.ipv4.tcp_rmem, which on our system is set to 64KB). We > changed the code so that setsockopt() is called for the listening socket > is created and we set the buffer sizes to something bigger, like 8MB. > Then we try to increase the buffer size for each socket created by the > accept() but what is seen on the network trace is that window size > doesn't open beyond the values used for the listening socket. It would be better if NFSD stayed out of doign setsockopt and just let the sender/receiver autotuning work? > I looked around in the code. There is a variable called > "window_clamp" that seems to specifies the largest possible window > advertisement. window_clamp gets set during the creation of the accept > socket. At that time, it's value is based on the sk_rcvbuf of the > listening socket. Thus, that would explain the behavior that window > doesn't grow beyond the values used in setsockopt() for the listening > socket, even though the new socket has new (larger) sk_sndbuf and > sk_rcvbuf than the listening socket. > > I realize that send/receive buffer size and window advertisement are > different but they are related in the way that by telling TCP that we > have a certain amount of memory for socket operations, it should try to > open big enough window (provided that there is no congestion). > > Can somebody advise us on how to properly set send/receive buffer > sizes for the NFSD in the kernel such that (1) the window is not bound > by the TCP's default sysctl value and (2) if it is possible to do so for > the accept sockets and not the listening socket. > > I would appreciate if we could be CC-ed on the reply as we are not > subscribed to the netdev mailing list. > > Thank you. > > -Olga > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-07 21:24 ` setsockopt() Stephen Hemminger @ 2008-07-07 21:30 ` Olga Kornievskaia 2008-07-07 21:33 ` setsockopt() Stephen Hemminger ` (2 more replies) 2008-07-07 21:32 ` setsockopt() J. Bruce Fields 1 sibling, 3 replies; 43+ messages in thread From: Olga Kornievskaia @ 2008-07-07 21:30 UTC (permalink / raw) To: Stephen Hemminger; +Cc: netdev, Jim Rees, J. Bruce Fields Stephen Hemminger wrote: > On Mon, 07 Jul 2008 14:18:38 -0400 > Olga Kornievskaia <aglo@citi.umich.edu> wrote: > > >> Hi, >> >> I'd like to ask a question regarding socket options, more >> specifically send and receive buffer sizes. >> >> One simple question: (on the server-side) is it true that, to set >> send/receive buffer size, setsockopt() can only be called before >> listen()? From what I can tell, if I were to set socket options for the >> listening socket, they get inherited by the socket created during the >> accept(). However, when I try to change send/receive buffer size for the >> new socket, they take no affect. >> >> The server in question is the NFSD server in the kernel. NFSD's code >> tries to adjust the buffer size (in order to have TCP increase the >> window size appropriately) but it does so after the new socket is >> created. It leads to the fact that the TCP window doesn't open beyond >> the TCP's "default" sysctl value (that would be the 2nd value in the >> triple net.ipv4.tcp_rmem, which on our system is set to 64KB). We >> changed the code so that setsockopt() is called for the listening socket >> is created and we set the buffer sizes to something bigger, like 8MB. >> Then we try to increase the buffer size for each socket created by the >> accept() but what is seen on the network trace is that window size >> doesn't open beyond the values used for the listening socket. >> > > It would be better if NFSD stayed out of doign setsockopt and just > let the sender/receiver autotuning work? > Auto-tuning would be guided by the sysctl values that are set for all applications. I could be wrong but what I see is that unless an application does a setsockopt(), its window is bound by the default sysctl value. If it is true, than it is not acceptable. It means that in order for NFSD to achieve a large enough window it needs to modify TCP's sysctl value which will effect all other applications. >> I looked around in the code. There is a variable called >> "window_clamp" that seems to specifies the largest possible window >> advertisement. window_clamp gets set during the creation of the accept >> socket. At that time, it's value is based on the sk_rcvbuf of the >> listening socket. Thus, that would explain the behavior that window >> doesn't grow beyond the values used in setsockopt() for the listening >> socket, even though the new socket has new (larger) sk_sndbuf and >> sk_rcvbuf than the listening socket. >> >> I realize that send/receive buffer size and window advertisement are >> different but they are related in the way that by telling TCP that we >> have a certain amount of memory for socket operations, it should try to >> open big enough window (provided that there is no congestion). >> >> Can somebody advise us on how to properly set send/receive buffer >> sizes for the NFSD in the kernel such that (1) the window is not bound >> by the TCP's default sysctl value and (2) if it is possible to do so for >> the accept sockets and not the listening socket. >> >> I would appreciate if we could be CC-ed on the reply as we are not >> subscribed to the netdev mailing list. >> >> Thank you. >> >> -Olga >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe netdev" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-07 21:30 ` setsockopt() Olga Kornievskaia @ 2008-07-07 21:33 ` Stephen Hemminger 2008-07-07 21:49 ` setsockopt() David Miller 2008-07-07 22:50 ` setsockopt() Rick Jones 2 siblings, 0 replies; 43+ messages in thread From: Stephen Hemminger @ 2008-07-07 21:33 UTC (permalink / raw) To: Olga Kornievskaia; +Cc: netdev, Jim Rees, J. Bruce Fields On Mon, 07 Jul 2008 17:30:49 -0400 Olga Kornievskaia <aglo@citi.umich.edu> wrote: > > > Stephen Hemminger wrote: > > On Mon, 07 Jul 2008 14:18:38 -0400 > > Olga Kornievskaia <aglo@citi.umich.edu> wrote: > > > > > >> Hi, > >> > >> I'd like to ask a question regarding socket options, more > >> specifically send and receive buffer sizes. > >> > >> One simple question: (on the server-side) is it true that, to set > >> send/receive buffer size, setsockopt() can only be called before > >> listen()? From what I can tell, if I were to set socket options for the > >> listening socket, they get inherited by the socket created during the > >> accept(). However, when I try to change send/receive buffer size for the > >> new socket, they take no affect. > >> > >> The server in question is the NFSD server in the kernel. NFSD's code > >> tries to adjust the buffer size (in order to have TCP increase the > >> window size appropriately) but it does so after the new socket is > >> created. It leads to the fact that the TCP window doesn't open beyond > >> the TCP's "default" sysctl value (that would be the 2nd value in the > >> triple net.ipv4.tcp_rmem, which on our system is set to 64KB). We > >> changed the code so that setsockopt() is called for the listening socket > >> is created and we set the buffer sizes to something bigger, like 8MB. > >> Then we try to increase the buffer size for each socket created by the > >> accept() but what is seen on the network trace is that window size > >> doesn't open beyond the values used for the listening socket. > >> > > > > It would be better if NFSD stayed out of doign setsockopt and just > > let the sender/receiver autotuning work? > > > Auto-tuning would be guided by the sysctl values that are set for all > applications. I could be wrong but what I see is that unless an > application does a setsockopt(), its window is bound by the default > sysctl value. If it is true, than it is not acceptable. It means that in > order for NFSD to achieve a large enough window it needs to modify TCP's > sysctl value which will effect all other applications. > Auto tuning starts at the default and will expand to the max allowed. If you set a value with setsockopt, then the kernel just uses that value and does no tuning. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-07 21:30 ` setsockopt() Olga Kornievskaia 2008-07-07 21:33 ` setsockopt() Stephen Hemminger @ 2008-07-07 21:49 ` David Miller 2008-07-08 4:54 ` setsockopt() Evgeniy Polyakov 2008-07-08 20:12 ` setsockopt() Jim Rees 2008-07-07 22:50 ` setsockopt() Rick Jones 2 siblings, 2 replies; 43+ messages in thread From: David Miller @ 2008-07-07 21:49 UTC (permalink / raw) To: aglo; +Cc: shemminger, netdev, rees, bfields From: Olga Kornievskaia <aglo@citi.umich.edu> Date: Mon, 07 Jul 2008 17:30:49 -0400 > Auto-tuning would be guided by the sysctl values that are set for all > applications. I could be wrong but what I see is that unless an > application does a setsockopt(), its window is bound by the default > sysctl value. If it is true, than it is not acceptable. It means that in > order for NFSD to achieve a large enough window it needs to modify TCP's > sysctl value which will effect all other applications. This is nonsense. The kernel autotunes the receive and send buffers based upon the dynamic behavior of the connection. The sysctls only control limits. If you set the socket buffer sizes explicitly, you essentially turn off half of the TCP stack because it won't do dynamic socket buffer sizing afterwards. There is no reason these days to ever explicitly set the socket buffer sizes on TCP sockets under Linux. If something is going wrong it's a bug and we should fix it. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-07 21:49 ` setsockopt() David Miller @ 2008-07-08 4:54 ` Evgeniy Polyakov 2008-07-08 6:02 ` setsockopt() Bill Fink 2008-07-08 20:12 ` setsockopt() Jim Rees 1 sibling, 1 reply; 43+ messages in thread From: Evgeniy Polyakov @ 2008-07-08 4:54 UTC (permalink / raw) To: David Miller; +Cc: aglo, shemminger, netdev, rees, bfields Hi. On Mon, Jul 07, 2008 at 02:49:12PM -0700, David Miller (davem@davemloft.net) wrote: > There is no reason these days to ever explicitly set the socket > buffer sizes on TCP sockets under Linux. > > If something is going wrong it's a bug and we should fix it. Just for the reference: autosizing is (was?) not always working correctly for some workloads at least couple of years ago. For example I worked with small enough embedded systems with 16-32 MB of RAM where socket buffer size never grew up more than 200Kb (100mbit network), but workload was very bursty, so if remote system froze for several milliseconds (and sometimes upto couple of seconds), socket buffer was completely filled with new burst of data and either sending started to sleep or returned EAGAIN, which resulted in semi-realtime data to be dropped. Setting buffer size explicitely to large enough value like 8Mb fixed this burst issues. Another fix was to allocate data each time it becomes ready and copy portion to this buffer, but allocation was quite slow, which led to unneded latencies, which again could lead to data loss. -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-08 4:54 ` setsockopt() Evgeniy Polyakov @ 2008-07-08 6:02 ` Bill Fink 2008-07-08 6:29 ` setsockopt() Roland Dreier 0 siblings, 1 reply; 43+ messages in thread From: Bill Fink @ 2008-07-08 6:02 UTC (permalink / raw) To: Evgeniy Polyakov; +Cc: David Miller, aglo, shemminger, netdev, rees, bfields On Tue, 8 Jul 2008, Evgeniy Polyakov wrote: > On Mon, Jul 07, 2008 at 02:49:12PM -0700, David Miller (davem@davemloft.net) wrote: > > There is no reason these days to ever explicitly set the socket > > buffer sizes on TCP sockets under Linux. > > > > If something is going wrong it's a bug and we should fix it. > > Just for the reference: autosizing is (was?) not always working correctly > for some workloads at least couple of years ago. > For example I worked with small enough embedded systems with 16-32 MB > of RAM where socket buffer size never grew up more than 200Kb (100mbit > network), but workload was very bursty, so if remote system froze for > several milliseconds (and sometimes upto couple of seconds), socket > buffer was completely filled with new burst of data and either sending > started to sleep or returned EAGAIN, which resulted in semi-realtime > data to be dropped. > > Setting buffer size explicitely to large enough value like 8Mb fixed > this burst issues. Another fix was to allocate data each time it becomes > ready and copy portion to this buffer, but allocation was quite slow, > which led to unneded latencies, which again could lead to data loss. I admittedly haven't tested on the latest greatest kernel versions, but the testing I have done on large RTT 10-GigE networks, if I want to get the ultimate TCP performance I still need to explicitly set the socket buffer sizes, although I give kudos to the autotuning which does remarkably well. Here's a comparison across an ~72 ms RTT 10-GigE path (sender is 2.6.20.7 and receiver is 2.6.22.9). Autotuning (30-second TCP test with 1-second interval reports): # nuttcp -T30 -i1 192.168.21.82 nuttcp-6.0.1: Using beta version: retrans interface/output subject to change (to suppress this message use "-f-beta") 7.2500 MB / 1.01 sec = 60.4251 Mbps 0 retrans 43.6875 MB / 1.00 sec = 366.4509 Mbps 0 retrans 169.4375 MB / 1.00 sec = 1421.2296 Mbps 0 retrans 475.3125 MB / 1.00 sec = 3986.8873 Mbps 0 retrans 827.6250 MB / 1.00 sec = 6942.0247 Mbps 0 retrans 877.6250 MB / 1.00 sec = 7361.2792 Mbps 0 retrans 878.1250 MB / 1.00 sec = 7365.7750 Mbps 0 retrans 878.4375 MB / 1.00 sec = 7368.2710 Mbps 0 retrans 878.3750 MB / 1.00 sec = 7367.7173 Mbps 0 retrans 878.7500 MB / 1.00 sec = 7370.6932 Mbps 0 retrans 878.8125 MB / 1.00 sec = 7371.6818 Mbps 0 retrans 879.1875 MB / 1.00 sec = 7374.5546 Mbps 0 retrans 878.6875 MB / 1.00 sec = 7370.3754 Mbps 0 retrans 878.2500 MB / 1.00 sec = 7366.3742 Mbps 0 retrans 878.6875 MB / 1.00 sec = 7370.6407 Mbps 0 retrans 878.8125 MB / 1.00 sec = 7371.4239 Mbps 0 retrans 878.5000 MB / 1.00 sec = 7368.8174 Mbps 0 retrans 879.0625 MB / 1.00 sec = 7373.4766 Mbps 0 retrans 878.8125 MB / 1.00 sec = 7371.4386 Mbps 0 retrans 878.3125 MB / 1.00 sec = 7367.2152 Mbps 0 retrans 878.8125 MB / 1.00 sec = 7371.3723 Mbps 0 retrans 878.6250 MB / 1.00 sec = 7369.8585 Mbps 0 retrans 878.8125 MB / 1.00 sec = 7371.4460 Mbps 0 retrans 875.5000 MB / 1.00 sec = 7373.0401 Mbps 0 retrans 878.8125 MB / 1.00 sec = 7371.5123 Mbps 0 retrans 878.3750 MB / 1.00 sec = 7367.5037 Mbps 0 retrans 878.5000 MB / 1.00 sec = 7368.9647 Mbps 0 retrans 879.4375 MB / 1.00 sec = 7376.6073 Mbps 0 retrans 878.8750 MB / 1.00 sec = 7371.8891 Mbps 0 retrans 878.4375 MB / 1.00 sec = 7368.3521 Mbps 0 retrans 23488.6875 MB / 30.10 sec = 6547.0228 Mbps 81 %TX 49 %RX 0 retrans Same test but with explicitly specified 100 MB socket buffer: # nuttcp -T30 -i1 -w100m 192.168.21.82 nuttcp-6.0.1: Using beta version: retrans interface/output subject to change (to suppress this message use "-f-beta") 7.1250 MB / 1.01 sec = 59.4601 Mbps 0 retrans 120.3750 MB / 1.00 sec = 1009.7464 Mbps 0 retrans 859.4375 MB / 1.00 sec = 7208.5832 Mbps 0 retrans 939.3125 MB / 1.00 sec = 7878.9965 Mbps 0 retrans 935.5000 MB / 1.00 sec = 7847.0249 Mbps 0 retrans 934.8125 MB / 1.00 sec = 7841.1248 Mbps 0 retrans 933.8125 MB / 1.00 sec = 7832.7291 Mbps 0 retrans 933.1875 MB / 1.00 sec = 7827.5727 Mbps 0 retrans 932.1875 MB / 1.00 sec = 7819.1300 Mbps 0 retrans 933.1250 MB / 1.00 sec = 7826.8059 Mbps 0 retrans 933.3125 MB / 1.00 sec = 7828.6760 Mbps 0 retrans 933.0000 MB / 1.00 sec = 7825.9608 Mbps 0 retrans 932.6875 MB / 1.00 sec = 7823.1753 Mbps 0 retrans 932.0625 MB / 1.00 sec = 7818.0268 Mbps 0 retrans 931.7500 MB / 1.00 sec = 7815.6088 Mbps 0 retrans 931.0625 MB / 1.00 sec = 7809.7717 Mbps 0 retrans 931.5000 MB / 1.00 sec = 7813.3711 Mbps 0 retrans 931.8750 MB / 1.00 sec = 7816.4931 Mbps 0 retrans 932.0625 MB / 1.00 sec = 7817.8157 Mbps 0 retrans 931.5000 MB / 1.00 sec = 7813.4180 Mbps 0 retrans 931.6250 MB / 1.00 sec = 7814.5134 Mbps 0 retrans 931.6250 MB / 1.00 sec = 7814.4821 Mbps 0 retrans 931.3125 MB / 1.00 sec = 7811.7124 Mbps 0 retrans 930.8750 MB / 1.00 sec = 7808.0818 Mbps 0 retrans 931.0625 MB / 1.00 sec = 7809.6233 Mbps 0 retrans 930.6875 MB / 1.00 sec = 7806.6964 Mbps 0 retrans 931.2500 MB / 1.00 sec = 7811.0164 Mbps 0 retrans 931.3125 MB / 1.00 sec = 7811.9077 Mbps 0 retrans 931.3750 MB / 1.00 sec = 7812.3617 Mbps 0 retrans 931.4375 MB / 1.00 sec = 7812.6750 Mbps 0 retrans 26162.6875 MB / 30.15 sec = 7279.7648 Mbps 93 %TX 54 %RX 0 retrans As you can see, the autotuned case maxed out at about 7.37 Gbps, whereas by explicitly specifying a 100 MB socket buffer it was possible to achieve a somewhat higher rate of about 7.81 Gbps. Admittedly the autotuning did great, with a difference of only about 6 %, but if you want to squeeze the last drop of performance out of your network, explicitly setting the socket buffer sizes can still be helpful in certain situations (perhaps newer kernels have reduced the gap even more). But I would definitely agree with the general recommendation to just take advantage of the excellent Linux TCP autotuning for most common scenarios. -Bill ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-08 6:02 ` setsockopt() Bill Fink @ 2008-07-08 6:29 ` Roland Dreier 2008-07-08 6:43 ` setsockopt() Evgeniy Polyakov ` (2 more replies) 0 siblings, 3 replies; 43+ messages in thread From: Roland Dreier @ 2008-07-08 6:29 UTC (permalink / raw) To: Bill Fink Cc: Evgeniy Polyakov, David Miller, aglo, shemminger, netdev, rees, bfields Interesting... I'd not tried nuttcp before, and on my testbed, which is a very high-bandwidth, low-RTT network (IP-over-InfiniBand with DDR IB, so the network is capable of 16 Gbps, and the RTT is ~25 microseconds), the difference between autotuning and not for nuttcp is huge (testing with 2.6.26-rc8 plus some pending 2.6.27 patches that add checksum offload, LSO and LRO to the IP-over-IB driver): nuttcp -T30 -i1 ends up with: 14465.0625 MB / 30.01 sec = 4043.6073 Mbps 82 %TX 2 %RX while setting the window even to 128 KB with nuttcp -w128k -T30 -i1 ends up with: 36416.8125 MB / 30.00 sec = 10182.8137 Mbps 90 %TX 96 %RX so it's a factor of 2.5 with nuttcp. I've never seen other apps behave like that -- for example NPtcp (netpipe) only gets slower when explicitly setting the window size. Strange... ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-08 6:29 ` setsockopt() Roland Dreier @ 2008-07-08 6:43 ` Evgeniy Polyakov 2008-07-08 7:03 ` setsockopt() Roland Dreier 2008-07-08 18:48 ` setsockopt() Bill Fink 2008-07-08 20:48 ` setsockopt() Stephen Hemminger 2 siblings, 1 reply; 43+ messages in thread From: Evgeniy Polyakov @ 2008-07-08 6:43 UTC (permalink / raw) To: Roland Dreier Cc: Bill Fink, David Miller, aglo, shemminger, netdev, rees, bfields On Mon, Jul 07, 2008 at 11:29:31PM -0700, Roland Dreier (rdreier@cisco.com) wrote: > nuttcp -T30 -i1 ends up with: > > 14465.0625 MB / 30.01 sec = 4043.6073 Mbps 82 %TX 2 %RX > > while setting the window even to 128 KB with > nuttcp -w128k -T30 -i1 ends up with: > > 36416.8125 MB / 30.00 sec = 10182.8137 Mbps 90 %TX 96 %RX > > so it's a factor of 2.5 with nuttcp. I've never seen other apps behave > like that -- for example NPtcp (netpipe) only gets slower when > explicitly setting the window size. > > Strange... Maybe nuttcp by default sets very small socket buffer size? -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-08 6:43 ` setsockopt() Evgeniy Polyakov @ 2008-07-08 7:03 ` Roland Dreier 0 siblings, 0 replies; 43+ messages in thread From: Roland Dreier @ 2008-07-08 7:03 UTC (permalink / raw) To: Evgeniy Polyakov Cc: Bill Fink, David Miller, aglo, shemminger, netdev, rees, bfields > Maybe nuttcp by default sets very small socket buffer size? strace shows no setsockopt() calls other than SO_REUSEADDR in the default case. - R. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-08 6:29 ` setsockopt() Roland Dreier 2008-07-08 6:43 ` setsockopt() Evgeniy Polyakov @ 2008-07-08 18:48 ` Bill Fink 2008-07-09 18:10 ` setsockopt() Roland Dreier 2008-07-08 20:48 ` setsockopt() Stephen Hemminger 2 siblings, 1 reply; 43+ messages in thread From: Bill Fink @ 2008-07-08 18:48 UTC (permalink / raw) To: Roland Dreier Cc: Evgeniy Polyakov, David Miller, aglo, shemminger, netdev, rees, bfields Hi Roland, I think you set a new nuttcp speed record. :-) I've merely had 10-GigE networks to play with. On Mon, 07 Jul 2008, Roland Dreier wrote: > Interesting... I'd not tried nuttcp before, and on my testbed, which is > a very high-bandwidth, low-RTT network (IP-over-InfiniBand with DDR IB, > so the network is capable of 16 Gbps, and the RTT is ~25 microseconds), > the difference between autotuning and not for nuttcp is huge (testing > with 2.6.26-rc8 plus some pending 2.6.27 patches that add checksum > offload, LSO and LRO to the IP-over-IB driver): > > nuttcp -T30 -i1 ends up with: > > 14465.0625 MB / 30.01 sec = 4043.6073 Mbps 82 %TX 2 %RX > > while setting the window even to 128 KB with > nuttcp -w128k -T30 -i1 ends up with: > > 36416.8125 MB / 30.00 sec = 10182.8137 Mbps 90 %TX 96 %RX > > so it's a factor of 2.5 with nuttcp. I've never seen other apps behave > like that -- for example NPtcp (netpipe) only gets slower when > explicitly setting the window size. > > Strange... It is strange. The first case just uses the TCP autotuning, since as you discovered, nuttcp doesn't make any SO_SNDBUF/SO_RCVBUF setsockopt() calls unless you explicitly set the "-w" option. Perhaps the maximum value for tcp_rmem/tcp_wmem is smallish on your systems (check both client and server). On my system: # cat /proc/sys/net/ipv4/tcp_rmem 4096 524288 104857600 # cat /proc/sys/net/ipv4/tcp_wmem 4096 524288 104857600 IIRC the explicit setting of SO_SNDBUF/SO_RCVBUF is instead governed by rmem_max/wmem_max. # cat /proc/sys/net/core/rmem_max 104857600 # cat /proc/sys/net/core/wmem_max 104857600 The other weird thing about your test is the huge difference in the receiver (and server in this case) CPU utilization between the autotuning and explicit setting cases (2 %RX versus 96 %RX). -Bill ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-08 18:48 ` setsockopt() Bill Fink @ 2008-07-09 18:10 ` Roland Dreier 2008-07-09 18:34 ` setsockopt() Evgeniy Polyakov 0 siblings, 1 reply; 43+ messages in thread From: Roland Dreier @ 2008-07-09 18:10 UTC (permalink / raw) To: Bill Fink Cc: Evgeniy Polyakov, David Miller, aglo, shemminger, netdev, rees, bfields > The other weird thing about your test is the huge difference in > the receiver (and server in this case) CPU utilization between the > autotuning and explicit setting cases (2 %RX versus 96 %RX). I think I found another clue -- it seems that CPU affinity has something to do with the results. Usually I pin the adapter interrupt to CPU 0 and use "taskset 4" to pin the benchmarking process to CPU 2 (this leads to the best performance for these particular systems in almost all benchmarks). But with nuttcp I see the following: with taskset 4: $ taskset 4 nuttcp -T30 192.168.145.73 9911.3125 MB / 30.01 sec = 2770.3202 Mbps 42 %TX 10 %RX $ taskset 4 nuttcp -w128k -T30 192.168.145.73 36241.9375 MB / 30.00 sec = 10133.8931 Mbps 89 %TX 96 %RX with no taskset (ie let kernel schedule as it wants to): $ nuttcp -T30 192.168.145.73 36689.6875 MB / 30.00 sec = 10259.1525 Mbps 99 %TX 96 %RX $ nuttcp -w128k -T30 192.168.145.73 36486.0000 MB / 30.00 sec = 10202.1870 Mbps 74 %TX 95 %RX so somehow setting the window helps with the scheduling of processes... I guess autotuning lets some queue get too long or something like that. The actual window doesn't matter too much, since the delay of the network is low enough that even though the bandwidth is very high, the BDP is quite small. (With a 25 usec RTT, a 128 KB window should be enough for 40 Gbps, well over the raw link speed of 16 Gbps that I have) - R. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-09 18:10 ` setsockopt() Roland Dreier @ 2008-07-09 18:34 ` Evgeniy Polyakov 2008-07-10 2:50 ` setsockopt() Bill Fink 0 siblings, 1 reply; 43+ messages in thread From: Evgeniy Polyakov @ 2008-07-09 18:34 UTC (permalink / raw) To: Roland Dreier Cc: Bill Fink, David Miller, aglo, shemminger, netdev, rees, bfields On Wed, Jul 09, 2008 at 11:10:31AM -0700, Roland Dreier (rdreier@cisco.com) wrote: > so somehow setting the window helps with the scheduling of > processes... I guess autotuning lets some queue get too long or > something like that. The actual window doesn't matter too much, since > the delay of the network is low enough that even though the bandwidth is > very high, the BDP is quite small. (With a 25 usec RTT, a 128 KB window > should be enough for 40 Gbps, well over the raw link speed of 16 Gbps > that I have) That may be cache issues: depending on what application does it can be useful or not to be bound to the same CPU. I suppose if benchmark looks into the packet content, then it likely wants to be on the same CPU to aliminate cache line ping-pongs, otherwise it only needs to be awakened to send/receive next chunk, so having it on different CPU may result in better its utilization... -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-09 18:34 ` setsockopt() Evgeniy Polyakov @ 2008-07-10 2:50 ` Bill Fink 2008-07-10 17:26 ` setsockopt() Rick Jones 0 siblings, 1 reply; 43+ messages in thread From: Bill Fink @ 2008-07-10 2:50 UTC (permalink / raw) To: Evgeniy Polyakov Cc: Roland Dreier, David Miller, aglo, shemminger, netdev, rees, bfields On Wed, 9 Jul 2008, Evgeniy Polyakov wrote: > On Wed, Jul 09, 2008 at 11:10:31AM -0700, Roland Dreier (rdreier@cisco.com) wrote: > > so somehow setting the window helps with the scheduling of > > processes... I guess autotuning lets some queue get too long or > > something like that. The actual window doesn't matter too much, since > > the delay of the network is low enough that even though the bandwidth is > > very high, the BDP is quite small. (With a 25 usec RTT, a 128 KB window > > should be enough for 40 Gbps, well over the raw link speed of 16 Gbps > > that I have) > > That may be cache issues: depending on what application does it can be > useful or not to be bound to the same CPU. I suppose if benchmark looks > into the packet content, then it likely wants to be on the same CPU to > aliminate cache line ping-pongs, otherwise it only needs to be awakened > to send/receive next chunk, so having it on different CPU may result in > better its utilization... In my own network benchmarking experience, I've generally gotten the best performance results when the nuttcp application and the NIC interrupts are on the same CPU, which I understood was because of cache effects. I wonder if the "-w128" forces the socket buffer to a small enough size that it totally fits in cache and this helps the performance. -Bill ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-10 2:50 ` setsockopt() Bill Fink @ 2008-07-10 17:26 ` Rick Jones 2008-07-11 0:50 ` setsockopt() Bill Fink 0 siblings, 1 reply; 43+ messages in thread From: Rick Jones @ 2008-07-10 17:26 UTC (permalink / raw) To: Bill Fink Cc: Evgeniy Polyakov, Roland Dreier, David Miller, aglo, shemminger, netdev, rees, bfields > In my own network benchmarking experience, I've generally gotten the > best performance results when the nuttcp application and the NIC > interrupts are on the same CPU, which I understood was because of > cache effects. Interestingly enough I have a slightly different experience: *) single-transaction, single-stream TCP_RR - best when app and NIC use same core *) bulk transfer - either TCP_STREAM or aggregate TCP_RR: a) enough CPU on one core to reach max tput, best when same core b) not enough, tput max when app and NIC on separate cores, preferably cores sharing some cache That is in the context of either maximizing throughput or minimizing latency. If the context is most efficient transfer, then in all cases my experience thusfar agrees with yours. rick jones ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-10 17:26 ` setsockopt() Rick Jones @ 2008-07-11 0:50 ` Bill Fink 0 siblings, 0 replies; 43+ messages in thread From: Bill Fink @ 2008-07-11 0:50 UTC (permalink / raw) To: Rick Jones Cc: Evgeniy Polyakov, Roland Dreier, David Miller, aglo, shemminger, netdev, rees, bfields On Thu, 10 Jul 2008, Rick Jones wrote: > > In my own network benchmarking experience, I've generally gotten the > > best performance results when the nuttcp application and the NIC > > interrupts are on the same CPU, which I understood was because of > > cache effects. > > Interestingly enough I have a slightly different experience: > > *) single-transaction, single-stream TCP_RR - best when app and NIC use > same core > > *) bulk transfer - either TCP_STREAM or aggregate TCP_RR: > a) enough CPU on one core to reach max tput, best when same core > b) not enough, tput max when app and NIC on separate cores, > preferably cores sharing some cache > > That is in the context of either maximizing throughput or minimizing > latency. If the context is most efficient transfer, then in all cases > my experience thusfar agrees with yours. Yes, I was talking about single stream bulk data transfers, where the CPU was not a limiting factor (just barely when doing full 10-GigE line rate transfers with 9000-byte jumbo frames). On multiple stream tests there can be a benefit to spreading the load across multiple cores. -Bill ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-08 6:29 ` setsockopt() Roland Dreier 2008-07-08 6:43 ` setsockopt() Evgeniy Polyakov 2008-07-08 18:48 ` setsockopt() Bill Fink @ 2008-07-08 20:48 ` Stephen Hemminger 2008-07-08 22:05 ` setsockopt() Bill Fink 2 siblings, 1 reply; 43+ messages in thread From: Stephen Hemminger @ 2008-07-08 20:48 UTC (permalink / raw) To: Roland Dreier Cc: Bill Fink, Evgeniy Polyakov, David Miller, aglo, shemminger, netdev, rees, bfields On Mon, 07 Jul 2008 23:29:31 -0700 Roland Dreier <rdreier@cisco.com> wrote: > Interesting... I'd not tried nuttcp before, and on my testbed, which is > a very high-bandwidth, low-RTT network (IP-over-InfiniBand with DDR IB, > so the network is capable of 16 Gbps, and the RTT is ~25 microseconds), > the difference between autotuning and not for nuttcp is huge (testing > with 2.6.26-rc8 plus some pending 2.6.27 patches that add checksum > offload, LSO and LRO to the IP-over-IB driver): > > nuttcp -T30 -i1 ends up with: > > 14465.0625 MB / 30.01 sec = 4043.6073 Mbps 82 %TX 2 %RX > > while setting the window even to 128 KB with > nuttcp -w128k -T30 -i1 ends up with: > > 36416.8125 MB / 30.00 sec = 10182.8137 Mbps 90 %TX 96 %RX > > so it's a factor of 2.5 with nuttcp. I've never seen other apps behave > like that -- for example NPtcp (netpipe) only gets slower when > explicitly setting the window size. > > Strange... I suspect that the link is so fast that the window growth isn't happening fast enough. With only a 30 second test, you probably barely made it out of TCP slow start. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-08 20:48 ` setsockopt() Stephen Hemminger @ 2008-07-08 22:05 ` Bill Fink 2008-07-09 5:25 ` setsockopt() Evgeniy Polyakov 0 siblings, 1 reply; 43+ messages in thread From: Bill Fink @ 2008-07-08 22:05 UTC (permalink / raw) To: Stephen Hemminger Cc: Roland Dreier, Evgeniy Polyakov, David Miller, aglo, shemminger, netdev, rees, bfields On Tue, 8 Jul 2008, Stephen Hemminger wrote: > On Mon, 07 Jul 2008 23:29:31 -0700 > Roland Dreier <rdreier@cisco.com> wrote: > > > Interesting... I'd not tried nuttcp before, and on my testbed, which is > > a very high-bandwidth, low-RTT network (IP-over-InfiniBand with DDR IB, > > so the network is capable of 16 Gbps, and the RTT is ~25 microseconds), > > the difference between autotuning and not for nuttcp is huge (testing > > with 2.6.26-rc8 plus some pending 2.6.27 patches that add checksum > > offload, LSO and LRO to the IP-over-IB driver): > > > > nuttcp -T30 -i1 ends up with: > > > > 14465.0625 MB / 30.01 sec = 4043.6073 Mbps 82 %TX 2 %RX > > > > while setting the window even to 128 KB with > > nuttcp -w128k -T30 -i1 ends up with: > > > > 36416.8125 MB / 30.00 sec = 10182.8137 Mbps 90 %TX 96 %RX > > > > so it's a factor of 2.5 with nuttcp. I've never seen other apps behave > > like that -- for example NPtcp (netpipe) only gets slower when > > explicitly setting the window size. > > > > Strange... > > I suspect that the link is so fast that the window growth isn't happening > fast enough. With only a 30 second test, you probably barely made it > out of TCP slow start. Nah. 30 seconds is plenty of time. I got up to nearly 8 Gbps in 4 seconds (see my test report in earlier message in this thread), and that was on an ~72 ms RTT network path. Roland's IB network only has a ~25 usec RTT. BTW I believe there is one other important difference between the way the tcp_rmem/tcp_wmem autotuning parameters are handled versus the way the rmem_max/wmem_max parameters are used when explicitly setting the socket buffer sizes. I believe the tcp_rmem/tcp_wmem autotuning maximum parameters are hard limits, with the default maximum tcp_rmem setting being ~170 KB and the default maximum tcp_wmem setting being 128 KB. On the other hand, I believe the rmem_max/wmem_max determines the maximum value allowed to be set via the SO_RCVBUF/SO_SNDBUF setsockopt() call. But then Linux doubles the requested value, so when Roland specified a "-w128" nuttcp parameter, he actually got a socket buffer size of 256 KB, which would thus be double that available in the autotuning case assuming the tcp_rmem/tcp_wmem settings are using their default values. This could then account for a factor of 2 X between the two test cases. The "-v" verbose option to nuttcp might shed some light on this hypothesis. -Bill ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-08 22:05 ` setsockopt() Bill Fink @ 2008-07-09 5:25 ` Evgeniy Polyakov 2008-07-09 5:47 ` setsockopt() Bill Fink 0 siblings, 1 reply; 43+ messages in thread From: Evgeniy Polyakov @ 2008-07-09 5:25 UTC (permalink / raw) To: Bill Fink Cc: Stephen Hemminger, Roland Dreier, David Miller, aglo, shemminger, netdev, rees, bfields On Tue, Jul 08, 2008 at 06:05:00PM -0400, Bill Fink (billfink@mindspring.com) wrote: > BTW I believe there is one other important difference between the way > the tcp_rmem/tcp_wmem autotuning parameters are handled versus the way > the rmem_max/wmem_max parameters are used when explicitly setting the > socket buffer sizes. I believe the tcp_rmem/tcp_wmem autotuning maximum > parameters are hard limits, with the default maximum tcp_rmem setting > being ~170 KB and the default maximum tcp_wmem setting being 128 KB. Maximum tcp_wmem depends on amount of available RAM, but at least 64k. Maybe Reoland's distro set hard limit just to 128k... -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-09 5:25 ` setsockopt() Evgeniy Polyakov @ 2008-07-09 5:47 ` Bill Fink 2008-07-09 6:03 ` setsockopt() Evgeniy Polyakov 0 siblings, 1 reply; 43+ messages in thread From: Bill Fink @ 2008-07-09 5:47 UTC (permalink / raw) To: Evgeniy Polyakov Cc: Stephen Hemminger, Roland Dreier, David Miller, aglo, shemminger, netdev, rees, bfields On Wed, 9 Jul 2008, Evgeniy Polyakov wrote: > On Tue, Jul 08, 2008 at 06:05:00PM -0400, Bill Fink (billfink@mindspring.com) wrote: > > BTW I believe there is one other important difference between the way > > the tcp_rmem/tcp_wmem autotuning parameters are handled versus the way > > the rmem_max/wmem_max parameters are used when explicitly setting the > > socket buffer sizes. I believe the tcp_rmem/tcp_wmem autotuning maximum > > parameters are hard limits, with the default maximum tcp_rmem setting > > being ~170 KB and the default maximum tcp_wmem setting being 128 KB. > > Maximum tcp_wmem depends on amount of available RAM, but at least 64k. > Maybe Reoland's distro set hard limit just to 128k... Are you sure you're not thinking about tcp_mem, which is a function of available memory, or has this been changed in more recent kernels? The 2.6.22.9 Documentation/networking/ip-sysctl.txt indicates: tcp_wmem - vector of 3 INTEGERs: min, default, max ... max: Maximal amount of memory allowed for automatically selected send buffers for TCP socket. This value does not override net.core.wmem_max, "static" selection via SO_SNDBUF does not use this. Default: 128K I also ran a purely local 10-GigE nuttcp TCP test, with and without autotuning (0.13 ms RTT). Autotuning (standard 10-second TCP test): # nuttcp 192.168.88.13 ... 11818.0625 MB / 10.01 sec = 9906.0223 Mbps 100 %TX 72 %RX 0 retrans Same test but with explicitly specified 1 MB socket buffer: # nuttcp -w1m 192.168.88.13 ... 11818.0000 MB / 10.01 sec = 9902.0102 Mbps 99 %TX 71 %RX 0 retrans The TCP autotuning worked great, with both tests basically achieving full 10-GigE line rate. The test with the TCP autotuning actually did slightly better than the test where an explicitly specified 1 MB socket buffer was used, although this could just be within the margin of error of the testing. -Bill ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-09 5:47 ` setsockopt() Bill Fink @ 2008-07-09 6:03 ` Evgeniy Polyakov 2008-07-09 18:11 ` setsockopt() J. Bruce Fields 0 siblings, 1 reply; 43+ messages in thread From: Evgeniy Polyakov @ 2008-07-09 6:03 UTC (permalink / raw) To: Bill Fink Cc: Stephen Hemminger, Roland Dreier, David Miller, aglo, shemminger, netdev, rees, bfields On Wed, Jul 09, 2008 at 01:47:58AM -0400, Bill Fink (billfink@mindspring.com) wrote: > Are you sure you're not thinking about tcp_mem, which is a function > of available memory, or has this been changed in more recent kernels? > The 2.6.22.9 Documentation/networking/ip-sysctl.txt indicates: In 2.6.25 tcp_mem depends on amount of ram, and third tcp_wmem is calculated based on it: /* Set per-socket limits to no more than 1/128 the pressure threshold */ limit = ((unsigned long)sysctl_tcp_mem[1]) << (PAGE_SHIFT - 7); max_share = min(4UL*1024*1024, limit); sysctl_tcp_wmem[0] = SK_MEM_QUANTUM; sysctl_tcp_wmem[1] = 16*1024; sysctl_tcp_wmem[2] = max(64*1024, max_share); sysctl_tcp_rmem[0] = SK_MEM_QUANTUM; sysctl_tcp_rmem[1] = 87380; sysctl_tcp_rmem[2] = max(87380, max_share); > tcp_wmem - vector of 3 INTEGERs: min, default, max > ... > max: Maximal amount of memory allowed for automatically selected > send buffers for TCP socket. This value does not override > net.core.wmem_max, "static" selection via SO_SNDBUF does not use this. > Default: 128K Yeah, its a bit confusing. It probably was copypasted, there is no default, but minimum possible value. > > The TCP autotuning worked great, with both tests basically achieving > full 10-GigE line rate. The test with the TCP autotuning actually > did slightly better than the test where an explicitly specified 1 MB > socket buffer was used, although this could just be within the margin > of error of the testing. If you will check tcp_wmem on your machine, it will likely show that tcp_wmem[max] is far larger than 128k. It is equal to 1mb on my old laptop with 256mb of ram, I suppose machines equipped with 10gige network adapters usually have slightly more. -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-09 6:03 ` setsockopt() Evgeniy Polyakov @ 2008-07-09 18:11 ` J. Bruce Fields 2008-07-09 18:43 ` setsockopt() Evgeniy Polyakov 0 siblings, 1 reply; 43+ messages in thread From: J. Bruce Fields @ 2008-07-09 18:11 UTC (permalink / raw) To: Evgeniy Polyakov Cc: Bill Fink, Stephen Hemminger, Roland Dreier, David Miller, aglo, shemminger, netdev, rees On Wed, Jul 09, 2008 at 10:03:41AM +0400, Evgeniy Polyakov wrote: > On Wed, Jul 09, 2008 at 01:47:58AM -0400, Bill Fink (billfink@mindspring.com) wrote: > > Are you sure you're not thinking about tcp_mem, which is a function > > of available memory, or has this been changed in more recent kernels? > > The 2.6.22.9 Documentation/networking/ip-sysctl.txt indicates: > > In 2.6.25 tcp_mem depends on amount of ram, and third tcp_wmem is > calculated based on it: > > /* Set per-socket limits to no more than 1/128 the pressure threshold */ > limit = ((unsigned long)sysctl_tcp_mem[1]) << (PAGE_SHIFT - 7); > max_share = min(4UL*1024*1024, limit); > > sysctl_tcp_wmem[0] = SK_MEM_QUANTUM; > sysctl_tcp_wmem[1] = 16*1024; > sysctl_tcp_wmem[2] = max(64*1024, max_share); > > sysctl_tcp_rmem[0] = SK_MEM_QUANTUM; > sysctl_tcp_rmem[1] = 87380; > sysctl_tcp_rmem[2] = max(87380, max_share); > > > tcp_wmem - vector of 3 INTEGERs: min, default, max > > ... > > max: Maximal amount of memory allowed for automatically selected > > send buffers for TCP socket. This value does not override > > net.core.wmem_max, "static" selection via SO_SNDBUF does not use this. > > Default: 128K > > Yeah, its a bit confusing. It probably was copypasted, there is no > default, but minimum possible value. I don't understand; what do you mean by "there is no default"? (And if not, what does tcp_wmem[1] mean?) --b. > > > > > The TCP autotuning worked great, with both tests basically achieving > > full 10-GigE line rate. The test with the TCP autotuning actually > > did slightly better than the test where an explicitly specified 1 MB > > socket buffer was used, although this could just be within the margin > > of error of the testing. > > If you will check tcp_wmem on your machine, it will likely show that > tcp_wmem[max] is far larger than 128k. It is equal to 1mb on my old > laptop with 256mb of ram, I suppose machines equipped with 10gige > network adapters usually have slightly more. > > -- > Evgeniy Polyakov ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-09 18:11 ` setsockopt() J. Bruce Fields @ 2008-07-09 18:43 ` Evgeniy Polyakov 2008-07-09 22:28 ` setsockopt() J. Bruce Fields 0 siblings, 1 reply; 43+ messages in thread From: Evgeniy Polyakov @ 2008-07-09 18:43 UTC (permalink / raw) To: J. Bruce Fields Cc: Bill Fink, Stephen Hemminger, Roland Dreier, David Miller, aglo, shemminger, netdev, rees On Wed, Jul 09, 2008 at 02:11:22PM -0400, J. Bruce Fields (bfields@fieldses.org) wrote: > > Yeah, its a bit confusing. It probably was copypasted, there is no > > default, but minimum possible value. > > I don't understand; what do you mean by "there is no default"? (And if > not, what does tcp_wmem[1] mean?) I meant there is no default value for tcp_w/rmem[2], which is calculated based on tcp_mem, which in turn is calculated based on amount RAM of in the system. tcp_wmem[2] will be at least 64k, but its higher limit (calculated by system, which of course can be overwritten) is RAM/256 on x86 (iirc only low mem is counted, although that was different in various kernel versions), but not more than 4Mb. tcp_wmem[1] means initial send buffer size, it can grow up to tcp_wmem[2]. There is a default for this parameter. Actually all this numbers are a bit fluffy, so they are kind of soft rules for socket memory accounting mechanics. -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-09 18:43 ` setsockopt() Evgeniy Polyakov @ 2008-07-09 22:28 ` J. Bruce Fields 2008-07-10 1:06 ` setsockopt() Evgeniy Polyakov 0 siblings, 1 reply; 43+ messages in thread From: J. Bruce Fields @ 2008-07-09 22:28 UTC (permalink / raw) To: Evgeniy Polyakov Cc: Bill Fink, Stephen Hemminger, Roland Dreier, David Miller, aglo, shemminger, netdev, rees On Wed, Jul 09, 2008 at 10:43:30PM +0400, Evgeniy Polyakov wrote: > On Wed, Jul 09, 2008 at 02:11:22PM -0400, J. Bruce Fields (bfields@fieldses.org) wrote: > > > Yeah, its a bit confusing. It probably was copypasted, there is no > > > default, but minimum possible value. > > > > I don't understand; what do you mean by "there is no default"? (And if > > not, what does tcp_wmem[1] mean?) > > I meant there is no default value for tcp_w/rmem[2], which is calculated > based on tcp_mem, which in turn is calculated based on amount RAM of in > the system. tcp_wmem[2] will be at least 64k, but its higher limit > (calculated by system, which of course can be overwritten) is RAM/256 on > x86 (iirc only low mem is counted, although that was different in > various kernel versions), but not more than 4Mb. > > tcp_wmem[1] means initial send buffer size, it can grow up to tcp_wmem[2]. > There is a default for this parameter. Actually all this numbers are a > bit fluffy, so they are kind of soft rules for socket memory accounting > mechanics. OK, thanks. Would the following be any more clearer and/or accurate? --b. diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 17a6e46..a22af04 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -336,7 +336,7 @@ tcp_rmem - vector of 3 INTEGERs: min, default, max pressure. Default: 8K - default: default size of receive buffer used by TCP sockets. + default: initial size of receive buffer used by TCP sockets. This value overrides net.core.rmem_default used by other protocols. Default: 87380 bytes. This value results in window of 65535 with default setting of tcp_adv_win_scale and tcp_app_win:0 and a bit @@ -344,8 +344,10 @@ tcp_rmem - vector of 3 INTEGERs: min, default, max max: maximal size of receive buffer allowed for automatically selected receiver buffers for TCP socket. This value does not override - net.core.rmem_max, "static" selection via SO_RCVBUF does not use this. - Default: 87380*2 bytes. + net.core.rmem_max. Calling setsockopt() with SO_RCVBUF disables + automatic tuning of that socket's receive buffer size, in which + case this value is ignored. + Default: between 87380B and 4MB, depending on RAM size. tcp_sack - BOOLEAN Enable select acknowledgments (SACKS). @@ -419,19 +421,21 @@ tcp_window_scaling - BOOLEAN Enable window scaling as defined in RFC1323. tcp_wmem - vector of 3 INTEGERs: min, default, max - min: Amount of memory reserved for send buffers for TCP socket. + min: Amount of memory reserved for send buffers for TCP sockets. Each TCP socket has rights to use it due to fact of its birth. Default: 4K - default: Amount of memory allowed for send buffers for TCP socket - by default. This value overrides net.core.wmem_default used - by other protocols, it is usually lower than net.core.wmem_default. + default: initial size of send buffer used by TCP sockets. This + value overrides net.core.wmem_default used by other protocols. + It is usually lower than net.core.wmem_default. Default: 16K - max: Maximal amount of memory allowed for automatically selected - send buffers for TCP socket. This value does not override - net.core.wmem_max, "static" selection via SO_SNDBUF does not use this. - Default: 128K + max: Maximal amount of memory allowed for automatically tuned + send buffers for TCP sockets. This value does not override + net.core.wmem_max. Calling setsockopt() with SO_SNDBUF disables + automatic tuning of that socket's send buffer size, in which case + this value is ignored. + Default: between 64K and 4MB, depending on RAM size. tcp_workaround_signed_windows - BOOLEAN If set, assume no receipt of a window scaling option means the ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-09 22:28 ` setsockopt() J. Bruce Fields @ 2008-07-10 1:06 ` Evgeniy Polyakov 2008-07-10 20:05 ` [PATCH] Documentation: clarify tcp_{r,w}mem sysctl docs J. Bruce Fields 0 siblings, 1 reply; 43+ messages in thread From: Evgeniy Polyakov @ 2008-07-10 1:06 UTC (permalink / raw) To: J. Bruce Fields Cc: Bill Fink, Stephen Hemminger, Roland Dreier, David Miller, aglo, shemminger, netdev, rees On Wed, Jul 09, 2008 at 06:28:02PM -0400, J. Bruce Fields (bfields@fieldses.org) wrote: > OK, thanks. Would the following be any more clearer and/or accurate? Looks good, thank you :) It is likely the first ever mention of the fact, that SO_RECV/SNDBUF disables autotuning. -- Evgeniy Polyakov ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH] Documentation: clarify tcp_{r,w}mem sysctl docs 2008-07-10 1:06 ` setsockopt() Evgeniy Polyakov @ 2008-07-10 20:05 ` J. Bruce Fields 2008-07-10 23:50 ` David Miller 0 siblings, 1 reply; 43+ messages in thread From: J. Bruce Fields @ 2008-07-10 20:05 UTC (permalink / raw) To: Jonathan Corbet Cc: Bill Fink, Stephen Hemminger, Roland Dreier, David Miller, aglo, shemminger, netdev, rees, Evgeniy Polyakov From: J. Bruce Fields <bfields@citi.umich.edu> Date: Wed, 9 Jul 2008 18:28:48 -0400 Fix some of the defaults and attempt to clarify some language. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru> --- Documentation/networking/ip-sysctl.txt | 26 +++++++++++++++----------- 1 files changed, 15 insertions(+), 11 deletions(-) On Thu, Jul 10, 2008 at 05:06:50AM +0400, Evgeniy Polyakov wrote: > On Wed, Jul 09, 2008 at 06:28:02PM -0400, J. Bruce Fields > (bfields@fieldses.org) wrote: > > OK, thanks. Would the following be any more clearer and/or > > accurate? > > Looks good, thank you :) > It is likely the first ever mention of the fact, that SO_RECV/SNDBUF > disables autotuning. OK! Uh, I'm assuming Jon Corbet's documentation tree would be a reasonable route to submit this by.... --b. diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 17a6e46..a22af04 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -336,7 +336,7 @@ tcp_rmem - vector of 3 INTEGERs: min, default, max pressure. Default: 8K - default: default size of receive buffer used by TCP sockets. + default: initial size of receive buffer used by TCP sockets. This value overrides net.core.rmem_default used by other protocols. Default: 87380 bytes. This value results in window of 65535 with default setting of tcp_adv_win_scale and tcp_app_win:0 and a bit @@ -344,8 +344,10 @@ tcp_rmem - vector of 3 INTEGERs: min, default, max max: maximal size of receive buffer allowed for automatically selected receiver buffers for TCP socket. This value does not override - net.core.rmem_max, "static" selection via SO_RCVBUF does not use this. - Default: 87380*2 bytes. + net.core.rmem_max. Calling setsockopt() with SO_RCVBUF disables + automatic tuning of that socket's receive buffer size, in which + case this value is ignored. + Default: between 87380B and 4MB, depending on RAM size. tcp_sack - BOOLEAN Enable select acknowledgments (SACKS). @@ -419,19 +421,21 @@ tcp_window_scaling - BOOLEAN Enable window scaling as defined in RFC1323. tcp_wmem - vector of 3 INTEGERs: min, default, max - min: Amount of memory reserved for send buffers for TCP socket. + min: Amount of memory reserved for send buffers for TCP sockets. Each TCP socket has rights to use it due to fact of its birth. Default: 4K - default: Amount of memory allowed for send buffers for TCP socket - by default. This value overrides net.core.wmem_default used - by other protocols, it is usually lower than net.core.wmem_default. + default: initial size of send buffer used by TCP sockets. This + value overrides net.core.wmem_default used by other protocols. + It is usually lower than net.core.wmem_default. Default: 16K - max: Maximal amount of memory allowed for automatically selected - send buffers for TCP socket. This value does not override - net.core.wmem_max, "static" selection via SO_SNDBUF does not use this. - Default: 128K + max: Maximal amount of memory allowed for automatically tuned + send buffers for TCP sockets. This value does not override + net.core.wmem_max. Calling setsockopt() with SO_SNDBUF disables + automatic tuning of that socket's send buffer size, in which case + this value is ignored. + Default: between 64K and 4MB, depending on RAM size. tcp_workaround_signed_windows - BOOLEAN If set, assume no receipt of a window scaling option means the -- 1.5.5.rc1 ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH] Documentation: clarify tcp_{r,w}mem sysctl docs 2008-07-10 20:05 ` [PATCH] Documentation: clarify tcp_{r,w}mem sysctl docs J. Bruce Fields @ 2008-07-10 23:50 ` David Miller 0 siblings, 0 replies; 43+ messages in thread From: David Miller @ 2008-07-10 23:50 UTC (permalink / raw) To: bfields Cc: corbet, billfink, stephen.hemminger, rdreier, aglo, shemminger, netdev, rees, johnpol From: "J. Bruce Fields" <bfields@fieldses.org> Date: Thu, 10 Jul 2008 16:05:10 -0400 > Fix some of the defaults and attempt to clarify some language. > > Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Applied, thanks! ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-07 21:49 ` setsockopt() David Miller 2008-07-08 4:54 ` setsockopt() Evgeniy Polyakov @ 2008-07-08 20:12 ` Jim Rees 2008-07-08 21:54 ` setsockopt() John Heffner 1 sibling, 1 reply; 43+ messages in thread From: Jim Rees @ 2008-07-08 20:12 UTC (permalink / raw) To: netdev; +Cc: aglo, shemminger, bfields David Miller wrote: If you set the socket buffer sizes explicitly, you essentially turn off half of the TCP stack because it won't do dynamic socket buffer sizing afterwards. There is no reason these days to ever explicitly set the socket buffer sizes on TCP sockets under Linux. So it seems clear that nfsd should stop setting the socket buffer sizes. The problem we run into if we try that is that the server won't read any incoming data from its socket until an entire rpc has been assembled and is waiting to be read off the socket. An rpc can be almost any size up to about 1MB, but the socket buffer never grows past about 50KB, so the rpc can never be assembled entirely in the socket buf. Maybe the nfsd needs a way to tell the socket/tcp layers that it wants a minimum size socket buffer. Or maybe nfsd needs to be modified so that it will read partial rpcs. I would appreciate suggestions as to which is the better fix. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-08 20:12 ` setsockopt() Jim Rees @ 2008-07-08 21:54 ` John Heffner 2008-07-08 23:51 ` setsockopt() Jim Rees 0 siblings, 1 reply; 43+ messages in thread From: John Heffner @ 2008-07-08 21:54 UTC (permalink / raw) To: Jim Rees; +Cc: netdev, aglo, shemminger, bfields On Tue, Jul 8, 2008 at 1:12 PM, Jim Rees <rees@umich.edu> wrote: > David Miller wrote: > > If you set the socket buffer sizes explicitly, you essentially turn > off half of the TCP stack because it won't do dynamic socket buffer > sizing afterwards. > > There is no reason these days to ever explicitly set the socket > buffer sizes on TCP sockets under Linux. > > So it seems clear that nfsd should stop setting the socket buffer sizes. > > The problem we run into if we try that is that the server won't read any > incoming data from its socket until an entire rpc has been assembled and is > waiting to be read off the socket. An rpc can be almost any size up to > about 1MB, but the socket buffer never grows past about 50KB, so the rpc can > never be assembled entirely in the socket buf. > > Maybe the nfsd needs a way to tell the socket/tcp layers that it wants a > minimum size socket buffer. Or maybe nfsd needs to be modified so that it > will read partial rpcs. I would appreciate suggestions as to which is the > better fix. This is an interesting observation. It turns out that the best way to solve send-side autotuning is not to "tune" the send buffer at all, but to change its semantics. From you example, we can clearly see that the send buffer is overloaded. It's used to buffer data between a scheduled application and the event-driven kernel, and also to store data that may need to be retransmitted. If you separate the socket buffer from the retransmit queue, you can size the socket buffer based on the application's needs (e.g., you want about 1 MB), and the retransmit queue's size will naturally be bound by cwnd. I implemented this split about six years ago, but never submitted largely because it wasn't clear how to handle backward/cross-platform compatibility of socket options, and because no one seemed to care about it too much. (I think you are the first person I remember to bring up this issue.) Unfortunately, where this leaves you is still trying to guess the right socket buffer size. I actually like your idea for a "soft" SO_SNDBUF -- ask the kernel for at least that much, but let it autotune higher if needed. This is almost trivial to implement -- it's the same as SO_SNDBUF but don't set the sock sndbuf lock. One thing to note here. While this option would solve your problem, there's another similar issue that would not be addressed. Some applications want to "feel" the network -- that is, want to as quickly as possible observe changes in sending rate. (Say you have an adaptive codec.) This application would want a small send buffer, but a larger retransmit queue. It's not possible to do this without splitting the send buffer. -John ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-08 21:54 ` setsockopt() John Heffner @ 2008-07-08 23:51 ` Jim Rees 2008-07-09 0:07 ` setsockopt() John Heffner 0 siblings, 1 reply; 43+ messages in thread From: Jim Rees @ 2008-07-08 23:51 UTC (permalink / raw) To: John Heffner; +Cc: netdev, aglo, shemminger, bfields John Heffner wrote: I actually like your idea for a "soft" SO_SNDBUF -- ask the kernel for at least that much, but let it autotune higher if needed. This is almost trivial to implement -- it's the same as SO_SNDBUF but don't set the sock sndbuf lock. Which brings me to another issue. The nfs server doesn't call sock_setsockopt(), it diddles sk_sndbuf and sk_rcvbuf directly, so as to get around the max socket buf limit. I don't like this. If this is a legit thing to do, there should be an api. I'm thinking we need a sock_set_min_bufsize(), where the values passed in are minimums, subject to autotuning, and maybe are not limited by the max. It would, as you say, just set sk_sndbuf and sk_rcvbuf without setting the corresponding flags SOCK_SNDBUF_LOCK and SOCK_RCVBUF_LOCK. Would this do the trick, or is there a danger that autotuning would reduce the buffer sizes below the given minimum? If so, we might need sk_min_rcvbuf or something like that. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-08 23:51 ` setsockopt() Jim Rees @ 2008-07-09 0:07 ` John Heffner 0 siblings, 0 replies; 43+ messages in thread From: John Heffner @ 2008-07-09 0:07 UTC (permalink / raw) To: Jim Rees; +Cc: netdev, aglo, shemminger, bfields On Tue, Jul 8, 2008 at 4:51 PM, Jim Rees <rees@umich.edu> wrote: > John Heffner wrote: > > I actually like your idea for a "soft" > SO_SNDBUF -- ask the kernel for at least that much, but let it > autotune higher if needed. This is almost trivial to implement -- > it's the same as SO_SNDBUF but don't set the sock sndbuf lock. > > Which brings me to another issue. The nfs server doesn't call > sock_setsockopt(), it diddles sk_sndbuf and sk_rcvbuf directly, so as to get > around the max socket buf limit. I don't like this. If this is a legit > thing to do, there should be an api. > > I'm thinking we need a sock_set_min_bufsize(), where the values passed in > are minimums, subject to autotuning, and maybe are not limited by the max. > It would, as you say, just set sk_sndbuf and sk_rcvbuf without setting the > corresponding flags SOCK_SNDBUF_LOCK and SOCK_RCVBUF_LOCK. > > Would this do the trick, or is there a danger that autotuning would reduce > the buffer sizes below the given minimum? If so, we might need > sk_min_rcvbuf or something like that. TCP buffer sizes will only be pulled back if the system runs into the global tcp memory limits (sysctl_tcp_mem). I think this is correct behavior regardless of the requested value. -John ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-07 21:30 ` setsockopt() Olga Kornievskaia 2008-07-07 21:33 ` setsockopt() Stephen Hemminger 2008-07-07 21:49 ` setsockopt() David Miller @ 2008-07-07 22:50 ` Rick Jones 2008-07-07 23:00 ` setsockopt() David Miller 2008-07-08 3:33 ` setsockopt() John Heffner 2 siblings, 2 replies; 43+ messages in thread From: Rick Jones @ 2008-07-07 22:50 UTC (permalink / raw) To: Olga Kornievskaia; +Cc: Stephen Hemminger, netdev, Jim Rees, J. Bruce Fields Olga Kornievskaia wrote: > Stephen Hemminger wrote: >> It would be better if NFSD stayed out of doign setsockopt and just >> let the sender/receiver autotuning work? >> > > Auto-tuning would be guided by the sysctl values that are set for all > applications. I could be wrong but what I see is that unless an > application does a setsockopt(), its window is bound by the default > sysctl value. If it is true, than it is not acceptable. It means that in > order for NFSD to achieve a large enough window it needs to modify TCP's > sysctl value which will effect all other applications. My experience thusfar is that the sysctl defaults will allow an autotuned TCP receive window far larger than it will allow with a direct setsockopt() call. I'm still a triffle puzzled/concerned/confused by the extent to which autotuning will allow the receive window to grow, again based on some netperf experience thusfar, and patient explanations provided here and elsewhere, it seems as though autotuning will let things get to 2x what it thinks the sender's cwnd happens to be. So far under netperf testing that seems to be the case, and 99 times out of ten my netperf tests will have the window grow to the max. rick jones ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-07 22:50 ` setsockopt() Rick Jones @ 2008-07-07 23:00 ` David Miller 2008-07-07 23:27 ` setsockopt() Rick Jones 2008-07-08 3:33 ` setsockopt() John Heffner 1 sibling, 1 reply; 43+ messages in thread From: David Miller @ 2008-07-07 23:00 UTC (permalink / raw) To: rick.jones2; +Cc: aglo, shemminger, netdev, rees, bfields From: Rick Jones <rick.jones2@hp.com> Date: Mon, 07 Jul 2008 15:50:21 -0700 > I'm still a triffle puzzled/concerned/confused by the extent to which > autotuning will allow the receive window to grow, again based on some > netperf experience thusfar, and patient explanations provided here and > elsewhere, it seems as though autotuning will let things get to 2x what > it thinks the sender's cwnd happens to be. So far under netperf testing > that seems to be the case, and 99 times out of ten my netperf tests will > have the window grow to the max. We need 2x, in order to have a full window during recovery. There was a measurement bug found a few months ago when the google folks were probing in this area, which was fixed by John Heffner. Most of which had to deal with TSO subtleties. -------------------- commit 246eb2af060fc32650f07203c02bdc0456ad76c7 Author: John Heffner <johnwheffner@gmail.com> Date: Tue Apr 29 03:13:52 2008 -0700 tcp: Limit cwnd growth when deferring for GSO This fixes inappropriately large cwnd growth on sender-limited flows when GSO is enabled, limiting cwnd growth to 64k. Signed-off-by: John Heffner <johnwheffner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> commit ce447eb91409225f8a488f6b7b2a1bdf7b2d884f Author: John Heffner <johnwheffner@gmail.com> Date: Tue Apr 29 03:13:02 2008 -0700 tcp: Allow send-limited cwnd to grow up to max_burst when gso disabled This changes the logic in tcp_is_cwnd_limited() so that cwnd may grow up to tcp_max_burst() even when sk_can_gso() is false, or when sysctl_tcp_tso_win_divisor != 0. Signed-off-by: John Heffner <johnwheffner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> -------------------- Setting TCP socket buffer via setsockopt() is always wrong. If there is a bug, let's fix it. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-07 23:00 ` setsockopt() David Miller @ 2008-07-07 23:27 ` Rick Jones 2008-07-08 1:15 ` setsockopt() Rick Jones 2008-07-08 1:44 ` setsockopt() David Miller 0 siblings, 2 replies; 43+ messages in thread From: Rick Jones @ 2008-07-07 23:27 UTC (permalink / raw) To: David Miller; +Cc: aglo, shemminger, netdev, rees, bfields David Miller wrote: > We need 2x, in order to have a full window during recovery. > > There was a measurement bug found a few months ago when the > google folks were probing in this area, which was fixed > by John Heffner. Most of which had to deal with TSO subtleties. > > -------------------- > commit 246eb2af060fc32650f07203c02bdc0456ad76c7 > Author: John Heffner <johnwheffner@gmail.com> > Date: Tue Apr 29 03:13:52 2008 -0700 > > tcp: Limit cwnd growth when deferring for GSO > > This fixes inappropriately large cwnd growth on sender-limited flows > when GSO is enabled, limiting cwnd growth to 64k. > > Signed-off-by: John Heffner <johnwheffner@gmail.com> > Signed-off-by: David S. Miller <davem@davemloft.net> > > commit ce447eb91409225f8a488f6b7b2a1bdf7b2d884f > Author: John Heffner <johnwheffner@gmail.com> > Date: Tue Apr 29 03:13:02 2008 -0700 > > tcp: Allow send-limited cwnd to grow up to max_burst when gso disabled > > This changes the logic in tcp_is_cwnd_limited() so that cwnd may grow > up to tcp_max_burst() even when sk_can_gso() is false, or when > sysctl_tcp_tso_win_divisor != 0. > > Signed-off-by: John Heffner <johnwheffner@gmail.com> > Signed-off-by: David S. Miller <davem@davemloft.net> > -------------------- I'll try my tests again with newer kernels since I'm not 100% certain I was trying with those commits in place. > Setting TCP socket buffer via setsockopt() is always wrong. Does that apply equally to SO_SNDBUF and SO_RCVBUF? rick jones ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-07 23:27 ` setsockopt() Rick Jones @ 2008-07-08 1:15 ` Rick Jones 2008-07-08 1:48 ` setsockopt() J. Bruce Fields 2008-07-08 1:44 ` setsockopt() David Miller 1 sibling, 1 reply; 43+ messages in thread From: Rick Jones @ 2008-07-08 1:15 UTC (permalink / raw) To: netdev; +Cc: David Miller, aglo, shemminger, rees, bfields Rick Jones wrote: > David Miller wrote: > >> We need 2x, in order to have a full window during recovery. >> >> There was a measurement bug found a few months ago when the >> google folks were probing in this area, which was fixed >> by John Heffner. Most of which had to deal with TSO subtleties. >> >> -------------------- >> commit 246eb2af060fc32650f07203c02bdc0456ad76c7 >> ... >> commit ce447eb91409225f8a488f6b7b2a1bdf7b2d884f >> ... > I'll try my tests again with newer kernels since I'm not 100% certain I > was trying with those commits in place. Did those commits make it into 2.6.26-rc9? (Gentle taps of clue-bat as to how to use git to check commits in various trees would be welcome - to say I am a git noob would be an understatement - the tree from which that kernel was made was cloned from Linus' about 16:00 to 17:00 pacific time) Assuming they did, a pair of systems with tg3-driven BCM5704's: moe:~# ethtool -i eth0 driver: tg3 version: 3.92.1 firmware-version: 5704-v3.27 bus-info: 0000:01:02.0 moe:~# uname -a Linux moe 2.6.26-rc9-raj #1 SMP Mon Jul 7 17:26:15 PDT 2008 ia64 GNU/Linux with TSO enabled still takes the socket buffers all the way out to 4MB for a GbE LAN test: moe:~# netperf -t omni -H manny -- -o foo OMNI TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to manny.west (10.208.0.13) port 0 AF_INET Throughput,Local Send Socket Size Requested,Local Send Socket Size Initial,Local Send Socket Size Final,Remote Recv Socket Size Requested,Remote Recv Socket Size Initial,Remote Recv Socket Size Final 941.41,-1,16384,4194304,-1,87380,4194304 When a 64K socket buffer request was sufficient: moe:~# netperf -t omni -H manny -- -o foo -s 64K -S 64K -m 16K OMNI TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to manny.west (10.208.0.13) port 0 AF_INET Throughput,Local Send Socket Size Requested,Local Send Socket Size Initial,Local Send Socket Size Final,Remote Recv Socket Size Requested,Remote Recv Socket Size Initial,Remote Recv Socket Size Final 941.12,65536,131072,131072,65536,131072,131072 FWIW, disabling TSO via ethtool didn't seem to change the behaviour: moe:~# ethtool -K eth0 tso off moe:~# ethtool -k eth0 Offload parameters for eth0: rx-checksumming: on tx-checksumming: on scatter-gather: on tcp segmentation offload: off udp fragmentation offload: off generic segmentation offload: off moe:~# netperf -t omni -H manny -- -o foo OMNI TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to manny.west (10.208.0.13) port 0 AF_INET Throughput,Local Send Socket Size Requested,Local Send Socket Size Initial,Local Send Socket Size Final,Remote Recv Socket Size Requested,Remote Recv Socket Size Initial,Remote Recv Socket Size Final 941.40,-1,16384,4194304,-1,87380,4194304 If I was cloning off the wrong tree, my apologies and redirects to the correct tree would be gladly accepted. rick jones moe:~# cat foo throughput,lss_size_req,lss_size,lss_size_end,rsr_size_req,rsr_size,rsr_size_end ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-08 1:15 ` setsockopt() Rick Jones @ 2008-07-08 1:48 ` J. Bruce Fields 0 siblings, 0 replies; 43+ messages in thread From: J. Bruce Fields @ 2008-07-08 1:48 UTC (permalink / raw) To: Rick Jones; +Cc: netdev, David Miller, aglo, shemminger, rees On Mon, Jul 07, 2008 at 06:15:01PM -0700, Rick Jones wrote: > Rick Jones wrote: >> David Miller wrote: >> >>> We need 2x, in order to have a full window during recovery. >>> >>> There was a measurement bug found a few months ago when the >>> google folks were probing in this area, which was fixed >>> by John Heffner. Most of which had to deal with TSO subtleties. >>> >>> -------------------- >>> commit 246eb2af060fc32650f07203c02bdc0456ad76c7 >>> ... >>> commit ce447eb91409225f8a488f6b7b2a1bdf7b2d884f >>> ... >> I'll try my tests again with newer kernels since I'm not 100% certain I >> was trying with those commits in place. > > Did those commits make it into 2.6.26-rc9? (Gentle taps of clue-bat as > to how to use git to check commits in various trees would be welcome - > to say I am a git noob would be an understatement - the tree from which > that kernel was made was cloned from Linus' about 16:00 to 17:00 pacific > time) "git describe --contains", will tell you the first tag git finds containing the given commit: bfields@pickle:linux$ git describe --contains 246eb2af060 v2.6.26-rc1~95^2~18 bfields@pickle:linux$ git describe --contains ce447eb9140 v2.6.26-rc1~95^2~19 So both were already in 2.6.25-rc1. Or if you forget that, another trick is to note that git log A..B really means "tell me all the commits contained in B but not in A". So "git log A..B" returns output if (and only if) B is not contained in A. For example, if "git log HEAD..246eb2af060" returns without any output, then you know that 246eb2af060 is contained in the head of the currently checked-out branch. --b. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-07 23:27 ` setsockopt() Rick Jones 2008-07-08 1:15 ` setsockopt() Rick Jones @ 2008-07-08 1:44 ` David Miller 1 sibling, 0 replies; 43+ messages in thread From: David Miller @ 2008-07-08 1:44 UTC (permalink / raw) To: rick.jones2; +Cc: aglo, shemminger, netdev, rees, bfields From: Rick Jones <rick.jones2@hp.com> Date: Mon, 07 Jul 2008 16:27:14 -0700 > David Miller wrote: > > Setting TCP socket buffer via setsockopt() is always wrong. > > Does that apply equally to SO_SNDBUF and SO_RCVBUF? Yes. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-07 22:50 ` setsockopt() Rick Jones 2008-07-07 23:00 ` setsockopt() David Miller @ 2008-07-08 3:33 ` John Heffner 2008-07-08 18:16 ` setsockopt() Rick Jones [not found] ` <349f35ee0807090255s58fd040bne265ee117d06d397@mail.gmail.com> 1 sibling, 2 replies; 43+ messages in thread From: John Heffner @ 2008-07-08 3:33 UTC (permalink / raw) To: Rick Jones; +Cc: netdev On Mon, Jul 7, 2008 at 3:50 PM, Rick Jones <rick.jones2@hp.com> wrote: > I'm still a triffle puzzled/concerned/confused by the extent to which > autotuning will allow the receive window to grow, again based on some > netperf experience thusfar, and patient explanations provided here and > elsewhere, it seems as though autotuning will let things get to 2x what it > thinks the sender's cwnd happens to be. So far under netperf testing that > seems to be the case, and 99 times out of ten my netperf tests will have the > window grow to the max. Rick, I thought this was covered pretty thoroughly back in April. The behavior you're seeing is 100% expected, and not likely to change unless Jerry Chu gets his local queued data measurement patch working. I'm not sure what ultimately happened there, but it was a cool idea and I hope he has time to polish it up. It's definitely tricky to get right. Jerry's optimization is a sender-side change. The fact that the receiver announces enough window is almost certainly the right thing for it to do, and (I hope) this will not change. If you're still curious: http://www.psc.edu/networking/ftp/papers/autotune_sigcomm98.ps http://www.lanl.gov/radiant/pubs/drs/lacsi2001.pdf http://staff.psc.edu/jheffner/papers/senior_thesis.pdf -John ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-08 3:33 ` setsockopt() John Heffner @ 2008-07-08 18:16 ` Rick Jones 2008-07-08 19:10 ` setsockopt() John Heffner [not found] ` <349f35ee0807090255s58fd040bne265ee117d06d397@mail.gmail.com> 1 sibling, 1 reply; 43+ messages in thread From: Rick Jones @ 2008-07-08 18:16 UTC (permalink / raw) To: John Heffner; +Cc: netdev John Heffner wrote: > On Mon, Jul 7, 2008 at 3:50 PM, Rick Jones <rick.jones2@hp.com> wrote: > >>I'm still a triffle puzzled/concerned/confused by the extent to which >>autotuning will allow the receive window to grow, again based on some >>netperf experience thusfar, and patient explanations provided here and >>elsewhere, it seems as though autotuning will let things get to 2x what it >>thinks the sender's cwnd happens to be. So far under netperf testing that >>seems to be the case, and 99 times out of ten my netperf tests will have the >>window grow to the max. > > > > Rick, > > I thought this was covered pretty thoroughly back in April. I'll plead bit errors in the dimm wetware memory :( And go back through the archives. > The behavior you're seeing is 100% expected, and not likely to change > unless Jerry Chu gets his local queued data measurement patch > working. I'm not sure what ultimately happened there, but it was a > cool idea and I hope he has time to polish it up. It's definitely > tricky to get right. > > Jerry's optimization is a sender-side change. The fact that the > receiver announces enough window is almost certainly the right thing > for it to do, and (I hope) this will not change. It just seems to be so, well, trusting of the sender. > > If you're still curious: > http://www.psc.edu/networking/ftp/papers/autotune_sigcomm98.ps > http://www.lanl.gov/radiant/pubs/drs/lacsi2001.pdf That one didn't show the effect on LANs, only WANs, although it did say things like this when discussing timer granularity and estimating the sender's window: page 4 - "Thus in no case will the actual window be larger than the measured amount of data recieved during the period. However, the amount of data received during the period may be three times the actual window size when measurements are made across wide-area networks with rtt > 20 ms. Further, local networks with small round-trip delays may be grossly over-estimated." I imagine that the 20 ms bit depends on the release and its timer granularity. It was really that last sentence that caught my eye. Still, rerunning with multiple concurrent tests showed they weren't all going to the limit: moe:~# for i in 1 2 3 4; do netperf -t omni -l 30 -H manny -P 0 -- -o foo & done moe:~# 289.31,-1,16384,2635688,-1,87380,2389248 210.43,-1,16384,2415720,-1,87380,2084736 194.87,-1,16384,1783312,-1,87380,1760704 247.00,-1,16384,2647472,-1,87380,2646912 moe:~# for i in 1 2 3 4; do netperf -t omni -l 120 -H manny -P 0 -- -o foo & done moe:~# 240.19,-1,16384,2761384,-1,87380,2635200 197.78,-1,16384,2337160,-1,87380,2225280 220.47,-1,16384,2867440,-1,87380,2834304 283.08,-1,16384,3244528,-1,87380,3091968 I'm not sure the extent to which skew error might be at issue in those measurements. rick jones ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-08 18:16 ` setsockopt() Rick Jones @ 2008-07-08 19:10 ` John Heffner 0 siblings, 0 replies; 43+ messages in thread From: John Heffner @ 2008-07-08 19:10 UTC (permalink / raw) To: Rick Jones; +Cc: netdev On Tue, Jul 8, 2008 at 11:16 AM, Rick Jones <rick.jones2@hp.com> wrote: > John Heffner wrote: >> Jerry's optimization is a sender-side change. The fact that the >> receiver announces enough window is almost certainly the right thing >> for it to do, and (I hope) this will not change. > > It just seems to be so, well, trusting of the sender. Really, it's not trusting the sender at all. The way it works is that a receiver sizes its buffer based strictly on *how much its application has read in an RTT*. It the application reads slowly, it uses a small buffer. If it reads quickly, it increases the size so that the TCP buffer is big enough to not be the limiting factor (if system limits allow). That's about all there is to it. The only effect the sender has is that it it sends slowly, it bounds the rate at which the receiver can read, and consequently results in an appropriately small receive buffer. The issue you're talking about is when the RTT gets inflated by filling a buffer somewhere in that path -- in your case specifically in the sender's interface queue. When the RTT gets inflated, the receiver will continue to track it, and continue announcing a window large enough that it doesn't limit the sender's window. In this case, it is not really paying a penalty, since it's keeping up and it doesn't actually have to buffer any data. It will happily let the sender continue to fill its own buffers, and the sender will pay the penalty. The receiver *can* try to do something about this situation, basically by seeing that the RTT is increasing, and not using the higher RTT values in its calculation. However, this is a very dangerous game, and comes with all the issues of delay-based congestion control. (Basically, you can't tell if your flow is the one causing the queuing or if it's cross/reverse-path traffic. Or if increased delay is caused by a routing change. Or wireless link layer games.) If you're going to try to solve this problem, the sender is the better place to do it, because it has better information, and because it pays the higher cost. -John ^ permalink raw reply [flat|nested] 43+ messages in thread
[parent not found: <349f35ee0807090255s58fd040bne265ee117d06d397@mail.gmail.com>]
* Re: setsockopt() [not found] ` <349f35ee0807090255s58fd040bne265ee117d06d397@mail.gmail.com> @ 2008-07-09 10:38 ` Jerry Chu 0 siblings, 0 replies; 43+ messages in thread From: Jerry Chu @ 2008-07-09 10:38 UTC (permalink / raw) To: johnwheffner; +Cc: netdev, aglo, ranjitm On Wed, Jul 9, 2008 at 2:55 AM, H.K. Jerry Chu <hkjerry.chu@gmail.com> wrote: > > > ---------- Forwarded message ---------- > From: John Heffner <johnwheffner@gmail.com> > Date: Mon, Jul 7, 2008 at 8:33 PM > Subject: Re: setsockopt() > To: Rick Jones <rick.jones2@hp.com> > Cc: netdev@vger.kernel.org > > > On Mon, Jul 7, 2008 at 3:50 PM, Rick Jones <rick.jones2@hp.com> wrote: > > I'm still a triffle puzzled/concerned/confused by the extent to which > > autotuning will allow the receive window to grow, again based on some > > netperf experience thusfar, and patient explanations provided here and > > elsewhere, it seems as though autotuning will let things get to 2x what it > > thinks the sender's cwnd happens to be. So far under netperf testing that > > seems to be the case, and 99 times out of ten my netperf tests will have the > > window grow to the max. > > > Rick, > > I thought this was covered pretty thoroughly back in April. The > behavior you're seeing is 100% expected, and not likely to change > unless Jerry Chu gets his local queued data measurement patch working. > I'm not sure what ultimately happened there, but it was a cool idea > and I hope he has time to polish it up. It's definitely tricky to get > right. Yes most certainly! I've had the non-TSO code mostly working for the past couple of months (i.e., cwnd grows only to ~50KB on a local 1GbE setup). But no such luck with TSO. Although the idea (exclusing pkts still stuck in some queues inside the sending host from "in_flight" when deciding whether needing to grow cwnd or not) seems simple, getting the accounting right for TSO seems impossible. After catching and fixing a slew of cases for 1GbE and seemingly getting close to the end of the tunnel, I moved my tests to 10GbE last month and discovered accounting leakage again. Basically my count of all the pkts still stuck inside the host sometimes becomes larger than total in-flight. I have not figured out what skb paths I might have missed, or is it possible the over-zealously-tuned 10G drivers are doing something funky? Not to mention a number of other tricky scenario - e.g., when TSO is enabled on 1GbE, the code works well for a netperf streaming test but not the RR test with 1MB request size. After a while I discovered that tcp_sendmsg() for the 1MB RR tests often runs in a tight loop without flow control, hence always hitting snd_cwnd, even though acks have come back. This is because the socket lock is only released during flow control. The problem went away when I check and let the return traffic in inside the tcp_sendmsg() loop. This kind of stuff can easily spoil my original simple algorithm. Jerry > Jerry's optimization is a sender-side change. The fact that the > receiver announces enough window is almost certainly the right thing > for it to do, and (I hope) this will not change. > > If you're still curious: > http://www.psc.edu/networking/ftp/papers/autotune_sigcomm98.ps > http://www.lanl.gov/radiant/pubs/drs/lacsi2001.pdf > http://staff.psc.edu/jheffner/papers/senior_thesis.pdf > > -John > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-07 21:24 ` setsockopt() Stephen Hemminger 2008-07-07 21:30 ` setsockopt() Olga Kornievskaia @ 2008-07-07 21:32 ` J. Bruce Fields 1 sibling, 0 replies; 43+ messages in thread From: J. Bruce Fields @ 2008-07-07 21:32 UTC (permalink / raw) To: Stephen Hemminger; +Cc: Olga Kornievskaia, netdev, Jim Rees On Mon, Jul 07, 2008 at 02:24:08PM -0700, Stephen Hemminger wrote: > On Mon, 07 Jul 2008 14:18:38 -0400 > Olga Kornievskaia <aglo@citi.umich.edu> wrote: > > > Hi, > > > > I'd like to ask a question regarding socket options, more > > specifically send and receive buffer sizes. > > > > One simple question: (on the server-side) is it true that, to set > > send/receive buffer size, setsockopt() can only be called before > > listen()? From what I can tell, if I were to set socket options for the > > listening socket, they get inherited by the socket created during the > > accept(). However, when I try to change send/receive buffer size for the > > new socket, they take no affect. > > > > The server in question is the NFSD server in the kernel. NFSD's code > > tries to adjust the buffer size (in order to have TCP increase the > > window size appropriately) but it does so after the new socket is > > created. It leads to the fact that the TCP window doesn't open beyond > > the TCP's "default" sysctl value (that would be the 2nd value in the > > triple net.ipv4.tcp_rmem, which on our system is set to 64KB). We > > changed the code so that setsockopt() is called for the listening socket > > is created and we set the buffer sizes to something bigger, like 8MB. > > Then we try to increase the buffer size for each socket created by the > > accept() but what is seen on the network trace is that window size > > doesn't open beyond the values used for the listening socket. > > It would be better if NFSD stayed out of doign setsockopt and just > let the sender/receiver autotuning work? Just googling around.... Yes, that's probably exactly what we want, thanks! Any pointers to a good tutorial on the autotuning behavior? So all we should have to do is never mess with setsockopt, and the receive buffer size can increase up to the maximum (the third integer in the tcp_rmem sysctl) if necessary? --b. > > > I looked around in the code. There is a variable called > > "window_clamp" that seems to specifies the largest possible window > > advertisement. window_clamp gets set during the creation of the accept > > socket. At that time, it's value is based on the sk_rcvbuf of the > > listening socket. Thus, that would explain the behavior that window > > doesn't grow beyond the values used in setsockopt() for the listening > > socket, even though the new socket has new (larger) sk_sndbuf and > > sk_rcvbuf than the listening socket. > > > > I realize that send/receive buffer size and window advertisement are > > different but they are related in the way that by telling TCP that we > > have a certain amount of memory for socket operations, it should try to > > open big enough window (provided that there is no congestion). > > > > Can somebody advise us on how to properly set send/receive buffer > > sizes for the NFSD in the kernel such that (1) the window is not bound > > by the TCP's default sysctl value and (2) if it is possible to do so for > > the accept sockets and not the listening socket. > > > > I would appreciate if we could be CC-ed on the reply as we are not > > subscribed to the netdev mailing list. > > > > Thank you. > > > > -Olga > > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe netdev" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: setsockopt() 2008-07-07 18:18 setsockopt() Olga Kornievskaia 2008-07-07 21:24 ` setsockopt() Stephen Hemminger @ 2008-07-08 1:17 ` John Heffner 1 sibling, 0 replies; 43+ messages in thread From: John Heffner @ 2008-07-08 1:17 UTC (permalink / raw) To: Olga Kornievskaia; +Cc: netdev, Jim Rees, J. Bruce Fields On Mon, Jul 7, 2008 at 11:18 AM, Olga Kornievskaia <aglo@citi.umich.edu> wrote: > Can somebody advise us on how to properly set send/receive buffer sizes > for the NFSD in the kernel such that (1) the window is not bound by the > TCP's default sysctl value and (2) if it is possible to do so for the accept > sockets and not the listening socket. As others have said, most likely you'd be better off without calling SO_{SND,RCV}BUF. It's possible but difficult in some circumstances to do better than the kernel's autotuning. If you must do SO_RCVBUF, you also need to set TCP_WINDOW_CLAMP. It would probably be better if the kernel would recalculate window_clamp on an SO_RCVBUF automatically, though this is slightly problematic from a layering point of view. Note, however, that changing SO_RCVBUF after connection establishment is not supported on many OS's, and usually isn't what you want to do. -John ^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2008-07-11 0:51 UTC | newest]
Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-07 18:18 setsockopt() Olga Kornievskaia
2008-07-07 21:24 ` setsockopt() Stephen Hemminger
2008-07-07 21:30 ` setsockopt() Olga Kornievskaia
2008-07-07 21:33 ` setsockopt() Stephen Hemminger
2008-07-07 21:49 ` setsockopt() David Miller
2008-07-08 4:54 ` setsockopt() Evgeniy Polyakov
2008-07-08 6:02 ` setsockopt() Bill Fink
2008-07-08 6:29 ` setsockopt() Roland Dreier
2008-07-08 6:43 ` setsockopt() Evgeniy Polyakov
2008-07-08 7:03 ` setsockopt() Roland Dreier
2008-07-08 18:48 ` setsockopt() Bill Fink
2008-07-09 18:10 ` setsockopt() Roland Dreier
2008-07-09 18:34 ` setsockopt() Evgeniy Polyakov
2008-07-10 2:50 ` setsockopt() Bill Fink
2008-07-10 17:26 ` setsockopt() Rick Jones
2008-07-11 0:50 ` setsockopt() Bill Fink
2008-07-08 20:48 ` setsockopt() Stephen Hemminger
2008-07-08 22:05 ` setsockopt() Bill Fink
2008-07-09 5:25 ` setsockopt() Evgeniy Polyakov
2008-07-09 5:47 ` setsockopt() Bill Fink
2008-07-09 6:03 ` setsockopt() Evgeniy Polyakov
2008-07-09 18:11 ` setsockopt() J. Bruce Fields
2008-07-09 18:43 ` setsockopt() Evgeniy Polyakov
2008-07-09 22:28 ` setsockopt() J. Bruce Fields
2008-07-10 1:06 ` setsockopt() Evgeniy Polyakov
2008-07-10 20:05 ` [PATCH] Documentation: clarify tcp_{r,w}mem sysctl docs J. Bruce Fields
2008-07-10 23:50 ` David Miller
2008-07-08 20:12 ` setsockopt() Jim Rees
2008-07-08 21:54 ` setsockopt() John Heffner
2008-07-08 23:51 ` setsockopt() Jim Rees
2008-07-09 0:07 ` setsockopt() John Heffner
2008-07-07 22:50 ` setsockopt() Rick Jones
2008-07-07 23:00 ` setsockopt() David Miller
2008-07-07 23:27 ` setsockopt() Rick Jones
2008-07-08 1:15 ` setsockopt() Rick Jones
2008-07-08 1:48 ` setsockopt() J. Bruce Fields
2008-07-08 1:44 ` setsockopt() David Miller
2008-07-08 3:33 ` setsockopt() John Heffner
2008-07-08 18:16 ` setsockopt() Rick Jones
2008-07-08 19:10 ` setsockopt() John Heffner
[not found] ` <349f35ee0807090255s58fd040bne265ee117d06d397@mail.gmail.com>
2008-07-09 10:38 ` setsockopt() Jerry Chu
2008-07-07 21:32 ` setsockopt() J. Bruce Fields
2008-07-08 1:17 ` setsockopt() John Heffner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).