* TCP_CONGESTION documentation
@ 2008-11-21 16:06 Michael Kerrisk
[not found] ` <4926DC7B.7020203-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
0 siblings, 1 reply; 12+ messages in thread
From: Michael Kerrisk @ 2008-11-21 16:06 UTC (permalink / raw)
To: Stephen Hemminger
Cc: David Miller, linux-net-u79uwXL29TY76Z2rM5mHXA, Andi Kleen,
linux-man-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk
Hello Stephen,
Back in 2.6.13, you added the TCP_CONGESTION sockopt, but
provided no man-page patch...
Below is my attempt to document this sockopt. Could you please
review. Please don't assume I've well understood the code: I
may well have messed up in my reading of it, so review what
I've written with care.
Also, a question: was the silent truncation of the returned
string on getsockopt() if optlen is too small really intended?
Would it not be/have been better to error on this case?
Cheers,
Michael
TCP_CONGESTION (since Linux 2.6.13)
Get or set the congestion-control algorithm for this
socket. The optval argument is a pointer to a
character-string buffer.
For getsockopt() *optlen specifies the amount of space
available in the buffer pointed to by optval, which
should be at least 16 bytes (defined by the kernel-
internal constant TCP_CA_NAME_MAX). On return, the
buffer pointed to by optval is set to a null-terminated
string containing the name of the congestion-control
algorithm for this socket, and *optlen is set to the
minimum of its original value and TCP_CA_NAME_MAX. If
the value passed in *optlen is too small, then the
string returned in *optval is silently truncated, and no
terminating null byte is added. If an empty string is
returned, then the socket is using the default conges-
tion-control algorithm, determined as described under
tcp_congestion_control above.
For setsockopt() optlen specifies the length of the con-
gestion-control algorithm name contained in the buffer
pointed to by optval; this length need not include any
terminating null byte. The algorithm "reno" is always
permitted; other algorithms may be available, depending
on kernel configuration. Possible errors from setsock-
opt() include: algorithm not found/available (ENOENT);
setting this algorithm requires the CAP_NET_ADMIN capa-
bility (EPERM); and failure getting kernel module
(EBUSY).
--- tcp.7 2008-11-21 10:54:08.000000000 -0500
+++ tcp.7.TCP_CONGESTION.patch 2008-11-21 10:53:36.000000000 -0500
@@ -733,7 +733,58 @@
socket options are valid on TCP sockets.
For more information see
.BR ip (7).
-.\" FIXME Document TCP_CONGESTION (new in 2.6.13)
+.TP
+.BR TCP_CONGESTION " (since Linux 2.6.13)"
+Get or set the congestion-control algorithm for this socket.
+The
+.I optval
+argument is a pointer to a character-string buffer.
+
+For
+.BR getsockopt ()
+.I *optlen
+specifies the amount of space available in the buffer pointed to by
+.IR optval ,
+which should be at least 16 bytes (defined by the kernel-internal constant
+.BR TCP_CA_NAME_MAX ).
+On return, the buffer pointed to by
+.I optval
+is set to a null-terminated string containing the name of the
+congestion-control algorithm for this socket, and
+.I *optlen
+is set to the minimum of its original value and
+.BR TCP_CA_NAME_MAX .
+If the value passed in
+.I *optlen
+is too small, then the string returned in
+.I *optval
+is silently truncated, and no terminating null byte is added.
+If an empty string is returned, then the socket is using the default
+congestion-control algorithm, determined as described under
+.I tcp_congestion_control
+above.
+
+For
+.BR setsockopt ()
+.I optlen
+specifies the length of the congestion-control algorithm name
+contained in the buffer pointed to by
+.IR optval ;
+this length need not include any terminating null byte.
+The algorithm "reno" is always permitted;
+other algorithms may be available, depending on kernel configuration.
+Possible errors from
+.BR setsockopt ()
+include:
+algorithm not found/available
+.RB ( ENOENT );
+setting this algorithm requires the
+.B CAP_NET_ADMIN
+capability
+.RB ( EPERM );
+and failure getting kernel module
+.RB ( EBUSY ).
+.I
.TP
.B TCP_CORK
If set, don't send out partial frames.
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread[parent not found: <4926DC7B.7020203-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: TCP_CONGESTION documentation [not found] ` <4926DC7B.7020203-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2008-11-21 16:08 ` Michael Kerrisk 2008-11-21 20:42 ` Andi Kleen 2008-11-21 19:57 ` Stephen Hemminger 1 sibling, 1 reply; 12+ messages in thread From: Michael Kerrisk @ 2008-11-21 16:08 UTC (permalink / raw) To: Stephen Hemminger Cc: David Miller, linux-net-u79uwXL29TY76Z2rM5mHXA, linux-man-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk, Andi Kleen [CC+= Andi, this time with the right address] On Fri, Nov 21, 2008 at 11:06 AM, Michael Kerrisk <mtk.manpages-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote: > Hello Stephen, > > Back in 2.6.13, you added the TCP_CONGESTION sockopt, but > provided no man-page patch... > > Below is my attempt to document this sockopt. Could you please > review. Please don't assume I've well understood the code: I > may well have messed up in my reading of it, so review what > I've written with care. > > Also, a question: was the silent truncation of the returned > string on getsockopt() if optlen is too small really intended? > Would it not be/have been better to error on this case? > > Cheers, > > Michael > > > TCP_CONGESTION (since Linux 2.6.13) > Get or set the congestion-control algorithm for this > socket. The optval argument is a pointer to a > character-string buffer. > > For getsockopt() *optlen specifies the amount of space > available in the buffer pointed to by optval, which > should be at least 16 bytes (defined by the kernel- > internal constant TCP_CA_NAME_MAX). On return, the > buffer pointed to by optval is set to a null-terminated > string containing the name of the congestion-control > algorithm for this socket, and *optlen is set to the > minimum of its original value and TCP_CA_NAME_MAX. If > the value passed in *optlen is too small, then the > string returned in *optval is silently truncated, and no > terminating null byte is added. If an empty string is > returned, then the socket is using the default conges- > tion-control algorithm, determined as described under > tcp_congestion_control above. > > For setsockopt() optlen specifies the length of the con- > gestion-control algorithm name contained in the buffer > pointed to by optval; this length need not include any > terminating null byte. The algorithm "reno" is always > permitted; other algorithms may be available, depending > on kernel configuration. Possible errors from setsock- > opt() include: algorithm not found/available (ENOENT); > setting this algorithm requires the CAP_NET_ADMIN capa- > bility (EPERM); and failure getting kernel module > (EBUSY). > > --- tcp.7 2008-11-21 10:54:08.000000000 -0500 > +++ tcp.7.TCP_CONGESTION.patch 2008-11-21 10:53:36.000000000 -0500 > @@ -733,7 +733,58 @@ > socket options are valid on TCP sockets. > For more information see > .BR ip (7). > -.\" FIXME Document TCP_CONGESTION (new in 2.6.13) > +.TP > +.BR TCP_CONGESTION " (since Linux 2.6.13)" > +Get or set the congestion-control algorithm for this socket. > +The > +.I optval > +argument is a pointer to a character-string buffer. > + > +For > +.BR getsockopt () > +.I *optlen > +specifies the amount of space available in the buffer pointed to by > +.IR optval , > +which should be at least 16 bytes (defined by the kernel-internal constant > +.BR TCP_CA_NAME_MAX ). > +On return, the buffer pointed to by > +.I optval > +is set to a null-terminated string containing the name of the > +congestion-control algorithm for this socket, and > +.I *optlen > +is set to the minimum of its original value and > +.BR TCP_CA_NAME_MAX . > +If the value passed in > +.I *optlen > +is too small, then the string returned in > +.I *optval > +is silently truncated, and no terminating null byte is added. > +If an empty string is returned, then the socket is using the default > +congestion-control algorithm, determined as described under > +.I tcp_congestion_control > +above. > + > +For > +.BR setsockopt () > +.I optlen > +specifies the length of the congestion-control algorithm name > +contained in the buffer pointed to by > +.IR optval ; > +this length need not include any terminating null byte. > +The algorithm "reno" is always permitted; > +other algorithms may be available, depending on kernel configuration. > +Possible errors from > +.BR setsockopt () > +include: > +algorithm not found/available > +.RB ( ENOENT ); > +setting this algorithm requires the > +.B CAP_NET_ADMIN > +capability > +.RB ( EPERM ); > +and failure getting kernel module > +.RB ( EBUSY ). > +.I > .TP > .B TCP_CORK > If set, don't send out partial frames. > > > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git man-pages online: http://www.kernel.org/doc/man-pages/online_pages.html Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: TCP_CONGESTION documentation 2008-11-21 16:08 ` Michael Kerrisk @ 2008-11-21 20:42 ` Andi Kleen [not found] ` <20081121204210.GG6703-qrUzlfsMFqo/4alezvVtWx2eb7JE58TQ@public.gmane.org> 0 siblings, 1 reply; 12+ messages in thread From: Andi Kleen @ 2008-11-21 20:42 UTC (permalink / raw) To: mtk.manpages Cc: Stephen Hemminger, David Miller, linux-net, linux-man, Michael Kerrisk, Andi Kleen On Fri, Nov 21, 2008 at 11:08:22AM -0500, Michael Kerrisk wrote: > [CC+= Andi, this time with the right address] Just a general comment. The initial DESCRIPTION in tcp should be probably adapted to mentioned that Linux has pluggable congestion avoidance modules now and also that the defaults have changed (from NewReno to CUBIC etc.) -Andi ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <20081121204210.GG6703-qrUzlfsMFqo/4alezvVtWx2eb7JE58TQ@public.gmane.org>]
* Re: TCP_CONGESTION documentation [not found] ` <20081121204210.GG6703-qrUzlfsMFqo/4alezvVtWx2eb7JE58TQ@public.gmane.org> @ 2008-11-21 20:44 ` Michael Kerrisk [not found] ` <cfd18e0f0811211244v5d391f8du3380332a721ed33-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 12+ messages in thread From: Michael Kerrisk @ 2008-11-21 20:44 UTC (permalink / raw) To: Andi Kleen Cc: Stephen Hemminger, David Miller, linux-net-u79uwXL29TY76Z2rM5mHXA, linux-man-u79uwXL29TY76Z2rM5mHXA Hi Andi, On Fri, Nov 21, 2008 at 3:42 PM, Andi Kleen <andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org> wrote: > On Fri, Nov 21, 2008 at 11:08:22AM -0500, Michael Kerrisk wrote: >> [CC+= Andi, this time with the right address] > > Just a general comment. The initial DESCRIPTION in tcp should > be probably adapted to mentioned that Linux has pluggable > congestion avoidance modules now and also that the defaults > have changed (from NewReno to CUBIC etc.) If I try to do this, I'm going to create rubbish, because I know next to nothing about these details... Could I ask a favor? Below is the DESCRIPTION text. Could you note write some sentences in the rough location where you think they below, and I will turn that into a *roff patch. Thanks, Michael This is an implementation of the TCP protocol defined in RFC 793, RFC 1122 and RFC 2001 with the NewReno and SACK extensions. It provides a reliable, stream-oriented, full-duplex connection between two sockets on top of ip(7), for both v4 and v6 versions. TCP guarantees that the data arrives in order and retransmits lost packets. It generates and checks a per-packet checksum to catch transmission errors. TCP does not preserve record bound- aries. A newly created TCP socket has no remote or local address and is not fully specified. To create an outgoing TCP connection use connect(2) to establish a connection to another TCP socket. To receive new incoming connections, first bind(2) the socket to a local address and port and then call listen(2) to put the socket into the listening state. After that a new socket for each incoming connec- tion can be accepted using accept(2). A socket which has had accept(2) or connect(2) successfully called on it is fully specified and may transmit data. Data cannot be transmitted on listening or not yet connected sockets. Linux supports RFC 1323 TCP high performance extensions. These include Protection Against Wrapped Sequence Numbers (PAWS), Window Scaling and Timestamps. Window scaling allows the use of large (> 64K) TCP windows in order to support links with high latency or bandwidth. To make use of them, the send and receive buffer sizes must be increased. They can be set globally with the /proc/sys/net/ipv4/tcp_wmem and /proc/sys/net/ipv4/tcp_rmem files, or on individual sock- ets by using the SO_SNDBUF and SO_RCVBUF socket options with the setsockopt(2) call. The maximum sizes for socket buffers declared via the SO_SNDBUF and SO_RCVBUF mechanisms are limited by the values in the /proc/sys/net/core/rmem_max and /proc/sys/net/core/wmem_max files. Note that TCP actu- ally allocates twice the size of the buffer requested in the setsockopt(2) call, and so a succeeding getsockopt(2) call will not return the same size of buffer as requested in the setsockopt(2) call. TCP uses the extra space for administrative purposes and internal kernel structures, and the /proc file values reflect the larger sizes com- pared to the actual TCP windows. On individual connec- tions, the socket buffer size must be set prior to the listen(2) or connect(2) calls in order to have it take effect. See socket(7) for more information. TCP supports urgent data. Urgent data is used to signal the receiver that some important message is part of the data stream and that it should be processed as soon as possible. To send urgent data specify the MSG_OOB option to send(2). When urgent data is received, the kernel sends a SIGURG signal to the process or process group that has been set as the socket "owner" using the SIOCSP- GRP or FIOSETOWN ioctls (or the POSIX.1-2001-specified fcntl(2) F_SETOWN operation). When the SO_OOBINLINE socket option is enabled, urgent data is put into the normal data stream (a program can test for its location using the SIOCATMARK ioctl described below), otherwise it can be only received when the MSG_OOB flag is set for recv(2) or recvmsg(2). Linux 2.4 introduced a number of changes for improved throughput and scaling, as well as enhanced functional- ity. Some of these features include support for zero- copy sendfile(2), Explicit Congestion Notification, new management of TIME_WAIT sockets, keep-alive socket options and support for Duplicate SACK extensions. -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git man-pages online: http://www.kernel.org/doc/man-pages/online_pages.html Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <cfd18e0f0811211244v5d391f8du3380332a721ed33-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: TCP_CONGESTION documentation [not found] ` <cfd18e0f0811211244v5d391f8du3380332a721ed33-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2008-11-21 21:16 ` Andi Kleen 2008-11-22 7:39 ` Stephen Hemminger 1 sibling, 0 replies; 12+ messages in thread From: Andi Kleen @ 2008-11-21 21:16 UTC (permalink / raw) To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w Cc: Andi Kleen, Stephen Hemminger, David Miller, linux-net-u79uwXL29TY76Z2rM5mHXA, linux-man-u79uwXL29TY76Z2rM5mHXA On Fri, Nov 21, 2008 at 03:44:05PM -0500, Michael Kerrisk wrote: > Hi Andi, > > On Fri, Nov 21, 2008 at 3:42 PM, Andi Kleen <andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org> wrote: > > On Fri, Nov 21, 2008 at 11:08:22AM -0500, Michael Kerrisk wrote: > >> [CC+= Andi, this time with the right address] > > > > Just a general comment. The initial DESCRIPTION in tcp should > > be probably adapted to mentioned that Linux has pluggable > > congestion avoidance modules now and also that the defaults > > have changed (from NewReno to CUBIC etc.) > > If I try to do this, I'm going to create rubbish, because I know next > to nothing about these details... > > Could I ask a favor? Below is the DESCRIPTION text. Could you note > write some sentences in the rough location where you think they below, > and I will turn that into a *roff patch. It would be better if Stephen or David do that. They kept better track of it than me. In the worst case I could try too. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: TCP_CONGESTION documentation [not found] ` <cfd18e0f0811211244v5d391f8du3380332a721ed33-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2008-11-21 21:16 ` Andi Kleen @ 2008-11-22 7:39 ` Stephen Hemminger 2008-11-22 14:56 ` Andi Kleen 1 sibling, 1 reply; 12+ messages in thread From: Stephen Hemminger @ 2008-11-22 7:39 UTC (permalink / raw) To: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w Cc: mtk.manpages-gM/Ye1E23mwN+BqQ9rBEUg, Andi Kleen, David Miller, linux-net-u79uwXL29TY76Z2rM5mHXA, linux-man-u79uwXL29TY76Z2rM5mHXA Does this help get it started in right direction?? -------------------------------------------- This is an implementation of the TCP protocol defined in RFC 793, RFC 1122 and RFC 2001 with the NewReno and SACK extensions. It provides a reliable, stream-oriented, full-duplex connection between two sockets on top of ip(7), for both v4 and v6 versions. TCP guarantees that the data arrives in order and retransmits lost packets. It generates and checks a per-packet checksum to catch transmission errors. TCP does not preserve record bound- aries. A newly created TCP socket has no remote or local address and is not fully specified. To create an outgoing TCP connection use connect(2) to establish a connection to another TCP socket. To receive new incoming connections, first bind(2) the socket to a local address and port and then call listen(2) to put the socket into the listening state. After that a new socket for each incoming connec- tion can be accepted using accept(2). A socket which has had accept(2) or connect(2) successfully called on it is fully specified and may transmit data. Data cannot be transmitted on listening or not yet connected sockets. [move buffering ahead of 1323 stuff - more important] Socket buffers on linux are automatically tuned by Linux TCP based on available memory and the throughput of the socket. The starting value and upper bound of buffer tuning is determined by tcp_rwin (for receiving) and tcp_wwin (for sending) as described in Sysctls secton. The buffer sizes can be fixed with SO_SNDBUF and SO_RCVBUF mechanisms. Note that TCP actually allocates twice the size of the buffer requested in the setsockopt(2) call, and so a succeeding getsockopt(2) call will not return the same size of buffer as requested in the setsockopt(2) call. TCP uses the extra space for administrative purposes and internal kernel structures, and the /proc file values reflect the larger sizes com- pared to the actual TCP windows. The maximum sizes for socket buffers declared via the SO_SNDBUF and SO_RCVBUF mechanisms are limited by the values in the net.core.rmem_max and net.core.wmem_max sysctl values. On individual connec- tions, the socket buffer size must be set prior to the listen(2) or connect(2) calls in order to have it take effect. See socket(7) for more information. Linux supports RFC 1323 TCP high performance extensions. These include Protection Against Wrapped Sequence Numbers (PAWS), Window Scaling and Timestamps. Window scaling allows the use of large (> 64K) TCP windows in order to support links with high latency or bandwidth. TCP supports urgent data. Urgent data is used to signal the receiver that some important message is part of the data stream and that it should be processed as soon as possible. To send urgent data specify the MSG_OOB option to send(2). When urgent data is received, the kernel sends a SIGURG signal to the process or process group that has been set as the socket "owner" using the SIOCSP- GRP or FIOSETOWN ioctls (or the POSIX.1-2001-specified fcntl(2) F_SETOWN operation). When the SO_OOBINLINE socket option is enabled, urgent data is put into the normal data stream (a program can test for its location using the SIOCATMARK ioctl described below), otherwise it can be only received when the MSG_OOB flag is set for recv(2) or recvmsg(2). Linux supports multiple different congestion control algorithms. The default choice of congestion control is controlled by net.ipv4.tcp_congestion_control sysctl. This value can be overridden by TCP_CONGESTION socket option. The actual choices of congestion control available vary according between release as more are added, and depend on the configuration choices made when the kernel was built. The list of congestion control protocols currently loaded is in net.ipv4.tcp_available_congestion_control. -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: TCP_CONGESTION documentation 2008-11-22 7:39 ` Stephen Hemminger @ 2008-11-22 14:56 ` Andi Kleen [not found] ` <20081122145632.GQ6703-qrUzlfsMFqo/4alezvVtWx2eb7JE58TQ@public.gmane.org> 0 siblings, 1 reply; 12+ messages in thread From: Andi Kleen @ 2008-11-22 14:56 UTC (permalink / raw) To: Stephen Hemminger Cc: mtk.manpages, mtk.manpages, Andi Kleen, David Miller, linux-net, linux-man On Fri, Nov 21, 2008 at 11:39:18PM -0800, Stephen Hemminger wrote: > Does this help get it started in right direction?? Yes. > -------------------------------------------- > > This is an implementation of the TCP protocol defined in > RFC 793, RFC 1122 and RFC 2001 with the NewReno and SACK > extensions. It provides a reliable, stream-oriented, Perhaps drop NewReno, it's really obsolete because Linux is so far beyond. > > Note that TCP actually allocates twice the size of the > buffer requested in The "twice" is obsolete, it's far more complicated now. So it should be just "more" I think > socket option is enabled, urgent data is put into the > normal data stream (a program can test for its location > using the SIOCATMARK ioctl described below), otherwise it > can be only received when the MSG_OOB flag is set for > recv(2) or recvmsg(2). > > Linux supports multiple different congestion control > algorithms. The default choice of congestion control is controlled > by net.ipv4.tcp_congestion_control sysctl. This value can > be overridden by TCP_CONGESTION socket option. The actual choices > of congestion control available vary according between release > as more are added, and depend on the configuration choices Hmm perhaps mention the current standard default? > made when the kernel was built. The list of congestion control > protocols currently loaded is in net.ipv4.tcp_available_congestion_control. Best would be probably to have an manpage for each of them, but I'm not going to write them :) -Andi -- ak@linux.intel.com ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <20081122145632.GQ6703-qrUzlfsMFqo/4alezvVtWx2eb7JE58TQ@public.gmane.org>]
* Re: TCP_CONGESTION documentation [not found] ` <20081122145632.GQ6703-qrUzlfsMFqo/4alezvVtWx2eb7JE58TQ@public.gmane.org> @ 2008-11-23 6:34 ` Stephen Hemminger 2008-11-23 20:06 ` Andi Kleen 0 siblings, 1 reply; 12+ messages in thread From: Stephen Hemminger @ 2008-11-23 6:34 UTC (permalink / raw) Cc: mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w, mtk.manpages-gM/Ye1E23mwN+BqQ9rBEUg, Andi Kleen, David Miller, linux-net-u79uwXL29TY76Z2rM5mHXA, linux-man-u79uwXL29TY76Z2rM5mHXA On Sat, 22 Nov 2008 15:56:32 +0100 Andi Kleen <andi-Vw/NltI1exuRpAAqCnN02g@public.gmane.org> wrote: > On Fri, Nov 21, 2008 at 11:39:18PM -0800, Stephen Hemminger wrote: > > Does this help get it started in right direction?? > > Yes. > > > -------------------------------------------- > > > > This is an implementation of the TCP protocol defined in > > RFC 793, RFC 1122 and RFC 2001 with the NewReno and SACK > > extensions. It provides a reliable, stream-oriented, > > Perhaps drop NewReno, it's really obsolete because Linux is so > far beyond. > > > > > Note that TCP actually allocates twice the size of the > > buffer requested in > > The "twice" is obsolete, it's far more complicated now. > So it should be just "more" I think > > > socket option is enabled, urgent data is put into the > > normal data stream (a program can test for its location > > using the SIOCATMARK ioctl described below), otherwise it > > can be only received when the MSG_OOB flag is set for > > recv(2) or recvmsg(2). > > > > Linux supports multiple different congestion control > > algorithms. The default choice of congestion control is controlled > > by net.ipv4.tcp_congestion_control sysctl. This value can > > be overridden by TCP_CONGESTION socket option. The actual choices > > of congestion control available vary according between release > > as more are added, and depend on the configuration choices > > Hmm perhaps mention the current standard default? There is no "standard default" it is a kernel config option. And it may change in future, so writing it into the manual page seems being short sighted. -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: TCP_CONGESTION documentation 2008-11-23 6:34 ` Stephen Hemminger @ 2008-11-23 20:06 ` Andi Kleen 0 siblings, 0 replies; 12+ messages in thread From: Andi Kleen @ 2008-11-23 20:06 UTC (permalink / raw) To: Stephen Hemminger Cc: Andi Kleen, mtk.manpages, mtk.manpages, David Miller, linux-net, linux-man > There is no "standard default" it is a kernel config option. There is one default y and very strong suggestions in Kconfig. I bet 99+% of the users use that, which means CUBIC now since kernel version number XX.YY (I forgot) Also the "depends on kernel config option" argument seems a poor one. A lot of things can be disabled with uncommon CONFIG options, but the man pages still document the standard defaults used by near all people. > And it may change in future, so writing it into the manual page > seems being short sighted. That just means that users will never know. -Andi -- ak@linux.intel.com ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: TCP_CONGESTION documentation [not found] ` <4926DC7B.7020203-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2008-11-21 16:08 ` Michael Kerrisk @ 2008-11-21 19:57 ` Stephen Hemminger 2008-11-21 20:32 ` Michael Kerrisk 1 sibling, 1 reply; 12+ messages in thread From: Stephen Hemminger @ 2008-11-21 19:57 UTC (permalink / raw) Cc: David Miller, linux-net-u79uwXL29TY76Z2rM5mHXA, Andi Kleen, linux-man-u79uwXL29TY76Z2rM5mHXA, Michael Kerrisk On Fri, 21 Nov 2008 11:06:19 -0500 Michael Kerrisk <mtk.manpages-gM/Ye1E23mwN+BqQ9rBEUg@public.gmane.org> wrote: > Hello Stephen, > > Back in 2.6.13, you added the TCP_CONGESTION sockopt, but > provided no man-page patch... > > Below is my attempt to document this sockopt. Could you please > review. Please don't assume I've well understood the code: I > may well have messed up in my reading of it, so review what > I've written with care. > > Also, a question: was the silent truncation of the returned > string on getsockopt() if optlen is too small really intended? > Would it not be/have been better to error on this case? > > Cheers, > > Michael > > > TCP_CONGESTION (since Linux 2.6.13) > Get or set the congestion-control algorithm for this > socket. The optval argument is a pointer to a > character-string buffer. > > For getsockopt() *optlen specifies the amount of space > available in the buffer pointed to by optval, which > should be at least 16 bytes (defined by the kernel- > internal constant TCP_CA_NAME_MAX). On return, the > buffer pointed to by optval is set to a null-terminated > string containing the name of the congestion-control > algorithm for this socket, and *optlen is set to the > minimum of its original value and TCP_CA_NAME_MAX. If > the value passed in *optlen is too small, then the > string returned in *optval is silently truncated, and no > terminating null byte is added. If an empty string is > returned, then the socket is using the default conges- > tion-control algorithm, determined as described under > tcp_congestion_control above. > > For setsockopt() optlen specifies the length of the con- > gestion-control algorithm name contained in the buffer > pointed to by optval; this length need not include any > terminating null byte. The algorithm "reno" is always > permitted; other algorithms may be available, depending > on kernel configuration. Possible errors from setsock- > opt() include: algorithm not found/available (ENOENT); > setting this algorithm requires the CAP_NET_ADMIN capa- > bility (EPERM); and failure getting kernel module > (EBUSY). The tcp(7) man page is related and seems out of date as well. At least on this sytem (Ubuntu 8.04).it seems to be stuck back in pre 2.6.12 timewarp (see tcp_bic, tcp_bic_low_window, ...) values that no longer exist. Should be updated as well. -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: TCP_CONGESTION documentation 2008-11-21 19:57 ` Stephen Hemminger @ 2008-11-21 20:32 ` Michael Kerrisk 2008-11-21 20:34 ` Michael Kerrisk 0 siblings, 1 reply; 12+ messages in thread From: Michael Kerrisk @ 2008-11-21 20:32 UTC (permalink / raw) To: Stephen Hemminger; +Cc: David Miller, linux-net, Andi Kleen, linux-man Hi Stephen, On Fri, Nov 21, 2008 at 2:57 PM, Stephen Hemminger <shemminger@linux-foundation.org> wrote: > On Fri, 21 Nov 2008 11:06:19 -0500 > Michael Kerrisk <mtk.manpages@googlemail.com> wrote: > >> Hello Stephen, >> >> Back in 2.6.13, you added the TCP_CONGESTION sockopt, but >> provided no man-page patch... >> >> Below is my attempt to document this sockopt. Could you please >> review. Please don't assume I've well understood the code: I >> may well have messed up in my reading of it, so review what >> I've written with care. >> >> Also, a question: was the silent truncation of the returned >> string on getsockopt() if optlen is too small really intended? >> Would it not be/have been better to error on this case? You added some other stuff below, but I got no response to the review request and the question above? >> TCP_CONGESTION (since Linux 2.6.13) >> Get or set the congestion-control algorithm for this >> socket. The optval argument is a pointer to a >> character-string buffer. >> >> For getsockopt() *optlen specifies the amount of space >> available in the buffer pointed to by optval, which >> should be at least 16 bytes (defined by the kernel- >> internal constant TCP_CA_NAME_MAX). On return, the >> buffer pointed to by optval is set to a null-terminated >> string containing the name of the congestion-control >> algorithm for this socket, and *optlen is set to the >> minimum of its original value and TCP_CA_NAME_MAX. If >> the value passed in *optlen is too small, then the >> string returned in *optval is silently truncated, and no >> terminating null byte is added. If an empty string is >> returned, then the socket is using the default conges- >> tion-control algorithm, determined as described under >> tcp_congestion_control above. >> >> For setsockopt() optlen specifies the length of the con- >> gestion-control algorithm name contained in the buffer >> pointed to by optval; this length need not include any >> terminating null byte. The algorithm "reno" is always >> permitted; other algorithms may be available, depending >> on kernel configuration. Possible errors from setsock- >> opt() include: algorithm not found/available (ENOENT); >> setting this algorithm requires the CAP_NET_ADMIN capa- >> bility (EPERM); and failure getting kernel module >> (EBUSY). > The tcp(7) man page is related (I'm a little confused by that remark: the patch at the end of my mail was *for* the tcp(7) man page; maybe you missed that.) > and seems out of date as well. Yes, many things on it are out of date. (I've made occasional requests to linux-net for help on this point, but have had little response.) > At least on this sytem (Ubuntu 8.04).it seems to be stuck back in pre 2.6.12 Yes, 2.6.12 sounds about right. > timewarp (see tcp_bic, tcp_bic_low_window, ...) values that no > longer exist. And when they disappear, no one CCs the man-page maintainer :-(. Anyway, thanks for the heads up tcp_bic* is now fixed (disappeared in 2.6.13) and tcp_westwood is now fixed (disappeared in 2.6.13) and tcp_vegas_cong_avoid is now fixed (disappeared in 2.6.13) > Should be updated as well. I'm working on it. Many updates to tcp(7) will be in man-pages-3.14. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git man-pages online: http://www.kernel.org/doc/man-pages/online_pages.html Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: TCP_CONGESTION documentation 2008-11-21 20:32 ` Michael Kerrisk @ 2008-11-21 20:34 ` Michael Kerrisk 0 siblings, 0 replies; 12+ messages in thread From: Michael Kerrisk @ 2008-11-21 20:34 UTC (permalink / raw) To: Stephen Hemminger; +Cc: David Miller, linux-net, linux-man, Andi Kleen Bother. CC += Andi again. On Fri, Nov 21, 2008 at 3:32 PM, Michael Kerrisk <mtk.manpages@googlemail.com> wrote: > Hi Stephen, > > On Fri, Nov 21, 2008 at 2:57 PM, Stephen Hemminger > <shemminger@linux-foundation.org> wrote: >> On Fri, 21 Nov 2008 11:06:19 -0500 >> Michael Kerrisk <mtk.manpages@googlemail.com> wrote: >> >>> Hello Stephen, >>> >>> Back in 2.6.13, you added the TCP_CONGESTION sockopt, but >>> provided no man-page patch... >>> >>> Below is my attempt to document this sockopt. Could you please >>> review. Please don't assume I've well understood the code: I >>> may well have messed up in my reading of it, so review what >>> I've written with care. >>> >>> Also, a question: was the silent truncation of the returned >>> string on getsockopt() if optlen is too small really intended? >>> Would it not be/have been better to error on this case? > > You added some other stuff below, but I got no response to the review > request and the question above? > >>> TCP_CONGESTION (since Linux 2.6.13) >>> Get or set the congestion-control algorithm for this >>> socket. The optval argument is a pointer to a >>> character-string buffer. >>> >>> For getsockopt() *optlen specifies the amount of space >>> available in the buffer pointed to by optval, which >>> should be at least 16 bytes (defined by the kernel- >>> internal constant TCP_CA_NAME_MAX). On return, the >>> buffer pointed to by optval is set to a null-terminated >>> string containing the name of the congestion-control >>> algorithm for this socket, and *optlen is set to the >>> minimum of its original value and TCP_CA_NAME_MAX. If >>> the value passed in *optlen is too small, then the >>> string returned in *optval is silently truncated, and no >>> terminating null byte is added. If an empty string is >>> returned, then the socket is using the default conges- >>> tion-control algorithm, determined as described under >>> tcp_congestion_control above. >>> >>> For setsockopt() optlen specifies the length of the con- >>> gestion-control algorithm name contained in the buffer >>> pointed to by optval; this length need not include any >>> terminating null byte. The algorithm "reno" is always >>> permitted; other algorithms may be available, depending >>> on kernel configuration. Possible errors from setsock- >>> opt() include: algorithm not found/available (ENOENT); >>> setting this algorithm requires the CAP_NET_ADMIN capa- >>> bility (EPERM); and failure getting kernel module >>> (EBUSY). > >> The tcp(7) man page is related > > (I'm a little confused by that remark: the patch at the end of my mail > was *for* the tcp(7) man page; maybe you missed that.) > >> and seems out of date as well. > > Yes, many things on it are out of date. (I've made occasional > requests to linux-net for help on this point, but have had little > response.) > >> At least on this sytem (Ubuntu 8.04).it seems to be stuck back in pre 2.6.12 > > Yes, 2.6.12 sounds about right. > >> timewarp (see tcp_bic, tcp_bic_low_window, ...) values that no >> longer exist. > > And when they disappear, no one CCs the man-page maintainer :-(. > > Anyway, thanks for the heads up > > tcp_bic* is now fixed (disappeared in 2.6.13) > and > tcp_westwood is now fixed (disappeared in 2.6.13) > and > tcp_vegas_cong_avoid is now fixed (disappeared in 2.6.13) > >> Should be updated as well. > > I'm working on it. Many updates to tcp(7) will be in man-pages-3.14. > > Cheers, > > Michael > > -- > Michael Kerrisk > Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ > git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git > man-pages online: http://www.kernel.org/doc/man-pages/online_pages.html > Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ git://git.kernel.org/pub/scm/docs/man-pages/man-pages.git man-pages online: http://www.kernel.org/doc/man-pages/online_pages.html Found a bug? http://www.kernel.org/doc/man-pages/reporting_bugs.html ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2008-11-23 20:06 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-21 16:06 TCP_CONGESTION documentation Michael Kerrisk
[not found] ` <4926DC7B.7020203-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2008-11-21 16:08 ` Michael Kerrisk
2008-11-21 20:42 ` Andi Kleen
[not found] ` <20081121204210.GG6703-qrUzlfsMFqo/4alezvVtWx2eb7JE58TQ@public.gmane.org>
2008-11-21 20:44 ` Michael Kerrisk
[not found] ` <cfd18e0f0811211244v5d391f8du3380332a721ed33-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-11-21 21:16 ` Andi Kleen
2008-11-22 7:39 ` Stephen Hemminger
2008-11-22 14:56 ` Andi Kleen
[not found] ` <20081122145632.GQ6703-qrUzlfsMFqo/4alezvVtWx2eb7JE58TQ@public.gmane.org>
2008-11-23 6:34 ` Stephen Hemminger
2008-11-23 20:06 ` Andi Kleen
2008-11-21 19:57 ` Stephen Hemminger
2008-11-21 20:32 ` Michael Kerrisk
2008-11-21 20:34 ` Michael Kerrisk
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox