* want opinions on possible glitch in 2.4 network error reporting @ 2002-02-06 20:31 Chris Friesen 2002-02-06 20:56 ` Richard B. Johnson 2002-02-07 0:26 ` Alan Cox 0 siblings, 2 replies; 27+ messages in thread From: Chris Friesen @ 2002-02-06 20:31 UTC (permalink / raw) To: linux-kernel I've been looking around in the 2.4 networking stack, and I noticed that when the tulip (and no doubt many other) driver cannot put any more outgoing packets on the queue, it calls netif_stop_queue(). Then, in dev_queue_xmit() we check this flag by calling netif_queue_stopped(). My concern is that if this flag is true, we return -ENETDOWN. Is this really the proper return code for this? If anything, the network is too active. It seems to me that it would make more sense to have some kind of congestion return code rather than claiming that the network is down. I think it would make sense to return -ENOBUFS in this case, as its already listed in the sendto() man page, and the description matches the error because the command could succeed if retried. I ran into a somewhat related issue on a 2.2.16 system, where I had an app that was calling sendto() on 217000 packets/sec, even though the wire could only handle about 127000 packets/sec. I got no errors at all in sendto, even though over a third of the packets were not actually being sent. -- Chris Friesen | MailStop: 043/33/F10 Nortel Networks | work: (613) 765-0557 3500 Carling Avenue | fax: (613) 765-2986 Nepean, ON K2H 8E9 Canada | email: cfriesen@nortelnetworks.com ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: want opinions on possible glitch in 2.4 network error reporting 2002-02-06 20:31 want opinions on possible glitch in 2.4 network error reporting Chris Friesen @ 2002-02-06 20:56 ` Richard B. Johnson 2002-02-06 21:45 ` Ben Greear ` (2 more replies) 2002-02-07 0:26 ` Alan Cox 1 sibling, 3 replies; 27+ messages in thread From: Richard B. Johnson @ 2002-02-06 20:56 UTC (permalink / raw) To: Chris Friesen; +Cc: linux-kernel On Wed, 6 Feb 2002, Chris Friesen wrote: [SNIPPED...] > > I ran into a somewhat related issue on a 2.2.16 system, where I had an app that > was calling sendto() on 217000 packets/sec, even though the wire could only > handle about 127000 packets/sec. I got no errors at all in sendto, even though > over a third of the packets were not actually being sent. > In principle, sendto() will always succeed unless you provided the wrong parameters in the function call, or the machines crashes, at which time your task won't be there to receive the error code anyway. Hackers code sendto as: sendto(s,...); Professional programmers use: (void)sendto(s,...); checking the return value is useless. Note that the man-page specifically states that ENOBUFS can't happen. You cannot assume that any sendto() data actually gets on the wire, much less to its destination. With any user-datagram-protocol, both ends, sender and receiver, have to work out what they will do with missing packets and packets received out-of-order. Cheers, Dick Johnson Penguin : Linux version 2.4.1 on an i686 machine (797.90 BogoMips). I was going to compile a list of innovations that could be attributed to Microsoft. Once I realized that Ctrl-Alt-Del was handled in the BIOS, I found that there aren't any. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: want opinions on possible glitch in 2.4 network error reporting 2002-02-06 20:56 ` Richard B. Johnson @ 2002-02-06 21:45 ` Ben Greear 2002-02-06 22:23 ` Chris Friesen 2002-02-07 0:24 ` Alan Cox 2 siblings, 0 replies; 27+ messages in thread From: Ben Greear @ 2002-02-06 21:45 UTC (permalink / raw) To: root; +Cc: Chris Friesen, linux-kernel However, if you use non-blocking IO you will get EAGAIN if there is no buffer space. Blocking calls should always block untill there is buffer space. Also, just because select says the socket/poll is writable, it may not be (immediately) because you can send UDP packets that are larger than 2048 bytes, and that is the cutoff that tells select the socket is writable... I've actually sent a patch to Dave Miller to make select/poll wait untill there is 64k of buffer space (the maximum size of a UDP packet), but he is still reviewing the issue. Enjoy, Ben Richard B. Johnson wrote: > On Wed, 6 Feb 2002, Chris Friesen wrote: > > [SNIPPED...] > > > >>I ran into a somewhat related issue on a 2.2.16 system, where I had an app that >>was calling sendto() on 217000 packets/sec, even though the wire could only >>handle about 127000 packets/sec. I got no errors at all in sendto, even though >>over a third of the packets were not actually being sent. >> >> > > In principle, sendto() will always succeed unless you provided the > wrong parameters in the function call, or the machines crashes, at > which time your task won't be there to receive the error code anyway. > > Hackers code sendto as: > sendto(s,...); > Professional programmers use: > (void)sendto(s,...); > > checking the return value is useless. > > Note that the man-page specifically states that ENOBUFS can't happen. > > You cannot assume that any sendto() data actually gets on the wire, much > less to its destination. With any user-datagram-protocol, both ends, > sender and receiver, have to work out what they will do with missing > packets and packets received out-of-order. > > > Cheers, > Dick Johnson -- Ben Greear <greearb@candelatech.com> <Ben_Greear AT excite.com> President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: want opinions on possible glitch in 2.4 network error reporting 2002-02-06 20:56 ` Richard B. Johnson 2002-02-06 21:45 ` Ben Greear @ 2002-02-06 22:23 ` Chris Friesen 2002-02-07 13:44 ` Richard B. Johnson 2002-02-07 0:24 ` Alan Cox 2 siblings, 1 reply; 27+ messages in thread From: Chris Friesen @ 2002-02-06 22:23 UTC (permalink / raw) To: root; +Cc: linux-kernel "Richard B. Johnson" wrote: [snip] > Hackers code sendto as: > sendto(s,...); > Professional programmers use: > (void)sendto(s,...); > > checking the return value is useless. > > Note that the man-page specifically states that ENOBUFS can't happen. I don't know what your manpage says, but my manpage doesn't say anything about ENOBUFS not being possible. From the man page: "ENOBUFS The system was unable to allocate an internal memory block. The operation may succeed when buffers become available." > You cannot assume that any sendto() data actually gets on the wire, much > less to its destination. With any user-datagram-protocol, both ends, > sender and receiver, have to work out what they will do with missing > packets and packets received out-of-order. Hmm. I knew you couldn't assume it was delivered (the man page says so), but I didn't know it doesn't guarantee it getting to the wire. The man page says that "locally detected errors are indicated by a return value of -1". Furthermore, it also says "When the message does not fit into the send buffer of the socket, send normally blocks, unless the socket has been placed in non-blocking I/O mode." I would suggest that if the packet doesn't make it onto the wire, sendto() should either a) block until it can send the packet (or return with EAGAIN, as appropriate), or b) return an error. Chris -- Chris Friesen | MailStop: 043/33/F10 Nortel Networks | work: (613) 765-0557 3500 Carling Avenue | fax: (613) 765-2986 Nepean, ON K2H 8E9 Canada | email: cfriesen@nortelnetworks.com ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: want opinions on possible glitch in 2.4 network error reporting 2002-02-06 22:23 ` Chris Friesen @ 2002-02-07 13:44 ` Richard B. Johnson 2002-02-07 16:33 ` Gerold Jury 0 siblings, 1 reply; 27+ messages in thread From: Richard B. Johnson @ 2002-02-07 13:44 UTC (permalink / raw) To: Chris Friesen; +Cc: linux-kernel [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: TEXT/PLAIN; charset=US-ASCII, Size: 1391 bytes --] On Wed, 6 Feb 2002, Chris Friesen wrote: > "Richard B. Johnson" wrote: > > [snip] > > Hackers code sendto as: > > sendto(s,...); > > Professional programmers use: > > (void)sendto(s,...); > > > > checking the return value is useless. > > > > Note that the man-page specifically states that ENOBUFS can't happen. > > I don't know what your manpage says, but my manpage doesn't say anything about > ENOBUFS not being possible. From the man page: > > "ENOBUFS The system was unable to allocate an internal memory block. The > operation may succeed when buffers become available." ENOBUFS The output queue for a network interface was full. This generally indicates that the interface has stopped sending, but may be caused by transient congestion. (This cannot occur in Linux, packets are just silently dropped when a device queue over flows.) Linux Man Page July 1999 1 Script done on Thu Feb 7 08:35:39 2002 Distributed with RedHat 7 Cheers, Dick Johnson Penguin : Linux version 2.4.1 on an i686 machine (797.90 BogoMips). I was going to compile a list of innovations that could be attributed to Microsoft. Once I realized that Ctrl-Alt-Del was handled in the BIOS, I found that there aren't any. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: want opinions on possible glitch in 2.4 network error reporting 2002-02-07 13:44 ` Richard B. Johnson @ 2002-02-07 16:33 ` Gerold Jury 0 siblings, 0 replies; 27+ messages in thread From: Gerold Jury @ 2002-02-07 16:33 UTC (permalink / raw) To: root; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 726 bytes --] This is off topic but close. UDP packets are not dropped when the UDP socket is shutdown(socket,0); for receive at least on 2.4.17. attached is a small proof. Anyone with a hint or a fix ? Thanks Gerold On Thursday 07 February 2002 14:44, Richard B. Johnson wrote: > > ENOBUFS > The output queue for a network interface was full. > This generally indicates that the interface has > stopped sending, but may be caused by transient > congestion. (This cannot occur in Linux, packets > are just silently dropped when a device queue over > flows.) > > > Linux Man Page July 1999 1 > [-- Attachment #2: udpshutdown.c --] [-- Type: text/x-c++, Size: 1275 bytes --] #include <stdio.h> #include <errno.h> #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> char hello[] = "hello socket"; char buf[2048]; int main( int argc, char *argv[] ) { int iFd, ret; socklen_t fromlen; short port = 15678; struct sockaddr_in addr, from; if((iFd = socket(AF_INET, SOCK_DGRAM, 0)) < 0) { perror("create socket"); exit( 0 ); } addr.sin_family = AF_INET; addr.sin_port = htons( port ); addr.sin_addr.s_addr = htonl( INADDR_LOOPBACK ); if (bind( iFd, (struct sockaddr *)&addr, sizeof( addr ) ) < 0) { fprintf(stderr,"bind to %d failed %s\n",port,strerror(errno)); exit( 0 ); } shutdown( iFd, 0 ); // shutdown receive // addr.sin_addr.s_addr = htonl( INADDR_LOOPBACK ); if( sendto( iFd, hello, sizeof( hello ), 0, (struct sockaddr *)&addr, sizeof( addr ) ) < 0 ) { perror("sendto"); exit( 0 ); } ret = recvfrom( iFd, buf, sizeof( buf ), 0, (struct sockaddr *)&from, &fromlen ); if( ret < 0 ) { perror("recvfrom"); exit( 0 ); } printf( "received %d bytes [%s]\n", ret, buf ); shutdown( iFd, 1 ); // shutdown send if( sendto( iFd, hello, sizeof( hello ), 0, (struct sockaddr *)&addr, sizeof( addr ) ) < 0 ) { perror("sendto"); exit( 0 ); } return 0; } ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: want opinions on possible glitch in 2.4 network error reporting 2002-02-06 20:56 ` Richard B. Johnson 2002-02-06 21:45 ` Ben Greear 2002-02-06 22:23 ` Chris Friesen @ 2002-02-07 0:24 ` Alan Cox 2 siblings, 0 replies; 27+ messages in thread From: Alan Cox @ 2002-02-07 0:24 UTC (permalink / raw) To: root; +Cc: Chris Friesen, linux-kernel > Hackers code sendto as: > sendto(s,...); > Professional programmers use: > (void)sendto(s,...); Remind me never to hire one of your professional programmers > checking the return value is useless. Not so. For a large number of situations its extremely informative. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: want opinions on possible glitch in 2.4 network error reporting 2002-02-06 20:31 want opinions on possible glitch in 2.4 network error reporting Chris Friesen 2002-02-06 20:56 ` Richard B. Johnson @ 2002-02-07 0:26 ` Alan Cox 2002-02-07 1:51 ` Ion Badulescu 2002-02-07 9:22 ` Luis Garces 1 sibling, 2 replies; 27+ messages in thread From: Alan Cox @ 2002-02-07 0:26 UTC (permalink / raw) To: Chris Friesen; +Cc: linux-kernel > I ran into a somewhat related issue on a 2.2.16 system, where I had an app that > was calling sendto() on 217000 packets/sec, even though the wire could only > handle about 127000 packets/sec. I got no errors at all in sendto, even though > over a third of the packets were not actually being sent. That is correct UDP behaviour ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: want opinions on possible glitch in 2.4 network error reporting 2002-02-07 0:26 ` Alan Cox @ 2002-02-07 1:51 ` Ion Badulescu 2002-02-07 2:08 ` Alan Cox 2002-02-07 4:23 ` Ben Greear 2002-02-07 9:22 ` Luis Garces 1 sibling, 2 replies; 27+ messages in thread From: Ion Badulescu @ 2002-02-07 1:51 UTC (permalink / raw) To: Alan Cox; +Cc: linux-kernel, Chris Friesen On Thu, 7 Feb 2002 00:26:20 +0000 (GMT), Alan Cox <alan@lxorguk.ukuu.org.uk> wrote: >> I ran into a somewhat related issue on a 2.2.16 system, where I had an app that >> was calling sendto() on 217000 packets/sec, even though the wire could only >> handle about 127000 packets/sec. I got no errors at all in sendto, even though >> over a third of the packets were not actually being sent. > > That is correct UDP behaviour This is totally untrue, unless the socket doing non-blocking I/O -- and even then you get -1 and EAGAIN from sendto. Otherwise sendto is very much a blocking operation which will block until there is enough space in socket buffer to store the data. From there, there is no way to "lose" that data before it hits the wire, unless of course the network driver is broken and doesn't plug the upper layers when its TX queue is full. Think of it: if what you said were true, NFS over UDP would be totally useless. But it's not, so if UDP data gets lost before it hits the wire, it's usually a bug in the network driver. >From the limited testing I just ran, I appears that starfire and 3c59x handle this correctly, whereas tulip always loses a small number of packets during a UDP storm. ttcp -us[rt] is very useful for such testing... Ion -- It is better to keep your mouth shut and be thought a fool, than to open it and remove all doubt. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: want opinions on possible glitch in 2.4 network error reporting 2002-02-07 1:51 ` Ion Badulescu @ 2002-02-07 2:08 ` Alan Cox 2002-02-07 2:09 ` Ion Badulescu 2002-02-07 4:21 ` Ben Greear 2002-02-07 4:23 ` Ben Greear 1 sibling, 2 replies; 27+ messages in thread From: Alan Cox @ 2002-02-07 2:08 UTC (permalink / raw) To: Ion Badulescu; +Cc: Alan Cox, linux-kernel, Chris Friesen > > That is correct UDP behaviour > > This is totally untrue, unless the socket doing non-blocking I/O -- and > even then you get -1 and EAGAIN from sendto. Not the case. > there is no way to "lose" that data before it hits the wire, unless of > course the network driver is broken and doesn't plug the upper layers when > its TX queue is full. UDP is not flow controlled. > Think of it: if what you said were true, NFS over UDP would be totally > useless. But it's not, so if UDP data gets lost before it hits the wire, > it's usually a bug in the network driver. NFS does UDP flow control of its own. If it didnt it would indeed be broken. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: want opinions on possible glitch in 2.4 network error reporting 2002-02-07 2:08 ` Alan Cox @ 2002-02-07 2:09 ` Ion Badulescu 2002-02-07 2:34 ` Alan Cox 2002-02-07 4:21 ` Ben Greear 1 sibling, 1 reply; 27+ messages in thread From: Ion Badulescu @ 2002-02-07 2:09 UTC (permalink / raw) To: Alan Cox; +Cc: linux-kernel, Chris Friesen On Thu, 7 Feb 2002, Alan Cox wrote: > > there is no way to "lose" that data before it hits the wire, unless of > > course the network driver is broken and doesn't plug the upper layers when > > its TX queue is full. > > UDP is not flow controlled. No, of course not, but this has *nothing* to do with UDP. The IP socket itself is flow controlled, and so is the TX queue of the network driver. Let me give you another example: ping -f. If what you said were true, ping -f would send packets as fast as the CPU can generate into the black hole called an IP raw socket, right? Well, that just doesn't happen, because sendto/sendmsg will block until there is enough space in the TX queue of the raw socket. I'll state again: if data (UDP or otherwise) is lost after sendto() returns success but before it hits the wire, something is BROKEN in that IP stack. Ion -- It is better to keep your mouth shut and be thought a fool, than to open it and remove all doubt. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: want opinions on possible glitch in 2.4 network error reporting 2002-02-07 2:09 ` Ion Badulescu @ 2002-02-07 2:34 ` Alan Cox 2002-02-07 2:54 ` Ion Badulescu 0 siblings, 1 reply; 27+ messages in thread From: Alan Cox @ 2002-02-07 2:34 UTC (permalink / raw) To: Ion Badulescu; +Cc: Alan Cox, linux-kernel, Chris Friesen > > UDP is not flow controlled. > > No, of course not, but this has *nothing* to do with UDP. The IP socket > itself is flow controlled, and so is the TX queue of the network driver. It is not flow controlled > Let me give you another example: ping -f. If what you said were true, ping -f > would send packets as fast as the CPU can generate into the black hole > called an IP raw socket, right? Well, that just doesn't happen, because Wrong. man ping. ping -f doesn't do what you apparently think it does. Alan ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: want opinions on possible glitch in 2.4 network error reporting 2002-02-07 2:34 ` Alan Cox @ 2002-02-07 2:54 ` Ion Badulescu 2002-02-07 11:11 ` Alan Cox 0 siblings, 1 reply; 27+ messages in thread From: Ion Badulescu @ 2002-02-07 2:54 UTC (permalink / raw) To: Alan Cox; +Cc: linux-kernel, Chris Friesen On Thu, 7 Feb 2002, Alan Cox wrote: > Wrong. man ping. ping -f doesn't do what you apparently think it does. strace ping, you'll see it doing a setsockopt(7, SOL_SOCKET, SO_SNDTIMEO, [1], 8) = 0 on its socket. That's about the only way (aside from using a TBF queue, and other non FIFO queues) you can lose data from a socket's queue. Getting back to the NFS/UDP example: yes, NFS has its own flow control, but that's not the point. The reason NFS/UDP works so well with large NFS packets over a fully-switched *local* subnet is precisely because NFS's flow control is almost never exercised in that case. Data simply doesn't get lost -- never in the UDP socket's queue, and very rarely on the wire. But you don't need to believe me. Just run the ttcp -uts test and explain how come all the data makes it to the other end (again, over a fully-switched local subnet) if: 1. ttcp has no clue about the wire speed (which it obviously doesn't) so it can't do rate limiting 2. the UDP socket simply discards data when some internal queue fills up, without blocking sendto() and without returning an error. Moreover: please strace -T that ttcp -uts test, and notice how the time for the system call goes up by 2 orders of magnitude (i.e. it blocks) as soon as the socket queue fill up. Ion -- It is better to keep your mouth shut and be thought a fool, than to open it and remove all doubt. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: want opinions on possible glitch in 2.4 network error reporting 2002-02-07 2:54 ` Ion Badulescu @ 2002-02-07 11:11 ` Alan Cox 2002-02-08 16:11 ` Pavel Machek 0 siblings, 1 reply; 27+ messages in thread From: Alan Cox @ 2002-02-07 11:11 UTC (permalink / raw) To: Ion Badulescu; +Cc: Alan Cox, linux-kernel, Chris Friesen > > Wrong. man ping. ping -f doesn't do what you apparently think it does. > > strace ping, you'll see it doing a > setsockopt(7, SOL_SOCKET, SO_SNDTIMEO, [1], 8) = 0 > > on its socket. Read the ping manual page. Then when you understand what ping -f does come back and have a useful conversation. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: want opinions on possible glitch in 2.4 network error reporting 2002-02-07 11:11 ` Alan Cox @ 2002-02-08 16:11 ` Pavel Machek 2002-02-08 21:39 ` Ion Badulescu 0 siblings, 1 reply; 27+ messages in thread From: Pavel Machek @ 2002-02-08 16:11 UTC (permalink / raw) To: Alan Cox; +Cc: Ion Badulescu, linux-kernel, Chris Friesen Hi! > > > Wrong. man ping. ping -f doesn't do what you apparently think it does. > > > > strace ping, you'll see it doing a > > setsockopt(7, SOL_SOCKET, SO_SNDTIMEO, [1], 8) = 0 > > > > on its socket. > > Read the ping manual page. Then when you understand what ping -f does > come back and have a useful conversation. But I guess it *would* be usefull to have -F option saying "feed data as fast as possible", right? And it would be nice if this option did not eat 100% cpu when possible, right? So what he is asking for is pretty usefull behaviour. Pavel -- (about SSSCA) "I don't say this lightly. However, I really think that the U.S. no longer is classifiable as a democracy, but rather as a plutocracy." --hpa ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: want opinions on possible glitch in 2.4 network error reporting 2002-02-08 16:11 ` Pavel Machek @ 2002-02-08 21:39 ` Ion Badulescu 0 siblings, 0 replies; 27+ messages in thread From: Ion Badulescu @ 2002-02-08 21:39 UTC (permalink / raw) To: Pavel Machek; +Cc: Alan Cox, linux-kernel, Chris Friesen On Fri, 8 Feb 2002, Pavel Machek wrote: > Hi! > > > > > Wrong. man ping. ping -f doesn't do what you apparently think it does. > > > > > > strace ping, you'll see it doing a > > > setsockopt(7, SOL_SOCKET, SO_SNDTIMEO, [1], 8) = 0 > > > > > > on its socket. > > > > Read the ping manual page. Then when you understand what ping -f does > > come back and have a useful conversation. > > But I guess it *would* be usefull to have -F option saying "feed data > as fast as possible", right? And it would be nice if this option did > not eat 100% cpu when possible, right? > > So what he is asking for is pretty usefull behaviour. I'm not asking for it. I'm saying this is what we already have. Too bad people won't listen -- and yes I know ping -f was a bad example. A blocking sendto() *will* block (surprise surprise), even though it *might* throw the data away later on. Indeed, as Davem stated, a UDP socket will lose data under memory pressure. In real life this hardly ever happens, however, even with large message sizes: I just tested with sizes up to 52000, which is just about as large as you'll ever see in real environments. Also: I'm just dying to be enlightened about how a dumb program like ttcp -u, doing a totally dumb "while (1) sendto();", can manage to score sending rates identical to the raw wire speed, if indeed sendto() never blocks and simply throws away the data: apollo:/# ttcp -utsl 53000 zeus ttcp-t: buflen=53000, nbuf=2048, align=16384/0, port=5001 udp -> sybase2 ttcp-t: socket ttcp-t: 108544000 bytes in 9.02 real seconds = 11745.26 KB/sec +++ ttcp-t: 2054 I/O calls, msec/call = 4.50, calls/sec = 227.59 ttcp-t: 0.0user 0.2sys 0:09real 2% 0i+0d 0maxrss 0+13pf 0+0csw zeus:/var/lib/pgsql# ttcp -ursl 53000 ttcp-r: buflen=53000, nbuf=2048, align=16384/0, port=5001 udp ttcp-r: socket ttcp-r: 108544000 bytes in 9.03 real seconds = 11741.76 KB/sec +++ ttcp-r: 2050 I/O calls, msec/call = 4.51, calls/sec = 227.08 ttcp-r: 0.0user 0.1sys 0:09real 1% 0i+0d 0maxrss 0+12pf 0+0csw 11745KB/sec sounds suspiciously close to the 100Mb/sec wire speed. and, for reference, just to make sure ttcp wasn't lying to me: zeus:/var/lib/pgsql# iptables -L -n -v Chain INPUT (policy ACCEPT 7217K packets, 3137M bytes) pkts bytes target prot opt in out source destination 2051 108M udp -- * * 10.2.10.216 0.0.0.0/0 udp dpt:5001 But no, it's so much easier to incompletely quote a message and then claim the other person has no idea about what he's talking about. Yes, Alan, that's precisely what you did. Ion -- It is better to keep your mouth shut and be thought a fool, than to open it and remove all doubt. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: want opinions on possible glitch in 2.4 network error reporting 2002-02-07 2:08 ` Alan Cox 2002-02-07 2:09 ` Ion Badulescu @ 2002-02-07 4:21 ` Ben Greear 2002-02-07 4:38 ` David S. Miller 1 sibling, 1 reply; 27+ messages in thread From: Ben Greear @ 2002-02-07 4:21 UTC (permalink / raw) To: Alan Cox; +Cc: Ion Badulescu, linux-kernel, Chris Friesen Alan Cox wrote: >>>That is correct UDP behaviour >>> >>This is totally untrue, unless the socket doing non-blocking I/O -- and >>even then you get -1 and EAGAIN from sendto. >> > > Not the case. Are you claiming that you will never see -1 and EAGAIN on a nonblocking UDP socket with sendto? If so, I'll bet you a kernel patch that you are not correct (I get to write the patch and you include it :) ) > > >>there is no way to "lose" that data before it hits the wire, unless of >>course the network driver is broken and doesn't plug the upper layers when >>its TX queue is full. >> > > UDP is not flow controlled. If it makes it through sendto, where can it be dropped before it hits the wire? I doubt the socket buffers are anthing other than FIFO, and the same goes for the ethernet/device queue. Since we (can) know at sendto whether or not the PDU was enqueued for transmit, it seems trivial to notify user space of success/failure of the local network stack, and I believe this is what is done. Now granted, it can be dropped anywhere outside of the machine, but I can see no good reason to drop it inside the machine. -- Ben Greear <greearb@candelatech.com> <Ben_Greear AT excite.com> President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: want opinions on possible glitch in 2.4 network error reporting 2002-02-07 4:21 ` Ben Greear @ 2002-02-07 4:38 ` David S. Miller 2002-02-07 4:56 ` Ben Greear 0 siblings, 1 reply; 27+ messages in thread From: David S. Miller @ 2002-02-07 4:38 UTC (permalink / raw) To: greearb; +Cc: alan, ionut, linux-kernel, cfriesen From: Ben Greear <greearb@candelatech.com> Date: Wed, 06 Feb 2002 21:21:09 -0700 Alan Cox wrote: > UDP is not flow controlled. If it makes it through sendto, where can it be dropped before it hits the wire? If the packet ends up being fragmented on the way out and the socket cannot take on the allocation against it's buffer space. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: want opinions on possible glitch in 2.4 network error reporting 2002-02-07 4:38 ` David S. Miller @ 2002-02-07 4:56 ` Ben Greear 0 siblings, 0 replies; 27+ messages in thread From: Ben Greear @ 2002-02-07 4:56 UTC (permalink / raw) To: David S. Miller; +Cc: alan, ionut, linux-kernel, cfriesen David S. Miller wrote: > From: Ben Greear <greearb@candelatech.com> > Date: Wed, 06 Feb 2002 21:21:09 -0700 > > Alan Cox wrote: > > > UDP is not flow controlled. > > If it makes it through sendto, where can it be dropped before it > hits the wire? > > If the packet ends up being fragmented on the way out and the socket > cannot take on the allocation against it's buffer space. In the fragmentation case (at least over 1500 MTU ethernet), the headers are a relatively small portion of the total PDU, right? So, if we reserved 10-15% (or whatever it works out to) that should make it so we never drop the packet due to fragmentation, right? I can't see any reason not to reserve this space, because sending a little later is definately better than going through the work to send it sooner but then having to drop it down in the local kernel. We may only want to reserve the buffers when they are fairly large (ie not you your very small and slow embedded devices where memory is very precious). -- Ben Greear <greearb@candelatech.com> <Ben_Greear AT excite.com> President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: want opinions on possible glitch in 2.4 network error reporting 2002-02-07 1:51 ` Ion Badulescu 2002-02-07 2:08 ` Alan Cox @ 2002-02-07 4:23 ` Ben Greear 2002-02-07 4:37 ` Ion Badulescu 1 sibling, 1 reply; 27+ messages in thread From: Ben Greear @ 2002-02-07 4:23 UTC (permalink / raw) To: Ion Badulescu; +Cc: Alan Cox, linux-kernel, Chris Friesen >>From the limited testing I just ran, I appears that starfire and 3c59x > handle this correctly, whereas tulip always loses a small number of > packets during a UDP storm. ttcp -us[rt] is very useful for such > testing... It would be interesting to see which side is dropping? Have you coorelated ethernet driver counters to your sendto count? > > Ion > > -- Ben Greear <greearb@candelatech.com> <Ben_Greear AT excite.com> President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: want opinions on possible glitch in 2.4 network error reporting 2002-02-07 4:23 ` Ben Greear @ 2002-02-07 4:37 ` Ion Badulescu 0 siblings, 0 replies; 27+ messages in thread From: Ion Badulescu @ 2002-02-07 4:37 UTC (permalink / raw) To: Ben Greear; +Cc: Alan Cox, linux-kernel, Chris Friesen On Wed, 6 Feb 2002, Ben Greear wrote: > >>From the limited testing I just ran, I appears that starfire and 3c59x > > handle this correctly, whereas tulip always loses a small number of > > packets during a UDP storm. ttcp -us[rt] is very useful for such > > testing... > > It would be interesting to see which side is dropping? Have you > coorelated ethernet driver counters to your sendto count? It's hard for me to do it right now, because I don't have them in isolation (they do NFS and other stuff), and I don't have iptables support compiled into the kernel running the tulip. However: starfire -> 3c59x 3c59x -> starfire tulip -> 3c59x tulip -> starfire never lose data on a quiescent network: ttcp-t: 83886080 bytes in 7.04 real seconds = 11640.36 KB/sec +++ ttcp-r: 83886080 bytes in 7.04 real seconds = 11641.10 KB/sec +++ whereas 3c59x -> tulip starfire -> tulip *always* lose several packets: ttcp-t: 16777216 bytes in 1.40 real seconds = 11717.40 KB/sec +++ ttcp-r: 16769024 bytes in 1.40 real seconds = 11679.39 KB/sec +++ and ttcp-t: 33554432 bytes in 2.80 real seconds = 11714.81 KB/sec +++ ttcp-r: 33456128 bytes in 2.80 real seconds = 11660.28 KB/sec +++ and ttcp-t: 83886080 bytes in 7.00 real seconds = 11704.40 KB/sec +++ ttcp-r: 83722240 bytes in 7.00 real seconds = 11674.67 KB/sec +++ So I would tend to blame it on the tulip -- but the Rx side of it, not the Tx, which this discussion was about... Ion -- It is better to keep your mouth shut and be thought a fool, than to open it and remove all doubt. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: want opinions on possible glitch in 2.4 network error reporting 2002-02-07 0:26 ` Alan Cox 2002-02-07 1:51 ` Ion Badulescu @ 2002-02-07 9:22 ` Luis Garces 1 sibling, 0 replies; 27+ messages in thread From: Luis Garces @ 2002-02-07 9:22 UTC (permalink / raw) To: linux-kernel Alan Cox wrote: >> I ran into a somewhat related issue on a 2.2.16 system, where I >> had an app that was calling sendto() on 217000 packets/sec, even >> though the wire could only handle about 127000 packets/sec. I >> got no errors at all in sendto, even though over a third of the >> packets were not actually being sent. >> > > That is correct UDP behaviour - > Yes, TCP provides a reliable point-to-point path, and UDP doesn't. The problem is considering where does this unreliability starts in the UDP path. In Alan's opinion (I think) it starts in the very moment data is passed to the call to sendto() (i.e, includes the kernel in the unreliable UDP path). Perhaps it is a little sad to see the kernel as something lossy, but I think it's the nature of UDP. -- Luis **** ^ permalink raw reply [flat|nested] 27+ messages in thread
[parent not found: <3C6192A5.911D5B4F@nortelnetworks.com.suse.lists.linux.kernel>]
* Re: want opinions on possible glitch in 2.4 network error reporting [not found] <3C6192A5.911D5B4F@nortelnetworks.com.suse.lists.linux.kernel> @ 2002-02-07 0:06 ` Andi Kleen 2002-02-07 15:59 ` Chris Friesen 0 siblings, 1 reply; 27+ messages in thread From: Andi Kleen @ 2002-02-07 0:06 UTC (permalink / raw) To: Chris Friesen; +Cc: linux-kernel Chris Friesen <cfriesen@nortelnetworks.com> writes: > I've been looking around in the 2.4 networking stack, and I noticed that when > the tulip (and no doubt many other) driver cannot put any more outgoing packets > on the queue, it calls netif_stop_queue(). Then, in dev_queue_xmit() we check > this flag by calling netif_queue_stopped(). My concern is that if this flag is > true, we return -ENETDOWN. Is this really the proper return code for this? If > anything, the network is too active. It seems to me that it would make more > sense to have some kind of congestion return code rather than claiming that the > network is down. The ENETDOWN path you're seeing only applies to queueless devices (like loopback or a tunnel device). These should only set the queued stopped flag when something is terrible wrong. All real network devices have a queue and go through the qdisc. > > I think it would make sense to return -ENOBUFS in this case, as its already > listed in the sendto() man page, and the description matches the error because > the command could succeed if retried. > > I ran into a somewhat related issue on a 2.2.16 system, where I had an app that > was calling sendto() on 217000 packets/sec, even though the wire could only > handle about 127000 packets/sec. I got no errors at all in sendto, even though > over a third of the packets were not actually being sent. The qdisc queue acts like an IP network and deletes unnecessary packets. There is no provision to block when it fills because that would have many sideeffects and complicate the stack a lot. There is an return code though that is passed up when the queue fills (NET_XMIT_DROP or NET_XMIT_CN), but it's currently only used by TCP but not passed to user space for UPD/RAW. It could be probably done with a special socket option if there is a clear need. -Andi ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: want opinions on possible glitch in 2.4 network error reporting 2002-02-07 0:06 ` Andi Kleen @ 2002-02-07 15:59 ` Chris Friesen 2002-02-07 16:01 ` Andi Kleen 0 siblings, 1 reply; 27+ messages in thread From: Chris Friesen @ 2002-02-07 15:59 UTC (permalink / raw) To: Andi Kleen; +Cc: linux-kernel Andi Kleen wrote: > > Chris Friesen <cfriesen@nortelnetworks.com> writes: > > > I've been looking around in the 2.4 networking stack, and I noticed that when > > the tulip (and no doubt many other) driver cannot put any more outgoing packets > > on the queue, it calls netif_stop_queue(). Then, in dev_queue_xmit() we check > > this flag by calling netif_queue_stopped(). My concern is that if this flag is > > true, we return -ENETDOWN. Is this really the proper return code for this? If > > anything, the network is too active. It seems to me that it would make more > > sense to have some kind of congestion return code rather than claiming that the > > network is down. > > The ENETDOWN path you're seeing only applies to queueless devices (like > loopback or a tunnel device). These should only set the queued stopped > flag when something is terrible wrong. > > All real network devices have a queue and go through the qdisc. Okay, I must be missing something, so can you enlighten me? I can't figure out where the qdisc is attached to the ethernet device. Chris -- Chris Friesen | MailStop: 043/33/F10 Nortel Networks | work: (613) 765-0557 3500 Carling Avenue | fax: (613) 765-2986 Nepean, ON K2H 8E9 Canada | email: cfriesen@nortelnetworks.com ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: want opinions on possible glitch in 2.4 network error reporting 2002-02-07 15:59 ` Chris Friesen @ 2002-02-07 16:01 ` Andi Kleen 0 siblings, 0 replies; 27+ messages in thread From: Andi Kleen @ 2002-02-07 16:01 UTC (permalink / raw) To: Chris Friesen; +Cc: Andi Kleen, linux-kernel On Thu, Feb 07, 2002 at 10:59:56AM -0500, Chris Friesen wrote: > Okay, I must be missing something, so can you enlighten me? I can't figure out > where the qdisc is attached to the ethernet device. net/core/dev.c:dev_open -> dev_activate. -Andi ^ permalink raw reply [flat|nested] 27+ messages in thread
[parent not found: <E16Ydys-0007D6-00@the-village.bc.nu.suse.lists.linux.kernel>]
[parent not found: <Pine.LNX.4.44.0202062101390.4832-100000@age.cs.columbia.edu.suse.lists.linux.kernel>]
* Re: want opinions on possible glitch in 2.4 network error reporting [not found] ` <Pine.LNX.4.44.0202062101390.4832-100000@age.cs.columbia.edu.suse.lists.linux.kernel> @ 2002-02-07 2:47 ` Andi Kleen 2002-02-07 6:25 ` Chris Friesen 0 siblings, 1 reply; 27+ messages in thread From: Andi Kleen @ 2002-02-07 2:47 UTC (permalink / raw) To: Ion Badulescu; +Cc: linux-kernel Ion Badulescu <ionut@cs.columbia.edu> writes: > I'll state again: if data (UDP or otherwise) is lost after sendto() > returns success but before it hits the wire, something is BROKEN in that > IP stack. Your proposal would break select(). It would require UDP sendmsg to block when the TX queue is full. Most applications using select do not send the socket non blocking. If they select for writing and the kernel signals the socket writable they expect not to block in the write. As long as the only thing controlling the blocking is the per socket send buffer that works out as long as the application is careful enough not to fill its send buffer. If you would put the TX queue into the blocking equation too this cannot be guaranteed anymore because the TX queue is shared between all local processes and even forwarding. You would get random blocking on select based applications, breaking them. I BTW had a proposal for blocking the sender in TX some time ago but it was luckily shot down by people who knew better than me. -Andi ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: want opinions on possible glitch in 2.4 network error reporting 2002-02-07 2:47 ` Andi Kleen @ 2002-02-07 6:25 ` Chris Friesen 0 siblings, 0 replies; 27+ messages in thread From: Chris Friesen @ 2002-02-07 6:25 UTC (permalink / raw) To: Andi Kleen; +Cc: Ion Badulescu, linux-kernel Andi Kleen wrote: > > Ion Badulescu <ionut@cs.columbia.edu> writes: > > I'll state again: if data (UDP or otherwise) is lost after sendto() > > returns success but before it hits the wire, something is BROKEN in that > > IP stack. > > Your proposal would break select(). It would require UDP sendmsg to block > when the TX queue is full. Most applications using select do > not send the socket non blocking. If they select for writing and the > kernel signals the socket writable they expect not to block in the write. > As long as the only thing controlling the blocking is the per socket > send buffer that works out as long as the application is careful enough > not to fill its send buffer. If you would put the TX queue into the > blocking equation too this cannot be guaranteed anymore because the TX queue > is shared between all local processes and even forwarding. You would > get random blocking on select based applications, breaking them. I don't see the problem. So sendto() blocks if there is no room on the socket buffer. Fine. So if there's room on the socket buffer we take the packet and put in on the buffer, and sendto() returns. Now, for each socket we've got a buffer of packets that want to get onto the device driver tx queue. So we use some kind of algorithm to pick which packets to move from the group of socket buffers to the device driver tx queue. If the app calls sendto() before there is space in the socket buffer, then sendto() blocks. select() should return whether or not there is space in the socket buffer. Eventually, every packet that gets put into a socket buffer makes it out onto the wire. Congestion is dealt with by leaving packets in the socket buffers until they can be guaranteed a spot in the device tx queue. I assume we would try and add it to the tx queue, and remove it from the socket buffer if the add succeeds. I just don't see why sendto() would accept the packet and then later on it gets dropped. Chris -- Chris Friesen | MailStop: 043/33/F10 Nortel Networks | work: (613) 765-0557 3500 Carling Avenue | fax: (613) 765-2986 Nepean, ON K2H 8E9 Canada | email: cfriesen@nortelnetworks.com ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2002-02-08 21:40 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-02-06 20:31 want opinions on possible glitch in 2.4 network error reporting Chris Friesen
2002-02-06 20:56 ` Richard B. Johnson
2002-02-06 21:45 ` Ben Greear
2002-02-06 22:23 ` Chris Friesen
2002-02-07 13:44 ` Richard B. Johnson
2002-02-07 16:33 ` Gerold Jury
2002-02-07 0:24 ` Alan Cox
2002-02-07 0:26 ` Alan Cox
2002-02-07 1:51 ` Ion Badulescu
2002-02-07 2:08 ` Alan Cox
2002-02-07 2:09 ` Ion Badulescu
2002-02-07 2:34 ` Alan Cox
2002-02-07 2:54 ` Ion Badulescu
2002-02-07 11:11 ` Alan Cox
2002-02-08 16:11 ` Pavel Machek
2002-02-08 21:39 ` Ion Badulescu
2002-02-07 4:21 ` Ben Greear
2002-02-07 4:38 ` David S. Miller
2002-02-07 4:56 ` Ben Greear
2002-02-07 4:23 ` Ben Greear
2002-02-07 4:37 ` Ion Badulescu
2002-02-07 9:22 ` Luis Garces
[not found] <3C6192A5.911D5B4F@nortelnetworks.com.suse.lists.linux.kernel>
2002-02-07 0:06 ` Andi Kleen
2002-02-07 15:59 ` Chris Friesen
2002-02-07 16:01 ` Andi Kleen
[not found] <E16Ydys-0007D6-00@the-village.bc.nu.suse.lists.linux.kernel>
[not found] ` <Pine.LNX.4.44.0202062101390.4832-100000@age.cs.columbia.edu.suse.lists.linux.kernel>
2002-02-07 2:47 ` Andi Kleen
2002-02-07 6:25 ` Chris Friesen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox