want opinions on possible glitch in 2.4 network error reporting

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* want opinions on possible glitch in 2.4 network error reporting
@ 2002-02-06 20:31 Chris Friesen
  2002-02-06 20:56 ` Richard B. Johnson
  2002-02-07  0:26 ` Alan Cox
  0 siblings, 2 replies; 27+ messages in thread
From: Chris Friesen @ 2002-02-06 20:31 UTC (permalink / raw)
  To: linux-kernel

I've been looking around in the 2.4 networking stack, and I noticed that when
the tulip (and no doubt many other) driver cannot put any more outgoing packets
on the queue, it calls netif_stop_queue().  Then, in dev_queue_xmit() we check
this flag by calling netif_queue_stopped().  My concern is that if this flag is
true, we return -ENETDOWN.  Is this really the proper return code for this? If
anything, the network is too active.  It seems to me that it would make more
sense to have some kind of congestion return code rather than claiming that the
network is down.

I think it would make sense to return -ENOBUFS in this case, as its already
listed in the sendto() man page, and the description matches the error because
the command could succeed if retried.

I ran into a somewhat related issue on a 2.2.16 system, where I had an app that
was calling sendto() on 217000 packets/sec, even though the wire could only
handle about 127000 packets/sec.  I got no errors at all in sendto, even though
over a third of the packets were not actually being sent.

-- 
Chris Friesen                    | MailStop: 043/33/F10  
Nortel Networks                  | work: (613) 765-0557
3500 Carling Avenue              | fax:  (613) 765-2986
Nepean, ON K2H 8E9 Canada        | email: cfriesen@nortelnetworks.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: want opinions on possible glitch in 2.4 network error reporting
  2002-02-06 20:31 want opinions on possible glitch in 2.4 network error reporting Chris Friesen
@ 2002-02-06 20:56 ` Richard B. Johnson
  2002-02-06 21:45   ` Ben Greear
                     ` (2 more replies)
  2002-02-07  0:26 ` Alan Cox
  1 sibling, 3 replies; 27+ messages in thread
From: Richard B. Johnson @ 2002-02-06 20:56 UTC (permalink / raw)
  To: Chris Friesen; +Cc: linux-kernel

On Wed, 6 Feb 2002, Chris Friesen wrote:

[SNIPPED...]

> 
> I ran into a somewhat related issue on a 2.2.16 system, where I had an app that
> was calling sendto() on 217000 packets/sec, even though the wire could only
> handle about 127000 packets/sec.  I got no errors at all in sendto, even though
> over a third of the packets were not actually being sent.
> 

In principle, sendto() will always succeed unless you provided the
wrong parameters in the function call, or the machines crashes, at
which time your task won't be there to receive the error code anyway.

Hackers code sendto as:
	sendto(s,...);
Professional programmers use:
	(void)sendto(s,...);

checking the return value is useless.

Note that the man-page specifically states that ENOBUFS can't happen.

You cannot assume that any sendto() data actually gets on the wire, much
less to its destination. With any user-datagram-protocol, both ends,
sender and receiver, have to work out what they will do with missing
packets and packets received out-of-order.

Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (797.90 BogoMips).

    I was going to compile a list of innovations that could be
    attributed to Microsoft. Once I realized that Ctrl-Alt-Del
    was handled in the BIOS, I found that there aren't any.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: want opinions on possible glitch in 2.4 network error reporting
  2002-02-06 20:56 ` Richard B. Johnson
@ 2002-02-06 21:45   ` Ben Greear
  2002-02-06 22:23   ` Chris Friesen
  2002-02-07  0:24   ` Alan Cox
  2 siblings, 0 replies; 27+ messages in thread
From: Ben Greear @ 2002-02-06 21:45 UTC (permalink / raw)
  To: root; +Cc: Chris Friesen, linux-kernel

However, if you use non-blocking IO you will get EAGAIN if
there is no buffer space.  Blocking calls should always
block untill there is buffer space.

Also, just because select says the socket/poll is writable, it
may not be (immediately) because you can send UDP packets
that are larger than 2048 bytes, and that is the cutoff that
tells select the socket is writable...

I've actually sent a patch to Dave Miller to make select/poll
wait untill there is 64k of buffer space (the maximum size of
a UDP packet), but he is still reviewing the issue.


Enjoy,
Ben

Richard B. Johnson wrote:

> On Wed, 6 Feb 2002, Chris Friesen wrote:
> 
> [SNIPPED...]
> 
> 
> 
>>I ran into a somewhat related issue on a 2.2.16 system, where I had an app that
>>was calling sendto() on 217000 packets/sec, even though the wire could only
>>handle about 127000 packets/sec.  I got no errors at all in sendto, even though
>>over a third of the packets were not actually being sent.
>>
>>
> 
> In principle, sendto() will always succeed unless you provided the
> wrong parameters in the function call, or the machines crashes, at
> which time your task won't be there to receive the error code anyway.
> 
> Hackers code sendto as:
> 	sendto(s,...);
> Professional programmers use:
> 	(void)sendto(s,...);
> 
> checking the return value is useless.
> 
> Note that the man-page specifically states that ENOBUFS can't happen.
> 
> You cannot assume that any sendto() data actually gets on the wire, much
> less to its destination. With any user-datagram-protocol, both ends,
> sender and receiver, have to work out what they will do with missing
> packets and packets received out-of-order.
> 
> 
> Cheers,
> Dick Johnson


-- 
Ben Greear <greearb@candelatech.com>       <Ben_Greear AT excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: want opinions on possible glitch in 2.4 network error reporting
  2002-02-06 20:56 ` Richard B. Johnson
  2002-02-06 21:45   ` Ben Greear
@ 2002-02-06 22:23   ` Chris Friesen
  2002-02-07 13:44     ` Richard B. Johnson
  2002-02-07  0:24   ` Alan Cox
  2 siblings, 1 reply; 27+ messages in thread
From: Chris Friesen @ 2002-02-06 22:23 UTC (permalink / raw)
  To: root; +Cc: linux-kernel

"Richard B. Johnson" wrote:

[snip]
> Hackers code sendto as:
>         sendto(s,...);
> Professional programmers use:
>         (void)sendto(s,...);
> 
> checking the return value is useless.
> 
> Note that the man-page specifically states that ENOBUFS can't happen.

I don't know what your manpage says, but my manpage doesn't say anything about
ENOBUFS not being possible.  From the man page: 

"ENOBUFS The system was unable to allocate an internal memory block.  The
operation may succeed when buffers become available."

> You cannot assume that any sendto() data actually gets on the wire, much
> less to its destination. With any user-datagram-protocol, both ends,
> sender and receiver, have to work out what they will do with missing
> packets and packets received out-of-order.

Hmm.  I knew you couldn't assume it was delivered (the man page says so), but I
didn't know it doesn't guarantee it getting to the wire.  The man page says that
"locally detected errors are indicated by a return value  of -1".  Furthermore,
it also says "When the  message does not fit into the send buffer of the socket,
send normally blocks, unless the socket has been placed in non-blocking I/O
mode."

I would suggest that if the packet doesn't make it onto the wire, sendto()
should either a) block until it can send the packet (or return with EAGAIN, as
appropriate), or b) return an error.

Chris

-- 
Chris Friesen                    | MailStop: 043/33/F10  
Nortel Networks                  | work: (613) 765-0557
3500 Carling Avenue              | fax:  (613) 765-2986
Nepean, ON K2H 8E9 Canada        | email: cfriesen@nortelnetworks.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: want opinions on possible glitch in 2.4 network error reporting
  2002-02-06 22:23   ` Chris Friesen
@ 2002-02-07 13:44     ` Richard B. Johnson
  2002-02-07 16:33       ` Gerold Jury
  0 siblings, 1 reply; 27+ messages in thread
From: Richard B. Johnson @ 2002-02-07 13:44 UTC (permalink / raw)
  To: Chris Friesen; +Cc: linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=US-ASCII, Size: 1391 bytes --]

On Wed, 6 Feb 2002, Chris Friesen wrote:

> "Richard B. Johnson" wrote:
> 
> [snip]
> > Hackers code sendto as:
> >         sendto(s,...);
> > Professional programmers use:
> >         (void)sendto(s,...);
> > 
> > checking the return value is useless.
> > 
> > Note that the man-page specifically states that ENOBUFS can't happen.
> 
> I don't know what your manpage says, but my manpage doesn't say anything about
> ENOBUFS not being possible.  From the man page: 
> 
> "ENOBUFS The system was unable to allocate an internal memory block.  The
> operation may succeed when buffers become available."



       ENOBUFS
              The output queue for a network interface was  full.
              This  generally  indicates  that  the interface has
              stopped sending, but may  be  caused  by  transient
              congestion.   (This  cannot occur in Linux, packets
              are just silently dropped when a device queue over
              flows.)


Linux Man Page              July 1999                           1

Script done on Thu Feb  7 08:35:39 2002

Distributed with RedHat 7




Cheers,
Dick Johnson

Penguin : Linux version 2.4.1 on an i686 machine (797.90 BogoMips).

    I was going to compile a list of innovations that could be
    attributed to Microsoft. Once I realized that Ctrl-Alt-Del
    was handled in the BIOS, I found that there aren't any.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: want opinions on possible glitch in 2.4 network error reporting
  2002-02-07 13:44     ` Richard B. Johnson
@ 2002-02-07 16:33       ` Gerold Jury
  0 siblings, 0 replies; 27+ messages in thread
From: Gerold Jury @ 2002-02-07 16:33 UTC (permalink / raw)
  To: root; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 726 bytes --]

This is off topic but close.
UDP packets are not dropped when the UDP socket is shutdown(socket,0); for 
receive at least on 2.4.17.
attached is a small proof.
Anyone with a hint or a fix ?

Thanks
Gerold


On Thursday 07 February 2002 14:44, Richard B. Johnson wrote:
>
>        ENOBUFS
>               The output queue for a network interface was  full.
>               This  generally  indicates  that  the interface has
>               stopped sending, but may  be  caused  by  transient
>               congestion.   (This  cannot occur in Linux, packets
>               are just silently dropped when a device queue over
>               flows.)
>
>
> Linux Man Page              July 1999                           1
>

[-- Attachment #2: udpshutdown.c --]
[-- Type: text/x-c++, Size: 1275 bytes --]

#include <stdio.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>

char hello[] = "hello socket";
char buf[2048];

int main( int argc, char *argv[] )
{
  int iFd, ret;
  socklen_t fromlen;
  short port = 15678;
  struct sockaddr_in addr, from;

  if((iFd = socket(AF_INET, SOCK_DGRAM, 0)) < 0) {
    perror("create socket");
    exit( 0 );
  }
  addr.sin_family = AF_INET;
  addr.sin_port = htons( port );
  addr.sin_addr.s_addr = htonl( INADDR_LOOPBACK );
  if (bind( iFd, (struct sockaddr *)&addr, sizeof( addr ) ) < 0) {
    fprintf(stderr,"bind to %d failed %s\n",port,strerror(errno));
    exit( 0 );
  }
  shutdown( iFd, 0 ); // shutdown receive
  // addr.sin_addr.s_addr = htonl( INADDR_LOOPBACK );
  if( sendto( iFd, hello, sizeof( hello ), 0, (struct sockaddr *)&addr, sizeof( addr ) ) < 0 ) {
    perror("sendto");
    exit( 0 );
  }
  ret = recvfrom( iFd,  buf, sizeof( buf ), 0, (struct sockaddr *)&from, &fromlen );
  if( ret < 0 ) {
    perror("recvfrom");
    exit( 0 );
  }
  printf( "received %d bytes [%s]\n", ret, buf );

  shutdown( iFd, 1 ); // shutdown send
  if( sendto( iFd, hello, sizeof( hello ), 0, (struct sockaddr *)&addr, sizeof( addr ) ) < 0 ) {
    perror("sendto");
    exit( 0 );
  }
  return 0;
}

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: want opinions on possible glitch in 2.4 network error reporting
  2002-02-06 20:56 ` Richard B. Johnson
  2002-02-06 21:45   ` Ben Greear
  2002-02-06 22:23   ` Chris Friesen
@ 2002-02-07  0:24   ` Alan Cox
  2 siblings, 0 replies; 27+ messages in thread
From: Alan Cox @ 2002-02-07  0:24 UTC (permalink / raw)
  To: root; +Cc: Chris Friesen, linux-kernel

> Hackers code sendto as:
> 	sendto(s,...);
> Professional programmers use:
> 	(void)sendto(s,...);

Remind me never to hire one of your professional programmers

> checking the return value is useless.

Not so. For a large number of situations its extremely informative. 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: want opinions on possible glitch in 2.4 network error reporting
  2002-02-06 20:31 want opinions on possible glitch in 2.4 network error reporting Chris Friesen
  2002-02-06 20:56 ` Richard B. Johnson
@ 2002-02-07  0:26 ` Alan Cox
  2002-02-07  1:51   ` Ion Badulescu
  2002-02-07  9:22   ` Luis Garces
  1 sibling, 2 replies; 27+ messages in thread
From: Alan Cox @ 2002-02-07  0:26 UTC (permalink / raw)
  To: Chris Friesen; +Cc: linux-kernel

> I ran into a somewhat related issue on a 2.2.16 system, where I had an app that
> was calling sendto() on 217000 packets/sec, even though the wire could only
> handle about 127000 packets/sec.  I got no errors at all in sendto, even though
> over a third of the packets were not actually being sent.

That is correct UDP behaviour

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: want opinions on possible glitch in 2.4 network error reporting
  2002-02-07  0:26 ` Alan Cox
@ 2002-02-07  1:51   ` Ion Badulescu
  2002-02-07  2:08     ` Alan Cox
  2002-02-07  4:23     ` Ben Greear
  2002-02-07  9:22   ` Luis Garces
  1 sibling, 2 replies; 27+ messages in thread
From: Ion Badulescu @ 2002-02-07  1:51 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel, Chris Friesen

On Thu, 7 Feb 2002 00:26:20 +0000 (GMT), Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
>> I ran into a somewhat related issue on a 2.2.16 system, where I had an app that
>> was calling sendto() on 217000 packets/sec, even though the wire could only
>> handle about 127000 packets/sec.  I got no errors at all in sendto, even though
>> over a third of the packets were not actually being sent.
> 
> That is correct UDP behaviour

This is totally untrue, unless the socket doing non-blocking I/O -- and
even then you get -1 and EAGAIN from sendto.

Otherwise sendto is very much a blocking operation which will block until
there is enough space in socket buffer to store the data. From there,
there is no way to "lose" that data before it hits the wire, unless of
course the network driver is broken and doesn't plug the upper layers when
its TX queue is full.

Think of it: if what you said were true, NFS over UDP would be totally
useless. But it's not, so if UDP data gets lost before it hits the wire,
it's usually a bug in the network driver.

>From the limited testing I just ran, I appears that starfire and 3c59x 
handle this correctly, whereas tulip always loses a small number of 
packets during a UDP storm. ttcp -us[rt] is very useful for such 
testing...

Ion

-- 
  It is better to keep your mouth shut and be thought a fool,
            than to open it and remove all doubt.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: want opinions on possible glitch in 2.4 network error reporting
  2002-02-07  1:51   ` Ion Badulescu
@ 2002-02-07  2:08     ` Alan Cox
  2002-02-07  2:09       ` Ion Badulescu
  2002-02-07  4:21       ` Ben Greear
  2002-02-07  4:23     ` Ben Greear
  1 sibling, 2 replies; 27+ messages in thread
From: Alan Cox @ 2002-02-07  2:08 UTC (permalink / raw)
  To: Ion Badulescu; +Cc: Alan Cox, linux-kernel, Chris Friesen

> > That is correct UDP behaviour
> 
> This is totally untrue, unless the socket doing non-blocking I/O -- and
> even then you get -1 and EAGAIN from sendto.

Not the case.

> there is no way to "lose" that data before it hits the wire, unless of
> course the network driver is broken and doesn't plug the upper layers when
> its TX queue is full.

UDP is not flow controlled.

> Think of it: if what you said were true, NFS over UDP would be totally
> useless. But it's not, so if UDP data gets lost before it hits the wire,
> it's usually a bug in the network driver.

NFS does UDP flow control of its own. If it didnt it would indeed be
broken.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: want opinions on possible glitch in 2.4 network error reporting
  2002-02-07  2:08     ` Alan Cox
@ 2002-02-07  2:09       ` Ion Badulescu
  2002-02-07  2:34         ` Alan Cox
  2002-02-07  4:21       ` Ben Greear
  1 sibling, 1 reply; 27+ messages in thread
From: Ion Badulescu @ 2002-02-07  2:09 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel, Chris Friesen

On Thu, 7 Feb 2002, Alan Cox wrote:

> > there is no way to "lose" that data before it hits the wire, unless of
> > course the network driver is broken and doesn't plug the upper layers when
> > its TX queue is full.
> 
> UDP is not flow controlled.

No, of course not, but this has *nothing* to do with UDP. The IP socket 
itself is flow controlled, and so is the TX queue of the network driver.

Let me give you another example: ping -f. If what you said were true, ping -f 
would send packets as fast as the CPU can generate into the black hole 
called an IP raw socket, right? Well, that just doesn't happen, because 
sendto/sendmsg will block until there is enough space in the TX queue of 
the raw socket.

I'll state again: if data (UDP or otherwise) is lost after sendto() 
returns success but before it hits the wire, something is BROKEN in that 
IP stack.

Ion

-- 
  It is better to keep your mouth shut and be thought a fool,
            than to open it and remove all doubt.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: want opinions on possible glitch in 2.4 network error reporting
  2002-02-07  2:09       ` Ion Badulescu
@ 2002-02-07  2:34         ` Alan Cox
  2002-02-07  2:54           ` Ion Badulescu
  0 siblings, 1 reply; 27+ messages in thread
From: Alan Cox @ 2002-02-07  2:34 UTC (permalink / raw)
  To: Ion Badulescu; +Cc: Alan Cox, linux-kernel, Chris Friesen

> > UDP is not flow controlled.
> 
> No, of course not, but this has *nothing* to do with UDP. The IP socket 
> itself is flow controlled, and so is the TX queue of the network driver.

It is not flow controlled

> Let me give you another example: ping -f. If what you said were true, ping -f 
> would send packets as fast as the CPU can generate into the black hole 
> called an IP raw socket, right? Well, that just doesn't happen, because 

Wrong. man ping. ping -f doesn't do what you apparently think it does.

Alan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: want opinions on possible glitch in 2.4 network error reporting
  2002-02-07  2:34         ` Alan Cox
@ 2002-02-07  2:54           ` Ion Badulescu
  2002-02-07 11:11             ` Alan Cox
  0 siblings, 1 reply; 27+ messages in thread
From: Ion Badulescu @ 2002-02-07  2:54 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel, Chris Friesen

On Thu, 7 Feb 2002, Alan Cox wrote:

> Wrong. man ping. ping -f doesn't do what you apparently think it does.

strace ping, you'll see it doing a 

	setsockopt(7, SOL_SOCKET, SO_SNDTIMEO, [1], 8) = 0

on its socket.

That's about the only way (aside from using a TBF queue, and other non
FIFO queues) you can lose data from a socket's queue.

Getting back to the NFS/UDP example: yes, NFS has its own flow control, 
but that's not the point. The reason NFS/UDP works so well with large NFS 
packets over a fully-switched *local* subnet is precisely because NFS's 
flow control is almost never exercised in that case. Data simply doesn't 
get lost -- never in the UDP socket's queue, and very rarely on the wire.

But you don't need to believe me. Just run the ttcp -uts test and explain 
how come all the data makes it to the other end (again, over a 
fully-switched local subnet) if:
1. ttcp has no clue about the wire speed (which it obviously doesn't) so 
it can't do rate limiting
2. the UDP socket simply discards data when some internal queue fills up, 
without blocking sendto() and without returning an error.

Moreover: please strace -T that ttcp -uts test, and notice how the time 
for the system call goes up by 2 orders of magnitude (i.e. it blocks) as 
soon as the socket queue fill up.

Ion

-- 
  It is better to keep your mouth shut and be thought a fool,
            than to open it and remove all doubt.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: want opinions on possible glitch in 2.4 network error reporting
  2002-02-07  2:54           ` Ion Badulescu
@ 2002-02-07 11:11             ` Alan Cox
  2002-02-08 16:11               ` Pavel Machek
  0 siblings, 1 reply; 27+ messages in thread
From: Alan Cox @ 2002-02-07 11:11 UTC (permalink / raw)
  To: Ion Badulescu; +Cc: Alan Cox, linux-kernel, Chris Friesen

> > Wrong. man ping. ping -f doesn't do what you apparently think it does.
> 
> strace ping, you'll see it doing a 
> 	setsockopt(7, SOL_SOCKET, SO_SNDTIMEO, [1], 8) = 0
> 
> on its socket.

Read the ping manual page. Then when you understand what ping -f does 
come back and have a useful conversation.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: want opinions on possible glitch in 2.4 network error reporting
  2002-02-07 11:11             ` Alan Cox
@ 2002-02-08 16:11               ` Pavel Machek
  2002-02-08 21:39                 ` Ion Badulescu
  0 siblings, 1 reply; 27+ messages in thread
From: Pavel Machek @ 2002-02-08 16:11 UTC (permalink / raw)
  To: Alan Cox; +Cc: Ion Badulescu, linux-kernel, Chris Friesen

Hi!

> > > Wrong. man ping. ping -f doesn't do what you apparently think it does.
> > 
> > strace ping, you'll see it doing a 
> > 	setsockopt(7, SOL_SOCKET, SO_SNDTIMEO, [1], 8) = 0
> > 
> > on its socket.
> 
> Read the ping manual page. Then when you understand what ping -f does 
> come back and have a useful conversation.

But I guess it *would* be usefull to have -F option saying "feed data
as fast as possible", right? And it would be nice if this option did
not eat 100% cpu when possible, right?

So what he is asking for is pretty usefull behaviour.
									Pavel
-- 
(about SSSCA) "I don't say this lightly.  However, I really think that the U.S.
no longer is classifiable as a democracy, but rather as a plutocracy." --hpa

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: want opinions on possible glitch in 2.4 network error reporting
  2002-02-08 16:11               ` Pavel Machek
@ 2002-02-08 21:39                 ` Ion Badulescu
  0 siblings, 0 replies; 27+ messages in thread
From: Ion Badulescu @ 2002-02-08 21:39 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Alan Cox, linux-kernel, Chris Friesen

On Fri, 8 Feb 2002, Pavel Machek wrote:

> Hi!
> 
> > > > Wrong. man ping. ping -f doesn't do what you apparently think it does.
> > > 
> > > strace ping, you'll see it doing a 
> > > 	setsockopt(7, SOL_SOCKET, SO_SNDTIMEO, [1], 8) = 0
> > > 
> > > on its socket.
> > 
> > Read the ping manual page. Then when you understand what ping -f does 
> > come back and have a useful conversation.
> 
> But I guess it *would* be usefull to have -F option saying "feed data
> as fast as possible", right? And it would be nice if this option did
> not eat 100% cpu when possible, right?
> 
> So what he is asking for is pretty usefull behaviour.

I'm not asking for it. I'm saying this is what we already have. Too bad 
people won't listen -- and yes I know ping -f was a bad example. A 
blocking sendto() *will* block (surprise surprise), even though it *might* 
throw the data away later on.

Indeed, as Davem stated, a UDP socket will lose data under memory
pressure.  In real life this hardly ever happens, however, even with large 
message sizes: I just tested with sizes up to 52000, which is just about 
as large as you'll ever see in real environments.

Also: I'm just dying to be enlightened about how a dumb program like
ttcp -u, doing a totally dumb "while (1) sendto();", can manage to score 
sending rates identical to the raw wire speed, if indeed sendto() never 
blocks and simply throws away the data:

apollo:/# ttcp -utsl 53000 zeus
ttcp-t: buflen=53000, nbuf=2048, align=16384/0, port=5001  udp  -> sybase2
ttcp-t: socket
ttcp-t: 108544000 bytes in 9.02 real seconds = 11745.26 KB/sec +++
ttcp-t: 2054 I/O calls, msec/call = 4.50, calls/sec = 227.59
ttcp-t: 0.0user 0.2sys 0:09real 2% 0i+0d 0maxrss 0+13pf 0+0csw
zeus:/var/lib/pgsql# ttcp -ursl 53000
ttcp-r: buflen=53000, nbuf=2048, align=16384/0, port=5001  udp
ttcp-r: socket
ttcp-r: 108544000 bytes in 9.03 real seconds = 11741.76 KB/sec +++
ttcp-r: 2050 I/O calls, msec/call = 4.51, calls/sec = 227.08
ttcp-r: 0.0user 0.1sys 0:09real 1% 0i+0d 0maxrss 0+12pf 0+0csw

11745KB/sec sounds suspiciously close to the 100Mb/sec wire speed.

and, for reference, just to make sure ttcp wasn't lying to me:

zeus:/var/lib/pgsql# iptables -L -n -v
Chain INPUT (policy ACCEPT 7217K packets, 3137M bytes)
 pkts bytes target     prot opt in     out     source               destination         
 2051  108M            udp  --  *      *       10.2.10.216          0.0.0.0/0          udp dpt:5001 


But no, it's so much easier to incompletely quote a message and then claim 
the other person has no idea about what he's talking about. Yes, Alan, 
that's precisely what you did.

Ion

-- 
  It is better to keep your mouth shut and be thought a fool,
            than to open it and remove all doubt.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: want opinions on possible glitch in 2.4 network error reporting
  2002-02-07  2:08     ` Alan Cox
  2002-02-07  2:09       ` Ion Badulescu
@ 2002-02-07  4:21       ` Ben Greear
  2002-02-07  4:38         ` David S. Miller
  1 sibling, 1 reply; 27+ messages in thread
From: Ben Greear @ 2002-02-07  4:21 UTC (permalink / raw)
  To: Alan Cox; +Cc: Ion Badulescu, linux-kernel, Chris Friesen

Alan Cox wrote:

>>>That is correct UDP behaviour
>>>
>>This is totally untrue, unless the socket doing non-blocking I/O -- and
>>even then you get -1 and EAGAIN from sendto.
>>
> 
> Not the case.

Are you claiming that you will never see -1 and EAGAIN on a nonblocking
UDP socket with sendto?  If so, I'll bet you a kernel patch that you are not
correct (I get to write the patch and you include it :) )

> 
> 
>>there is no way to "lose" that data before it hits the wire, unless of
>>course the network driver is broken and doesn't plug the upper layers when
>>its TX queue is full.
>>
> 
> UDP is not flow controlled.

If it makes it through sendto, where can it be dropped before it
hits the wire?  I doubt the socket buffers are anthing other than FIFO,
and the same goes for the ethernet/device queue.  Since we (can) know
at sendto whether or not the PDU was enqueued for transmit, it seems
trivial to notify user space of success/failure of the local network
stack, and I believe this is what is done.

Now granted, it can be dropped anywhere outside of the machine, but
I can see no good reason to drop it inside the machine.

-- 
Ben Greear <greearb@candelatech.com>       <Ben_Greear AT excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: want opinions on possible glitch in 2.4 network error reporting
  2002-02-07  4:21       ` Ben Greear
@ 2002-02-07  4:38         ` David S. Miller
  2002-02-07  4:56           ` Ben Greear
  0 siblings, 1 reply; 27+ messages in thread
From: David S. Miller @ 2002-02-07  4:38 UTC (permalink / raw)
  To: greearb; +Cc: alan, ionut, linux-kernel, cfriesen

   From: Ben Greear <greearb@candelatech.com>
   Date: Wed, 06 Feb 2002 21:21:09 -0700
   
   Alan Cox wrote:
   
   > UDP is not flow controlled.
   
   If it makes it through sendto, where can it be dropped before it
   hits the wire?

If the packet ends up being fragmented on the way out and the socket
cannot take on the allocation against it's buffer space.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: want opinions on possible glitch in 2.4 network error reporting
  2002-02-07  4:38         ` David S. Miller
@ 2002-02-07  4:56           ` Ben Greear
  0 siblings, 0 replies; 27+ messages in thread
From: Ben Greear @ 2002-02-07  4:56 UTC (permalink / raw)
  To: David S. Miller; +Cc: alan, ionut, linux-kernel, cfriesen

David S. Miller wrote:

>    From: Ben Greear <greearb@candelatech.com>
>    Date: Wed, 06 Feb 2002 21:21:09 -0700
>    
>    Alan Cox wrote:
>    
>    > UDP is not flow controlled.
>    
>    If it makes it through sendto, where can it be dropped before it
>    hits the wire?
> 
> If the packet ends up being fragmented on the way out and the socket
> cannot take on the allocation against it's buffer space.

In the fragmentation case (at least over 1500 MTU ethernet), the
headers are a relatively small portion of the total PDU, right?
So, if we reserved 10-15% (or whatever it works out to) that should
make it so we never drop the packet due to fragmentation, right?  I can't see any reason
not to reserve this space, because sending a little later is definately
better than going through the work to send it sooner but then having to
drop it down in the local kernel.  We may only want to reserve the buffers
when they are fairly large (ie not you your very small and slow embedded devices
where memory is very precious).

-- 
Ben Greear <greearb@candelatech.com>       <Ben_Greear AT excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: want opinions on possible glitch in 2.4 network error reporting
  2002-02-07  1:51   ` Ion Badulescu
  2002-02-07  2:08     ` Alan Cox
@ 2002-02-07  4:23     ` Ben Greear
  2002-02-07  4:37       ` Ion Badulescu
  1 sibling, 1 reply; 27+ messages in thread
From: Ben Greear @ 2002-02-07  4:23 UTC (permalink / raw)
  To: Ion Badulescu; +Cc: Alan Cox, linux-kernel, Chris Friesen




>>From the limited testing I just ran, I appears that starfire and 3c59x 
> handle this correctly, whereas tulip always loses a small number of 
> packets during a UDP storm. ttcp -us[rt] is very useful for such 
> testing...


It would be interesting to see which side is dropping?  Have you
coorelated ethernet driver counters to your sendto count?


> 
> Ion
> 
> 


-- 
Ben Greear <greearb@candelatech.com>       <Ben_Greear AT excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: want opinions on possible glitch in 2.4 network error reporting
  2002-02-07  4:23     ` Ben Greear
@ 2002-02-07  4:37       ` Ion Badulescu
  0 siblings, 0 replies; 27+ messages in thread
From: Ion Badulescu @ 2002-02-07  4:37 UTC (permalink / raw)
  To: Ben Greear; +Cc: Alan Cox, linux-kernel, Chris Friesen

On Wed, 6 Feb 2002, Ben Greear wrote:

> >>From the limited testing I just ran, I appears that starfire and 3c59x 
> > handle this correctly, whereas tulip always loses a small number of 
> > packets during a UDP storm. ttcp -us[rt] is very useful for such 
> > testing...
> 
> It would be interesting to see which side is dropping?  Have you
> coorelated ethernet driver counters to your sendto count?

It's hard for me to do it right now, because I don't have them in 
isolation (they do NFS and other stuff), and I don't have iptables support 
compiled into the kernel running the tulip. However:

starfire -> 3c59x
3c59x -> starfire
tulip -> 3c59x
tulip -> starfire

never lose data on a quiescent network:

ttcp-t: 83886080 bytes in 7.04 real seconds = 11640.36 KB/sec +++
ttcp-r: 83886080 bytes in 7.04 real seconds = 11641.10 KB/sec +++

whereas

3c59x -> tulip
starfire -> tulip

*always* lose several packets:

ttcp-t: 16777216 bytes in 1.40 real seconds = 11717.40 KB/sec +++
ttcp-r: 16769024 bytes in 1.40 real seconds = 11679.39 KB/sec +++

and

ttcp-t: 33554432 bytes in 2.80 real seconds = 11714.81 KB/sec +++
ttcp-r: 33456128 bytes in 2.80 real seconds = 11660.28 KB/sec +++

and

ttcp-t: 83886080 bytes in 7.00 real seconds = 11704.40 KB/sec +++
ttcp-r: 83722240 bytes in 7.00 real seconds = 11674.67 KB/sec +++

So I would tend to blame it on the tulip -- but the Rx side of it, not the 
Tx, which this discussion was about...

Ion

-- 
  It is better to keep your mouth shut and be thought a fool,
            than to open it and remove all doubt.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: want opinions on possible glitch in 2.4 network error reporting
  2002-02-07  0:26 ` Alan Cox
  2002-02-07  1:51   ` Ion Badulescu
@ 2002-02-07  9:22   ` Luis Garces
  1 sibling, 0 replies; 27+ messages in thread
From: Luis Garces @ 2002-02-07  9:22 UTC (permalink / raw)
  To: linux-kernel

Alan Cox wrote:

 >> I ran into a somewhat related issue on a 2.2.16 system, where I
 >> had an app that was calling sendto() on 217000 packets/sec, even
 >> though the wire could only handle about 127000 packets/sec.  I
 >> got no errors at all in sendto, even though over a third of the
 >> packets were not actually being sent.
 >>
 >
 > That is correct UDP behaviour -
 >

Yes, TCP provides a reliable point-to-point path, and UDP doesn't. The 
problem is considering where does this unreliability starts in the UDP 
path. In Alan's opinion (I think) it starts in the very moment data is 
passed to the call to sendto() (i.e, includes the kernel in the 
unreliable UDP path). Perhaps it is a little sad to see the kernel as 
something lossy, but I think it's the nature of UDP.

-- 
Luis
****

^ permalink raw reply	[flat|nested] 27+ messages in thread

[parent not found: <3C6192A5.911D5B4F@nortelnetworks.com.suse.lists.linux.kernel>]

* Re: want opinions on possible glitch in 2.4 network error reporting
       [not found] <3C6192A5.911D5B4F@nortelnetworks.com.suse.lists.linux.kernel>
@ 2002-02-07  0:06 ` Andi Kleen
  2002-02-07 15:59   ` Chris Friesen
  0 siblings, 1 reply; 27+ messages in thread
From: Andi Kleen @ 2002-02-07  0:06 UTC (permalink / raw)
  To: Chris Friesen; +Cc: linux-kernel

Chris Friesen <cfriesen@nortelnetworks.com> writes:

> I've been looking around in the 2.4 networking stack, and I noticed that when
> the tulip (and no doubt many other) driver cannot put any more outgoing packets
> on the queue, it calls netif_stop_queue().  Then, in dev_queue_xmit() we check
> this flag by calling netif_queue_stopped().  My concern is that if this flag is
> true, we return -ENETDOWN.  Is this really the proper return code for this? If
> anything, the network is too active.  It seems to me that it would make more
> sense to have some kind of congestion return code rather than claiming that the
> network is down.

The ENETDOWN path you're seeing only applies to queueless devices (like
loopback or a tunnel device). These should only set the queued stopped
flag when something is terrible wrong. 

All real network devices have a queue and go through the qdisc. 

> 
> I think it would make sense to return -ENOBUFS in this case, as its already
> listed in the sendto() man page, and the description matches the error because
> the command could succeed if retried.
> 
> I ran into a somewhat related issue on a 2.2.16 system, where I had an app that
> was calling sendto() on 217000 packets/sec, even though the wire could only
> handle about 127000 packets/sec.  I got no errors at all in sendto, even though
> over a third of the packets were not actually being sent.

The qdisc queue acts like an IP network and deletes unnecessary packets. 
There is no provision to block when it fills because that would have
many sideeffects and complicate the stack a lot. There is an return
code though that is passed up when the queue fills (NET_XMIT_DROP or
NET_XMIT_CN), but it's currently only used by TCP but not passed to 
user space for UPD/RAW. It could be probably done with a special
socket option if there is a clear need.

-Andi

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: want opinions on possible glitch in 2.4 network error reporting
  2002-02-07  0:06 ` Andi Kleen
@ 2002-02-07 15:59   ` Chris Friesen
  2002-02-07 16:01     ` Andi Kleen
  0 siblings, 1 reply; 27+ messages in thread
From: Chris Friesen @ 2002-02-07 15:59 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel

Andi Kleen wrote:
> 
> Chris Friesen <cfriesen@nortelnetworks.com> writes:
> 
> > I've been looking around in the 2.4 networking stack, and I noticed that when
> > the tulip (and no doubt many other) driver cannot put any more outgoing packets
> > on the queue, it calls netif_stop_queue().  Then, in dev_queue_xmit() we check
> > this flag by calling netif_queue_stopped().  My concern is that if this flag is
> > true, we return -ENETDOWN.  Is this really the proper return code for this? If
> > anything, the network is too active.  It seems to me that it would make more
> > sense to have some kind of congestion return code rather than claiming that the
> > network is down.
> 
> The ENETDOWN path you're seeing only applies to queueless devices (like
> loopback or a tunnel device). These should only set the queued stopped
> flag when something is terrible wrong.
> 
> All real network devices have a queue and go through the qdisc.

Okay, I must be missing something, so can you enlighten me?  I can't figure out
where the qdisc is attached to the ethernet device.

Chris

-- 
Chris Friesen                    | MailStop: 043/33/F10  
Nortel Networks                  | work: (613) 765-0557
3500 Carling Avenue              | fax:  (613) 765-2986
Nepean, ON K2H 8E9 Canada        | email: cfriesen@nortelnetworks.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: want opinions on possible glitch in 2.4 network error reporting
  2002-02-07 15:59   ` Chris Friesen
@ 2002-02-07 16:01     ` Andi Kleen
  0 siblings, 0 replies; 27+ messages in thread
From: Andi Kleen @ 2002-02-07 16:01 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Andi Kleen, linux-kernel

On Thu, Feb 07, 2002 at 10:59:56AM -0500, Chris Friesen wrote:
> Okay, I must be missing something, so can you enlighten me?  I can't figure out
> where the qdisc is attached to the ethernet device.

net/core/dev.c:dev_open -> dev_activate. 

-Andi

^ permalink raw reply	[flat|nested] 27+ messages in thread

[parent not found: <E16Ydys-0007D6-00@the-village.bc.nu.suse.lists.linux.kernel>]

[parent not found: <Pine.LNX.4.44.0202062101390.4832-100000@age.cs.columbia.edu.suse.lists.linux.kernel>]

* Re: want opinions on possible glitch in 2.4 network error reporting
       [not found] ` <Pine.LNX.4.44.0202062101390.4832-100000@age.cs.columbia.edu.suse.lists.linux.kernel>
@ 2002-02-07  2:47   ` Andi Kleen
  2002-02-07  6:25     ` Chris Friesen
  0 siblings, 1 reply; 27+ messages in thread
From: Andi Kleen @ 2002-02-07  2:47 UTC (permalink / raw)
  To: Ion Badulescu; +Cc: linux-kernel

Ion Badulescu <ionut@cs.columbia.edu> writes:
> I'll state again: if data (UDP or otherwise) is lost after sendto() 
> returns success but before it hits the wire, something is BROKEN in that 
> IP stack.

Your proposal would break select(). It would require UDP sendmsg to block
when the TX queue is full. Most applications using select do
not send the socket non blocking. If they select for writing and the 
kernel signals the socket writable they expect not to block in the write. 
As long as the only thing controlling the blocking is the per socket
send buffer that works out as long as the application is careful enough
not to fill its send buffer. If you would put the TX queue into the 
blocking equation too this cannot be guaranteed anymore because the TX queue
is shared between all local processes and even forwarding. You would
get random blocking on select based applications, breaking them.

I BTW had a proposal for blocking the sender in TX some time ago but it was
luckily shot down by people who knew better than me. 

-Andi

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: want opinions on possible glitch in 2.4 network error reporting
  2002-02-07  2:47   ` Andi Kleen
@ 2002-02-07  6:25     ` Chris Friesen
  0 siblings, 0 replies; 27+ messages in thread
From: Chris Friesen @ 2002-02-07  6:25 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Ion Badulescu, linux-kernel

Andi Kleen wrote:
> 
> Ion Badulescu <ionut@cs.columbia.edu> writes:
> > I'll state again: if data (UDP or otherwise) is lost after sendto()
> > returns success but before it hits the wire, something is BROKEN in that
> > IP stack.
> 
> Your proposal would break select(). It would require UDP sendmsg to block
> when the TX queue is full. Most applications using select do
> not send the socket non blocking. If they select for writing and the
> kernel signals the socket writable they expect not to block in the write.
> As long as the only thing controlling the blocking is the per socket
> send buffer that works out as long as the application is careful enough
> not to fill its send buffer. If you would put the TX queue into the
> blocking equation too this cannot be guaranteed anymore because the TX queue
> is shared between all local processes and even forwarding. You would
> get random blocking on select based applications, breaking them.

I don't see the problem.  So sendto() blocks if there is no room on the socket
buffer.  Fine.  So if there's room on the socket buffer we take the packet and
put in on the buffer, and sendto() returns.

Now, for each socket we've got a buffer of packets that want to get onto the
device driver tx queue.  So we use some kind of algorithm to pick which packets
to move from the group of socket buffers to the device driver tx queue.  If the
app calls sendto() before there is space in the socket buffer, then sendto()
blocks.  select() should return whether or not there is space in the socket
buffer.  Eventually, every packet that gets put into a socket buffer makes it
out onto the wire.  Congestion is dealt with by leaving packets in the socket
buffers until they can be guaranteed a spot in the device tx queue.  I assume we
would try and add it to the tx queue, and remove it from the socket buffer if
the add succeeds.

I just don't see why sendto() would accept the packet and then later on it gets
dropped.

Chris

-- 
Chris Friesen                    | MailStop: 043/33/F10  
Nortel Networks                  | work: (613) 765-0557
3500 Carling Avenue              | fax:  (613) 765-2986
Nepean, ON K2H 8E9 Canada        | email: cfriesen@nortelnetworks.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2002-02-08 21:40 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-02-06 20:31 want opinions on possible glitch in 2.4 network error reporting Chris Friesen
2002-02-06 20:56 ` Richard B. Johnson
2002-02-06 21:45   ` Ben Greear
2002-02-06 22:23   ` Chris Friesen
2002-02-07 13:44     ` Richard B. Johnson
2002-02-07 16:33       ` Gerold Jury
2002-02-07  0:24   ` Alan Cox
2002-02-07  0:26 ` Alan Cox
2002-02-07  1:51   ` Ion Badulescu
2002-02-07  2:08     ` Alan Cox
2002-02-07  2:09       ` Ion Badulescu
2002-02-07  2:34         ` Alan Cox
2002-02-07  2:54           ` Ion Badulescu
2002-02-07 11:11             ` Alan Cox
2002-02-08 16:11               ` Pavel Machek
2002-02-08 21:39                 ` Ion Badulescu
2002-02-07  4:21       ` Ben Greear
2002-02-07  4:38         ` David S. Miller
2002-02-07  4:56           ` Ben Greear
2002-02-07  4:23     ` Ben Greear
2002-02-07  4:37       ` Ion Badulescu
2002-02-07  9:22   ` Luis Garces
     [not found] <3C6192A5.911D5B4F@nortelnetworks.com.suse.lists.linux.kernel>
2002-02-07  0:06 ` Andi Kleen
2002-02-07 15:59   ` Chris Friesen
2002-02-07 16:01     ` Andi Kleen
     [not found] <E16Ydys-0007D6-00@the-village.bc.nu.suse.lists.linux.kernel>
     [not found] ` <Pine.LNX.4.44.0202062101390.4832-100000@age.cs.columbia.edu.suse.lists.linux.kernel>
2002-02-07  2:47   ` Andi Kleen
2002-02-07  6:25     ` Chris Friesen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox