* Network compatibility and performance
@ 2006-08-10 15:34 linux-os (Dick Johnson)
2006-08-10 17:28 ` Stephen Hemminger
2006-08-12 19:21 ` Ben Greear
0 siblings, 2 replies; 9+ messages in thread
From: linux-os (Dick Johnson) @ 2006-08-10 15:34 UTC (permalink / raw)
To: Linux kernel
Hello,
Network throughput is seriously defective with linux-2.6.16.24
if the length given to 'write()' is a large number.
Given this code on a connected socket........
//-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
//
// Copyright(c) 2005 Analogic Corporation (rjohnson@analogic.com)
//
// This program may be distributed under the GNU Public License
// version 2, as published by the Free Software Foundation, Inc.,
// 59 Temple Place, Suite 330 Boston, MA, 02111.
//
//-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdint.h>
#include <signal.h>
#include <string.h>
#include <stdarg.h>
#include <errno.h>
#include <fcntl.h>
#include <netinet/in.h>
#include <netinet/tcp.h>
#include <sys/poll.h>
#define BUF_LEN 0x1000
#define FAIL -1
//-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
//
// This sends a message that could exceed the size of the network buffers.
// It returns 0 if everything went okay, and FAIL if not.
//
int32_t sender(int32_t fd, void *buf, size_t len)
{
int32_t ret_val;
uint8_t *cp;
cp = (uint8_t *) buf;
while(len) {
if((ret_val = write(fd, cp, MIN(len, BUF_LEN))) == FAIL) {
if(errno == EAGAIN)
continue;
return ret_val;
}
len -= ret_val;
cp += ret_val;
}
return 0;
}
It used to work quite well with:
while(len) {
if((ret_val = write(fd, cp, len)) == FAIL) {
return ret_val;
}
len -= ret_val;
cp += ret_val;
}
The network socket layer would return the amount of bytes
actually sent and the code would walk its way up through the
buffer. This was the expected behavior for many years.
Then after about Linux-2.6.8, I needed to do:
while(len) {
if((ret_val = write(fd, cp, len)) == FAIL) {
if(errno == EAGAIN)
continue;
return ret_val;
}
len -= ret_val;
cp += ret_val;
}
This was because Linux would claim to run out of resources
even though there was nothing else running on the system.
Now at Linux-2.6.16.24, the code needed to be further modified
to:
while(len) {
if((ret_val = write(fd, cp, MIN(len, 0x1000))) == FAIL) {
if(errno == EAGAIN)
continue;
return ret_val;
}
len -= ret_val;
cp += ret_val;
}
... or else it would spin <forever> returning 0 with no errno set.
In all cases, these problems exist when 'len' is a large value, perhaps
0x01000000, much greater than Linux could ever find an available
buffer for. Linux used to send what it could. Now it will just fail to
send anything at all, returning 0 if it 'feels' like it doesn't want
to bother. This is not good. With the hacked code, the data throughput
is about 100,000 bytes per second on a dedicated link. The previous
code ran 112,000 bytes per second. Once the 'errno' happens, the
network stumbles to a measley 12,000 bytes per second. This
breaks our applications.
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 2.40GHz
stepping : 9
cpu MHz : 2399.809
cache size : 512 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid xtpr
bogomips : 4804.62
processor : 1
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 2.40GHz
stepping : 9
cpu MHz : 2399.809
cache size : 512 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid xtpr
bogomips : 4798.10
processor : 2
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 2.40GHz
stepping : 9
cpu MHz : 2399.809
cache size : 512 KB
physical id : 3
siblings : 2
core id : 0
cpu cores : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid xtpr
bogomips : 4798.06
processor : 3
vendor_id : GenuineIntel
cpu family : 15
model : 2
model name : Intel(R) Xeon(TM) CPU 2.40GHz
stepping : 9
cpu MHz : 2399.809
cache size : 512 KB
physical id : 3
siblings : 2
core id : 0
cpu cores : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid xtpr
bogomips : 4797.98
Module Size Used by
parport_pc 26948 1
lp 13480 0
parport 33608 2 parport_pc,lp
autofs4 19460 0
sunrpc 124476 1
e1000 100276 0 <--- Intel network controller
floppy 55524 0
sg 33180 0
microcode 10912 0
dm_mod 49816 0
uhci_hcd 31248 0
button 9488 0
battery 12164 0
ac 7940 0
ipv6 229728 18
ext3 114568 2
jbd 50324 1 ext3
aic79xx 182104 3
scsi_transport_spi 22144 1 aic79xx
sd_mod 17792 4
scsi_mod 116360 4 sg,aic79xx,scsi_transport_spi,sd_mod
MemTotal: 1164292 kB
MemFree: 438272 kB
Buffers: 131024 kB
Cached: 383516 kB
SwapCached: 0 kB
Active: 125260 kB
Inactive: 407276 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 1164292 kB
LowFree: 438272 kB
SwapTotal: 2040244 kB
SwapFree: 2040244 kB
Dirty: 0 kB
Writeback: 0 kB
Mapped: 31780 kB
Slab: 185896 kB
CommitLimit: 2622388 kB
Committed_AS: 27872 kB
PageTables: 764 kB
VmallocTotal: 122576 kB
VmallocUsed: 11036 kB
VmallocChunk: 105676 kB
HugePages_Total: 0
HugePages_Free: 0
Hugepagesize: 4096 kB
Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.24 on an i686 machine (5592.62 BogoMips).
New book: http://www.AbominableFirebug.com/
_
\x1a\x04
****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.
Thank you.
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: Network compatibility and performance 2006-08-10 15:34 Network compatibility and performance linux-os (Dick Johnson) @ 2006-08-10 17:28 ` Stephen Hemminger 2006-08-10 18:09 ` linux-os (Dick Johnson) 2006-08-12 19:21 ` Ben Greear 1 sibling, 1 reply; 9+ messages in thread From: Stephen Hemminger @ 2006-08-10 17:28 UTC (permalink / raw) To: linux-kernel On Thu, 10 Aug 2006 11:34:23 -0400 "linux-os \(Dick Johnson\)" <linux-os@analogic.com> wrote: > > Hello, > > Network throughput is seriously defective with linux-2.6.16.24 > if the length given to 'write()' is a large number. > > Given this code on a connected socket........ What protocol (TCP?) and what Ethernet hardware (does it support TSO)? Did you set non-blocking? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Network compatibility and performance 2006-08-10 17:28 ` Stephen Hemminger @ 2006-08-10 18:09 ` linux-os (Dick Johnson) 2006-08-10 18:14 ` Stephen Hemminger 0 siblings, 1 reply; 9+ messages in thread From: linux-os (Dick Johnson) @ 2006-08-10 18:09 UTC (permalink / raw) To: Stephen Hemminger; +Cc: linux-kernel On Thu, 10 Aug 2006, Stephen Hemminger wrote: > On Thu, 10 Aug 2006 11:34:23 -0400 > "linux-os \(Dick Johnson\)" <linux-os@analogic.com> wrote: > >> >> Hello, >> >> Network throughput is seriously defective with linux-2.6.16.24 >> if the length given to 'write()' is a large number. >> >> Given this code on a connected socket........ > > What protocol (TCP?) and what Ethernet hardware (does it support TSO)? > Did you set non-blocking? A connected TCP socket. The Ethernet hardware was also described (Intel using e1000 as shown) It's on PCI-X 133MHz, two devices on the motherboard, not really relevent because it worked previously as described. TSO? No ARPA virtual terminals here. They went away in 1972. The socket was set to non-blocking because the same socket is used for reading (not at the same time), using poll() to find when data are supposed to be available. BTW, read() code used to use poll() to find out when data were available, but if poll returned POLLIN, sometimes data would NOT be available and the code would hang <forever>. Therefore a work-around was to set the socket non-blocking. Under the conditions where poll() would return POLLIN and a read of a non-blocking socket returned no data, the errno was 3 (no such process) which seems really strange. Nevertheless, this has worked in the field for quite some time. There are no threads or child processes sharing data. Just a task receiving messages (commands) and then sending data (responses). Nothing in any user code overlaps. Cheers, Dick Johnson Penguin : Linux version 2.6.16.24 on an i686 machine (5592.62 BogoMips). New book: http://www.AbominableFirebug.com/ _ \x1a\x04 **************************************************************** The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Network compatibility and performance 2006-08-10 18:09 ` linux-os (Dick Johnson) @ 2006-08-10 18:14 ` Stephen Hemminger 2006-08-10 18:32 ` linux-os (Dick Johnson) 0 siblings, 1 reply; 9+ messages in thread From: Stephen Hemminger @ 2006-08-10 18:14 UTC (permalink / raw) To: linux-os (Dick Johnson); +Cc: linux-kernel On Thu, 10 Aug 2006 14:09:34 -0400 "linux-os \(Dick Johnson\)" <linux-os@analogic.com> wrote: > > On Thu, 10 Aug 2006, Stephen Hemminger wrote: > > > On Thu, 10 Aug 2006 11:34:23 -0400 > > "linux-os \(Dick Johnson\)" <linux-os@analogic.com> wrote: > > > >> > >> Hello, > >> > >> Network throughput is seriously defective with linux-2.6.16.24 > >> if the length given to 'write()' is a large number. > >> > >> Given this code on a connected socket........ > > > > What protocol (TCP?) and what Ethernet hardware (does it support TSO)? > > Did you set non-blocking? > > A connected TCP socket. The Ethernet hardware was also > described (Intel using e1000 as shown) It's on PCI-X 133MHz, two > devices on the motherboard, not really relevent because it worked > previously as described. TSO? TSO = TCP segmentation Offload, if you are using e1000 it gets enabled. Only slightly relevant to this, because it would change the timing. > They went away in 1972. The socket was set to non-blocking because the > same socket is used for reading (not at the same time), using poll() > to find when data are supposed to be available. BTW, read() code > used to use poll() to find out when data were available, but if > poll returned POLLIN, sometimes data would NOT be available and > the code would hang <forever>. Therefore a work-around was to set > the socket non-blocking. Under the conditions where poll() would > return POLLIN and a read of a non-blocking socket returned no data, Basic unix programming, errno only has meaning if system call returns -1. Basic network programming. If read returns 0 it means other side has disconnected. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Network compatibility and performance 2006-08-10 18:14 ` Stephen Hemminger @ 2006-08-10 18:32 ` linux-os (Dick Johnson) 0 siblings, 0 replies; 9+ messages in thread From: linux-os (Dick Johnson) @ 2006-08-10 18:32 UTC (permalink / raw) To: Stephen Hemminger; +Cc: linux-kernel On Thu, 10 Aug 2006, Stephen Hemminger wrote: > On Thu, 10 Aug 2006 14:09:34 -0400 > "linux-os \(Dick Johnson\)" <linux-os@analogic.com> wrote: > >> >> On Thu, 10 Aug 2006, Stephen Hemminger wrote: >> >>> On Thu, 10 Aug 2006 11:34:23 -0400 >>> "linux-os \(Dick Johnson\)" <linux-os@analogic.com> wrote: >>> >>>> >>>> Hello, >>>> >>>> Network throughput is seriously defective with linux-2.6.16.24 >>>> if the length given to 'write()' is a large number. >>>> >>>> Given this code on a connected socket........ >>> >>> What protocol (TCP?) and what Ethernet hardware (does it support TSO)? >>> Did you set non-blocking? >> >> A connected TCP socket. The Ethernet hardware was also >> described (Intel using e1000 as shown) It's on PCI-X 133MHz, two >> devices on the motherboard, not really relevent because it worked >> previously as described. TSO? > > TSO = TCP segmentation Offload, if you are using e1000 it gets enabled. > Only slightly relevant to this, because it would change the timing. > >> They went away in 1972. The socket was set to non-blocking because the >> same socket is used for reading (not at the same time), using poll() >> to find when data are supposed to be available. BTW, read() code >> used to use poll() to find out when data were available, but if >> poll returned POLLIN, sometimes data would NOT be available and >> the code would hang <forever>. Therefore a work-around was to set >> the socket non-blocking. Under the conditions where poll() would >> return POLLIN and a read of a non-blocking socket returned no data, > > Basic unix programming, errno only has meaning if system call returns -1. > True. So what? > Basic network programming. If read returns 0 it means other side > has disconnected. > So, the other side did not disconnect or shutdown. This is a previously reported problem that has required work-arounds because it has never been fixed. The read() problem may not be relevent to the spin-forever when the kernel doesn't try to send even 1440 bites of the 0x01000000 byte buffer. That's the new problem being reported at this time. Remember we (me and the others whose code I help review and sometimes fix) have been doing socket programming since the days BSD first added them to Unix, and have been comming up with work-arounds for things that keep changing ever since. We have socket programming running on Interactive Unix (now that's old), SCO Unix (sorry for the swear word), and SunOS. I am reporting that something has CHANGED and, even with a work-around in place, the result is a severe reduction in throughput. Cheers, Dick Johnson Penguin : Linux version 2.6.16.24 on an i686 machine (5592.62 BogoMips). New book: http://www.AbominableFirebug.com/ _ \x1a\x04 **************************************************************** The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Network compatibility and performance 2006-08-10 15:34 Network compatibility and performance linux-os (Dick Johnson) 2006-08-10 17:28 ` Stephen Hemminger @ 2006-08-12 19:21 ` Ben Greear 2006-08-14 11:30 ` linux-os (Dick Johnson) 1 sibling, 1 reply; 9+ messages in thread From: Ben Greear @ 2006-08-12 19:21 UTC (permalink / raw) To: linux-os (Dick Johnson); +Cc: Linux kernel linux-os (Dick Johnson) wrote: > Hello, > > Network throughput is seriously defective with linux-2.6.16.24 > if the length given to 'write()' is a large number. > > Given this code on a connected socket........ > > //-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > // > // Copyright(c) 2005 Analogic Corporation (rjohnson@analogic.com) > // > // This program may be distributed under the GNU Public License > // version 2, as published by the Free Software Foundation, Inc., > // 59 Temple Place, Suite 330 Boston, MA, 02111. > // > //-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > > #include <stdio.h> > #include <unistd.h> > #include <stdlib.h> > #include <stdint.h> > #include <signal.h> > #include <string.h> > #include <stdarg.h> > #include <errno.h> > #include <fcntl.h> > #include <netinet/in.h> > #include <netinet/tcp.h> > #include <sys/poll.h> > > #define BUF_LEN 0x1000 > #define FAIL -1 > > //-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- > // > // This sends a message that could exceed the size of the network buffers. > // It returns 0 if everything went okay, and FAIL if not. > // > int32_t sender(int32_t fd, void *buf, size_t len) > { > int32_t ret_val; > uint8_t *cp; > cp = (uint8_t *) buf; > while(len) { > if((ret_val = write(fd, cp, MIN(len, BUF_LEN))) == FAIL) { > if(errno == EAGAIN) > continue; > return ret_val; > } > len -= ret_val; > cp += ret_val; > } > return 0; > } > > It used to work quite well with: > > while(len) { > if((ret_val = write(fd, cp, len)) == FAIL) { > return ret_val; > } > len -= ret_val; > cp += ret_val; > } > > The network socket layer would return the amount of bytes > actually sent and the code would walk its way up through the > buffer. This was the expected behavior for many years. > > Then after about Linux-2.6.8, I needed to do: > > while(len) { > if((ret_val = write(fd, cp, len)) == FAIL) { > if(errno == EAGAIN) > continue; > return ret_val; > } > len -= ret_val; > cp += ret_val; > } > > This was because Linux would claim to run out of resources > even though there was nothing else running on the system. > > Now at Linux-2.6.16.24, the code needed to be further modified > to: > while(len) { > if((ret_val = write(fd, cp, MIN(len, 0x1000))) == FAIL) { > if(errno == EAGAIN) > continue; > return ret_val; > } > len -= ret_val; > cp += ret_val; > } In the case where you are getting EAGAIN, this is a busy-spin. You might want to sleep in a select() or similar call as soon as you get EAGAIN on this socket..or go off and do other work while the OS clears out the send queue. Also, from your description, this code should return 0 on success. It is returning 'ret_val' instead, which should be > 0. I have no idea why you need to add the MIN() logic..and that seems like something that should not be required. > ... or else it would spin <forever> returning 0 with no errno set. > In all cases, these problems exist when 'len' is a large value, perhaps > 0x01000000, much greater than Linux could ever find an available > buffer for. Linux used to send what it could. Now it will just fail to > send anything at all, returning 0 if it 'feels' like it doesn't want > to bother. This is not good. With the hacked code, the data throughput > is about 100,000 bytes per second on a dedicated link. The previous > code ran 112,000 bytes per second. Once the 'errno' happens, the > network stumbles to a measley 12,000 bytes per second. This > breaks our applications. Even 112kbps sucks on a decent network. What is the speed of your network, what protocol are you using, if tcp, what is the latency of your network? Ben ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Network compatibility and performance 2006-08-12 19:21 ` Ben Greear @ 2006-08-14 11:30 ` linux-os (Dick Johnson) 2006-08-14 21:25 ` Ben Greear 0 siblings, 1 reply; 9+ messages in thread From: linux-os (Dick Johnson) @ 2006-08-14 11:30 UTC (permalink / raw) To: Ben Greear; +Cc: Linux kernel On Sat, 12 Aug 2006, Ben Greear wrote: > linux-os (Dick Johnson) wrote: >> Hello, >> >> Network throughput is seriously defective with linux-2.6.16.24 >> if the length given to 'write()' is a large number. >> >> Given this code on a connected socket........ >> >> //-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= >> // >> // Copyright(c) 2005 Analogic Corporation (rjohnson@analogic.com) >> // >> // This program may be distributed under the GNU Public License >> // version 2, as published by the Free Software Foundation, Inc., >> // 59 Temple Place, Suite 330 Boston, MA, 02111. >> // >> //-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= >> >> #include <stdio.h> >> #include <unistd.h> >> #include <stdlib.h> >> #include <stdint.h> >> #include <signal.h> >> #include <string.h> >> #include <stdarg.h> >> #include <errno.h> >> #include <fcntl.h> >> #include <netinet/in.h> >> #include <netinet/tcp.h> >> #include <sys/poll.h> >> >> #define BUF_LEN 0x1000 >> #define FAIL -1 >> >> //-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- >> // >> // This sends a message that could exceed the size of the network buffers. >> // It returns 0 if everything went okay, and FAIL if not. >> // >> int32_t sender(int32_t fd, void *buf, size_t len) >> { >> int32_t ret_val; >> uint8_t *cp; >> cp = (uint8_t *) buf; >> while(len) { >> if((ret_val = write(fd, cp, MIN(len, BUF_LEN))) == FAIL) { >> if(errno == EAGAIN) >> continue; >> return ret_val; >> } >> len -= ret_val; >> cp += ret_val; >> } >> return 0; >> } >> >> It used to work quite well with: >> >> while(len) { >> if((ret_val = write(fd, cp, len)) == FAIL) { >> return ret_val; >> } >> len -= ret_val; >> cp += ret_val; >> } >> >> The network socket layer would return the amount of bytes >> actually sent and the code would walk its way up through the >> buffer. This was the expected behavior for many years. >> >> Then after about Linux-2.6.8, I needed to do: >> >> while(len) { >> if((ret_val = write(fd, cp, len)) == FAIL) { >> if(errno == EAGAIN) >> continue; >> return ret_val; >> } >> len -= ret_val; >> cp += ret_val; >> } >> >> This was because Linux would claim to run out of resources >> even though there was nothing else running on the system. >> >> Now at Linux-2.6.16.24, the code needed to be further modified >> to: >> while(len) { >> if((ret_val = write(fd, cp, MIN(len, 0x1000))) == FAIL) { >> if(errno == EAGAIN) >> continue; >> return ret_val; >> } >> len -= ret_val; >> cp += ret_val; >> } > > In the case where you are getting EAGAIN, this is a busy-spin. You > might want to sleep in a select() or similar call as soon as you get > EAGAIN on this socket..or go off and do other work while the OS clears > out the send queue. > > Also, from your description, this code should return 0 on success. It > is returning 'ret_val' instead, which should be > 0. No it will return FAIL (-1) or an error and 0 (the bottom of the procedure) if the whole things went. It is mandatory that the whole thing goes so this procedure should handle any intermediate actions. Upon your advice, I may try to add select() although, on a write it seems to be putting in user-space something that used to be handled quite well in the kernel. I don't think the user should really care about the kernel internals, whether or not the kernel happens to have a buffer available. > > I have no idea why you need to add the MIN() logic..and that seems like > something that should not be required. > It seems that some code 'thinks' that a large buffer of data is an error and won't even try to send some anymore. >> ... or else it would spin <forever> returning 0 with no errno set. >> In all cases, these problems exist when 'len' is a large value, perhaps >> 0x01000000, much greater than Linux could ever find an available >> buffer for. Linux used to send what it could. Now it will just fail to >> send anything at all, returning 0 if it 'feels' like it doesn't want >> to bother. This is not good. With the hacked code, the data throughput >> is about 100,000 bytes per second on a dedicated link. The previous >> code ran 112,000 bytes per second. Once the 'errno' happens, the >> network stumbles to a measley 12,000 bytes per second. This >> breaks our applications. > > Even 112kbps sucks on a decent network. What is the speed of your > network, what protocol are you using, if tcp, what is the latency > of your network? > The network is a single wire about 8 feet long, connecting Intel gigibit links on two identical computers (crossover cable). This link is TCP. For high-speed data, I use UDP and I get a higher throughput because there is no handshake. Thew latency is the latency of Linux. BTW, it's only a gigaBIT link, you can divide that by 8 for gigabytes. I don't know the actual bit-rate on the wires, if we assume 1GHz, the byte-rate is only 125,000 bytes per second. Being able to use 89.6 percent of that isn't bad at all. > Ben > > > Cheers, Dick Johnson Penguin : Linux version 2.6.16.24 on an i686 machine (5592.62 BogoMips). New book: http://www.AbominableFirebug.com/ _ \x1a\x04 **************************************************************** The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Network compatibility and performance 2006-08-14 11:30 ` linux-os (Dick Johnson) @ 2006-08-14 21:25 ` Ben Greear 2006-08-15 11:34 ` linux-os (Dick Johnson) 0 siblings, 1 reply; 9+ messages in thread From: Ben Greear @ 2006-08-14 21:25 UTC (permalink / raw) To: linux-os (Dick Johnson); +Cc: Linux kernel linux-os (Dick Johnson) wrote: > No it will return FAIL (-1) or an error and 0 (the bottom of the procedure) > if the whole things went. It is mandatory that the whole thing goes > so this procedure should handle any intermediate actions. I see..I missed that part. > Upon your advice, I may try to add select() although, on a write it > seems to be putting in user-space something that used to be handled > quite well in the kernel. I don't think the user should really care > about the kernel internals, whether or not the kernel happens to have > a buffer available. Since you put it in non-blocking mode, you need the select() to throttle unless you want to busy spin. Whether you should have to actually put in in non-blocking mode or not is a different question. >>I have no idea why you need to add the MIN() logic..and that seems like >>something that should not be required. >> > > It seems that some code 'thinks' that a large buffer of data is > an error and won't even try to send some anymore. I have seen a problem where I can repeatedly hang a TCP connection when running at high speed. The tx queue is full or mostly full, and on the wire I only see 200kpps of duplicate acks. Can't reproduce it with anything other than my big complicated proprietary app, so it remains unfixed. I am not sure if this is related to what you see or not..but could you check to see if there is lots of acks on the wire when this hang happens? >>Even 112kbps sucks on a decent network. What is the speed of your >>network, what protocol are you using, if tcp, what is the latency >>of your network? >> > > > The network is a single wire about 8 feet long, connecting Intel gigibit > links on two identical computers (crossover cable). This link is TCP. > For high-speed data, I use UDP and I get a higher throughput because > there is no handshake. Thew latency is the latency of Linux. BTW, it's > only a gigaBIT link, you can divide that by 8 for gigabytes. I don't > know the actual bit-rate on the wires, if we assume 1GHz, the byte-rate > is only 125,000 bytes per second. Being able to use 89.6 percent of > that isn't bad at all. You must be meaning to add a few more zeros to that number. If you are getting ~125,000,000 Bytes per second then you are doing OK. Ben ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Network compatibility and performance 2006-08-14 21:25 ` Ben Greear @ 2006-08-15 11:34 ` linux-os (Dick Johnson) 0 siblings, 0 replies; 9+ messages in thread From: linux-os (Dick Johnson) @ 2006-08-15 11:34 UTC (permalink / raw) To: Ben Greear; +Cc: Linux kernel On Mon, 14 Aug 2006, Ben Greear wrote: > linux-os (Dick Johnson) wrote: > >> No it will return FAIL (-1) or an error and 0 (the bottom of the procedure) >> if the whole things went. It is mandatory that the whole thing goes >> so this procedure should handle any intermediate actions. > > I see..I missed that part. > >> Upon your advice, I may try to add select() although, on a write it >> seems to be putting in user-space something that used to be handled >> quite well in the kernel. I don't think the user should really care >> about the kernel internals, whether or not the kernel happens to have >> a buffer available. > > Since you put it in non-blocking mode, you need the select() to throttle > unless you want to busy spin. Whether you should have to actually put > in in non-blocking mode or not is a different question. > >>> I have no idea why you need to add the MIN() logic..and that seems like >>> something that should not be required. >>> >> >> It seems that some code 'thinks' that a large buffer of data is >> an error and won't even try to send some anymore. > > I have seen a problem where I can repeatedly hang a TCP connection > when running at high speed. The tx queue is full or mostly full, and > on the wire I only see 200kpps of duplicate acks. Can't reproduce it > with anything other than my big complicated proprietary app, so it > remains unfixed. > > I am not sure if this is related to what you see or not..but could you > check to see if there is lots of acks on the wire when this hang happens? I will check to see what it's doing on the wire and get back. > >>> Even 112kbps sucks on a decent network. What is the speed of your >>> network, what protocol are you using, if tcp, what is the latency >>> of your network? >>> >> >> >> The network is a single wire about 8 feet long, connecting Intel gigibit >> links on two identical computers (crossover cable). This link is TCP. >> For high-speed data, I use UDP and I get a higher throughput because >> there is no handshake. Thew latency is the latency of Linux. BTW, it's >> only a gigaBIT link, you can divide that by 8 for gigabytes. I don't >> know the actual bit-rate on the wires, if we assume 1GHz, the byte-rate >> is only 125,000 bytes per second. Being able to use 89.6 percent of >> that isn't bad at all. > > You must be meaning to add a few more zeros to that number. If you > are getting ~125,000,000 Bytes per second then you are doing OK. > ACK that! > Ben > > Cheers, Dick Johnson Penguin : Linux version 2.6.16.24 on an i686 machine (5592.62 BogoMips). New book: http://www.AbominableFirebug.com/ _ \x1a\x04 **************************************************************** The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them. Thank you. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2006-08-15 11:34 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-08-10 15:34 Network compatibility and performance linux-os (Dick Johnson) 2006-08-10 17:28 ` Stephen Hemminger 2006-08-10 18:09 ` linux-os (Dick Johnson) 2006-08-10 18:14 ` Stephen Hemminger 2006-08-10 18:32 ` linux-os (Dick Johnson) 2006-08-12 19:21 ` Ben Greear 2006-08-14 11:30 ` linux-os (Dick Johnson) 2006-08-14 21:25 ` Ben Greear 2006-08-15 11:34 ` linux-os (Dick Johnson)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox