* Re: webserver stalls [was Re: bug in (linux) slattach] [not found] <1035036158.454.17.camel@cool> @ 2002-10-20 9:34 ` jb1 2002-10-20 17:06 ` Harry Kalogirou 0 siblings, 1 reply; 14+ messages in thread From: jb1 @ 2002-10-20 9:34 UTC (permalink / raw) To: Harry Kalogirou; +Cc: Linux-8086 On 19 Oct 2002, Harry Kalogirou wrote: > > Since I had used ELKS for long time on my network and I hardly had > checksum errors, had other biger problems 8), I think that this has From what I've seen in the mailinglist archives, I think other people have had similar problems. They probably just gave up when no one answered their vague, sometimes irrelevant, questions. > something to do with the serial line altering bytes that ELKS transmits. > Can you send me the output of "stty -a -F /dev/ttySX" after you setup > the connection. Maybe the line is not corectly setup on linux side (XOFF > XON and stuff). After issuing "/bin/stty -F /dev/ttyS0 4800" on the Red Hat 7.0 Linux box, "stty -a -F /dev/ttyS0" displays: speed 4800 baud; rows 0; columns 0; line = 0; intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = <undef>; eol2 = <undef>; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R; werase = ^W; lnext = ^V; flush = ^O; min = 1; time = 0; -parenb -parodd cs8 hupcl -cstopb cread clocal -crtscts -ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl ixon -ixoff -iuclc -ixany -imaxbel opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0 isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop -echoprt echoctl echoke > If the above proves the setup of the serial line to be ok, then ELKS has > a problem. Maybe then the problem is at the assembly optimized checksum > routines I wrote. Disabling that by undefing USE_ASM in ip.c will show > that. Both the C and ASM routines in ip_calc_chksum() in elksnet/ktcp/ip.c from elksnet-0.1.1.tar.gz look like they should have worked correctly for the packet with the bad IP Header checksum. The C routine has a lurking bug; it doesn't account for a possible carry in return ~((sum & 0xffff) + ((sum >> 16) & 0xffff)); but even if USE_ASM were undefined it wouldn't have affected that packet. Even if my serial port handshaking is incorrecty set up, the difference between the packet that fails and the one that succeeds is trivial. The former is 63 bytes long with the data "Content-Length: 100^M^J^M^J"; the latter is 62 bytes long with the data "Content-Length: 99^M^J^M^J and (after the ACK by the linux box) is followed by a *successful* 139-byte packet containing the entire 99-byte webpage file. Also, the erroneous checksum is exactly the same and in exactly the same packet even for the 266-byte original file tcpdump'ed several days earlier ("Content-Length: 266^M^J^M^J"). The Linux box's /proc/cpuinfo says its AMD-K6 is running at 360.800 MHz; I wouldn't be surprised if it could run 4800 baud with no handshaking at all, and I didn't notice any XON or XOFF characters mixed in with the data. I wonder if there's any significance in the fact that the problem occurs precisely at the boundary between data obviously generated by the server, itself, and the contents of the webpage file. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: webserver stalls [was Re: bug in (linux) slattach] 2002-10-20 9:34 ` webserver stalls [was Re: bug in (linux) slattach] jb1 @ 2002-10-20 17:06 ` Harry Kalogirou 2002-10-21 9:44 ` jb1 0 siblings, 1 reply; 14+ messages in thread From: Harry Kalogirou @ 2002-10-20 17:06 UTC (permalink / raw) To: jb1; +Cc: Linux-8086 > On 19 Oct 2002, Harry Kalogirou wrote: > > > > > Since I had used ELKS for long time on my network and I hardly had > > checksum errors, had other biger problems 8), I think that this has > > >From what I've seen in the mailinglist archives, I think other people have > had similar problems. They probably just gave up when no one answered > their vague, sometimes irrelevant, questions. > > > something to do with the serial line altering bytes that ELKS transmits. > > Can you send me the output of "stty -a -F /dev/ttySX" after you setup > > the connection. Maybe the line is not corectly setup on linux side (XOFF > > XON and stuff). > > After issuing "/bin/stty -F /dev/ttyS0 4800" on the Red Hat 7.0 Linux box, > "stty -a -F /dev/ttyS0" displays: > speed 4800 baud; rows 0; columns 0; line = 0; > intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = <undef>; > eol2 = <undef>; start = ^Q; stop = ^S; susp = ^Z; rprnt = ^R; werase = ^W; > lnext = ^V; flush = ^O; min = 1; time = 0; > -parenb -parodd cs8 hupcl -cstopb cread clocal -crtscts > -ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl ixon > -ixoff > -iuclc -ixany -imaxbel > opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 > vt0 ff0 > isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop > -echoprt > echoctl echoke > As I suspected, misconfigued line. Configure your like this : intr = <undef>; quit = <undef>; erase = <undef>; kill = <undef>; eof = <undef>; eol = <undef>; eol2 = <undef>; start = <undef>; stop = <undef>; susp = <undef>; rprnt = <undef>; werase = <undef>; lnext = <undef>; flush = <undef>; min = 1; time = 0; -parenb -parodd cs8 hupcl -cstopb cread clocal -crtscts ignbrk -brkint ignpar -parmrk -inpck -istrip -inlcr -igncr -icrnl -ixon -ixoff -iuclc -ixany -imaxbel -opost -olcuc -ocrnl -onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0 -isig -icanon -iexten -echo -echoe -echok -echonl -noflsh -xcase -tostop -echoprt -echoctl -echoke Basicaly the above configuration is done by the -L parameter of slattach, except the -crtscts. What I do just to be sure is : # slattach -p [c]slip -L -e /dev/ttyS0 # stty -F /dev/ttyS0 -crtscts # slattach -p [c]slip -s 4800 -m /dev/ttyS0 & Harry ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: webserver stalls [was Re: bug in (linux) slattach] 2002-10-20 17:06 ` Harry Kalogirou @ 2002-10-21 9:44 ` jb1 2002-10-21 9:55 ` Harry Kalogirou 0 siblings, 1 reply; 14+ messages in thread From: jb1 @ 2002-10-21 9:44 UTC (permalink / raw) To: Harry Kalogirou; +Cc: Linux-8086 On 20 Oct 2002, Harry Kalogirou wrote: > > On 19 Oct 2002, Harry Kalogirou wrote: ... > As I suspected, misconfigued line. Configure your like this : > > intr = <undef>; quit = <undef>; erase = <undef>; kill = <undef>; eof = > <undef>; > eol = <undef>; eol2 = <undef>; start = <undef>; stop = <undef>; susp = > <undef>; > rprnt = <undef>; werase = <undef>; lnext = <undef>; flush = <undef>; > min = 1; time = 0; > -parenb -parodd cs8 hupcl -cstopb cread clocal -crtscts > ignbrk -brkint ignpar -parmrk -inpck -istrip -inlcr -igncr -icrnl -ixon > -ixoff > -iuclc -ixany -imaxbel > -opost -olcuc -ocrnl -onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 > bs0 vt0 > ff0 > -isig -icanon -iexten -echo -echoe -echok -echonl -noflsh -xcase -tostop > -echoprt -echoctl -echoke > > > Basicaly the above configuration is done by the -L parameter of > slattach, except the -crtscts. What I do just to be sure is : > > # slattach -p [c]slip -L -e /dev/ttyS0 > # stty -F /dev/ttyS0 -crtscts > # slattach -p [c]slip -s 4800 -m /dev/ttyS0 & I did as you suggested (with one exception), confirmed that the settings were exactly like yours, and found *no* difference; the 99-byte webpage file works, the 100-byte byte webpage files don't. The exception was: stty 4800 -F /dev/ttyS0 -crtscts because my slattach program doesn't seem to change the baud rate. Also, I still getting frequent seemingly-random errors when I ping the ELKS box. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: webserver stalls [was Re: bug in (linux) slattach] 2002-10-21 9:44 ` jb1 @ 2002-10-21 9:55 ` Harry Kalogirou 2002-10-22 10:16 ` jb1 0 siblings, 1 reply; 14+ messages in thread From: Harry Kalogirou @ 2002-10-21 9:55 UTC (permalink / raw) To: jb1; +Cc: Linux-8086 [-- Attachment #1: Type: text/plain, Size: 1788 bytes --] > On 20 Oct 2002, Harry Kalogirou wrote: > > > > On 19 Oct 2002, Harry Kalogirou wrote: > ... > > As I suspected, misconfigued line. Configure your like this : > > > > intr = <undef>; quit = <undef>; erase = <undef>; kill = <undef>; eof = > > <undef>; > > eol = <undef>; eol2 = <undef>; start = <undef>; stop = <undef>; susp = > > <undef>; > > rprnt = <undef>; werase = <undef>; lnext = <undef>; flush = <undef>; > > min = 1; time = 0; > > -parenb -parodd cs8 hupcl -cstopb cread clocal -crtscts > > ignbrk -brkint ignpar -parmrk -inpck -istrip -inlcr -igncr -icrnl -ixon > > -ixoff > > -iuclc -ixany -imaxbel > > -opost -olcuc -ocrnl -onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 > > bs0 vt0 > > ff0 > > -isig -icanon -iexten -echo -echoe -echok -echonl -noflsh -xcase -tostop > > -echoprt -echoctl -echoke > > > > > > Basicaly the above configuration is done by the -L parameter of > > slattach, except the -crtscts. What I do just to be sure is : > > > > # slattach -p [c]slip -L -e /dev/ttyS0 > > # stty -F /dev/ttyS0 -crtscts > > # slattach -p [c]slip -s 4800 -m /dev/ttyS0 & > > I did as you suggested (with one exception), confirmed that the settings > were exactly like yours, and found *no* difference; the 99-byte webpage > file works, the 100-byte byte webpage files don't. The exception was: > stty 4800 -F /dev/ttyS0 -crtscts > because my slattach program doesn't seem to change the baud rate. Also, > I still getting frequent seemingly-random errors when I ping the ELKS box. Mmm.. weird.. I probably got you tired with all this but can you try and see if the failures are realy random? A good aid at this the -p parameter of ping. I'm just convinsed that there is a problem after the packets leave ELKS. Harry [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: webserver stalls [was Re: bug in (linux) slattach] 2002-10-21 9:55 ` Harry Kalogirou @ 2002-10-22 10:16 ` jb1 2002-10-22 13:56 ` Harry Kalogirou 2002-10-22 13:57 ` [SOLVED] " Harry Kalogirou 0 siblings, 2 replies; 14+ messages in thread From: jb1 @ 2002-10-22 10:16 UTC (permalink / raw) To: Harry Kalogirou; +Cc: Linux-8086 On 21 Oct 2002, Harry Kalogirou wrote: > Mmm.. weird.. I probably got you tired with all this but can you try and > see if the failures are realy random? A good aid at this the -p > parameter of ping. 100 pings (200 packets) each of patterns 00, 55, aa, and ff had zero to five errors, too few to account for the 100 percent failure rate of certain webpage files. 55 had the most errors and was the only one with an error in the pattern data. Most of the other errors were something about the time-of-day going back; 00 had one extremely long response time (1074131 mS). I think I can now prove that there's at least one IP Header sum-with-carry that results in a reproducible checksum error. I discovered that if the ELKS IP address were 192.168.1.135, all my test files could be read; large files required a few tries, but I was even able to read one 4369 (0x1111) bytes long! The unique property of the packets that never got ACK'ed is that their checksum-field contains 0xF6FF instead of the correct value 0xF5FF (the complement of 0x0A00). Each of the webpage files that stall produces a defective packet with this IP Header (the first twenty bytes of the packet): 4500 003f 0000 0000 4006 f6ff c0a8 0164 c0a8 0205 The corresponding packet in the 99-byte file is one byte shorter (003e instead of 003f), consequently having a different IP Header Checksum (f600 instead of the erroneous f6ff): 4500 003e 0000 0000 4006 f600 c0a8 0164 c0a8 0205 Ping uses Protocol 01 instead of Protocol 06, so by changing the ELKS IP address from 192.168.1.100 to 192.168.1.105 I was able to produce the identical erroneous IP Header Checksum with the command: ping -s 35 192.168.1.105 resulting in the IP header: 4500 003e 0000 0000 4001 f6ff c0a8 0169 c0a8 0205 To demonstrate that the problem is not the total packet size I added 1 to the packetsize and subtracted 1 from the ELKS IP address: ping -s 36 192.168.1.104 resulting in the IP Header: 4500 0040 0000 0000 4001 f6ff c0a8 0168 c0a8 0205 Just for symmetry, I produced the same checksum as that for the 99-byte webpage file, but the same length as the 100- and 266 byte webpage files with: ping -s 35 192.168.1.104 resulting in the IP Header: 4500 003f 0000 0000 4001 f600 c0a8 0168 c0a8 0205 In all cases, the pings with the defective checksum had 100% loss, while those with the good checksum succeeded. I didn't try manipulating the source IP address (c0a8 0205 = 192.168.2.5). If you can manipulate the packetsize and ELKS IP address so that the sum-with-carry of this header sans checksum-field is 0x09C1 you should be able to reproduce my results; otherwise it's probably a quirk in Red Hat 7.0 Linux (or you're using a different version of some critical ELKS file). SOURCE PACKAGES: elks-0.1.1.tar.gz, elkscmd_20020501.tar.gz, elksnet-0.1.1.tar.gz, Dev86src-0.16.0.tar.gz CVS PATCHES: (none) COMPILED UNDER: Red Hat 7.0 Linux, kernel 2.2.16-22 Note: I think bad packets comsume memory. After several unsuccessful transfers I started seeing "Cannot fork" on the ELKS box when I issued commands ... eventually I'd have to reboot it. It might be a good idea to purge them after a minute or two. Does anything other than the system time depend upon the CMOS clock? It obviously hasn't been read on any of the four machines on which I tried ELKS (yes, they all *have* standard, working CMOS clocks). By the way, I received two copies of this message in addition to the copy sent from the mailing list. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: webserver stalls [was Re: bug in (linux) slattach] 2002-10-22 10:16 ` jb1 @ 2002-10-22 13:56 ` Harry Kalogirou 2002-10-22 13:57 ` [SOLVED] " Harry Kalogirou 1 sibling, 0 replies; 14+ messages in thread From: Harry Kalogirou @ 2002-10-22 13:56 UTC (permalink / raw) To: jb1; +Cc: Linux-8086 [-- Attachment #1: Type: text/plain, Size: 3597 bytes --] > On 21 Oct 2002, Harry Kalogirou wrote: > > > Mmm.. weird.. I probably got you tired with all this but can you try and > > see if the failures are realy random? A good aid at this the -p > > parameter of ping. > > 100 pings (200 packets) each of patterns 00, 55, aa, and ff had zero to > five errors, too few to account for the 100 percent failure rate of > certain webpage files. 55 had the most errors and was the only one with an > error in the pattern data. Most of the other errors were something about > the time-of-day going back; 00 had one extremely long response time > (1074131 mS). > > > I think I can now prove that there's at least one IP Header sum-with-carry > that results in a reproducible checksum error. I discovered that if the > ELKS IP address were 192.168.1.135, all my test files could be read; large > files required a few tries, but I was even able to read one 4369 (0x1111) > bytes long! The unique property of the packets that never got ACK'ed is > that their checksum-field contains 0xF6FF instead of the correct value > 0xF5FF (the complement of 0x0A00). > > Each of the webpage files that stall produces a defective packet with this > IP Header (the first twenty bytes of the packet): > 4500 003f 0000 0000 4006 f6ff c0a8 0164 c0a8 0205 > The corresponding packet in the 99-byte file is one byte shorter (003e > instead of 003f), consequently having a different IP Header Checksum > (f600 instead of the erroneous f6ff): > 4500 003e 0000 0000 4006 f600 c0a8 0164 c0a8 0205 > > Ping uses Protocol 01 instead of Protocol 06, so by changing the ELKS IP > address from 192.168.1.100 to 192.168.1.105 I was able to produce the > identical erroneous IP Header Checksum with the command: > ping -s 35 192.168.1.105 > resulting in the IP header: > 4500 003e 0000 0000 4001 f6ff c0a8 0169 c0a8 0205 > > To demonstrate that the problem is not the total packet size I added 1 to > the packetsize and subtracted 1 from the ELKS IP address: > ping -s 36 192.168.1.104 > resulting in the IP Header: > 4500 0040 0000 0000 4001 f6ff c0a8 0168 c0a8 0205 > > Just for symmetry, I produced the same checksum as that for the 99-byte > webpage file, but the same length as the 100- and 266 byte webpage files > with: > ping -s 35 192.168.1.104 > resulting in the IP Header: > 4500 003f 0000 0000 4001 f600 c0a8 0168 c0a8 0205 > > In all cases, the pings with the defective checksum had 100% loss, while > those with the good checksum succeeded. I didn't try manipulating the > source IP address (c0a8 0205 = 192.168.2.5). If you can manipulate the > packetsize and ELKS IP address so that the sum-with-carry of this header > sans checksum-field is 0x09C1 you should be able to reproduce my results; > otherwise it's probably a quirk in Red Hat 7.0 Linux (or you're using a > different version of some critical ELKS file). > I'll check and get back to you... > Note: I think bad packets comsume memory. After several unsuccessful > transfers I started seeing "Cannot fork" on the ELKS box when I issued > commands ... eventually I'd have to reboot it. It might be a good idea to > purge them after a minute or two. These are the web servers that wait for the data to be transmited, when they exit memory will be freed. > Does anything other than the system time depend upon the CMOS clock? It > obviously hasn't been read on any of the four machines on which I tried > ELKS (yes, they all *have* standard, working CMOS clocks). > I don't think so. Harry [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* [SOLVED] Re: webserver stalls [was Re: bug in (linux) slattach] 2002-10-22 10:16 ` jb1 2002-10-22 13:56 ` Harry Kalogirou @ 2002-10-22 13:57 ` Harry Kalogirou 2002-10-22 16:02 ` Harry Kalogirou 1 sibling, 1 reply; 14+ messages in thread From: Harry Kalogirou @ 2002-10-22 13:57 UTC (permalink / raw) To: jb1; +Cc: Linux-8086 > On 21 Oct 2002, Harry Kalogirou wrote: > > > Mmm.. weird.. I probably got you tired with all this but can you try and > > see if the failures are realy random? A good aid at this the -p > > parameter of ping. > > 100 pings (200 packets) each of patterns 00, 55, aa, and ff had zero to > five errors, too few to account for the 100 percent failure rate of > certain webpage files. 55 had the most errors and was the only one with an > error in the pattern data. Most of the other errors were something about > the time-of-day going back; 00 had one extremely long response time > (1074131 mS). > > > I think I can now prove that there's at least one IP Header sum-with-carry > that results in a reproducible checksum error. I discovered that if the > ELKS IP address were 192.168.1.135, all my test files could be read; large > files required a few tries, but I was even able to read one 4369 (0x1111) > bytes long! The unique property of the packets that never got ACK'ed is > that their checksum-field contains 0xF6FF instead of the correct value > 0xF5FF (the complement of 0x0A00). > > Each of the webpage files that stall produces a defective packet with this > IP Header (the first twenty bytes of the packet): > 4500 003f 0000 0000 4006 f6ff c0a8 0164 c0a8 0205 > The corresponding packet in the 99-byte file is one byte shorter (003e > instead of 003f), consequently having a different IP Header Checksum > (f600 instead of the erroneous f6ff): > 4500 003e 0000 0000 4006 f600 c0a8 0164 c0a8 0205 > > Ping uses Protocol 01 instead of Protocol 06, so by changing the ELKS IP > address from 192.168.1.100 to 192.168.1.105 I was able to produce the > identical erroneous IP Header Checksum with the command: > ping -s 35 192.168.1.105 > resulting in the IP header: > 4500 003e 0000 0000 4001 f6ff c0a8 0169 c0a8 0205 > > To demonstrate that the problem is not the total packet size I added 1 to > the packetsize and subtracted 1 from the ELKS IP address: > ping -s 36 192.168.1.104 > resulting in the IP Header: > 4500 0040 0000 0000 4001 f6ff c0a8 0168 c0a8 0205 > > Just for symmetry, I produced the same checksum as that for the 99-byte > webpage file, but the same length as the 100- and 266 byte webpage files > with: > ping -s 35 192.168.1.104 > resulting in the IP Header: > 4500 003f 0000 0000 4001 f600 c0a8 0168 c0a8 0205 > > In all cases, the pings with the defective checksum had 100% loss, while > those with the good checksum succeeded. I didn't try manipulating the > source IP address (c0a8 0205 = 192.168.2.5). If you can manipulate the > packetsize and ELKS IP address so that the sum-with-carry of this header > sans checksum-field is 0x09C1 you should be able to reproduce my results; > otherwise it's probably a quirk in Red Hat 7.0 Linux (or you're using a > different version of some critical ELKS file). > Ok the quest is over. After all it was a problem of the checksum functions writen in assembly! Did you try with USE_ASM undefined? Anyway it works now and I commited it to the CVS. Thank you very much for all your efford! Nice work. Harry ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [SOLVED] Re: webserver stalls [was Re: bug in (linux) slattach] 2002-10-22 13:57 ` [SOLVED] " Harry Kalogirou @ 2002-10-22 16:02 ` Harry Kalogirou 2002-10-23 9:37 ` jb1 2002-10-29 10:25 ` jb1 0 siblings, 2 replies; 14+ messages in thread From: Harry Kalogirou @ 2002-10-22 16:02 UTC (permalink / raw) To: Harry Kalogirou; +Cc: jb1, Linux-8086 [-- Attachment #1: Type: text/plain, Size: 386 bytes --] > Ok the quest is over. > > After all it was a problem of the checksum functions writen in assembly! > Did you try with USE_ASM undefined? Anyway it works now and I commited > it to the CVS. > > Thank you very much for all your efford! Nice work. > > Harry Actualy the quest is over now... as previously I managed to commit half the patch to the CVS... Harry [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [SOLVED] Re: webserver stalls [was Re: bug in (linux) slattach] 2002-10-22 16:02 ` Harry Kalogirou @ 2002-10-23 9:37 ` jb1 2002-10-23 11:42 ` Harry Kalogirou 2002-10-29 10:25 ` jb1 1 sibling, 1 reply; 14+ messages in thread From: jb1 @ 2002-10-23 9:37 UTC (permalink / raw) To: Harry Kalogirou; +Cc: Harry Kalogirou, Linux-8086 On 22 Oct 2002, Harry Kalogirou wrote: > Actualy the quest is over now... as previously I managed to commit half > the patch to the CVS... Maybe not. I found ip.c Version 1.9 by browsing the CVS repository and, as far as I can tell, the only change was that you moved the first "dec cx"; this will have *no* effect. The algorithm can still fail if the carry flag happens to be set going into the routine, or if there is a carry generated the last time "adc [di]" is executed. I suggest something like this for _ip_calc_chksum: push bp mov bp,sp push di mov cx, 6[bp] sar cx, 1 dec cx xor ax,ax ; clear carry flag (as well as AX) mov di, 4[bp] mov ax, [di] inc di inc di loop1: adc ax, [di] inc di inc di loop loop1; ; a byte shorter and a clock faster ; than DEC CX/JNZ LOOP1 adc ax,0 ; add (just) the final carry not ax pop di pop bp ret Of course, this algorithm is valid only if the length (6[bp]) is an even number of bytes. While this is always true for IP headers, for TCP packet checksums there would have to be a final test of the length's low bit and appropriate handling of an additional odd byte. I ran the original routine on my "defective" packet IP Header using MSDOS' "debug" (with the carry initially clear and the data byte-swapped in memory) and got the correct checksum. Were there any other updated files I should have downloaded from the CVS repository? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [SOLVED] Re: webserver stalls [was Re: bug in (linux) slattach] 2002-10-23 9:37 ` jb1 @ 2002-10-23 11:42 ` Harry Kalogirou 2002-10-24 8:55 ` jb1 0 siblings, 1 reply; 14+ messages in thread From: Harry Kalogirou @ 2002-10-23 11:42 UTC (permalink / raw) To: jb1; +Cc: Linux-8086 > Maybe not. I found ip.c Version 1.9 by browsing the CVS repository and, as > far as I can tell, the only change was that you moved the first "dec cx"; > this will have *no* effect. The algorithm can still fail if the carry flag > happens to be set going into the routine, or if there is a carry generated > the last time "adc [di]" is executed. I suggest something like this for > _ip_calc_chksum: > > push bp > mov bp,sp > push di > > mov cx, 6[bp] > sar cx, 1 > dec cx > xor ax,ax ; clear carry flag (as well as AX) > mov di, 4[bp] > mov ax, [di] > inc di > inc di > loop1: > adc ax, [di] > inc di > inc di > > loop loop1; ; a byte shorter and a clock faster > ; than DEC CX/JNZ LOOP1 > > adc ax,0 ; add (just) the final carry > not ax > > pop di > pop bp > > ret > You can't be more right 8). I just thought I could get away without opening the 8086 instruction manual, and I just made bad assumptions about when the carry flag is cleared. The CVS now contains all your bugfixes (clear carry before entering the loop, adding last carry), the use of "loop" and I also unrolled the loop once. A code review would be gladly appreciated. > Of course, this algorithm is valid only if the length (6[bp]) is an even > number of bytes. While this is always true for IP headers, for TCP packet > checksums there would have to be a final test of the length's low bit and > appropriate handling of an additional odd byte. TCP uses another routine. Harry ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [SOLVED] Re: webserver stalls [was Re: bug in (linux) slattach] 2002-10-23 11:42 ` Harry Kalogirou @ 2002-10-24 8:55 ` jb1 0 siblings, 0 replies; 14+ messages in thread From: jb1 @ 2002-10-24 8:55 UTC (permalink / raw) To: Harry Kalogirou; +Cc: Linux-8086 On 23 Oct 2002, Harry Kalogirou wrote: > The CVS now contains all your bugfixes (clear carry before entering the loop, > adding last carry), the use of "loop" and I also unrolled the > loop once. A code review would be gladly appreciated. The file ip.c Version 1.10 from the CVS repository looks good. I haven't tried it yet, but a "toy" version of _ip_calc_chksum runs correctly in DEBUG under MSDOS. There's a trivial change I'd suggest: "SAR CX,1" to "SHR CX,1". SAR retains the high bit's value (for signed arithmetic), whereas SHR shifts a zero into the high bit. Since the IP Internet Header Length from which the length is derived can be no more that 15, and both inctructions are two bytes and take two clocks, this is just defensive programming against a spurious call with a length greater than 32767. Also, the copyright date is still last year's. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [SOLVED] Re: webserver stalls [was Re: bug in (linux) slattach] 2002-10-22 16:02 ` Harry Kalogirou 2002-10-23 9:37 ` jb1 @ 2002-10-29 10:25 ` jb1 2002-10-29 12:37 ` Harry Kalogirou 1 sibling, 1 reply; 14+ messages in thread From: jb1 @ 2002-10-29 10:25 UTC (permalink / raw) To: Harry Kalogirou; +Cc: Linux-8086 On 22 Oct 2002, Harry Kalogirou wrote: > > Ok the quest is over. Not yet. I think _tcp_chksumraw in tcp_output.c needs the same fixes as those you applied to _tcp_chksum. Without them I still got partial files with telnet/get. There's *still* something wrong, but it shows up most frequently when I urlget from one ELKS box to another (yes, they have different IP addresses). Rarely, all goes as it should; more often, the entire file comes in a reasonable time, but I never get the command prompt; often, nothing comes in and I never get the command prompt. Once, nothing seemed to happen for about 10 minutes, but when I checked the machines about 10 minutes later, the file had come in but there was no command prompt. I had enabled a second getty on that machine, so I was able to log in and run netstat on both machines while urlget was hung. Here are the results (about an hour later): On the client ("urlget") machine (192.168.1.100) -- 1 ESTABLISHED 4000ms 1025 0.0.0.0 2 2 ESTABLISHED 2400ms 1024 192.168.1.144 80 3 LISTEN 4000MS 80 0.0.0.0 0 On the server ("sender") (1.2.168.1.144) -- 1 ESTABLISHED 4000ms 1024 0.0.0.0 2 2 LISTEN 4000ms 80 0.0.0.0 0 Obviously, the server has discarded the connection, but the client machine thinks it's still connected. I'm also not sure the client port number (the one that's 1024 or greater) is handled properly. Each time I connect from a Linux box the port number is incremented, but once I observered that a first, successful, connection from ELKS was from port 1024, and the next, hanging, attempt was *also* port 1024. Connections from Linux usually, but not always, work; connections from ELKS rarely work. Diagnosing this stuff is very time-consuming because "kill" doesn't seem to do anything, so I must reboot both machines. Since ELKS "telnet" doesn't do anything but connect (and logs me out when it terminates!) I can't compare telnet from Linux and ELKS. I can only compare "urlget" from both systems, and since there's no tcpdump for ELKS I can't even determing if a failure is actually due to urlget. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [SOLVED] Re: webserver stalls [was Re: bug in (linux) slattach] 2002-10-29 10:25 ` jb1 @ 2002-10-29 12:37 ` Harry Kalogirou 0 siblings, 0 replies; 14+ messages in thread From: Harry Kalogirou @ 2002-10-29 12:37 UTC (permalink / raw) To: jb1; +Cc: Linux-8086 > On 22 Oct 2002, Harry Kalogirou wrote: > > > > Ok the quest is over. > > Not yet. I think _tcp_chksumraw in tcp_output.c needs the same fixes as > those you applied to _tcp_chksum. Without them I still got partial files > with telnet/get. It is fixed. > There's *still* something wrong, but it shows up most frequently when I > urlget from one ELKS box to another (yes, they have different IP > addresses). Rarely, all goes as it should; more often, the entire file > comes in a reasonable time, but I never get the command prompt; often, > nothing comes in and I never get the command prompt. Once, nothing seemed > to happen for about 10 minutes, but when I checked the machines about 10 > minutes later, the file had come in but there was no command prompt. I had > enabled a second getty on that machine, so I was able to log in and run > netstat on both machines while urlget was hung. Here are the results > (about an hour later): > > On the client ("urlget") machine (192.168.1.100) -- > 1 ESTABLISHED 4000ms 1025 0.0.0.0 2 > 2 ESTABLISHED 2400ms 1024 192.168.1.144 80 > 3 LISTEN 4000MS 80 0.0.0.0 0 > > > On the server ("sender") (1.2.168.1.144) -- > 1 ESTABLISHED 4000ms 1024 0.0.0.0 2 > 2 LISTEN 4000ms 80 0.0.0.0 0 > > Obviously, the server has discarded the connection, but the client machine > thinks it's still connected. > > I'm also not sure the client port number (the one that's 1024 or greater) > is handled properly. Each time I connect from a Linux box the port number > is incremented, but once I observered that a first, successful, connection > from ELKS was from port 1024, and the next, hanging, attempt was *also* > port 1024. Connections from Linux usually, but not always, work; > connections from ELKS rarely work. ELKS reuses the last used port if it is not still in use. I don't think that this is a problem. > Diagnosing this stuff is very time-consuming because "kill" doesn't seem > to do anything, so I must reboot both machines. Since ELKS "telnet" The kernel in the CVS will probably handle this more gracefully and actualy the process. > doesn't do anything but connect (and logs me out when it terminates!) I > can't compare telnet from Linux and ELKS. I can only compare "urlget" from > both systems, and since there's no tcpdump for ELKS I can't even determing > if a failure is actually due to urlget. You mean that you do "telnet bla.bla 80" and after you connect you can't do "get /"? Harry ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <Pine.LNX.4.33.0210300110270.32451-100000@olympus.btstream.com>]
* Re: [SOLVED] Re: webserver stalls [was Re: bug in (linux) slattach] [not found] <Pine.LNX.4.33.0210300110270.32451-100000@olympus.btstream.com> @ 2002-10-30 10:31 ` Harry Kalogirou 0 siblings, 0 replies; 14+ messages in thread From: Harry Kalogirou @ 2002-10-30 10:31 UTC (permalink / raw) To: jb1; +Cc: Linux-8086 > On 29 Oct 2002, Harry Kalogirou wrote: > > > > Diagnosing this stuff is very time-consuming because "kill" doesn't seem > > > to do anything, so I must reboot both machines. Since ELKS "telnet" > > > > The kernel in the CVS will probably handle this more gracefully and > > actualy the process. > > I don't understand your reply. When I issue "kill <process id>" from the > command line the process is still reported by "ps" and there's no evidence > that the process has actually been killed. I think kill.c calls the > kernel function. > Ofcource the "kill" calles the kernel and the kernel kills the process. So try the latest kernel from the CVS and you might get better results. Harry ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2002-10-30 10:31 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1035036158.454.17.camel@cool>
2002-10-20 9:34 ` webserver stalls [was Re: bug in (linux) slattach] jb1
2002-10-20 17:06 ` Harry Kalogirou
2002-10-21 9:44 ` jb1
2002-10-21 9:55 ` Harry Kalogirou
2002-10-22 10:16 ` jb1
2002-10-22 13:56 ` Harry Kalogirou
2002-10-22 13:57 ` [SOLVED] " Harry Kalogirou
2002-10-22 16:02 ` Harry Kalogirou
2002-10-23 9:37 ` jb1
2002-10-23 11:42 ` Harry Kalogirou
2002-10-24 8:55 ` jb1
2002-10-29 10:25 ` jb1
2002-10-29 12:37 ` Harry Kalogirou
[not found] <Pine.LNX.4.33.0210300110270.32451-100000@olympus.btstream.com>
2002-10-30 10:31 ` Harry Kalogirou
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox