> On 21 Oct 2002, Harry Kalogirou wrote: > > > Mmm.. weird.. I probably got you tired with all this but can you try and > > see if the failures are realy random? A good aid at this the -p > > parameter of ping. > > 100 pings (200 packets) each of patterns 00, 55, aa, and ff had zero to > five errors, too few to account for the 100 percent failure rate of > certain webpage files. 55 had the most errors and was the only one with an > error in the pattern data. Most of the other errors were something about > the time-of-day going back; 00 had one extremely long response time > (1074131 mS). > > > I think I can now prove that there's at least one IP Header sum-with-carry > that results in a reproducible checksum error. I discovered that if the > ELKS IP address were 192.168.1.135, all my test files could be read; large > files required a few tries, but I was even able to read one 4369 (0x1111) > bytes long! The unique property of the packets that never got ACK'ed is > that their checksum-field contains 0xF6FF instead of the correct value > 0xF5FF (the complement of 0x0A00). > > Each of the webpage files that stall produces a defective packet with this > IP Header (the first twenty bytes of the packet): > 4500 003f 0000 0000 4006 f6ff c0a8 0164 c0a8 0205 > The corresponding packet in the 99-byte file is one byte shorter (003e > instead of 003f), consequently having a different IP Header Checksum > (f600 instead of the erroneous f6ff): > 4500 003e 0000 0000 4006 f600 c0a8 0164 c0a8 0205 > > Ping uses Protocol 01 instead of Protocol 06, so by changing the ELKS IP > address from 192.168.1.100 to 192.168.1.105 I was able to produce the > identical erroneous IP Header Checksum with the command: > ping -s 35 192.168.1.105 > resulting in the IP header: > 4500 003e 0000 0000 4001 f6ff c0a8 0169 c0a8 0205 > > To demonstrate that the problem is not the total packet size I added 1 to > the packetsize and subtracted 1 from the ELKS IP address: > ping -s 36 192.168.1.104 > resulting in the IP Header: > 4500 0040 0000 0000 4001 f6ff c0a8 0168 c0a8 0205 > > Just for symmetry, I produced the same checksum as that for the 99-byte > webpage file, but the same length as the 100- and 266 byte webpage files > with: > ping -s 35 192.168.1.104 > resulting in the IP Header: > 4500 003f 0000 0000 4001 f600 c0a8 0168 c0a8 0205 > > In all cases, the pings with the defective checksum had 100% loss, while > those with the good checksum succeeded. I didn't try manipulating the > source IP address (c0a8 0205 = 192.168.2.5). If you can manipulate the > packetsize and ELKS IP address so that the sum-with-carry of this header > sans checksum-field is 0x09C1 you should be able to reproduce my results; > otherwise it's probably a quirk in Red Hat 7.0 Linux (or you're using a > different version of some critical ELKS file). > I'll check and get back to you... > Note: I think bad packets comsume memory. After several unsuccessful > transfers I started seeing "Cannot fork" on the ELKS box when I issued > commands ... eventually I'd have to reboot it. It might be a good idea to > purge them after a minute or two. These are the web servers that wait for the data to be transmited, when they exit memory will be freed. > Does anything other than the system time depend upon the CMOS clock? It > obviously hasn't been read on any of the four machines on which I tried > ELKS (yes, they all *have* standard, working CMOS clocks). > I don't think so. Harry