netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.4.18 userspace seeing UDP packets with bad checksum?
@ 2002-07-30 10:48 bert hubert
  2002-07-30 12:40 ` kuznet
  2002-07-30 12:57 ` Andi Kleen
  0 siblings, 2 replies; 11+ messages in thread
From: bert hubert @ 2002-07-30 10:48 UTC (permalink / raw)
  To: netdev

I'm under the strong impression that 2.4.18 lets userspace see packets with
incorrect UDP checksums. The packet attached arrived in userspace after
travelling all the way from Taiwan.

Is this policy?

Regards,

bert

-- 
http://www.PowerDNS.com          Versatile DNS Software & Services
http://www.tk                              the dot in .tk
http://lartc.org           Linux Advanced Routing & Traffic Control HOWTO

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.4.18 userspace seeing UDP packets with bad checksum?
  2002-07-30 10:48 2.4.18 userspace seeing UDP packets with bad checksum? bert hubert
@ 2002-07-30 12:40 ` kuznet
  2002-07-30 13:14   ` 3c59x " bert hubert
  2002-07-30 12:57 ` Andi Kleen
  1 sibling, 1 reply; 11+ messages in thread
From: kuznet @ 2002-07-30 12:40 UTC (permalink / raw)
  To: bert hubert; +Cc: netdev

Hello!

> I'm under the strong impression that 2.4.18 lets userspace see packets with
> incorrect UDP checksums.

How did you get this impression?

> Is this policy?

This is impossible unless requested explicitly with SO_NO_CHECK
or a buggy hardware incorrectly reports checksum is valid.

Alexey

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 2.4.18 userspace seeing UDP packets with bad checksum?
  2002-07-30 10:48 2.4.18 userspace seeing UDP packets with bad checksum? bert hubert
  2002-07-30 12:40 ` kuznet
@ 2002-07-30 12:57 ` Andi Kleen
  1 sibling, 0 replies; 11+ messages in thread
From: Andi Kleen @ 2002-07-30 12:57 UTC (permalink / raw)
  To: bert hubert; +Cc: netdev

On Tue, Jul 30, 2002 at 12:48:15PM +0200, bert hubert wrote:
> I'm under the strong impression that 2.4.18 lets userspace see packets with
> incorrect UDP checksums. The packet attached arrived in userspace after
> travelling all the way from Taiwan.
> 
> Is this policy?

Yes. The checksum is checked while doing the copy to user space. When the
checksum fails you still have the already done copy, the kernel does not 
try to undo it.

-Andi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* 3c59x 2.4.18 userspace seeing UDP packets with bad checksum?
  2002-07-30 12:40 ` kuznet
@ 2002-07-30 13:14   ` bert hubert
  2002-07-30 13:31     ` kuznet
  0 siblings, 1 reply; 11+ messages in thread
From: bert hubert @ 2002-07-30 13:14 UTC (permalink / raw)
  To: kuznet; +Cc: netdev, akpm, jgarzik, becker

On Tue, Jul 30, 2002 at 04:40:38PM +0400, kuznet@ms2.inr.ac.ru wrote:
> Hello!
> 
> > I'm under the strong impression that 2.4.18 lets userspace see packets with
> > incorrect UDP checksums.
> 
> How did you get this impression?

The hardware:

3c59x: Donald Becker and others. www.scyld.com/network/vortex.html
01:02.0: 3Com PCI 3c905C Tornado at 0xd800. Vers LK1.1.16
01:02.0 Ethernet controller: 3Com Corporation 3c905C-TX [Fast Etherlink]
(rev 78)

The packet is subtly corrupted and contains an invalid DNS label which our
nameserver tripped over (oops). It looks like a single byte error. PowerDNS
lives inside a wrapper, once every second the wrapper calls wait(), to see
if the child is well:

Jul 30 07:04:44 knife pdns-powerdns[2983]: Our pdns instance (6595) exited
after signal 11

These are the relevant packets, grouped by question/answer.

07:04:42.902162 200.171.175.165.14760 > 213.244.168.217.53:  [udp sum ok]
4170 CNAME? ifm.com.br. [|domain] (ttl 112, id 41767, len 56)

07:04:42.902198 213.244.168.217.53 > 200.171.175.165.14760:  [udp sum ok]
4170*- q: CNAME? ifm.com.br. 0/0/0 (28) (DF) (ttl 64, id 0, len 56)

==

07:04:43.147215 202.239.113.18.56146 > 213.244.168.217.53:  [udp sum ok]
32295 A? failte.powernap.org. [|domain] (ttl 241, id 11501, len 65)

07:04:43.149494 213.244.168.217.53 > 202.239.113.18.56146:  [udp sum ok]
32295*- q: A? failte.powernap.org. 1/0/0 failte.powernap.org. A 213.106.2.65
(53) (DF) (ttl 64, id 0, len 81)

==

This is the packet I mean. Note that no answers are sent out after this
one:

07:04:43.505166 61.222.31.205.62361 > 213.244.168.217.53:  [bad udp cksum
25f1!] 49 op5 [2a][|domain] (ttl 112, id 51330, len 109)

== 

07:04:43.698853 194.25.2.147.34441 > 213.244.168.217.53:  [udp sum ok] 53889
[1au] AAAA? DNS-EU1.POWERDNS.NET. . OPT  UDPsize=4096 (49) (DF) (ttl 248, id
23358, len 77)

==

07:04:43.699074 194.25.2.147.34441 > 213.244.168.217.53:  [udp sum ok] 32420
[1au] A6 ? DNS-EU1.POWERDNS.NET. . OPT  UDPsize=4096 (49) (DF) (ttl 248, id
23359, len 77)

==

Until a few seconds later when the parent respawns a new PowerDNS.

> > Is this policy?
> 
> This is impossible unless requested explicitly with SO_NO_CHECK
> or a buggy hardware incorrectly reports checksum is valid.

We don't supply SO_NO_CHECK. As the driver source mentions hardware
checksumming I've cc'd in Andrew, Donald & Jeff.

Regarding Andi's message, isn't it so that recvfrom() may return but in that
case returns -1 and sets errno to EAGAIN? I've seen that when trying to
reproduce this bug by using tcpreplay.

Anything I can do to help, let me know. I get in the order of 20 of these
corrupted packets each night from our Taiwanese friends at Hinet, and only
at night.

Netstat -s output after 81 days:

Udp:
    112974784 packets received
    10386 packets to unknown port received.
    3840 packet receive errors
    112889199 packets sent

Regards,

bert

-- 
http://www.PowerDNS.com          Versatile DNS Software & Services
http://www.tk                              the dot in .tk
http://lartc.org           Linux Advanced Routing & Traffic Control HOWTO

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 3c59x 2.4.18 userspace seeing UDP packets with bad checksum?
  2002-07-30 13:14   ` 3c59x " bert hubert
@ 2002-07-30 13:31     ` kuznet
  2002-07-30 13:40       ` jamal
                         ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: kuznet @ 2002-07-30 13:31 UTC (permalink / raw)
  To: bert hubert; +Cc: netdev, akpm, jgarzik, becker

Hello!

> Regarding Andi's message, isn't it so that recvfrom() may return but in that
> case returns -1 and sets errno to EAGAIN?

It should if we calculated this checksum. But 3com pretends to do this
in hardware. :-)


> Anything I can do to help,

Well, try to prove that corrupted packet is really received by application.
This is not useless work, in any case, you have to cure this place,
it should not abort because of invalid data. :-)

As a faster hint try to disable rx checksumming in the driver
and look at the effect. I do not see module option to do this,
so probably you have just to comment out the place where
skb->ip_summed is set to CHECKSUM_UNNECESSARY.

Alexey

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 3c59x 2.4.18 userspace seeing UDP packets with bad checksum?
  2002-07-30 13:31     ` kuznet
@ 2002-07-30 13:40       ` jamal
  2002-07-30 13:49       ` bert hubert
  2002-07-30 13:57       ` Donald Becker
  2 siblings, 0 replies; 11+ messages in thread
From: jamal @ 2002-07-30 13:40 UTC (permalink / raw)
  To: kuznet; +Cc: bert hubert, netdev, akpm, jgarzik, becker




On Tue, 30 Jul 2002 kuznet@ms2.inr.ac.ru wrote:

> Hello!
>
> > Regarding Andi's message, isn't it so that recvfrom() may return but in that
> > case returns -1 and sets errno to EAGAIN?
>
> It should if we calculated this checksum. But 3com pretends to do this
> in hardware. :-)
>
>

It doesnt seem like the NIC is Kaput given that he only sees the problems
with the Taiwanese site (unless everyone else sends no checksums). More
like what is in the data these guys send that is acivating this stuff?
Certainly that will become more obvious when
he comments out CHECKSUM_UNNECESSARY in the driver.

BTW, It would be nice if the driver also reported checksum errors. This
seems to only report checksum hits right now on close();
Is this something we need to add into generic stats? It would also
be nice to see breakdown by TCP and UDP. A NIC/driver that doesnt
implement that feature doesnt display those parameters.

cheers,
jamal

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 3c59x 2.4.18 userspace seeing UDP packets with bad checksum?
  2002-07-30 13:31     ` kuznet
  2002-07-30 13:40       ` jamal
@ 2002-07-30 13:49       ` bert hubert
  2002-07-30 14:09         ` Donald Becker
  2002-07-30 13:57       ` Donald Becker
  2 siblings, 1 reply; 11+ messages in thread
From: bert hubert @ 2002-07-30 13:49 UTC (permalink / raw)
  To: kuznet; +Cc: netdev, akpm, jgarzik, becker

On Tue, Jul 30, 2002 at 05:31:44PM +0400, kuznet@ms2.inr.ac.ru wrote:

> Well, try to prove that corrupted packet is really received by application.
> This is not useless work, in any case, you have to cure this place,
> it should not abort because of invalid data. :-)

We fixed the bug already :-) Signed/unsigned arithmetic problems.

I have six examples in my log of PowerDNS exiting just after receiving a
packet with a broken checksum. Now, I know that this is particle physics
kind of statistical evidence :-) but I think it is pretty solid.

Furthermore, I see answers being sent out to questions with broken
checksums. Even more enticing is that all bad checksums come from a small
number of hosts, 61.222.31.205, 64.58.142.2 and 202.106.0.21. So this would
be pretty unlikely for a tcpdump bug.

Here are some answers to questions with a bad udp checksum. Note that the
DNS question id (8, 9 and 8 in this case) matches over question and answer,
which is pretty conclusive evidence that userspace saw the packet.

08:06:53.097420 61.222.31.205.63240 > 213.244.168.217.53:  [bad udp cksum
4c83!] 8 op5 [2a][|domain] (ttl 112, id 444, len 109)

08:06:53.097693 213.244.168.217.53 > 61.222.31.205.63240:  [udp sum ok] 8
op5 NotImp*- q:[|domain] (DF) (ttl 64, id 0, len 52)

==

02:48:03.404865 61.222.31.205.62869 > 213.244.168.217.53:  [bad udp cksum
7a20!] 9 op5 [2a][|domain] (ttl 112, id 1277, len 109)   

02:48:03.405045 213.244.168.217.53 > 61.222.31.205.62869:  [udp sum ok] 9
op5 NotImp*- q:[|domain] (DF) (ttl 64, id 0, len 52)

==

03:20:36.514897 61.222.31.205.63268 > 213.244.168.217.53:  [bad udp cksum
b94d!] 8 op5 [2a][|domain] (ttl 112, id 2317, len 109)   

03:20:36.515094 213.244.168.217.53 > 61.222.31.205.63268:  [udp sum ok] 8
op5 NotImp*- q:[|domain] (DF) (ttl 64, id 0, len 80)

> As a faster hint try to disable rx checksumming in the driver
> and look at the effect. I do not see module option to do this,
> so probably you have just to comment out the place where
> skb->ip_summed is set to CHECKSUM_UNNECESSARY.

Ok, will do when we are near to the machine again. 

Regards,

bert hubert

-- 
http://www.PowerDNS.com          Versatile DNS Software & Services
http://www.tk                              the dot in .tk
http://lartc.org           Linux Advanced Routing & Traffic Control HOWTO

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 3c59x 2.4.18 userspace seeing UDP packets with bad checksum?
  2002-07-30 13:31     ` kuznet
  2002-07-30 13:40       ` jamal
  2002-07-30 13:49       ` bert hubert
@ 2002-07-30 13:57       ` Donald Becker
  2002-07-31 11:46         ` jamal
  2 siblings, 1 reply; 11+ messages in thread
From: Donald Becker @ 2002-07-30 13:57 UTC (permalink / raw)
  To: kuznet; +Cc: bert hubert, netdev, akpm, jgarzik

On Tue, 30 Jul 2002 kuznet@ms2.inr.ac.ru wrote:

> Subject: Re: 3c59x 2.4.18 userspace seeing UDP packets with bad checksum?
> 
> > Regarding Andi's message, isn't it so that recvfrom() may return but in that
> > case returns -1 and sets errno to EAGAIN?
> 
> It should if we calculated this checksum. But 3com pretends to do this
> in hardware. :-)

I've verified that (at least some) 3Com cards don't just fake the
checksum test.  They pass on packets with invalid checksums, but don't
indicate that the checksum is correct.  The driver code for this is

	int csum_bits = rx_status & 0xee000000;
	if (csum_bits &&
		(csum_bits == (IPChksumValid | TCPChksumValid) ||
		 csum_bits == (IPChksumValid | UDPChksumValid))) {
		skb->ip_summed = CHECKSUM_UNNECESSARY;

Note that this relies on the kernel to make the final decision that the
checksum is invalid.

[[ This is the correct semantics: the driver might indicate that a protocol
   level checksum is correct, but it should not discard packets based
   on its (necessarily limited) understanding of higher level protocols. 
]]


-- 
Donald Becker				becker@scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 3c59x 2.4.18 userspace seeing UDP packets with bad checksum?
  2002-07-30 13:49       ` bert hubert
@ 2002-07-30 14:09         ` Donald Becker
  2002-07-30 14:12           ` bert hubert
  0 siblings, 1 reply; 11+ messages in thread
From: Donald Becker @ 2002-07-30 14:09 UTC (permalink / raw)
  To: bert hubert; +Cc: kuznet, netdev, akpm, jgarzik

On Tue, 30 Jul 2002, bert hubert wrote:
> On Tue, Jul 30, 2002 at 05:31:44PM +0400, kuznet@ms2.inr.ac.ru wrote:
> 
> > Well, try to prove that corrupted packet is really received by application.
> > This is not useless work, in any case, you have to cure this place,
> > it should not abort because of invalid data. :-)
> 
> We fixed the bug already :-) Signed/unsigned arithmetic problems.

Were the arithmetic problems in your code, or the kernels checksumming?
(The latter seems unlikely.)

> I have six examples in my log of PowerDNS exiting just after receiving a
> packet with a broken checksum.

Wouldn't this have also happened with valid checksums and invalid data?
This is a good time to fix your application data validity checking.

Note that some network equipment will re-write the IP checksums, just as
some equipment regenerates the link-level frame CRC.  It's not
semantically correct, but claiming that such devices don't exist doesn't
make them go away.

-- 
Donald Becker				becker@scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 3c59x 2.4.18 userspace seeing UDP packets with bad checksum?
  2002-07-30 14:09         ` Donald Becker
@ 2002-07-30 14:12           ` bert hubert
  0 siblings, 0 replies; 11+ messages in thread
From: bert hubert @ 2002-07-30 14:12 UTC (permalink / raw)
  To: Donald Becker; +Cc: kuznet, netdev, akpm, jgarzik

On Tue, Jul 30, 2002 at 10:09:13AM -0400, Donald Becker wrote:

> > We fixed the bug already :-) Signed/unsigned arithmetic problems.
> 
> Were the arithmetic problems in your code, or the kernels checksumming?
> (The latter seems unlikely.)

We fixed *our* bug. I would never want to rely on checksums for input
validation :-) Our code should handle everything people care to throw at it
and I think that right now, it will.

> This is a good time to fix your application data validity checking.

Has been done. Watch freshmeat for the update :-)

Regards,

bert

-- 
http://www.PowerDNS.com          Versatile DNS Software & Services
http://www.tk                              the dot in .tk
http://lartc.org           Linux Advanced Routing & Traffic Control HOWTO

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: 3c59x 2.4.18 userspace seeing UDP packets with bad checksum?
  2002-07-30 13:57       ` Donald Becker
@ 2002-07-31 11:46         ` jamal
  0 siblings, 0 replies; 11+ messages in thread
From: jamal @ 2002-07-31 11:46 UTC (permalink / raw)
  To: Donald Becker; +Cc: kuznet, bert hubert, netdev, akpm, jgarzik



On Tue, 30 Jul 2002, Donald Becker wrote:

> 	int csum_bits = rx_status & 0xee000000;
> 	if (csum_bits &&
> 		(csum_bits == (IPChksumValid | TCPChksumValid) ||
> 		 csum_bits == (IPChksumValid | UDPChksumValid))) {
> 		skb->ip_summed = CHECKSUM_UNNECESSARY;
>
> Note that this relies on the kernel to make the final decision that the
> checksum is invalid.
>
> [[ This is the correct semantics: the driver might indicate that a protocol
>    level checksum is correct, but it should not discard packets based
>    on its (necessarily limited) understanding of higher level protocols.
> ]]
>

Nod.
So is CHECKSUM_WRONG needed to give a hint to the stack? It is possible
IPcsum is right but not the transport; Maybe ip_summed needs to be a
bitmap then?
You could use the information to move packets out of the stack fastpath
for example.

cheers,
jamal

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2002-07-31 11:46 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-07-30 10:48 2.4.18 userspace seeing UDP packets with bad checksum? bert hubert
2002-07-30 12:40 ` kuznet
2002-07-30 13:14   ` 3c59x " bert hubert
2002-07-30 13:31     ` kuznet
2002-07-30 13:40       ` jamal
2002-07-30 13:49       ` bert hubert
2002-07-30 14:09         ` Donald Becker
2002-07-30 14:12           ` bert hubert
2002-07-30 13:57       ` Donald Becker
2002-07-31 11:46         ` jamal
2002-07-30 12:57 ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).