* af_packet & CHECKSUM_HW
@ 2003-11-15 2:44 Herbert Xu
2003-11-16 13:44 ` James Morris
0 siblings, 1 reply; 11+ messages in thread
From: Herbert Xu @ 2003-11-15 2:44 UTC (permalink / raw)
To: netdev
Hi:
Should the packet socket compute the checksums of CHECKSUM_HW packets
before sending them to userspace?
In other words, what do I tell confused users who see bad checksums
in the tcpdump output over loopback and interfaces capable of
hardware checksums?
Cheers,
--
Debian GNU/Linux 3.0 is out! ( http://www.debian.org/ )
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: af_packet & CHECKSUM_HW 2003-11-15 2:44 af_packet & CHECKSUM_HW Herbert Xu @ 2003-11-16 13:44 ` James Morris 2003-11-16 20:51 ` Herbert Xu 0 siblings, 1 reply; 11+ messages in thread From: James Morris @ 2003-11-16 13:44 UTC (permalink / raw) To: Herbert Xu; +Cc: netdev On Sat, 15 Nov 2003, Herbert Xu wrote: > Hi: > > Should the packet socket compute the checksums of CHECKSUM_HW packets > before sending them to userspace? > > In other words, what do I tell confused users who see bad checksums > in the tcpdump output over loopback and interfaces capable of > hardware checksums? That this functionality does not belong in the kernel, as it serves no real purpose. - James -- James Morris <jmorris@redhat.com> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: af_packet & CHECKSUM_HW 2003-11-16 13:44 ` James Morris @ 2003-11-16 20:51 ` Herbert Xu 2003-11-17 19:44 ` Petr Vandrovec 0 siblings, 1 reply; 11+ messages in thread From: Herbert Xu @ 2003-11-16 20:51 UTC (permalink / raw) To: James Morris; +Cc: netdev On Sun, Nov 16, 2003 at 08:44:38AM -0500, James Morris wrote: > On Sat, 15 Nov 2003, Herbert Xu wrote: > > > In other words, what do I tell confused users who see bad checksums > > in the tcpdump output over loopback and interfaces capable of > > hardware checksums? > > That this functionality does not belong in the kernel, as it serves no > real purpose. OK, I will quote you on that :) -- Debian GNU/Linux 3.0 is out! ( http://www.debian.org/ ) Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: af_packet & CHECKSUM_HW 2003-11-16 20:51 ` Herbert Xu @ 2003-11-17 19:44 ` Petr Vandrovec 2003-11-17 20:30 ` David S. Miller 0 siblings, 1 reply; 11+ messages in thread From: Petr Vandrovec @ 2003-11-17 19:44 UTC (permalink / raw) To: Herbert Xu; +Cc: James Morris, netdev On Mon, Nov 17, 2003 at 07:51:46AM +1100, Herbert Xu wrote: > On Sun, Nov 16, 2003 at 08:44:38AM -0500, James Morris wrote: > > On Sat, 15 Nov 2003, Herbert Xu wrote: > > > > > In other words, what do I tell confused users who see bad checksums > > > in the tcpdump output over loopback and interfaces capable of > > > hardware checksums? > > > > That this functionality does not belong in the kernel, as it serves no > > real purpose. > > OK, I will quote you on that :) It would be nice if at least kernel did not stomp on skb2->h.raw so ptype callback could cope with situation and report to userspace where to put checksum. I already discussed it here two or so months ago, and only solution I was able to find is that VMware's vmnet have to parse itself network headers to find where to put checksum. Nothing for IPv6 users... Best regards, Petr Vandrovec ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: af_packet & CHECKSUM_HW 2003-11-17 19:44 ` Petr Vandrovec @ 2003-11-17 20:30 ` David S. Miller 2003-11-18 13:21 ` Petr Vandrovec 0 siblings, 1 reply; 11+ messages in thread From: David S. Miller @ 2003-11-17 20:30 UTC (permalink / raw) To: Petr Vandrovec; +Cc: herbert, jmorris, netdev On Mon, 17 Nov 2003 20:44:52 +0100 Petr Vandrovec <vandrove@vc.cvut.cz> wrote: > It would be nice if at least kernel did not stomp on skb2->h.raw so > ptype callback could cope with situation and report to userspace where > to put checksum. I already discussed it here two or so months ago, and > only solution I was able to find is that VMware's vmnet have to parse > itself network headers to find where to put checksum. Nothing for IPv6 > users... Remind me again why the vmnet driver can't just run skb_checksum() on packets that have CHECKSUM_HW set? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: af_packet & CHECKSUM_HW 2003-11-17 20:30 ` David S. Miller @ 2003-11-18 13:21 ` Petr Vandrovec 2003-11-19 1:46 ` David S. Miller 0 siblings, 1 reply; 11+ messages in thread From: Petr Vandrovec @ 2003-11-18 13:21 UTC (permalink / raw) To: David S. Miller; +Cc: herbert, jmorris, netdev On Mon, Nov 17, 2003 at 12:30:47PM -0800, David S. Miller wrote: > On Mon, 17 Nov 2003 20:44:52 +0100 > Petr Vandrovec <vandrove@vc.cvut.cz> wrote: > > > It would be nice if at least kernel did not stomp on skb2->h.raw so > > ptype callback could cope with situation and report to userspace where > > to put checksum. I already discussed it here two or so months ago, and 3 months ago. > > only solution I was able to find is that VMware's vmnet have to parse > > itself network headers to find where to put checksum. Nothing for IPv6 > > users... > > Remind me again why the vmnet driver can't just run skb_checksum() > on packets that have CHECKSUM_HW set? Because vmnet driver (for bridged networking) is hooked through dev_add_pack. Place where to put checksum is defined as skb->h.raw + skb->csum, and what to checksum as skb->h.raw ... skb->data + skb->len. But unfortunately dev_queue_xmit_nit replaces skb2->h.raw with skb2->nh.raw - and so hook has to look at the packet contents to compute h.raw from packet data and protocol type. When I was complaining about this assignment, you told me that AF_PACKET requires skb->h.raw pointing to same place as skb->nh.raw. And I was not able to find place you were referring to, and thread died. Unfortunately I was not able to find netdev public archive, so below are two messages I sent on this subject in august. As far as I know, there was no reply on my second message. Currently vmnet does fixup below at the start of hook. Fortunately no 'Unknown EII protocol' message was reported from the users yet (and we compute checksum while doing copy_to_user, so there is almost no slowdown). /* *---------------------------------------------------------------------- * * VNetBridgeComputeHeaderPos -- * * Compute correct position for UDP/TCP header. * * Results: * None. * * Side effects: * skb->h.raw updated to point to the tcp/udp header. * *---------------------------------------------------------------------- */ static INLINE_SINGLE_CALLER void VNetBridgeComputeHeaderPos(struct sk_buff *skb) { /* Maybe some kernel gets it right... */ if (skb->h.raw != skb->nh.raw) { return; } switch (be16_to_cpu(skb->protocol)) { case ETH_P_IP: skb->h.raw = skb->nh.raw + (skb->nh.raw[0] & 0x0F) * 4; return; default: printk(KERN_DEBUG "Unknown EII protocol %04X: csum at %d\n", be16_to_cpu(skb->protocol), skb->csum); break; } return; } Thanks, Petr Vandrovec ------ Date: Mon, 18 Aug 2003 05:37:11 +0200 From: Petr Vandrovec <vandrove@vc.cvut.cz> To: davem@redhat.com Cc: netdev@oss.sgi.com Subject: dev_add_pack and hardware checksumming Message-ID: <20030818033711.GA21360@vana.vc.cvut.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.4i Status: RO Content-Length: 992 Lines: 23 Hi Dave, VMware uses dev_add_pack() for capturing ethernet traffic generated by the host and forwarding it to the guest. Unfortunately skb at the packet filter input is still in CHECKSUM_HW state, while skb->csum is relative to the skb->h.raw and dev_queue_xmit_nit overwrites skb->h.raw with skb->nh.raw. Due to this I have to peek at packet body to detect protocol & its header length, reinitialize skb->h.raw with this knowledge and then finally do skb_checksum_help(). For me it would be best if dev_queue_xmit_nit could just leave skb->h.raw alone, usable for skb_checksum_help(), but I assume that it is not possible due to backward compatibility? Other choice is doing checksumming in dev_queue_xmit_nit itself, before passing skb down to the registered packet handlers. BTW, it looks to me like that an uninitialized value is leaked in UDP's checksum field to the packets tcpdump on the host can capture... Thanks, Petr Vandrovec vandrove@vc.cvut.cz ------- Date: Mon, 18 Aug 2003 20:27:27 +0200 From: Petr Vandrovec <vandrove@vc.cvut.cz> To: "David S. Miller" <davem@redhat.com> Cc: netdev@oss.sgi.com Subject: Re: dev_add_pack and hardware checksumming Message-ID: <20030818182727.GD999@vana.vc.cvut.cz> References: <20030818033711.GA21360@vana.vc.cvut.cz> <20030817223851.5185bdee.davem@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20030817223851.5185bdee.davem@redhat.com> User-Agent: Mutt/1.5.4i Status: RO Content-Length: 1103 Lines: 29 On Sun, Aug 17, 2003 at 10:38:51PM -0700, David S. Miller wrote: > On Mon, 18 Aug 2003 05:37:11 +0200 > Petr Vandrovec <vandrove@vc.cvut.cz> wrote: > > > For me it would be best if dev_queue_xmit_nit could just leave > > skb->h.raw alone, > > The AF_PACKET layer depends upon skb->h.raw being set a certain > way. Have a look at the dev_add_pack() registered hooks that > net/packet/af_packet.c uses. Well, I do not see skb->h.raw used in any way in AF_PACKET hooks... Probably I missed some deeply nested generic function which accesses it... > > Other choice is doing checksumming in dev_queue_xmit_nit itself, > > before passing skb down to the registered packet handlers. > > This is not an option either. We're specifically not going to > undo checksum offloading just because tcpdump or some other > kind of tap is installed. And what's left, then? I can happilly offload checksum to some other place down in the chain, but for doing that I must know where I should put checksum, and currently this information is unavailable to the packet hook. Best regards, Petr Vandrovec ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: af_packet & CHECKSUM_HW 2003-11-18 13:21 ` Petr Vandrovec @ 2003-11-19 1:46 ` David S. Miller 2003-11-19 3:34 ` James Morris 2003-11-19 12:35 ` Petr Vandrovec 0 siblings, 2 replies; 11+ messages in thread From: David S. Miller @ 2003-11-19 1:46 UTC (permalink / raw) To: Petr Vandrovec; +Cc: herbert, jmorris, netdev On Tue, 18 Nov 2003 14:21:37 +0100 Petr Vandrovec <vandrove@vc.cvut.cz> wrote: > On Mon, Nov 17, 2003 at 12:30:47PM -0800, David S. Miller wrote: > > Remind me again why the vmnet driver can't just run skb_checksum() > > on packets that have CHECKSUM_HW set? > > Because vmnet driver (for bridged networking) is hooked through dev_add_pack. > Place where to put checksum is defined as skb->h.raw + skb->csum, and what > to checksum as skb->h.raw ... skb->data + skb->len. But unfortunately > dev_queue_xmit_nit replaces skb2->h.raw with skb2->nh.raw - and so hook > has to look at the packet contents to compute h.raw from packet data and > protocol type. Thanks so much for reminding me. I'd like to resolve this. One solution is evident, let me know what you think about the following idea. How about we add a flag the the packet_type structure, called PTYPE_FLAG_NEEDCHECKSUM. If it is set, we run skb_checksum() on CHECKSUM_HW SKBs and do not mangle the skb2->h.raw etc. pointers, then we pass it into ptype->func(). In this way all existing ptype->func() implementations get what they expect. Only if the flag is set will the new behavior occur. Comments? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: af_packet & CHECKSUM_HW 2003-11-19 1:46 ` David S. Miller @ 2003-11-19 3:34 ` James Morris 2003-11-19 12:35 ` Petr Vandrovec 1 sibling, 0 replies; 11+ messages in thread From: James Morris @ 2003-11-19 3:34 UTC (permalink / raw) To: David S. Miller; +Cc: Petr Vandrovec, herbert, netdev On Tue, 18 Nov 2003, David S. Miller wrote: > In this way all existing ptype->func() implementations get what > they expect. Only if the flag is set will the new behavior occur. > > Comments? Sounds good. - James -- James Morris <jmorris@redhat.com> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: af_packet & CHECKSUM_HW 2003-11-19 1:46 ` David S. Miller 2003-11-19 3:34 ` James Morris @ 2003-11-19 12:35 ` Petr Vandrovec 2003-11-19 15:01 ` jamal 1 sibling, 1 reply; 11+ messages in thread From: Petr Vandrovec @ 2003-11-19 12:35 UTC (permalink / raw) To: David S. Miller; +Cc: herbert, jmorris, netdev On Tue, Nov 18, 2003 at 05:46:12PM -0800, David S. Miller wrote: > On Tue, 18 Nov 2003 14:21:37 +0100 > Petr Vandrovec <vandrove@vc.cvut.cz> wrote: > > > On Mon, Nov 17, 2003 at 12:30:47PM -0800, David S. Miller wrote: > > > Remind me again why the vmnet driver can't just run skb_checksum() > > > on packets that have CHECKSUM_HW set? > > > > Because vmnet driver (for bridged networking) is hooked through dev_add_pack. > > Place where to put checksum is defined as skb->h.raw + skb->csum, and what > > to checksum as skb->h.raw ... skb->data + skb->len. But unfortunately > > dev_queue_xmit_nit replaces skb2->h.raw with skb2->nh.raw - and so hook > > has to look at the packet contents to compute h.raw from packet data and > > protocol type. > > Thanks so much for reminding me. > > I'd like to resolve this. One solution is evident, let me know > what you think about the following idea. > > How about we add a flag the the packet_type structure, called > PTYPE_FLAG_NEEDCHECKSUM. If it is set, we run skb_checksum() on > CHECKSUM_HW SKBs and do not mangle the skb2->h.raw etc. pointers, then > we pass it into ptype->func(). > > In this way all existing ptype->func() implementations get what > they expect. Only if the flag is set will the new behavior occur. I'm not sure that we should run skb_checksum() on the packet. In vmnet's case about 95% of all packets delivered to the hook is thrown away after checking target MAC address (they are not broadcasts & they are not for the virtual machine), and in these cases computing checksum is just waste of time. So I think that this option should be named PTYPE_FLAG_CANCHECKSUM, and it should affect only h.raw assignment - no checksumming should happen. For AF_PACKET needs we can create PTYPE_FLAG_NEEDCHECKSUM which would call skb_checksum() itself. But I think that adding skb_checksum call to af_packet hook itself is simpler. We can leave default case (no flags) as is, so hooks which expects h.raw==nh.raw still work, and system is not slowed down by doing checksums for hooks which do not need it. I see only one possible benefit from PTYPE_FLAG_NEEDCHECKSUM: we can checksum original skb, potentially saving some cycles in drivers which use skb_csum_and_copy_dev (or when two or more such hooks are registered). Best regards, Petr Vandrovec vandrove@vc.cvut.cz ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: af_packet & CHECKSUM_HW 2003-11-19 12:35 ` Petr Vandrovec @ 2003-11-19 15:01 ` jamal 2003-11-19 15:11 ` Petr Vandrovec 0 siblings, 1 reply; 11+ messages in thread From: jamal @ 2003-11-19 15:01 UTC (permalink / raw) To: Petr Vandrovec; +Cc: David S. Miller, herbert, jmorris, netdev On Wed, 2003-11-19 at 07:35, Petr Vandrovec wrote: > I'm not sure that we should run skb_checksum() on the packet. In vmnet's > case about 95% of all packets delivered to the hook is thrown away after > checking target MAC address (they are not broadcasts & they are not for > the virtual machine), and in these cases computing checksum is just waste > of time. > > So I think that this option should be named PTYPE_FLAG_CANCHECKSUM, and it > should affect only h.raw assignment - no checksumming should happen. > > For AF_PACKET needs we can create PTYPE_FLAG_NEEDCHECKSUM which would call > skb_checksum() itself. But I think that adding skb_checksum call to af_packet > hook itself is simpler. > > We can leave default case (no flags) as is, so hooks which expects h.raw==nh.raw > still work, and system is not slowed down by doing checksums for hooks which > do not need it. > > I see only one possible benefit from PTYPE_FLAG_NEEDCHECKSUM: we can checksum > original skb, potentially saving some cycles in drivers which use > skb_csum_and_copy_dev (or when two or more such hooks are registered). I think this sort of stuff will greatly benefit from the tc extensions that i have been working on. instead of hacking on ptype you can instead write little modules that do speacilized work (like computing checksums) or dropping packets or setting these flags that can be later used for doing extra work. The code executes before the ptype level execution and using the u32 classifier you can grop any part of the packet. I can send you a patch if you want to give it a shot. cheers, jamal ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: af_packet & CHECKSUM_HW 2003-11-19 15:01 ` jamal @ 2003-11-19 15:11 ` Petr Vandrovec 0 siblings, 0 replies; 11+ messages in thread From: Petr Vandrovec @ 2003-11-19 15:11 UTC (permalink / raw) To: jamal; +Cc: David S. Miller, herbert, jmorris, netdev On Wed, Nov 19, 2003 at 10:01:45AM -0500, jamal wrote: > > I think this sort of stuff will greatly benefit from the tc extensions > that i have been working on. > instead of hacking on ptype you can instead write little modules that > do speacilized work (like computing checksums) or dropping packets > or setting these flags that can be later used for doing extra work. > The code executes before the ptype level execution and using the u32 > classifier you can grop any part of the packet. I can send you a patch > if you want to give it a shot. I'll gladly look at it. From your short description it looks like that it could solve also our second longstanding problem: we cannot block host->guest packets from hitting wire, and if guest communicates only with host (this traffic does not hit wire as it is injected as if received from network card) then switches on the network do not know where guest lives, and these packets then flood whole network. Thanks, Petr Vandrovec ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2003-11-19 15:11 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-11-15 2:44 af_packet & CHECKSUM_HW Herbert Xu 2003-11-16 13:44 ` James Morris 2003-11-16 20:51 ` Herbert Xu 2003-11-17 19:44 ` Petr Vandrovec 2003-11-17 20:30 ` David S. Miller 2003-11-18 13:21 ` Petr Vandrovec 2003-11-19 1:46 ` David S. Miller 2003-11-19 3:34 ` James Morris 2003-11-19 12:35 ` Petr Vandrovec 2003-11-19 15:01 ` jamal 2003-11-19 15:11 ` Petr Vandrovec
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).