* af_packet & CHECKSUM_HW
@ 2003-11-15 2:44 Herbert Xu
2003-11-16 13:44 ` James Morris
0 siblings, 1 reply; 11+ messages in thread
From: Herbert Xu @ 2003-11-15 2:44 UTC (permalink / raw)
To: netdev
Hi:
Should the packet socket compute the checksums of CHECKSUM_HW packets
before sending them to userspace?
In other words, what do I tell confused users who see bad checksums
in the tcpdump output over loopback and interfaces capable of
hardware checksums?
Cheers,
--
Debian GNU/Linux 3.0 is out! ( http://www.debian.org/ )
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: af_packet & CHECKSUM_HW
2003-11-15 2:44 af_packet & CHECKSUM_HW Herbert Xu
@ 2003-11-16 13:44 ` James Morris
2003-11-16 20:51 ` Herbert Xu
0 siblings, 1 reply; 11+ messages in thread
From: James Morris @ 2003-11-16 13:44 UTC (permalink / raw)
To: Herbert Xu; +Cc: netdev
On Sat, 15 Nov 2003, Herbert Xu wrote:
> Hi:
>
> Should the packet socket compute the checksums of CHECKSUM_HW packets
> before sending them to userspace?
>
> In other words, what do I tell confused users who see bad checksums
> in the tcpdump output over loopback and interfaces capable of
> hardware checksums?
That this functionality does not belong in the kernel, as it serves no
real purpose.
- James
--
James Morris
<jmorris@redhat.com>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: af_packet & CHECKSUM_HW
2003-11-16 13:44 ` James Morris
@ 2003-11-16 20:51 ` Herbert Xu
2003-11-17 19:44 ` Petr Vandrovec
0 siblings, 1 reply; 11+ messages in thread
From: Herbert Xu @ 2003-11-16 20:51 UTC (permalink / raw)
To: James Morris; +Cc: netdev
On Sun, Nov 16, 2003 at 08:44:38AM -0500, James Morris wrote:
> On Sat, 15 Nov 2003, Herbert Xu wrote:
>
> > In other words, what do I tell confused users who see bad checksums
> > in the tcpdump output over loopback and interfaces capable of
> > hardware checksums?
>
> That this functionality does not belong in the kernel, as it serves no
> real purpose.
OK, I will quote you on that :)
--
Debian GNU/Linux 3.0 is out! ( http://www.debian.org/ )
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: af_packet & CHECKSUM_HW
2003-11-16 20:51 ` Herbert Xu
@ 2003-11-17 19:44 ` Petr Vandrovec
2003-11-17 20:30 ` David S. Miller
0 siblings, 1 reply; 11+ messages in thread
From: Petr Vandrovec @ 2003-11-17 19:44 UTC (permalink / raw)
To: Herbert Xu; +Cc: James Morris, netdev
On Mon, Nov 17, 2003 at 07:51:46AM +1100, Herbert Xu wrote:
> On Sun, Nov 16, 2003 at 08:44:38AM -0500, James Morris wrote:
> > On Sat, 15 Nov 2003, Herbert Xu wrote:
> >
> > > In other words, what do I tell confused users who see bad checksums
> > > in the tcpdump output over loopback and interfaces capable of
> > > hardware checksums?
> >
> > That this functionality does not belong in the kernel, as it serves no
> > real purpose.
>
> OK, I will quote you on that :)
It would be nice if at least kernel did not stomp on skb2->h.raw so
ptype callback could cope with situation and report to userspace where
to put checksum. I already discussed it here two or so months ago, and
only solution I was able to find is that VMware's vmnet have to parse
itself network headers to find where to put checksum. Nothing for IPv6
users...
Best regards,
Petr Vandrovec
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: af_packet & CHECKSUM_HW
2003-11-17 19:44 ` Petr Vandrovec
@ 2003-11-17 20:30 ` David S. Miller
2003-11-18 13:21 ` Petr Vandrovec
0 siblings, 1 reply; 11+ messages in thread
From: David S. Miller @ 2003-11-17 20:30 UTC (permalink / raw)
To: Petr Vandrovec; +Cc: herbert, jmorris, netdev
On Mon, 17 Nov 2003 20:44:52 +0100
Petr Vandrovec <vandrove@vc.cvut.cz> wrote:
> It would be nice if at least kernel did not stomp on skb2->h.raw so
> ptype callback could cope with situation and report to userspace where
> to put checksum. I already discussed it here two or so months ago, and
> only solution I was able to find is that VMware's vmnet have to parse
> itself network headers to find where to put checksum. Nothing for IPv6
> users...
Remind me again why the vmnet driver can't just run skb_checksum()
on packets that have CHECKSUM_HW set?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: af_packet & CHECKSUM_HW
2003-11-17 20:30 ` David S. Miller
@ 2003-11-18 13:21 ` Petr Vandrovec
2003-11-19 1:46 ` David S. Miller
0 siblings, 1 reply; 11+ messages in thread
From: Petr Vandrovec @ 2003-11-18 13:21 UTC (permalink / raw)
To: David S. Miller; +Cc: herbert, jmorris, netdev
On Mon, Nov 17, 2003 at 12:30:47PM -0800, David S. Miller wrote:
> On Mon, 17 Nov 2003 20:44:52 +0100
> Petr Vandrovec <vandrove@vc.cvut.cz> wrote:
>
> > It would be nice if at least kernel did not stomp on skb2->h.raw so
> > ptype callback could cope with situation and report to userspace where
> > to put checksum. I already discussed it here two or so months ago, and
3 months ago.
> > only solution I was able to find is that VMware's vmnet have to parse
> > itself network headers to find where to put checksum. Nothing for IPv6
> > users...
>
> Remind me again why the vmnet driver can't just run skb_checksum()
> on packets that have CHECKSUM_HW set?
Because vmnet driver (for bridged networking) is hooked through dev_add_pack.
Place where to put checksum is defined as skb->h.raw + skb->csum, and what
to checksum as skb->h.raw ... skb->data + skb->len. But unfortunately
dev_queue_xmit_nit replaces skb2->h.raw with skb2->nh.raw - and so hook
has to look at the packet contents to compute h.raw from packet data and
protocol type.
When I was complaining about this assignment, you told me that AF_PACKET
requires skb->h.raw pointing to same place as skb->nh.raw. And I was not able
to find place you were referring to, and thread died.
Unfortunately I was not able to find netdev public archive, so below are two
messages I sent on this subject in august. As far as I know, there was no reply
on my second message.
Currently vmnet does fixup below at the start of hook. Fortunately
no 'Unknown EII protocol' message was reported from the users yet
(and we compute checksum while doing copy_to_user, so there is almost
no slowdown).
/*
*----------------------------------------------------------------------
*
* VNetBridgeComputeHeaderPos --
*
* Compute correct position for UDP/TCP header.
*
* Results:
* None.
*
* Side effects:
* skb->h.raw updated to point to the tcp/udp header.
*
*----------------------------------------------------------------------
*/
static INLINE_SINGLE_CALLER void
VNetBridgeComputeHeaderPos(struct sk_buff *skb)
{
/* Maybe some kernel gets it right... */
if (skb->h.raw != skb->nh.raw) {
return;
}
switch (be16_to_cpu(skb->protocol)) {
case ETH_P_IP:
skb->h.raw = skb->nh.raw + (skb->nh.raw[0] & 0x0F) * 4;
return;
default:
printk(KERN_DEBUG "Unknown EII protocol %04X: csum at %d\n", be16_to_cpu(skb->protocol), skb->csum);
break;
}
return;
}
Thanks,
Petr Vandrovec
------
Date: Mon, 18 Aug 2003 05:37:11 +0200
From: Petr Vandrovec <vandrove@vc.cvut.cz>
To: davem@redhat.com
Cc: netdev@oss.sgi.com
Subject: dev_add_pack and hardware checksumming
Message-ID: <20030818033711.GA21360@vana.vc.cvut.cz>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.5.4i
Status: RO
Content-Length: 992
Lines: 23
Hi Dave,
VMware uses dev_add_pack() for capturing ethernet traffic generated
by the host and forwarding it to the guest.
Unfortunately skb at the packet filter input is still in CHECKSUM_HW
state, while skb->csum is relative to the skb->h.raw and
dev_queue_xmit_nit overwrites skb->h.raw with skb->nh.raw. Due to this
I have to peek at packet body to detect protocol & its header length,
reinitialize skb->h.raw with this knowledge and then finally do
skb_checksum_help().
For me it would be best if dev_queue_xmit_nit could just leave
skb->h.raw alone, usable for skb_checksum_help(), but I assume that it
is not possible due to backward compatibility? Other choice is doing
checksumming in dev_queue_xmit_nit itself, before passing skb down to
the registered packet handlers.
BTW, it looks to me like that an uninitialized value is leaked in UDP's
checksum field to the packets tcpdump on the host can capture...
Thanks,
Petr Vandrovec
vandrove@vc.cvut.cz
-------
Date: Mon, 18 Aug 2003 20:27:27 +0200
From: Petr Vandrovec <vandrove@vc.cvut.cz>
To: "David S. Miller" <davem@redhat.com>
Cc: netdev@oss.sgi.com
Subject: Re: dev_add_pack and hardware checksumming
Message-ID: <20030818182727.GD999@vana.vc.cvut.cz>
References: <20030818033711.GA21360@vana.vc.cvut.cz> <20030817223851.5185bdee.davem@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20030817223851.5185bdee.davem@redhat.com>
User-Agent: Mutt/1.5.4i
Status: RO
Content-Length: 1103
Lines: 29
On Sun, Aug 17, 2003 at 10:38:51PM -0700, David S. Miller wrote:
> On Mon, 18 Aug 2003 05:37:11 +0200
> Petr Vandrovec <vandrove@vc.cvut.cz> wrote:
>
> > For me it would be best if dev_queue_xmit_nit could just leave
> > skb->h.raw alone,
>
> The AF_PACKET layer depends upon skb->h.raw being set a certain
> way. Have a look at the dev_add_pack() registered hooks that
> net/packet/af_packet.c uses.
Well, I do not see skb->h.raw used in any way in AF_PACKET hooks...
Probably I missed some deeply nested generic function which accesses
it...
> > Other choice is doing checksumming in dev_queue_xmit_nit itself,
> > before passing skb down to the registered packet handlers.
>
> This is not an option either. We're specifically not going to
> undo checksum offloading just because tcpdump or some other
> kind of tap is installed.
And what's left, then? I can happilly offload checksum to some
other place down in the chain, but for doing that I must know
where I should put checksum, and currently this information is
unavailable to the packet hook.
Best regards,
Petr Vandrovec
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: af_packet & CHECKSUM_HW
2003-11-18 13:21 ` Petr Vandrovec
@ 2003-11-19 1:46 ` David S. Miller
2003-11-19 3:34 ` James Morris
2003-11-19 12:35 ` Petr Vandrovec
0 siblings, 2 replies; 11+ messages in thread
From: David S. Miller @ 2003-11-19 1:46 UTC (permalink / raw)
To: Petr Vandrovec; +Cc: herbert, jmorris, netdev
On Tue, 18 Nov 2003 14:21:37 +0100
Petr Vandrovec <vandrove@vc.cvut.cz> wrote:
> On Mon, Nov 17, 2003 at 12:30:47PM -0800, David S. Miller wrote:
> > Remind me again why the vmnet driver can't just run skb_checksum()
> > on packets that have CHECKSUM_HW set?
>
> Because vmnet driver (for bridged networking) is hooked through dev_add_pack.
> Place where to put checksum is defined as skb->h.raw + skb->csum, and what
> to checksum as skb->h.raw ... skb->data + skb->len. But unfortunately
> dev_queue_xmit_nit replaces skb2->h.raw with skb2->nh.raw - and so hook
> has to look at the packet contents to compute h.raw from packet data and
> protocol type.
Thanks so much for reminding me.
I'd like to resolve this. One solution is evident, let me know
what you think about the following idea.
How about we add a flag the the packet_type structure, called
PTYPE_FLAG_NEEDCHECKSUM. If it is set, we run skb_checksum() on
CHECKSUM_HW SKBs and do not mangle the skb2->h.raw etc. pointers, then
we pass it into ptype->func().
In this way all existing ptype->func() implementations get what
they expect. Only if the flag is set will the new behavior occur.
Comments?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: af_packet & CHECKSUM_HW
2003-11-19 1:46 ` David S. Miller
@ 2003-11-19 3:34 ` James Morris
2003-11-19 12:35 ` Petr Vandrovec
1 sibling, 0 replies; 11+ messages in thread
From: James Morris @ 2003-11-19 3:34 UTC (permalink / raw)
To: David S. Miller; +Cc: Petr Vandrovec, herbert, netdev
On Tue, 18 Nov 2003, David S. Miller wrote:
> In this way all existing ptype->func() implementations get what
> they expect. Only if the flag is set will the new behavior occur.
>
> Comments?
Sounds good.
- James
--
James Morris
<jmorris@redhat.com>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: af_packet & CHECKSUM_HW
2003-11-19 1:46 ` David S. Miller
2003-11-19 3:34 ` James Morris
@ 2003-11-19 12:35 ` Petr Vandrovec
2003-11-19 15:01 ` jamal
1 sibling, 1 reply; 11+ messages in thread
From: Petr Vandrovec @ 2003-11-19 12:35 UTC (permalink / raw)
To: David S. Miller; +Cc: herbert, jmorris, netdev
On Tue, Nov 18, 2003 at 05:46:12PM -0800, David S. Miller wrote:
> On Tue, 18 Nov 2003 14:21:37 +0100
> Petr Vandrovec <vandrove@vc.cvut.cz> wrote:
>
> > On Mon, Nov 17, 2003 at 12:30:47PM -0800, David S. Miller wrote:
> > > Remind me again why the vmnet driver can't just run skb_checksum()
> > > on packets that have CHECKSUM_HW set?
> >
> > Because vmnet driver (for bridged networking) is hooked through dev_add_pack.
> > Place where to put checksum is defined as skb->h.raw + skb->csum, and what
> > to checksum as skb->h.raw ... skb->data + skb->len. But unfortunately
> > dev_queue_xmit_nit replaces skb2->h.raw with skb2->nh.raw - and so hook
> > has to look at the packet contents to compute h.raw from packet data and
> > protocol type.
>
> Thanks so much for reminding me.
>
> I'd like to resolve this. One solution is evident, let me know
> what you think about the following idea.
>
> How about we add a flag the the packet_type structure, called
> PTYPE_FLAG_NEEDCHECKSUM. If it is set, we run skb_checksum() on
> CHECKSUM_HW SKBs and do not mangle the skb2->h.raw etc. pointers, then
> we pass it into ptype->func().
>
> In this way all existing ptype->func() implementations get what
> they expect. Only if the flag is set will the new behavior occur.
I'm not sure that we should run skb_checksum() on the packet. In vmnet's
case about 95% of all packets delivered to the hook is thrown away after
checking target MAC address (they are not broadcasts & they are not for
the virtual machine), and in these cases computing checksum is just waste
of time.
So I think that this option should be named PTYPE_FLAG_CANCHECKSUM, and it
should affect only h.raw assignment - no checksumming should happen.
For AF_PACKET needs we can create PTYPE_FLAG_NEEDCHECKSUM which would call
skb_checksum() itself. But I think that adding skb_checksum call to af_packet
hook itself is simpler.
We can leave default case (no flags) as is, so hooks which expects h.raw==nh.raw
still work, and system is not slowed down by doing checksums for hooks which
do not need it.
I see only one possible benefit from PTYPE_FLAG_NEEDCHECKSUM: we can checksum
original skb, potentially saving some cycles in drivers which use
skb_csum_and_copy_dev (or when two or more such hooks are registered).
Best regards,
Petr Vandrovec
vandrove@vc.cvut.cz
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: af_packet & CHECKSUM_HW
2003-11-19 12:35 ` Petr Vandrovec
@ 2003-11-19 15:01 ` jamal
2003-11-19 15:11 ` Petr Vandrovec
0 siblings, 1 reply; 11+ messages in thread
From: jamal @ 2003-11-19 15:01 UTC (permalink / raw)
To: Petr Vandrovec; +Cc: David S. Miller, herbert, jmorris, netdev
On Wed, 2003-11-19 at 07:35, Petr Vandrovec wrote:
> I'm not sure that we should run skb_checksum() on the packet. In vmnet's
> case about 95% of all packets delivered to the hook is thrown away after
> checking target MAC address (they are not broadcasts & they are not for
> the virtual machine), and in these cases computing checksum is just waste
> of time.
>
> So I think that this option should be named PTYPE_FLAG_CANCHECKSUM, and it
> should affect only h.raw assignment - no checksumming should happen.
>
> For AF_PACKET needs we can create PTYPE_FLAG_NEEDCHECKSUM which would call
> skb_checksum() itself. But I think that adding skb_checksum call to af_packet
> hook itself is simpler.
>
> We can leave default case (no flags) as is, so hooks which expects h.raw==nh.raw
> still work, and system is not slowed down by doing checksums for hooks which
> do not need it.
>
> I see only one possible benefit from PTYPE_FLAG_NEEDCHECKSUM: we can checksum
> original skb, potentially saving some cycles in drivers which use
> skb_csum_and_copy_dev (or when two or more such hooks are registered).
I think this sort of stuff will greatly benefit from the tc extensions
that i have been working on.
instead of hacking on ptype you can instead write little modules that
do speacilized work (like computing checksums) or dropping packets
or setting these flags that can be later used for doing extra work.
The code executes before the ptype level execution and using the u32
classifier you can grop any part of the packet. I can send you a patch
if you want to give it a shot.
cheers,
jamal
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: af_packet & CHECKSUM_HW
2003-11-19 15:01 ` jamal
@ 2003-11-19 15:11 ` Petr Vandrovec
0 siblings, 0 replies; 11+ messages in thread
From: Petr Vandrovec @ 2003-11-19 15:11 UTC (permalink / raw)
To: jamal; +Cc: David S. Miller, herbert, jmorris, netdev
On Wed, Nov 19, 2003 at 10:01:45AM -0500, jamal wrote:
>
> I think this sort of stuff will greatly benefit from the tc extensions
> that i have been working on.
> instead of hacking on ptype you can instead write little modules that
> do speacilized work (like computing checksums) or dropping packets
> or setting these flags that can be later used for doing extra work.
> The code executes before the ptype level execution and using the u32
> classifier you can grop any part of the packet. I can send you a patch
> if you want to give it a shot.
I'll gladly look at it. From your short description it looks like that
it could solve also our second longstanding problem: we cannot block
host->guest packets from hitting wire, and if guest communicates only
with host (this traffic does not hit wire as it is injected as if
received from network card) then switches on the network do not know
where guest lives, and these packets then flood whole network.
Thanks,
Petr Vandrovec
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2003-11-19 15:11 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-11-15 2:44 af_packet & CHECKSUM_HW Herbert Xu
2003-11-16 13:44 ` James Morris
2003-11-16 20:51 ` Herbert Xu
2003-11-17 19:44 ` Petr Vandrovec
2003-11-17 20:30 ` David S. Miller
2003-11-18 13:21 ` Petr Vandrovec
2003-11-19 1:46 ` David S. Miller
2003-11-19 3:34 ` James Morris
2003-11-19 12:35 ` Petr Vandrovec
2003-11-19 15:01 ` jamal
2003-11-19 15:11 ` Petr Vandrovec
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).