netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] Fix false positives in can_checksum_protocol()
@ 2015-09-25 12:55 David Woodhouse
  2015-09-28 17:03 ` Tom Herbert
  0 siblings, 1 reply; 15+ messages in thread
From: David Woodhouse @ 2015-09-25 12:55 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 4540 bytes --]

The check in harmonize_features() is supposed to match the skb against
the features of the device in question, and prevent us from handing a
skb to a device which can't handle it.

It doesn't work correctly. A device with NETIF_F_IP_CSUM or
NETIF_F_IPV6_CSUM capabilities is only required to checksum TCP or UDP,
on Legacy IP and IPv6 respectively. But the existing check will allow
us to pass it *any* ETH_P_IP/ETH_P_IPV6 packets for hardware checksum
offload.

Depending on the driver in use, this leads to a BUG, a WARNING, or just
silent data corruption.

This is one approach to fixing that, and my test program at
http://bombadil.infradead.org/~dwmw2/raw.c can no longer trivially
reproduce the problem.

The test does now have false *negatives*, but those shouldn't happen
for locally-generated packets; only packets injected from af_packet,
tun, virtio_net and other places that allow us to inject
CHECKSUM_PARTIAL packets in order to make use of hardware offload
features. And false negatives aren't anywhere near as much of a problem
as false positives are — we just finish the checksum in software and
send the packet anyway.

It would be possible to fix those false negatives, if we really wanted
to — perhaps by adding an additional bit in the skbuff which indicates
that it *is* a TCP or UDP packet, rather than using ->sk->sk_protocol.
Then that bit could be set if appropriate in skb_partial_csum_set(), as
well as the places where we locally generate such packets. And the
check in can_checksum_protocol() would just check for that bit.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
---
Since can_checksum_protocol is inline, the compiler ought to know that
it doesn't even need to dereference skb->sk in the case where the
device has the NETIF_F_GEN_CSUM feature. So the additional check should
not slow down the (hopefully) common case in the fast path.

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2d15e38..76c8330 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3628,15 +3628,23 @@ struct sk_buff *skb_gso_segment(struct sk_buff *skb, netdev_features_t features)
 __be16 skb_network_protocol(struct sk_buff *skb, int *depth);
 
 static inline bool can_checksum_protocol(netdev_features_t features,
-					 __be16 protocol)
-{
-	return ((features & NETIF_F_GEN_CSUM) ||
-		((features & NETIF_F_V4_CSUM) &&
-		 protocol == htons(ETH_P_IP)) ||
-		((features & NETIF_F_V6_CSUM) &&
-		 protocol == htons(ETH_P_IPV6)) ||
-		((features & NETIF_F_FCOE_CRC) &&
-		 protocol == htons(ETH_P_FCOE)));
+					 __be16 protocol, u8 sk_protocol)
+{
+	if ((features & NETIF_F_GEN_CSUM) ||
+	    ((features & NETIF_F_FCOE_CRC) && protocol == htons(ETH_P_FCOE)))
+		return 1;
+
+	/* NETIF_F_V[46]_CSUM are defined to work only on TCP and UDP.
+	 * That is, when it needs to start checksumming at the transport
+	 * header, and place the result at an offset of either 6 (for UDP)
+	 * or 16 (for TCP).
+	 */
+	if ((((features & NETIF_F_V4_CSUM) && protocol == htons(ETH_P_IP)) ||
+	     ((features & NETIF_F_V6_CSUM) && protocol == htons(ETH_P_IPV6))) &&
+	    (sk_protocol == IPPROTO_TCP || sk_protocol == IPPROTO_UDP))
+		return 1;
+
+	return 0;
 }
 
 #ifdef CONFIG_BUG
diff --git a/net/core/dev.c b/net/core/dev.c
index 6bb6470..3c40957 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2613,7 +2613,8 @@ static netdev_features_t harmonize_features(struct sk_buff *skb,
 	features = net_mpls_features(skb, features, type);
 
 	if (skb->ip_summed != CHECKSUM_NONE &&
-	    !can_checksum_protocol(features, type)) {
+	    !can_checksum_protocol(features, type,
+				   skb->sk ? skb->sk->sk_protocol : 0)) {
 		features &= ~NETIF_F_ALL_CSUM;
 	} else if (illegal_highdma(skb->dev, skb)) {
 		features &= ~NETIF_F_SG;
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index dad4dd3..9126c6f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3004,7 +3004,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 		return ERR_PTR(-EINVAL);
 
 	csum = !head_skb->encap_hdr_csum &&
-	    !!can_checksum_protocol(features, proto);
+		!!can_checksum_protocol(features, proto,
+				head_skb->sk ? head_skb->sk->sk_protocol : 0);
 
 	headroom = skb_headroom(head_skb);
 	pos = skb_headlen(head_skb);
-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] Fix false positives in can_checksum_protocol()
  2015-09-25 12:55 [RFC PATCH] Fix false positives in can_checksum_protocol() David Woodhouse
@ 2015-09-28 17:03 ` Tom Herbert
  2015-09-28 18:27   ` David Woodhouse
  0 siblings, 1 reply; 15+ messages in thread
From: Tom Herbert @ 2015-09-28 17:03 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Linux Kernel Network Developers

On Fri, Sep 25, 2015 at 5:55 AM, David Woodhouse <dwmw2@infradead.org> wrote:
> The check in harmonize_features() is supposed to match the skb against
> the features of the device in question, and prevent us from handing a
> skb to a device which can't handle it.
>
> It doesn't work correctly. A device with NETIF_F_IP_CSUM or
> NETIF_F_IPV6_CSUM capabilities is only required to checksum TCP or UDP,
> on Legacy IP and IPv6 respectively. But the existing check will allow
> us to pass it *any* ETH_P_IP/ETH_P_IPV6 packets for hardware checksum
> offload.
>
> Depending on the driver in use, this leads to a BUG, a WARNING, or just
> silent data corruption.
>
> This is one approach to fixing that, and my test program at
> http://bombadil.infradead.org/~dwmw2/raw.c can no longer trivially
> reproduce the problem.
>
> The test does now have false *negatives*, but those shouldn't happen
> for locally-generated packets; only packets injected from af_packet,
> tun, virtio_net and other places that allow us to inject
> CHECKSUM_PARTIAL packets in order to make use of hardware offload
> features. And false negatives aren't anywhere near as much of a problem
> as false positives are — we just finish the checksum in software and
> send the packet anyway.
>
> It would be possible to fix those false negatives, if we really wanted
> to — perhaps by adding an additional bit in the skbuff which indicates
> that it *is* a TCP or UDP packet, rather than using ->sk->sk_protocol.
> Then that bit could be set if appropriate in skb_partial_csum_set(), as
> well as the places where we locally generate such packets. And the
> check in can_checksum_protocol() would just check for that bit.
>
> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
> ---
> Since can_checksum_protocol is inline, the compiler ought to know that
> it doesn't even need to dereference skb->sk in the case where the
> device has the NETIF_F_GEN_CSUM feature. So the additional check should
> not slow down the (hopefully) common case in the fast path.
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 2d15e38..76c8330 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -3628,15 +3628,23 @@ struct sk_buff *skb_gso_segment(struct sk_buff *skb, netdev_features_t features)
>  __be16 skb_network_protocol(struct sk_buff *skb, int *depth);
>
>  static inline bool can_checksum_protocol(netdev_features_t features,
> -                                        __be16 protocol)
> -{
> -       return ((features & NETIF_F_GEN_CSUM) ||
> -               ((features & NETIF_F_V4_CSUM) &&
> -                protocol == htons(ETH_P_IP)) ||
> -               ((features & NETIF_F_V6_CSUM) &&
> -                protocol == htons(ETH_P_IPV6)) ||
> -               ((features & NETIF_F_FCOE_CRC) &&
> -                protocol == htons(ETH_P_FCOE)));
> +                                        __be16 protocol, u8 sk_protocol)
> +{
> +       if ((features & NETIF_F_GEN_CSUM) ||
> +           ((features & NETIF_F_FCOE_CRC) && protocol == htons(ETH_P_FCOE)))
> +               return 1;
> +
> +       /* NETIF_F_V[46]_CSUM are defined to work only on TCP and UDP.
> +        * That is, when it needs to start checksumming at the transport
> +        * header, and place the result at an offset of either 6 (for UDP)
> +        * or 16 (for TCP).
> +        */
> +       if ((((features & NETIF_F_V4_CSUM) && protocol == htons(ETH_P_IP)) ||
> +            ((features & NETIF_F_V6_CSUM) && protocol == htons(ETH_P_IPV6))) &&
> +           (sk_protocol == IPPROTO_TCP || sk_protocol == IPPROTO_UDP))
> +               return 1;
> +
Relying on skb->sk->sk_protocol is problematic. This is making the
assumption that the checksum being offloaded for the packet is the
same as that of the protocol for the socket-- this may not be the case
when we are offloading an outer checksum in encapsulation. Currently
this wouldn't a be problem since we're probably only offloading outer
UDP checksums, but if we ever start trying to offload other outer
checksums (e.g. GRE) then this probably doesn't work so well. Also,
this doesn't help those drivers that that can offload TCP and UDP for
IPv6 but only if there are no extension headers, in those case the
driver needs to look at the packet to see if it is a "simple" UDP/TCP
packet.

AFAIK, the only non UDP/TCP transport IP checksum in the stack is GRE
checksum which as I pointed out we don't attempt to offload. So the
only way to trip the bug that you are seeing is probably through a
userspace packet interface like in the test code. I think this
actually might expose a much more serious issue. Looking at tun.c, I
don't see anything that validates that the csum_start and csum_offset
provided by userspace actually refers to a sane checksum offset. Not
only is this a way to ask the stack to perform checksums for non
TCP/UDP, but it actually seems like the interface could be used by a
malicious application to have a device arbitrarily overwrite two bytes
anywhere in the packet with it's own data far below the stack,
netfilter, routing. To really fix this we should probably be doing
validation in tun, if the checksum isn't for TCP or UDP then call
skb_checksum_help before sending the packet into the stack.

Tom

> +       return 0;
>  }
>
>  #ifdef CONFIG_BUG
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 6bb6470..3c40957 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -2613,7 +2613,8 @@ static netdev_features_t harmonize_features(struct sk_buff *skb,
>         features = net_mpls_features(skb, features, type);
>
>         if (skb->ip_summed != CHECKSUM_NONE &&
> -           !can_checksum_protocol(features, type)) {
> +           !can_checksum_protocol(features, type,
> +                                  skb->sk ? skb->sk->sk_protocol : 0)) {
>                 features &= ~NETIF_F_ALL_CSUM;
>         } else if (illegal_highdma(skb->dev, skb)) {
>                 features &= ~NETIF_F_SG;
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index dad4dd3..9126c6f 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3004,7 +3004,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>                 return ERR_PTR(-EINVAL);
>
>         csum = !head_skb->encap_hdr_csum &&
> -           !!can_checksum_protocol(features, proto);
> +               !!can_checksum_protocol(features, proto,
> +                               head_skb->sk ? head_skb->sk->sk_protocol : 0);
>
>         headroom = skb_headroom(head_skb);
>         pos = skb_headlen(head_skb);
> --
> David Woodhouse                            Open Source Technology Centre
> David.Woodhouse@intel.com                              Intel Corporation
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] Fix false positives in can_checksum_protocol()
  2015-09-28 17:03 ` Tom Herbert
@ 2015-09-28 18:27   ` David Woodhouse
  2015-09-28 19:13     ` Tom Herbert
  0 siblings, 1 reply; 15+ messages in thread
From: David Woodhouse @ 2015-09-28 18:27 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Linux Kernel Network Developers

[-- Attachment #1: Type: text/plain, Size: 2898 bytes --]

On Mon, 2015-09-28 at 10:03 -0700, Tom Herbert wrote:

> > +       if ((((features & NETIF_F_V4_CSUM) && protocol == htons(ETH_P_IP)) ||
> > +            ((features & NETIF_F_V6_CSUM) && protocol == htons(ETH_P_IPV6))) &&
> > +           (sk_protocol == IPPROTO_TCP || sk_protocol == IPPROTO_UDP))
> > +               return 1;
> > +
> Relying on skb->sk->sk_protocol is problematic. This is making the
> assumption that the checksum being offloaded for the packet is the
> same as that of the protocol for the socket-- this may not be the 
> case when we are offloading an outer checksum in encapsulation.

> Currently this wouldn't a be problem since we're probably only 
> offloading outer UDP checksums, but if we ever start trying to 
> offload other outer checksums (e.g. GRE) then this probably doesn't
> work so well.

That makes sense.

>  Also, this doesn't help those drivers that that can offload TCP and 
> UDP for IPv6 but only if there are no extension headers, in those 
> case the driver needs to look at the packet to see if it is a 
> "simple" UDP/TCP packet.

Hm, are such devices even permitted to set NETIF_F_IPV6_CSUM?

> AFAIK, the only non UDP/TCP transport IP checksum in the stack is GRE
> checksum which as I pointed out we don't attempt to offload. So the
> only way to trip the bug that you are seeing is probably through a
> userspace packet interface like in the test code. I think this
> actually might expose a much more serious issue. Looking at tun.c, I
> don't see anything that validates that the csum_start and csum_offset
> provided by userspace actually refers to a sane checksum offset. 

That's handled in skb_partial_csum_set().

> Not only is this a way to ask the stack to perform checksums for non
> TCP/UDP, but it actually seems like the interface could be used by a
> malicious application to have a device arbitrarily overwrite two 
> bytes anywhere in the packet with it's own data far below the stack,
> netfilter, routing. To really fix this we should probably be doing
> validation in tun, if the checksum isn't for TCP or UDP then call
> skb_checksum_help before sending the packet into the stack.

So... if it's never valid to ask for a hardware checksum on anything
but TCP or UDP, why do we bother with NETIF_F_GEN_CSUM at all? Should
we just be removing it entirely? That seems like something of a
retrograde step.

Perhaps a better solution would be a bit in the skbuff which indicates
that it *is* a TCP or UDP checksum. That would be set by our UDP and
TCP sockets, cleared by encapsulation, also set if appropriate by
skb_partial_csum_set().

And then the check in can_checksum_protocol() is trivial and clearly
correct.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] Fix false positives in can_checksum_protocol()
  2015-09-28 18:27   ` David Woodhouse
@ 2015-09-28 19:13     ` Tom Herbert
  2015-09-28 19:26       ` David Woodhouse
  0 siblings, 1 reply; 15+ messages in thread
From: Tom Herbert @ 2015-09-28 19:13 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Linux Kernel Network Developers

>>  Also, this doesn't help those drivers that that can offload TCP and
>> UDP for IPv6 but only if there are no extension headers, in those
>> case the driver needs to look at the packet to see if it is a
>> "simple" UDP/TCP packet.
>
> Hm, are such devices even permitted to set NETIF_F_IPV6_CSUM?
>
Apparently this may be a problem in ixgbe. See "[net-next 05/19]
ixgbe: Add support for UDP-encapsulated tx checksum offload" thread.

>> AFAIK, the only non UDP/TCP transport IP checksum in the stack is GRE
>> checksum which as I pointed out we don't attempt to offload. So the
>> only way to trip the bug that you are seeing is probably through a
>> userspace packet interface like in the test code. I think this
>> actually might expose a much more serious issue. Looking at tun.c, I
>> don't see anything that validates that the csum_start and csum_offset
>> provided by userspace actually refers to a sane checksum offset.
>
> That's handled in skb_partial_csum_set().
>
That only checks that start and offset are within skb_headlen. It
doesn't check that checksum offset refers to TCP/UDP/GRE/ICMP
checksum, or whether to the first two bytes of the IP destination
address. Maybe there's something later in the path that would catch
this, but I didn't readily see it.

>> Not only is this a way to ask the stack to perform checksums for non
>> TCP/UDP, but it actually seems like the interface could be used by a
>> malicious application to have a device arbitrarily overwrite two
>> bytes anywhere in the packet with it's own data far below the stack,
>> netfilter, routing. To really fix this we should probably be doing
>> validation in tun, if the checksum isn't for TCP or UDP then call
>> skb_checksum_help before sending the packet into the stack.
>
> So... if it's never valid to ask for a hardware checksum on anything
> but TCP or UDP, why do we bother with NETIF_F_GEN_CSUM at all? Should
> we just be removing it entirely? That seems like something of a
> retrograde step.
>
No, we want to do the opposite! In your example the request to
checksum is being generated from outside the stack so we need to
verify that for sanity-- requests generated by the stack would be
trusted. Presumably, within the stack we want a generic checksum
offload for new protocols, new extension headers (I am almost certain
that segment routing exthdr would break some NETIF_F_IPV6_CSUM), and
new flavors of encapsulation. NETIF_F_IP_CSUM and NETIF_F_IPV6_CSUM
are not generic and are becoming impediments to protocol development--
drivers moving to NETIF_F_HW_CSUM is the answer.

> Perhaps a better solution would be a bit in the skbuff which indicates
> that it *is* a TCP or UDP checksum. That would be set by our UDP and
> TCP sockets, cleared by encapsulation, also set if appropriate by
> skb_partial_csum_set().
>
Yes I agree. What I have been thinking to do is steal two bits from
csum_offset that would indicate that the checksum is IPv4 or IPv6
(specifically that the checksum value is seeded with an IPv4 or IPv6
pseudo header). This information plus the csum_offset would be
sufficient for drivers to identify the checksum as UDP/TCP-IPv4/IPv6.
The other case that needs special handling is inner vs. outer
checksum, but that can be deduced by comparing (inner of outer)
transport offset to checksum start. With this and a couple of utility
functions we should be able to start deprecating NETIF_F_IP_CSUM and
NETIF_F_IPV6_CSUM.

Thanks,
Tom

> And then the check in can_checksum_protocol() is trivial and clearly
> correct.
>
> --
> David Woodhouse                            Open Source Technology Centre
> David.Woodhouse@intel.com                              Intel Corporation
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] Fix false positives in can_checksum_protocol()
  2015-09-28 19:13     ` Tom Herbert
@ 2015-09-28 19:26       ` David Woodhouse
  2015-09-28 19:37         ` Tom Herbert
  0 siblings, 1 reply; 15+ messages in thread
From: David Woodhouse @ 2015-09-28 19:26 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Linux Kernel Network Developers

[-- Attachment #1: Type: text/plain, Size: 2086 bytes --]

On Mon, 2015-09-28 at 12:13 -0700, Tom Herbert wrote:
> 
> > Perhaps a better solution would be a bit in the skbuff which indicates
> > that it *is* a TCP or UDP checksum. That would be set by our UDP and
> > TCP sockets, cleared by encapsulation, also set if appropriate by
> > skb_partial_csum_set().
> >
> Yes I agree. What I have been thinking to do is steal two bits from
> csum_offset that would indicate that the checksum is IPv4 or IPv6
> (specifically that the checksum value is seeded with an IPv4 or IPv6
> pseudo header). This information plus the csum_offset would be
> sufficient for drivers to identify the checksum as UDP/TCP-IPv4/IPv6.

> The other case that needs special handling is inner vs. outer
> checksum, but that can be deduced by comparing (inner of outer)
> transport offset to checksum start. With this and a couple of utility
> functions we should be able to start deprecating NETIF_F_IP_CSUM and
> NETIF_F_IPV6_CSUM.

You mean drivers which currently set NETIF_F_IP_CSUM would need to
provide a .ndo_features_check() which tolerates only the packets they
can actually handle? And we'd just ensure that the bits are there for
them to use, in the skbuff? That seems reasonable.

Note that 'seeded with an IPv[46] pseudo header' isn't quite
sufficient. Some hardware like 8139cp is explicitly told to do a UDP or
a TCP checksum with a bit in the descriptor, so any UDP-like or TCP
-like checksum works out fine.

Other hardware works out whether to do a UDP or a TCP checksum for
*itself*, so it *can't* cope with other protocols which just happen to
look the same. For those it really *must* be IPPROTO_TCP or IPPROTO_UDP
and they're going to be looking in the IP header for it.

I do suspect we'll want a bit which says it's *actually* TCP or UDP,
not just 'seeded with a pseudo-header'. That's the important
distinction for NETIF_F_IP_CSUM vs. NETIF_F_HW_CSUM.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] Fix false positives in can_checksum_protocol()
  2015-09-28 19:26       ` David Woodhouse
@ 2015-09-28 19:37         ` Tom Herbert
  2015-09-29  1:38           ` Jesse Brandeburg
  2015-09-29  7:08           ` David Woodhouse
  0 siblings, 2 replies; 15+ messages in thread
From: Tom Herbert @ 2015-09-28 19:37 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Linux Kernel Network Developers

On Mon, Sep 28, 2015 at 12:26 PM, David Woodhouse <dwmw2@infradead.org> wrote:
> On Mon, 2015-09-28 at 12:13 -0700, Tom Herbert wrote:
>>
>> > Perhaps a better solution would be a bit in the skbuff which indicates
>> > that it *is* a TCP or UDP checksum. That would be set by our UDP and
>> > TCP sockets, cleared by encapsulation, also set if appropriate by
>> > skb_partial_csum_set().
>> >
>> Yes I agree. What I have been thinking to do is steal two bits from
>> csum_offset that would indicate that the checksum is IPv4 or IPv6
>> (specifically that the checksum value is seeded with an IPv4 or IPv6
>> pseudo header). This information plus the csum_offset would be
>> sufficient for drivers to identify the checksum as UDP/TCP-IPv4/IPv6.
>
>> The other case that needs special handling is inner vs. outer
>> checksum, but that can be deduced by comparing (inner of outer)
>> transport offset to checksum start. With this and a couple of utility
>> functions we should be able to start deprecating NETIF_F_IP_CSUM and
>> NETIF_F_IPV6_CSUM.
>
> You mean drivers which currently set NETIF_F_IP_CSUM would need to
> provide a .ndo_features_check() which tolerates only the packets they
> can actually handle? And we'd just ensure that the bits are there for
> them to use, in the skbuff? That seems reasonable.
>
I think it's easier to just call skb_checksum_help from the driver
when the packet is actually sent to the device (should be no cost for
late binding).

> Note that 'seeded with an IPv[46] pseudo header' isn't quite
> sufficient. Some hardware like 8139cp is explicitly told to do a UDP or
> a TCP checksum with a bit in the descriptor, so any UDP-like or TCP
> -like checksum works out fine.
>
UDP or TCP can be determined from csum_offset, e.g. 16=>TCP 6=>UDP

> Other hardware works out whether to do a UDP or a TCP checksum for
> *itself*, so it *can't* cope with other protocols which just happen to
> look the same. For those it really *must* be IPPROTO_TCP or IPPROTO_UDP
> and they're going to be looking in the IP header for it.
>
In those cases driver can just look into the packet to determine the protocol.

> I do suspect we'll want a bit which says it's *actually* TCP or UDP,
> not just 'seeded with a pseudo-header'. That's the important
> distinction for NETIF_F_IP_CSUM vs. NETIF_F_HW_CSUM.
>
> --
> David Woodhouse                            Open Source Technology Centre
> David.Woodhouse@intel.com                              Intel Corporation
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] Fix false positives in can_checksum_protocol()
  2015-09-28 19:37         ` Tom Herbert
@ 2015-09-29  1:38           ` Jesse Brandeburg
  2015-09-29  3:04             ` Tom Herbert
  2015-09-29  7:08           ` David Woodhouse
  1 sibling, 1 reply; 15+ messages in thread
From: Jesse Brandeburg @ 2015-09-29  1:38 UTC (permalink / raw)
  To: Tom Herbert; +Cc: David Woodhouse, Linux Kernel Network Developers

On Mon, Sep 28, 2015 at 12:37 PM, Tom Herbert <tom@herbertland.com> wrote:
> On Mon, Sep 28, 2015 at 12:26 PM, David Woodhouse <dwmw2@infradead.org> wrote:
>> On Mon, 2015-09-28 at 12:13 -0700, Tom Herbert wrote:
>>>
>>> > Perhaps a better solution would be a bit in the skbuff which indicates
>>> > that it *is* a TCP or UDP checksum. That would be set by our UDP and
>>> > TCP sockets, cleared by encapsulation, also set if appropriate by
>>> > skb_partial_csum_set().

I've been pondering a bit of a redesign in this space.  I think the
skb struct should be
explicit in its instructions to hardware for which offloads to do for
each packet.

In this way, the stack would be *directly* telling the drivers what to
do (and what not
to do), solving all sorts of bugs and really improving driver
reliability and implementation.

These other solutions you guys are discussing are half solving the
problem, only to
make it worse when the next thing comes along.  Unfortunately it is only an
idea right now and no patches.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] Fix false positives in can_checksum_protocol()
  2015-09-29  1:38           ` Jesse Brandeburg
@ 2015-09-29  3:04             ` Tom Herbert
  2015-09-29  7:12               ` David Woodhouse
  0 siblings, 1 reply; 15+ messages in thread
From: Tom Herbert @ 2015-09-29  3:04 UTC (permalink / raw)
  To: Jesse Brandeburg; +Cc: David Woodhouse, Linux Kernel Network Developers

On Mon, Sep 28, 2015 at 6:38 PM, Jesse Brandeburg
<jesse.brandeburg@gmail.com> wrote:
> On Mon, Sep 28, 2015 at 12:37 PM, Tom Herbert <tom@herbertland.com> wrote:
>> On Mon, Sep 28, 2015 at 12:26 PM, David Woodhouse <dwmw2@infradead.org> wrote:
>>> On Mon, 2015-09-28 at 12:13 -0700, Tom Herbert wrote:
>>>>
>>>> > Perhaps a better solution would be a bit in the skbuff which indicates
>>>> > that it *is* a TCP or UDP checksum. That would be set by our UDP and
>>>> > TCP sockets, cleared by encapsulation, also set if appropriate by
>>>> > skb_partial_csum_set().
>
> I've been pondering a bit of a redesign in this space.  I think the
> skb struct should be
> explicit in its instructions to hardware for which offloads to do for
> each packet.
>
> In this way, the stack would be *directly* telling the drivers what to
> do (and what not
> to do), solving all sorts of bugs and really improving driver
> reliability and implementation.
>
Doesn't CHECKSUM_PARTIAL with csum_offset and csum_start already tell
the driver unambiguously what to do wrt checksum offload?

> These other solutions you guys are discussing are half solving the
> problem, only to
> make it worse when the next thing comes along.  Unfortunately it is only an
> idea right now and no patches.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] Fix false positives in can_checksum_protocol()
  2015-09-28 19:37         ` Tom Herbert
  2015-09-29  1:38           ` Jesse Brandeburg
@ 2015-09-29  7:08           ` David Woodhouse
  1 sibling, 0 replies; 15+ messages in thread
From: David Woodhouse @ 2015-09-29  7:08 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Linux Kernel Network Developers

[-- Attachment #1: Type: text/plain, Size: 892 bytes --]

On Mon, 2015-09-28 at 12:37 -0700, Tom Herbert wrote:
> I think it's easier to just call skb_checksum_help from the driver
> when the packet is actually sent to the device (should be no cost for
> late binding).

That's true for checksum. Not for things like TSO though, and I wonder
if it's worth keeping it simple and doing it *all* in
.ndo_features_check()? 

> > Note that 'seeded with an IPv[46] pseudo header' isn't quite
> > sufficient. Some hardware like 8139cp is explicitly told to do a UDP or
> > a TCP checksum with a bit in the descriptor, so any UDP-like or TCP
> > -like checksum works out fine.
> > 
> UDP or TCP can be determined from csum_offset, e.g. 16=>TCP 6=>UDP

Kind of. There'll be false positives there too, though. That was
actually the basis of my first attempt to address this, at
http://lists.openwall.net/netdev/2013/01/14/36

-- 
dwmw2


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] Fix false positives in can_checksum_protocol()
  2015-09-29  3:04             ` Tom Herbert
@ 2015-09-29  7:12               ` David Woodhouse
  2015-09-29 22:52                 ` Tom Herbert
  0 siblings, 1 reply; 15+ messages in thread
From: David Woodhouse @ 2015-09-29  7:12 UTC (permalink / raw)
  To: Tom Herbert, Jesse Brandeburg; +Cc: Linux Kernel Network Developers

[-- Attachment #1: Type: text/plain, Size: 1606 bytes --]

On Mon, 2015-09-28 at 20:04 -0700, Tom Herbert wrote:
> 
> > I've been pondering a bit of a redesign in this space.  I think the
> > skb struct should be explicit in its instructions to hardware for
> > which offloads to do for each packet.
> >
> > In this way, the stack would be *directly* telling the drivers what to
> > do (and what not to do), solving all sorts of bugs and really improving
> > driver reliability and implementation.
> >
> Doesn't CHECKSUM_PARTIAL with csum_offset and csum_start already tell
> the driver unambiguously what to do wrt checksum offload?

Right. That's precisely what we *do* have. But as things stand, we
can't *use* it to its full capability.

It's fine for decent devices which can handle such explicit
instructions (advertised by the NETIF_F_HW_CSUM feature).

The problem is the crappy devices that can *only* checksum UDP and TCP
frames, advertised with the NETIF_F_IP{V6,}_CSUM features. We make a
primitive attempt *not* to feed arbitrary checksum requests to such
hardware. But we fail — we end up feeding *all* Legacy IP packets to a
NETIF_F_IP_CSUM device, and *all* IPv6 packets to a NETIF_F_IPV6_CSUM
device, regardless of whether they're *actually* TCP or UDP packets.

That's the problem I'm trying to solve. And then we *can* make full use
of the generic checksum offload (I had it working for ICMPv6 at one
point: http://lists.openwall.net/netdev/2013/01/14/38 ).

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] Fix false positives in can_checksum_protocol()
  2015-09-29  7:12               ` David Woodhouse
@ 2015-09-29 22:52                 ` Tom Herbert
  2015-10-05 11:16                   ` David Woodhouse
  0 siblings, 1 reply; 15+ messages in thread
From: Tom Herbert @ 2015-09-29 22:52 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Jesse Brandeburg, Linux Kernel Network Developers

On Tue, Sep 29, 2015 at 12:12 AM, David Woodhouse <dwmw2@infradead.org> wrote:
> On Mon, 2015-09-28 at 20:04 -0700, Tom Herbert wrote:
>>
>> > I've been pondering a bit of a redesign in this space.  I think the
>> > skb struct should be explicit in its instructions to hardware for
>> > which offloads to do for each packet.
>> >
>> > In this way, the stack would be *directly* telling the drivers what to
>> > do (and what not to do), solving all sorts of bugs and really improving
>> > driver reliability and implementation.
>> >
>> Doesn't CHECKSUM_PARTIAL with csum_offset and csum_start already tell
>> the driver unambiguously what to do wrt checksum offload?
>
> Right. That's precisely what we *do* have. But as things stand, we
> can't *use* it to its full capability.
>
> It's fine for decent devices which can handle such explicit
> instructions (advertised by the NETIF_F_HW_CSUM feature).
>
> The problem is the crappy devices that can *only* checksum UDP and TCP
> frames, advertised with the NETIF_F_IP{V6,}_CSUM features. We make a
> primitive attempt *not* to feed arbitrary checksum requests to such
> hardware. But we fail — we end up feeding *all* Legacy IP packets to a
> NETIF_F_IP_CSUM device, and *all* IPv6 packets to a NETIF_F_IPV6_CSUM
> device, regardless of whether they're *actually* TCP or UDP packets.
>
Please look at ixgbe_tx_csum in ixgbe driver. This one example of how
a driver can determine whether the checksum being offloaded is TCP or
UDP. The bug in this driver is that skb_checksum_help is not called
for a protocol the driver isn't looking for. In particular, I believe
this driver will probably send packets with invalid checksums when
TCP/UDP is used with IPv6 packets that contain extension headers.

Tom

> That's the problem I'm trying to solve. And then we *can* make full use
> of the generic checksum offload (I had it working for ICMPv6 at one
> point: http://lists.openwall.net/netdev/2013/01/14/38 ).
>
> --
> David Woodhouse                            Open Source Technology Centre
> David.Woodhouse@intel.com                              Intel Corporation
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] Fix false positives in can_checksum_protocol()
  2015-09-29 22:52                 ` Tom Herbert
@ 2015-10-05 11:16                   ` David Woodhouse
  2015-10-05 16:23                     ` Tom Herbert
  0 siblings, 1 reply; 15+ messages in thread
From: David Woodhouse @ 2015-10-05 11:16 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Jesse Brandeburg, Linux Kernel Network Developers

[-- Attachment #1: Type: text/plain, Size: 1641 bytes --]

On Tue, 2015-09-29 at 15:52 -0700, Tom Herbert wrote:
> Please look at ixgbe_tx_csum in ixgbe driver. This one example of how
> a driver can determine whether the checksum being offloaded is TCP or
> UDP. The bug in this driver is...

I think it serves better as an example of why we don't *want* drivers
doing that kind of thing for themselves... :)

I propose we steal some high bits from csum_offset, as you suggested,
and use them to indicate a 'checksum type', which will include TCP and
UDP.

Then the filter in netif_skb_features() can trivially do the right
thing for NETIF_F_IP{V6,}_CSUM devices, so avoid feeding them packets
they can't handle.

You mentioned that you actually want to deprecate those feature flags —
which works for me, but it's kind of orthogonal. If we do that we'd
still want to provide generic functions that such drivers can use as
their .ndo_features_check() method. And we'd *still* want to do the
check based on a simple flag, rather than grubbing around in the packet
data. (And the drivers if they *are* asked to do the checksum will
sometimes care whether it's TCP vs. UDP too).

I don't think we want drivers calling skb_checksum_help() for
themselves; we want the pre-filter. Mainly because we *definitely*
don't want drivers calling gso_skb_segment() for themselves in the same
situation — see the comment I posted on Friday about the r8169 instance
of that. ('Re: [PATCH net-next 3/3] r8169: support IPv6').

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] Fix false positives in can_checksum_protocol()
  2015-10-05 11:16                   ` David Woodhouse
@ 2015-10-05 16:23                     ` Tom Herbert
  2015-10-05 18:28                       ` Rustad, Mark D
  2015-10-05 20:22                       ` David Woodhouse
  0 siblings, 2 replies; 15+ messages in thread
From: Tom Herbert @ 2015-10-05 16:23 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Jesse Brandeburg, Linux Kernel Network Developers

On Mon, Oct 5, 2015 at 4:16 AM, David Woodhouse <dwmw2@infradead.org> wrote:
> On Tue, 2015-09-29 at 15:52 -0700, Tom Herbert wrote:
>> Please look at ixgbe_tx_csum in ixgbe driver. This one example of how
>> a driver can determine whether the checksum being offloaded is TCP or
>> UDP. The bug in this driver is...
>
> I think it serves better as an example of why we don't *want* drivers
> doing that kind of thing for themselves... :)
>
> I propose we steal some high bits from csum_offset, as you suggested,
> and use them to indicate a 'checksum type', which will include TCP and
> UDP.
>
> Then the filter in netif_skb_features() can trivially do the right
> thing for NETIF_F_IP{V6,}_CSUM devices, so avoid feeding them packets
> they can't handle.
>
> You mentioned that you actually want to deprecate those feature flags —
> which works for me, but it's kind of orthogonal. If we do that we'd
> still want to provide generic functions that such drivers can use as
> their .ndo_features_check() method. And we'd *still* want to do the
> check based on a simple flag, rather than grubbing around in the packet
> data. (And the drivers if they *are* asked to do the checksum will
> sometimes care whether it's TCP vs. UDP too).
>
> I don't think we want drivers calling skb_checksum_help() for
> themselves; we want the pre-filter. Mainly because we *definitely*
> don't want drivers calling gso_skb_segment() for themselves in the same
> situation — see the comment I posted on Friday about the r8169 instance
> of that. ('Re: [PATCH net-next 3/3] r8169: support IPv6').
>
David, here is what I am currently thinking the interface should be:

1) Drivers may advertise NETIF_F_HW_CSUM. The stack will indicate
checksum offload exclusively using the
CHECKSUM_PARTIAL/csum_start/csum_offset interface. No additional
interfaces (bits in skbuff should not be needed)
2) A driver may inspect packets via ndo_check to decide if it wants to
offload the checksum, if not cancels NETIF_F_HW_CSUM in the packet.
3) In driver xmit when CHECKSUM_PARTIAL is set the driver MUST
correctly resolve the checksum-- either by properly offloading to the
device or calling skb_checksum_help.
4) To help drivers for devices with limited offload capabilities we'll
define a helper function to check for typical restrictions (.e.g. IPv4
only, TCP/UDP only. no encapsulation, no IPv6 extension headers,
etc.). I am working on this helper function and will send RFC shortly.

Tom

> --
> David Woodhouse                            Open Source Technology Centre
> David.Woodhouse@intel.com                              Intel Corporation
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] Fix false positives in can_checksum_protocol()
  2015-10-05 16:23                     ` Tom Herbert
@ 2015-10-05 18:28                       ` Rustad, Mark D
  2015-10-05 20:22                       ` David Woodhouse
  1 sibling, 0 replies; 15+ messages in thread
From: Rustad, Mark D @ 2015-10-05 18:28 UTC (permalink / raw)
  To: Tom Herbert
  Cc: David Woodhouse, Jesse Brandeburg,
	Linux Kernel Network Developers

[-- Attachment #1: Type: text/plain, Size: 1030 bytes --]

> On Oct 5, 2015, at 9:23 AM, Tom Herbert <tom@herbertland.com> wrote:
> 
> 4) To help drivers for devices with limited offload capabilities we'll
> define a helper function to check for typical restrictions (.e.g. IPv4
> only, TCP/UDP only. no encapsulation, no IPv6 extension headers,
> etc.). I am working on this helper function and will send RFC shortly.

FWIW, ixgbe probably won't use the IPv6 extension header check, because I am working on simply adding offload support for it. It seems to me that if segment routing becomes used, in some places there will be enough traffic with it present to want it offloaded. Well, maybe ixgbe should use the extension header check until I get the offload support done, but that is it.

Are there any good tools for testing IPv6 extension headers? It seems that it hasn't been tested up to this point or the ixgbe problem would have been noticed, so it would be good to introduce a good tool to test it properly.

--
Mark Rustad, Networking Division, Intel Corporation


[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 841 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH] Fix false positives in can_checksum_protocol()
  2015-10-05 16:23                     ` Tom Herbert
  2015-10-05 18:28                       ` Rustad, Mark D
@ 2015-10-05 20:22                       ` David Woodhouse
  1 sibling, 0 replies; 15+ messages in thread
From: David Woodhouse @ 2015-10-05 20:22 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Jesse Brandeburg, Linux Kernel Network Developers

[-- Attachment #1: Type: text/plain, Size: 1749 bytes --]

On Mon, 2015-10-05 at 09:23 -0700, Tom Herbert wrote:
> 
> 1) Drivers may advertise NETIF_F_HW_CSUM. The stack will indicate
> checksum offload exclusively using the
> CHECKSUM_PARTIAL/csum_start/csum_offset interface. No additional
> interfaces (bits in skbuff should not be needed)
> 2) A driver may inspect packets via ndo_check to decide if it wants to
> offload the checksum, if not cancels NETIF_F_HW_CSUM in the packet.
> 3) In driver xmit when CHECKSUM_PARTIAL is set the driver MUST
> correctly resolve the checksum-- either by properly offloading to the
> device or calling skb_checksum_help.

In cases where they haven't used .ndo_features_check() to ensure that
they don't *see* such packets, sure. But using .ndo_features_check()
should probably be the preferred method.

> 4) To help drivers for devices with limited offload capabilities we'll
> define a helper function to check for typical restrictions (.e.g. IPv4
> only, TCP/UDP only. no encapsulation, no IPv6 extension headers,
> etc.). I am working on this helper function and will send RFC shortly.

I do suspect that helper function would benefit from seeing TCP/UDP
flags in the high bits of csum_offset, rather than grubbing around in
the packet for itself to see if it's really TCP/UDP. After all, it's
almost free to set those in the first place — at least for locally
-generated packets. And not *so* hard to add them in
skb_partial_csum_set(). But hey, if you can push an implementation
which is grubbing around in the packet then at least *I* don't have to
feel dirty for it... :)

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-10-05 20:22 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-25 12:55 [RFC PATCH] Fix false positives in can_checksum_protocol() David Woodhouse
2015-09-28 17:03 ` Tom Herbert
2015-09-28 18:27   ` David Woodhouse
2015-09-28 19:13     ` Tom Herbert
2015-09-28 19:26       ` David Woodhouse
2015-09-28 19:37         ` Tom Herbert
2015-09-29  1:38           ` Jesse Brandeburg
2015-09-29  3:04             ` Tom Herbert
2015-09-29  7:12               ` David Woodhouse
2015-09-29 22:52                 ` Tom Herbert
2015-10-05 11:16                   ` David Woodhouse
2015-10-05 16:23                     ` Tom Herbert
2015-10-05 18:28                       ` Rustad, Mark D
2015-10-05 20:22                       ` David Woodhouse
2015-09-29  7:08           ` David Woodhouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).