[PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
       [not found] <lsq.1461711744.351546278@decadent.org.uk>
@ 2016-04-26 23:02 ` Ben Hutchings
  2016-04-27 15:59   ` Ben Greear
  0 siblings, 1 reply; 24+ messages in thread
From: Ben Hutchings @ 2016-04-26 23:02 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: akpm, David S. Miller, Vijay Pandurangan, Cong Wang, netdev,
	Evan Jones, Nicolas Dichtel, Phil Sutter, Toshiaki Makita

3.2.80-rc1 review patch.  If anyone has any objections, please let me know.

------------------

From: Vijay Pandurangan <vijayp@vijayp.ca>

[ Upstream commit ce8c839b74e3017996fad4e1b7ba2e2625ede82f ]

Packets that arrive from real hardware devices have ip_summed ==
CHECKSUM_UNNECESSARY if the hardware verified the checksums, or
CHECKSUM_NONE if the packet is bad or it was unable to verify it. The
current version of veth will replace CHECKSUM_NONE with
CHECKSUM_UNNECESSARY, which causes corrupt packets routed from hardware to
a veth device to be delivered to the application. This caused applications
at Twitter to receive corrupt data when network hardware was corrupting
packets.

We believe this was added as an optimization to skip computing and
verifying checksums for communication between containers. However, locally
generated packets have ip_summed == CHECKSUM_PARTIAL, so the code as
written does nothing for them. As far as we can tell, after removing this
code, these packets are transmitted from one stack to another unmodified
(tcpdump shows invalid checksums on both sides, as expected), and they are
delivered correctly to applications. We didn’t test every possible network
configuration, but we tried a few common ones such as bridging containers,
using NAT between the host and a container, and routing from hardware
devices to containers. We have effectively deployed this in production at
Twitter (by disabling RX checksum offloading on veth devices).

This code dates back to the first version of the driver, commit
<e314dbdc1c0dc6a548ecf> ("[NET]: Virtual ethernet device driver"), so I
suspect this bug occurred mostly because the driver API has evolved
significantly since then. Commit <0b7967503dc97864f283a> ("net/veth: Fix
packet checksumming") (in December 2010) fixed this for packets that get
created locally and sent to hardware devices, by not changing
CHECKSUM_PARTIAL. However, the same issue still occurs for packets coming
in from hardware devices.

Co-authored-by: Evan Jones <ej@evanjones.ca>
Signed-off-by: Evan Jones <ej@evanjones.ca>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: Phil Sutter <phil@nwl.cc>
Cc: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Vijay Pandurangan <vijayp@vijayp.ca>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 drivers/net/veth.c | 6 ------
 1 file changed, 6 deletions(-)

--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -126,11 +126,6 @@ static netdev_tx_t veth_xmit(struct sk_b
 	stats = this_cpu_ptr(priv->stats);
 	rcv_stats = this_cpu_ptr(rcv_priv->stats);

-	/* don't change ip_summed == CHECKSUM_PARTIAL, as that
-	   will cause bad checksum on forwarded packets */
-	if (skb->ip_summed == CHECKSUM_NONE &&
-	    rcv->features & NETIF_F_RXCSUM)
-		skb->ip_summed = CHECKSUM_UNNECESSARY;

 	length = skb->len;
 	if (dev_forward_skb(rcv, skb) != NET_RX_SUCCESS)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
  2016-04-26 23:02 ` [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good Ben Hutchings
@ 2016-04-27 15:59   ` Ben Greear
  2016-04-27 18:07     ` Ben Hutchings
  0 siblings, 1 reply; 24+ messages in thread
From: Ben Greear @ 2016-04-27 15:59 UTC (permalink / raw)
  To: Ben Hutchings, linux-kernel, stable
  Cc: akpm, David S. Miller, Vijay Pandurangan, Cong Wang, netdev,
	Evan Jones, Nicolas Dichtel, Phil Sutter, Toshiaki Makita

On 04/26/2016 04:02 PM, Ben Hutchings wrote:
> 3.2.80-rc1 review patch.  If anyone has any objections, please let me know.

I would be careful about this.  It causes regressions when sending
PACKET_SOCKET buffers from user-space to veth devices.

There was a proposed upstream fix for the regression, but it has not gone
into the tree as far as I know.

http://www.spinics.net/lists/netdev/msg370436.html

Thanks,
Ben

>
> ------------------
>
> From: Vijay Pandurangan <vijayp@vijayp.ca>
>
> [ Upstream commit ce8c839b74e3017996fad4e1b7ba2e2625ede82f ]
>
> Packets that arrive from real hardware devices have ip_summed ==
> CHECKSUM_UNNECESSARY if the hardware verified the checksums, or
> CHECKSUM_NONE if the packet is bad or it was unable to verify it. The
> current version of veth will replace CHECKSUM_NONE with
> CHECKSUM_UNNECESSARY, which causes corrupt packets routed from hardware to
> a veth device to be delivered to the application. This caused applications
> at Twitter to receive corrupt data when network hardware was corrupting
> packets.
>
> We believe this was added as an optimization to skip computing and
> verifying checksums for communication between containers. However, locally
> generated packets have ip_summed == CHECKSUM_PARTIAL, so the code as
> written does nothing for them. As far as we can tell, after removing this
> code, these packets are transmitted from one stack to another unmodified
> (tcpdump shows invalid checksums on both sides, as expected), and they are
> delivered correctly to applications. We didn’t test every possible network
> configuration, but we tried a few common ones such as bridging containers,
> using NAT between the host and a container, and routing from hardware
> devices to containers. We have effectively deployed this in production at
> Twitter (by disabling RX checksum offloading on veth devices).
>
> This code dates back to the first version of the driver, commit
> <e314dbdc1c0dc6a548ecf> ("[NET]: Virtual ethernet device driver"), so I
> suspect this bug occurred mostly because the driver API has evolved
> significantly since then. Commit <0b7967503dc97864f283a> ("net/veth: Fix
> packet checksumming") (in December 2010) fixed this for packets that get
> created locally and sent to hardware devices, by not changing
> CHECKSUM_PARTIAL. However, the same issue still occurs for packets coming
> in from hardware devices.
>
> Co-authored-by: Evan Jones <ej@evanjones.ca>
> Signed-off-by: Evan Jones <ej@evanjones.ca>
> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Cc: Phil Sutter <phil@nwl.cc>
> Cc: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
> Cc: netdev@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Vijay Pandurangan <vijayp@vijayp.ca>
> Acked-by: Cong Wang <cwang@twopensource.com>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> [bwh: Backported to 3.2: adjust context]
> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
> ---
>   drivers/net/veth.c | 6 ------
>   1 file changed, 6 deletions(-)
>
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -126,11 +126,6 @@ static netdev_tx_t veth_xmit(struct sk_b
>   	stats = this_cpu_ptr(priv->stats);
>   	rcv_stats = this_cpu_ptr(rcv_priv->stats);
>
> -	/* don't change ip_summed == CHECKSUM_PARTIAL, as that
> -	   will cause bad checksum on forwarded packets */
> -	if (skb->ip_summed == CHECKSUM_NONE &&
> -	    rcv->features & NETIF_F_RXCSUM)
> -		skb->ip_summed = CHECKSUM_UNNECESSARY;
>
>   	length = skb->len;
>   	if (dev_forward_skb(rcv, skb) != NET_RX_SUCCESS)
>


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
  2016-04-27 15:59   ` Ben Greear
@ 2016-04-27 18:07     ` Ben Hutchings
  2016-04-28  0:00       ` Hannes Frederic Sowa
  0 siblings, 1 reply; 24+ messages in thread
From: Ben Hutchings @ 2016-04-27 18:07 UTC (permalink / raw)
  To: Ben Greear, linux-kernel, stable
  Cc: akpm, David S. Miller, Vijay Pandurangan, Cong Wang, netdev,
	Evan Jones, Nicolas Dichtel, Phil Sutter, Toshiaki Makita

[-- Attachment #1: Type: text/plain, Size: 631 bytes --]

On Wed, 2016-04-27 at 08:59 -0700, Ben Greear wrote:
> On 04/26/2016 04:02 PM, Ben Hutchings wrote:
> > 
> > 3.2.80-rc1 review patch.  If anyone has any objections, please let me know.
> I would be careful about this.  It causes regressions when sending
> PACKET_SOCKET buffers from user-space to veth devices.
> 
> There was a proposed upstream fix for the regression, but it has not gone
> into the tree as far as I know.
> 
> http://www.spinics.net/lists/netdev/msg370436.html
[...]

OK, I'll drop this for now.

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
  2016-04-27 18:07     ` Ben Hutchings
@ 2016-04-28  0:00       ` Hannes Frederic Sowa
  2016-04-28  0:14         ` Ben Greear
  0 siblings, 1 reply; 24+ messages in thread
From: Hannes Frederic Sowa @ 2016-04-28  0:00 UTC (permalink / raw)
  To: Ben Hutchings, Ben Greear, linux-kernel, stable
  Cc: akpm, David S. Miller, Vijay Pandurangan, Cong Wang, netdev,
	Evan Jones, Nicolas Dichtel, Phil Sutter, Toshiaki Makita

Hi Ben,

On Wed, Apr 27, 2016, at 20:07, Ben Hutchings wrote:
> On Wed, 2016-04-27 at 08:59 -0700, Ben Greear wrote:
> > On 04/26/2016 04:02 PM, Ben Hutchings wrote:
> > > 
> > > 3.2.80-rc1 review patch.  If anyone has any objections, please let me know.
> > I would be careful about this.  It causes regressions when sending
> > PACKET_SOCKET buffers from user-space to veth devices.
> > 
> > There was a proposed upstream fix for the regression, but it has not gone
> > into the tree as far as I know.
> > 
> > http://www.spinics.net/lists/netdev/msg370436.html
> [...]
> 
> OK, I'll drop this for now.

The fall out from not having this patch is in my opinion a bigger
fallout than not having this patch. This patch fixes silent data
corruption vs. the problem Ben Greear is talking about, which might not
be that a common usage.

What do others think?

Bye,
Hannes

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
  2016-04-28  0:00       ` Hannes Frederic Sowa
@ 2016-04-28  0:14         ` Ben Greear
  2016-04-28 10:29           ` Sabrina Dubroca
  0 siblings, 1 reply; 24+ messages in thread
From: Ben Greear @ 2016-04-28  0:14 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Ben Hutchings, linux-kernel, stable
  Cc: akpm, David S. Miller, Vijay Pandurangan, Cong Wang, netdev,
	Evan Jones, Nicolas Dichtel, Phil Sutter, Toshiaki Makita,
	xiyou.wangcong

On 04/27/2016 05:00 PM, Hannes Frederic Sowa wrote:
> Hi Ben,
>
> On Wed, Apr 27, 2016, at 20:07, Ben Hutchings wrote:
>> On Wed, 2016-04-27 at 08:59 -0700, Ben Greear wrote:
>>> On 04/26/2016 04:02 PM, Ben Hutchings wrote:
>>>>
>>>> 3.2.80-rc1 review patch.  If anyone has any objections, please let me know.
>>> I would be careful about this.  It causes regressions when sending
>>> PACKET_SOCKET buffers from user-space to veth devices.
>>>
>>> There was a proposed upstream fix for the regression, but it has not gone
>>> into the tree as far as I know.
>>>
>>> http://www.spinics.net/lists/netdev/msg370436.html
>> [...]
>>
>> OK, I'll drop this for now.
>
> The fall out from not having this patch is in my opinion a bigger
> fallout than not having this patch. This patch fixes silent data
> corruption vs. the problem Ben Greear is talking about, which might not
> be that a common usage.
>
> What do others think?
>
> Bye,
> Hannes
>

This patch from Cong Wang seems to fix the regression for me, I think it should be added and
tested in the main tree, and then apply them to stable as a pair.

http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a



diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index da1ae0e..f8cc758 100644 (file)
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1926,6 +1926,7 @@ retry:
                 goto out_unlock;
         }

+       skb->ip_summed = CHECKSUM_UNNECESSARY;
         skb->protocol = proto;
         skb->dev = dev;
         skb->priority = sk->sk_priority;
@@ -2352,6 +2353,7 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,

         ph.raw = frame;

+       skb->ip_summed = CHECKSUM_UNNECESSARY;
         skb->protocol = proto;
         skb->dev = dev;
         skb->priority = po->sk.sk_priority;
@@ -2776,6 +2778,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
                 goto out_free;
         }

+       skb->ip_summed = CHECKSUM_UNNECESSARY;
         skb->protocol = proto;
         skb->dev = dev;
         skb->priority = sk->sk_priority;

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
  2016-04-28  0:14         ` Ben Greear
@ 2016-04-28 10:29           ` Sabrina Dubroca
  2016-04-28 13:45             ` Ben Greear
                               ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Sabrina Dubroca @ 2016-04-28 10:29 UTC (permalink / raw)
  To: Ben Greear
  Cc: Hannes Frederic Sowa, Ben Hutchings, linux-kernel, stable, akpm,
	David S. Miller, Vijay Pandurangan, Cong Wang, netdev, Evan Jones,
	Nicolas Dichtel, Phil Sutter, Toshiaki Makita, xiyou.wangcong

Hello,

2016-04-27, 17:14:44 -0700, Ben Greear wrote:
> On 04/27/2016 05:00 PM, Hannes Frederic Sowa wrote:
> > Hi Ben,
> > 
> > On Wed, Apr 27, 2016, at 20:07, Ben Hutchings wrote:
> > > On Wed, 2016-04-27 at 08:59 -0700, Ben Greear wrote:
> > > > On 04/26/2016 04:02 PM, Ben Hutchings wrote:
> > > > > 
> > > > > 3.2.80-rc1 review patch.  If anyone has any objections, please let me know.
> > > > I would be careful about this.  It causes regressions when sending
> > > > PACKET_SOCKET buffers from user-space to veth devices.
> > > > 
> > > > There was a proposed upstream fix for the regression, but it has not gone
> > > > into the tree as far as I know.
> > > > 
> > > > http://www.spinics.net/lists/netdev/msg370436.html
> > > [...]
> > > 
> > > OK, I'll drop this for now.
> > 
> > The fall out from not having this patch is in my opinion a bigger
> > fallout than not having this patch. This patch fixes silent data
> > corruption vs. the problem Ben Greear is talking about, which might not
> > be that a common usage.
> > 
> > What do others think?
> > 
> > Bye,
> > Hannes
> > 
> 
> This patch from Cong Wang seems to fix the regression for me, I think it should be added and
> tested in the main tree, and then apply them to stable as a pair.
> 
> http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a

Actually, no, this is not really a regression.

If you capture packets on a device with checksum offloading enabled,
the TCP/UDP checksum isn't filled.  veth also behaves that way.  What
the "veth: don't modify ip_summed" patch does is enable proper
checksum validation on veth.  This really was a bug in veth.

Cong's patch would also break cases where we choose to inject packets
with invalid checksums, and they would now be accepted as correct.

Your use case is invalid, it just happened to work because of a
bug.  If you want the stack to fill checksums so that you want capture
and reinject packets, you have to disable checksum offloading (or
compute the checksum yourself in userspace).

Thanks.

-- 
Sabrina

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
  2016-04-28 10:29           ` Sabrina Dubroca
@ 2016-04-28 13:45             ` Ben Greear
  2016-04-30 19:18               ` Ben Hutchings
  2016-04-30 18:33             ` Ben Hutchings
  2016-04-30 20:15             ` Vijay Pandurangan
  2 siblings, 1 reply; 24+ messages in thread
From: Ben Greear @ 2016-04-28 13:45 UTC (permalink / raw)
  To: Sabrina Dubroca
  Cc: Hannes Frederic Sowa, Ben Hutchings, linux-kernel, stable, akpm,
	David S. Miller, Vijay Pandurangan, Cong Wang, netdev, Evan Jones,
	Nicolas Dichtel, Phil Sutter, Toshiaki Makita, xiyou.wangcong



On 04/28/2016 03:29 AM, Sabrina Dubroca wrote:
> Hello,
>
> 2016-04-27, 17:14:44 -0700, Ben Greear wrote:
>> On 04/27/2016 05:00 PM, Hannes Frederic Sowa wrote:
>>> Hi Ben,
>>>
>>> On Wed, Apr 27, 2016, at 20:07, Ben Hutchings wrote:
>>>> On Wed, 2016-04-27 at 08:59 -0700, Ben Greear wrote:
>>>>> On 04/26/2016 04:02 PM, Ben Hutchings wrote:
>>>>>>
>>>>>> 3.2.80-rc1 review patch.  If anyone has any objections, please let me know.
>>>>> I would be careful about this.  It causes regressions when sending
>>>>> PACKET_SOCKET buffers from user-space to veth devices.
>>>>>
>>>>> There was a proposed upstream fix for the regression, but it has not gone
>>>>> into the tree as far as I know.
>>>>>
>>>>> http://www.spinics.net/lists/netdev/msg370436.html
>>>> [...]
>>>>
>>>> OK, I'll drop this for now.
>>>
>>> The fall out from not having this patch is in my opinion a bigger
>>> fallout than not having this patch. This patch fixes silent data
>>> corruption vs. the problem Ben Greear is talking about, which might not
>>> be that a common usage.
>>>
>>> What do others think?
>>>
>>> Bye,
>>> Hannes
>>>
>>
>> This patch from Cong Wang seems to fix the regression for me, I think it should be added and
>> tested in the main tree, and then apply them to stable as a pair.
>>
>> http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a
>
> Actually, no, this is not really a regression.
>
> If you capture packets on a device with checksum offloading enabled,
> the TCP/UDP checksum isn't filled.  veth also behaves that way.  What
> the "veth: don't modify ip_summed" patch does is enable proper
> checksum validation on veth.  This really was a bug in veth.
>
> Cong's patch would also break cases where we choose to inject packets
> with invalid checksums, and they would now be accepted as correct.
>
> Your use case is invalid, it just happened to work because of a
> bug.  If you want the stack to fill checksums so that you want capture
> and reinject packets, you have to disable checksum offloading (or
> compute the checksum yourself in userspace).

Disabling checksum offloading or computing in user-space (and then
recomputing in veth to verify the checksum?) is a huge performance loss.

Maybe we could add a socket option to enable Cong's patch on a per-socket
basis?  That way my use-case can still work and you can have this new
behaviour by default?

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
  2016-04-28 10:29           ` Sabrina Dubroca
  2016-04-28 13:45             ` Ben Greear
@ 2016-04-30 18:33             ` Ben Hutchings
  2016-04-30 19:40               ` Ben Greear
  2016-04-30 20:15             ` Vijay Pandurangan
  2 siblings, 1 reply; 24+ messages in thread
From: Ben Hutchings @ 2016-04-30 18:33 UTC (permalink / raw)
  To: Sabrina Dubroca, Ben Greear
  Cc: Hannes Frederic Sowa, linux-kernel, stable, akpm, David S. Miller,
	Vijay Pandurangan, Cong Wang, netdev, Evan Jones, Nicolas Dichtel,
	Phil Sutter, Toshiaki Makita, xiyou.wangcong

[-- Attachment #1: Type: text/plain, Size: 1967 bytes --]

On Thu, 2016-04-28 at 12:29 +0200, Sabrina Dubroca wrote:
> Hello,
> 
> 2016-04-27, 17:14:44 -0700, Ben Greear wrote:
> > 
> > On 04/27/2016 05:00 PM, Hannes Frederic Sowa wrote:
> > > 
> > > Hi Ben,
> > > 
> > > On Wed, Apr 27, 2016, at 20:07, Ben Hutchings wrote:
> > > > 
> > > > On Wed, 2016-04-27 at 08:59 -0700, Ben Greear wrote:
> > > > > 
> > > > > On 04/26/2016 04:02 PM, Ben Hutchings wrote:
> > > > > > 
> > > > > > 
> > > > > > 3.2.80-rc1 review patch.  If anyone has any objections, please let me know.
> > > > > I would be careful about this.  It causes regressions when sending
> > > > > PACKET_SOCKET buffers from user-space to veth devices.
> > > > > 
> > > > > There was a proposed upstream fix for the regression, but it has not gone
> > > > > into the tree as far as I know.
> > > > > 
> > > > > http://www.spinics.net/lists/netdev/msg370436.html
> > > > [...]
> > > > 
> > > > OK, I'll drop this for now.
> > > The fall out from not having this patch is in my opinion a bigger
> > > fallout than not having this patch. This patch fixes silent data
> > > corruption vs. the problem Ben Greear is talking about, which might not
> > > be that a common usage.
> > > 
> > > What do others think?
> > > 
> > > Bye,
> > > Hannes
> > > 
> > This patch from Cong Wang seems to fix the regression for me, I think it should be added and
> > tested in the main tree, and then apply them to stable as a pair.
> > 
> > http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a
> Actually, no, this is not really a regression.
[...]

It really is.  Even though the old behaviour was a bug (raw packets
should not be changed), if there are real applications that depend on
that then we have to keep those applications working somehow.

Ben.

-- 
Ben Hutchings
Tomorrow will be cancelled due to lack of interest.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
  2016-04-28 13:45             ` Ben Greear
@ 2016-04-30 19:18               ` Ben Hutchings
  0 siblings, 0 replies; 24+ messages in thread
From: Ben Hutchings @ 2016-04-30 19:18 UTC (permalink / raw)
  To: Ben Greear, Sabrina Dubroca
  Cc: Hannes Frederic Sowa, linux-kernel, stable, akpm, David S. Miller,
	Vijay Pandurangan, Cong Wang, netdev, Evan Jones, Nicolas Dichtel,
	Phil Sutter, Toshiaki Makita, xiyou.wangcong

[-- Attachment #1: Type: text/plain, Size: 1079 bytes --]

On Thu, 2016-04-28 at 06:45 -0700, Ben Greear wrote:
> 
> On 04/28/2016 03:29 AM, Sabrina Dubroca wrote:
[...]
> > Your use case is invalid, it just happened to work because of a
> > bug.  If you want the stack to fill checksums so that you want capture
> > and reinject packets, you have to disable checksum offloading (or
> > compute the checksum yourself in userspace).
> Disabling checksum offloading or computing in user-space (and then
> recomputing in veth to verify the checksum?) is a huge performance loss.
> 
> Maybe we could add a socket option to enable Cong's patch on a per-socket
> basis?  That way my use-case can still work and you can have this new
> behaviour by default?

It does sound like a useful option to have.  If there are other
applications that depend on veth's checksum-fixing behaviour and are
being distributed in binary form, then a per-device option might be
necessary so users can keep those applications working before they're
updated.

Ben.

-- 
Ben Hutchings
Tomorrow will be cancelled due to lack of interest.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
  2016-04-30 18:33             ` Ben Hutchings
@ 2016-04-30 19:40               ` Ben Greear
  2016-04-30 19:54                 ` Tom Herbert
  0 siblings, 1 reply; 24+ messages in thread
From: Ben Greear @ 2016-04-30 19:40 UTC (permalink / raw)
  To: Ben Hutchings, Sabrina Dubroca
  Cc: Hannes Frederic Sowa, linux-kernel, stable, akpm, David S. Miller,
	Vijay Pandurangan, Cong Wang, netdev, Evan Jones, Nicolas Dichtel,
	Phil Sutter, Toshiaki Makita, xiyou.wangcong



On 04/30/2016 11:33 AM, Ben Hutchings wrote:
> On Thu, 2016-04-28 at 12:29 +0200, Sabrina Dubroca wrote:
>> Hello,

>>> http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a
>> Actually, no, this is not really a regression.
> [...]
>
> It really is.  Even though the old behaviour was a bug (raw packets
> should not be changed), if there are real applications that depend on
> that then we have to keep those applications working somehow.

To be honest, I fail to see why the old behaviour is a bug when sending
raw packets from user-space.  If raw packets should not be changed, then
we need some way to specify what the checksum setting is to begin with,
otherwise, user-space has not enough control.

A socket option for new programs, and sysctl configurable defaults for raw sockets
for old binary programs would be sufficient I think.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
  2016-04-30 19:40               ` Ben Greear
@ 2016-04-30 19:54                 ` Tom Herbert
  2016-04-30 20:59                   ` Ben Greear
  0 siblings, 1 reply; 24+ messages in thread
From: Tom Herbert @ 2016-04-30 19:54 UTC (permalink / raw)
  To: Ben Greear
  Cc: Ben Hutchings, Sabrina Dubroca, Hannes Frederic Sowa, LKML,
	stable, akpm, David S. Miller, Vijay Pandurangan, Cong Wang,
	Linux Kernel Network Developers, Evan Jones, Nicolas Dichtel,
	Phil Sutter, Toshiaki Makita, Cong Wang

We've put considerable effort into cleaning up the checksum interface
to make it as unambiguous as possible, please be very careful to
follow it. Broken checksum processing is really hard to detect and
debug.

CHECKSUM_UNNECESSARY means that some number of _specific_ checksums
(indicated by csum_level) have been verified to be correct in a
packet. Blindly promoting CHECKSUM_NONE to CHECKSUM_UNNECESSARY is
never right. If CHECKSUM_UNNECESSARY is set in such a manner but the
checksum it would refer to has not been verified and is incorrect this
is a major bug.

Tom

On Sat, Apr 30, 2016 at 12:40 PM, Ben Greear <greearb@candelatech.com> wrote:
>
>
> On 04/30/2016 11:33 AM, Ben Hutchings wrote:
>>
>> On Thu, 2016-04-28 at 12:29 +0200, Sabrina Dubroca wrote:
>>>
>>> Hello,
>
>
>>>>
>>>> http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a
>>>
>>> Actually, no, this is not really a regression.
>>
>> [...]
>>
>> It really is.  Even though the old behaviour was a bug (raw packets
>> should not be changed), if there are real applications that depend on
>> that then we have to keep those applications working somehow.
>
>
> To be honest, I fail to see why the old behaviour is a bug when sending
> raw packets from user-space.  If raw packets should not be changed, then
> we need some way to specify what the checksum setting is to begin with,
> otherwise, user-space has not enough control.
>
> A socket option for new programs, and sysctl configurable defaults for raw
> sockets
> for old binary programs would be sufficient I think.
>
>
> Thanks,
> Ben
>
> --
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
  2016-04-28 10:29           ` Sabrina Dubroca
  2016-04-28 13:45             ` Ben Greear
  2016-04-30 18:33             ` Ben Hutchings
@ 2016-04-30 20:15             ` Vijay Pandurangan
  2 siblings, 0 replies; 24+ messages in thread
From: Vijay Pandurangan @ 2016-04-30 20:15 UTC (permalink / raw)
  To: Sabrina Dubroca
  Cc: Ben Greear, Hannes Frederic Sowa, Ben Hutchings, LKML, stable,
	akpm, David S. Miller, Cong Wang, Linux Kernel Network Developers,
	Evan Jones, Nicolas Dichtel, Phil Sutter, Toshiaki Makita,
	Cong Wang

[oops – resending this because I was using gmail in HTML mode before
by accident]

There was a discussion on a separate thread about this. I agree with
Sabrina fully. I believe veth should provide an abstraction layer that
correctly emulates a physical network in all ways.

Consider an environment where we have multiple physical computers.
Each computer runs one or more containers, each of which has a
publicly routable ip address.  When adding a new app to the cluster, a
scheduler might decide to run this container on any physical machine
of its choice, assuming that apps have a way of routing traffic to
their backends (we did something similar Google >10 years ago). This
is something we might imagine happening with docker and ipv6 for
instance.

If you have an app, A, which sends raw ethernet frames (the simplest
case I could imagine) with TCP data that has invalid checksums to app
B, which is receiving it, the behaviour of the system _will be
different_ depending upon whether app B is scheduled to run on the
same machine as app A or not. This seems like a clear bug and a broken
abstraction (especially as the default case), and something we should
endeavour to avoid.

I do see Ben's point about enabling sw checksum verification as
potentially incurring a huge performance penalty (I haven't had a
chance to measure it) that is completely wasteful in the vast majority
of cases.

Unfortunately I just don't see how we can solve this problem in a way
that preserves a correct abstraction layer while also avoiding excess
work. I guess the first piece of data that would be helpful is to
determine just how big of a performance penalty this is. If it's
small, then maybe it doesn't matter.

On Thu, Apr 28, 2016 at 6:29 AM, Sabrina Dubroca <sd@queasysnail.net> wrote:
> Hello,
>
> 2016-04-27, 17:14:44 -0700, Ben Greear wrote:
>> On 04/27/2016 05:00 PM, Hannes Frederic Sowa wrote:
>> > Hi Ben,
>> >
>> > On Wed, Apr 27, 2016, at 20:07, Ben Hutchings wrote:
>> > > On Wed, 2016-04-27 at 08:59 -0700, Ben Greear wrote:
>> > > > On 04/26/2016 04:02 PM, Ben Hutchings wrote:
>> > > > >
>> > > > > 3.2.80-rc1 review patch.  If anyone has any objections, please let me know.
>> > > > I would be careful about this.  It causes regressions when sending
>> > > > PACKET_SOCKET buffers from user-space to veth devices.
>> > > >
>> > > > There was a proposed upstream fix for the regression, but it has not gone
>> > > > into the tree as far as I know.
>> > > >
>> > > > http://www.spinics.net/lists/netdev/msg370436.html
>> > > [...]
>> > >
>> > > OK, I'll drop this for now.
>> >
>> > The fall out from not having this patch is in my opinion a bigger
>> > fallout than not having this patch. This patch fixes silent data
>> > corruption vs. the problem Ben Greear is talking about, which might not
>> > be that a common usage.
>> >
>> > What do others think?
>> >
>> > Bye,
>> > Hannes
>> >
>>
>> This patch from Cong Wang seems to fix the regression for me, I think it should be added and
>> tested in the main tree, and then apply them to stable as a pair.
>>
>> http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a
>
> Actually, no, this is not really a regression.
>
> If you capture packets on a device with checksum offloading enabled,
> the TCP/UDP checksum isn't filled.  veth also behaves that way.  What
> the "veth: don't modify ip_summed" patch does is enable proper
> checksum validation on veth.  This really was a bug in veth.
>
> Cong's patch would also break cases where we choose to inject packets
> with invalid checksums, and they would now be accepted as correct.
>
> Your use case is invalid, it just happened to work because of a
> bug.  If you want the stack to fill checksums so that you want capture
> and reinject packets, you have to disable checksum offloading (or
> compute the checksum yourself in userspace).
>
> Thanks.
>
> --
> Sabrina

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
  2016-04-30 19:54                 ` Tom Herbert
@ 2016-04-30 20:59                   ` Ben Greear
  2016-04-30 21:13                     ` Vijay Pandurangan
  2016-04-30 22:42                     ` [PATCH 3.2 085/115] veth: don’t " Tom Herbert
  0 siblings, 2 replies; 24+ messages in thread
From: Ben Greear @ 2016-04-30 20:59 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Ben Hutchings, Sabrina Dubroca, Hannes Frederic Sowa, LKML,
	stable, akpm, David S. Miller, Vijay Pandurangan, Cong Wang,
	Linux Kernel Network Developers, Evan Jones, Nicolas Dichtel,
	Phil Sutter, Toshiaki Makita, Cong Wang


On 04/30/2016 12:54 PM, Tom Herbert wrote:
> We've put considerable effort into cleaning up the checksum interface
> to make it as unambiguous as possible, please be very careful to
> follow it. Broken checksum processing is really hard to detect and
> debug.
>
> CHECKSUM_UNNECESSARY means that some number of _specific_ checksums
> (indicated by csum_level) have been verified to be correct in a
> packet. Blindly promoting CHECKSUM_NONE to CHECKSUM_UNNECESSARY is
> never right. If CHECKSUM_UNNECESSARY is set in such a manner but the
> checksum it would refer to has not been verified and is incorrect this
> is a major bug.

Suppose I know that the packet received on a packet-socket has
already been verified by a NIC that supports hardware checksumming.

Then, I want to transmit it on a veth interface using a second
packet socket.  I do not want veth to recalculate the checksum on
transmit, nor to validate it on the peer veth on receive, because I do
not want to waste the CPU cycles.  I am assuming that my app is not
accidentally corrupting frames, so the checksum can never be bad.

How should the checksumming be configured for the packets going into
the packet-socket from user-space?

Also, I might want to send raw frames that do have
broken checksums (lets assume a real NIC, not veth), and I want them
to hit the wire with those bad checksums.

How do I configure the checksumming in this case?


Thanks,
Ben


>
> Tom
>
> On Sat, Apr 30, 2016 at 12:40 PM, Ben Greear <greearb@candelatech.com> wrote:
>>
>>
>> On 04/30/2016 11:33 AM, Ben Hutchings wrote:
>>>
>>> On Thu, 2016-04-28 at 12:29 +0200, Sabrina Dubroca wrote:
>>>>
>>>> Hello,
>>
>>
>>>>>
>>>>> http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a
>>>>
>>>> Actually, no, this is not really a regression.
>>>
>>> [...]
>>>
>>> It really is.  Even though the old behaviour was a bug (raw packets
>>> should not be changed), if there are real applications that depend on
>>> that then we have to keep those applications working somehow.
>>
>>
>> To be honest, I fail to see why the old behaviour is a bug when sending
>> raw packets from user-space.  If raw packets should not be changed, then
>> we need some way to specify what the checksum setting is to begin with,
>> otherwise, user-space has not enough control.
>>
>> A socket option for new programs, and sysctl configurable defaults for raw
>> sockets
>> for old binary programs would be sufficient I think.
>>
>>
>> Thanks,
>> Ben
>>
>> --
>> Ben Greear <greearb@candelatech.com>
>> Candela Technologies Inc  http://www.candelatech.com
>

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
  2016-04-30 20:59                   ` Ben Greear
@ 2016-04-30 21:13                     ` Vijay Pandurangan
  2016-04-30 21:29                       ` Ben Greear
  2016-04-30 22:42                     ` [PATCH 3.2 085/115] veth: don’t " Tom Herbert
  1 sibling, 1 reply; 24+ messages in thread
From: Vijay Pandurangan @ 2016-04-30 21:13 UTC (permalink / raw)
  To: Ben Greear
  Cc: Tom Herbert, Ben Hutchings, Sabrina Dubroca, Hannes Frederic Sowa,
	LKML, stable, akpm, David S. Miller, Cong Wang,
	Linux Kernel Network Developers, Evan Jones, Nicolas Dichtel,
	Phil Sutter, Toshiaki Makita, Cong Wang

On Sat, Apr 30, 2016 at 4:59 PM, Ben Greear <greearb@candelatech.com> wrote:
>
>
> On 04/30/2016 12:54 PM, Tom Herbert wrote:
>>
>> We've put considerable effort into cleaning up the checksum interface
>> to make it as unambiguous as possible, please be very careful to
>> follow it. Broken checksum processing is really hard to detect and
>> debug.
>>
>> CHECKSUM_UNNECESSARY means that some number of _specific_ checksums
>> (indicated by csum_level) have been verified to be correct in a
>> packet. Blindly promoting CHECKSUM_NONE to CHECKSUM_UNNECESSARY is
>> never right. If CHECKSUM_UNNECESSARY is set in such a manner but the
>> checksum it would refer to has not been verified and is incorrect this
>> is a major bug.
>
>
> Suppose I know that the packet received on a packet-socket has
> already been verified by a NIC that supports hardware checksumming.
>
> Then, I want to transmit it on a veth interface using a second
> packet socket.  I do not want veth to recalculate the checksum on
> transmit, nor to validate it on the peer veth on receive, because I do
> not want to waste the CPU cycles.  I am assuming that my app is not
> accidentally corrupting frames, so the checksum can never be bad.
>
> How should the checksumming be configured for the packets going into
> the packet-socket from user-space?


It seems like that only the receiver should decide whether or not to
checksum packets on the veth, not the sender.

How about:

We could add a receiving socket option for "don't checksum packets
received from a veth when the other side has marked them as
elide-checksum-suggested" (similar to UDP_NOCHECKSUM), and a sending
socket option for "mark all data sent via this socket to a veth as
elide-checksum-suggested".

So the process would be:

Writer:
1. open read socket
2. open write socket, with option elide-checksum-for-veth-suggested
3. write data

Reader:
1. open read socket with "follow-elide-checksum-suggestions-on-veth"
2. read data

The kernel / module would then need to persist the flag on all packets
that traverse a veth, and drop these data when they leave the veth
module.


>
>
> Also, I might want to send raw frames that do have
> broken checksums (lets assume a real NIC, not veth), and I want them
> to hit the wire with those bad checksums.
>
>
> How do I configure the checksumming in this case?


Correct me if I'm wrong but I think this is already possible now. You
can have packets with incorrect checksum hitting the wire as is. What
you cannot do is instruct the receiving end to ignore the checksum
from the sending end when using a physical device (and something I
think we should mimic on the sending device).


>
>
>
> Thanks,
> Ben
>
>
>
>>
>> Tom
>>
>> On Sat, Apr 30, 2016 at 12:40 PM, Ben Greear <greearb@candelatech.com> wrote:
>>>
>>>
>>>
>>> On 04/30/2016 11:33 AM, Ben Hutchings wrote:
>>>>
>>>>
>>>> On Thu, 2016-04-28 at 12:29 +0200, Sabrina Dubroca wrote:
>>>>>
>>>>>
>>>>> Hello,
>>>
>>>
>>>
>>>>>>
>>>>>> http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a
>>>>>
>>>>>
>>>>> Actually, no, this is not really a regression.
>>>>
>>>>
>>>> [...]
>>>>
>>>> It really is.  Even though the old behaviour was a bug (raw packets
>>>> should not be changed), if there are real applications that depend on
>>>> that then we have to keep those applications working somehow.
>>>
>>>
>>>
>>> To be honest, I fail to see why the old behaviour is a bug when sending
>>> raw packets from user-space.  If raw packets should not be changed, then
>>> we need some way to specify what the checksum setting is to begin with,
>>> otherwise, user-space has not enough control.
>>>
>>> A socket option for new programs, and sysctl configurable defaults for raw
>>> sockets
>>> for old binary programs would be sufficient I think.
>>>
>>>
>>> Thanks,
>>> Ben
>>>
>>> --
>>> Ben Greear <greearb@candelatech.com>
>>> Candela Technologies Inc  http://www.candelatech.com
>>
>>
>
> --
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
  2016-04-30 21:13                     ` Vijay Pandurangan
@ 2016-04-30 21:29                       ` Ben Greear
  2016-04-30 21:36                         ` Vijay Pandurangan
  0 siblings, 1 reply; 24+ messages in thread
From: Ben Greear @ 2016-04-30 21:29 UTC (permalink / raw)
  To: Vijay Pandurangan
  Cc: Tom Herbert, Ben Hutchings, Sabrina Dubroca, Hannes Frederic Sowa,
	LKML, stable, akpm, David S. Miller, Cong Wang,
	Linux Kernel Network Developers, Evan Jones, Nicolas Dichtel,
	Phil Sutter, Toshiaki Makita, Cong Wang



On 04/30/2016 02:13 PM, Vijay Pandurangan wrote:
> On Sat, Apr 30, 2016 at 4:59 PM, Ben Greear <greearb@candelatech.com> wrote:
>>
>>
>> On 04/30/2016 12:54 PM, Tom Herbert wrote:
>>>
>>> We've put considerable effort into cleaning up the checksum interface
>>> to make it as unambiguous as possible, please be very careful to
>>> follow it. Broken checksum processing is really hard to detect and
>>> debug.
>>>
>>> CHECKSUM_UNNECESSARY means that some number of _specific_ checksums
>>> (indicated by csum_level) have been verified to be correct in a
>>> packet. Blindly promoting CHECKSUM_NONE to CHECKSUM_UNNECESSARY is
>>> never right. If CHECKSUM_UNNECESSARY is set in such a manner but the
>>> checksum it would refer to has not been verified and is incorrect this
>>> is a major bug.
>>
>>
>> Suppose I know that the packet received on a packet-socket has
>> already been verified by a NIC that supports hardware checksumming.
>>
>> Then, I want to transmit it on a veth interface using a second
>> packet socket.  I do not want veth to recalculate the checksum on
>> transmit, nor to validate it on the peer veth on receive, because I do
>> not want to waste the CPU cycles.  I am assuming that my app is not
>> accidentally corrupting frames, so the checksum can never be bad.
>>
>> How should the checksumming be configured for the packets going into
>> the packet-socket from user-space?
>
>
> It seems like that only the receiver should decide whether or not to
> checksum packets on the veth, not the sender.
>
> How about:
>
> We could add a receiving socket option for "don't checksum packets
> received from a veth when the other side has marked them as
> elide-checksum-suggested" (similar to UDP_NOCHECKSUM), and a sending
> socket option for "mark all data sent via this socket to a veth as
> elide-checksum-suggested".
>
> So the process would be:
>
> Writer:
> 1. open read socket
> 2. open write socket, with option elide-checksum-for-veth-suggested
> 3. write data
>
> Reader:
> 1. open read socket with "follow-elide-checksum-suggestions-on-veth"
> 2. read data
>
> The kernel / module would then need to persist the flag on all packets
> that traverse a veth, and drop these data when they leave the veth
> module.

I'm not sure this works completely.  In my app, the packet flow might be:

eth0 <-> raw-socket <-> user-space-bridge <-> raw-socket <-> vethA <-> vethB <-> [kernel router/bridge logic ...] <-> eth1

There may be no sockets on the vethB port.  And reader/writer is not
a good way to look at it since I am implementing a bi-directional bridge in
user-space and each packet-socket is for both rx and tx.

>> Also, I might want to send raw frames that do have
>> broken checksums (lets assume a real NIC, not veth), and I want them
>> to hit the wire with those bad checksums.
>>
>>
>> How do I configure the checksumming in this case?
>
>
> Correct me if I'm wrong but I think this is already possible now. You
> can have packets with incorrect checksum hitting the wire as is. What
> you cannot do is instruct the receiving end to ignore the checksum
> from the sending end when using a physical device (and something I
> think we should mimic on the sending device).

Yes, it does work currently (or, last I checked)...I just want to make sure it keeps working.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
  2016-04-30 21:29                       ` Ben Greear
@ 2016-04-30 21:36                         ` Vijay Pandurangan
  2016-04-30 21:52                           ` Ben Greear
  0 siblings, 1 reply; 24+ messages in thread
From: Vijay Pandurangan @ 2016-04-30 21:36 UTC (permalink / raw)
  To: Ben Greear
  Cc: Tom Herbert, Ben Hutchings, Sabrina Dubroca, Hannes Frederic Sowa,
	LKML, stable, akpm, David S. Miller, Cong Wang,
	Linux Kernel Network Developers, Evan Jones, Nicolas Dichtel,
	Phil Sutter, Toshiaki Makita, Cong Wang

On Sat, Apr 30, 2016 at 5:29 PM, Ben Greear <greearb@candelatech.com> wrote:
>
>
> On 04/30/2016 02:13 PM, Vijay Pandurangan wrote:
>>
>> On Sat, Apr 30, 2016 at 4:59 PM, Ben Greear <greearb@candelatech.com>
>> wrote:
>>>
>>>
>>>
>>> On 04/30/2016 12:54 PM, Tom Herbert wrote:
>>>>
>>>>
>>>> We've put considerable effort into cleaning up the checksum interface
>>>> to make it as unambiguous as possible, please be very careful to
>>>> follow it. Broken checksum processing is really hard to detect and
>>>> debug.
>>>>
>>>> CHECKSUM_UNNECESSARY means that some number of _specific_ checksums
>>>> (indicated by csum_level) have been verified to be correct in a
>>>> packet. Blindly promoting CHECKSUM_NONE to CHECKSUM_UNNECESSARY is
>>>> never right. If CHECKSUM_UNNECESSARY is set in such a manner but the
>>>> checksum it would refer to has not been verified and is incorrect this
>>>> is a major bug.
>>>
>>>
>>>
>>> Suppose I know that the packet received on a packet-socket has
>>> already been verified by a NIC that supports hardware checksumming.
>>>
>>> Then, I want to transmit it on a veth interface using a second
>>> packet socket.  I do not want veth to recalculate the checksum on
>>> transmit, nor to validate it on the peer veth on receive, because I do
>>> not want to waste the CPU cycles.  I am assuming that my app is not
>>> accidentally corrupting frames, so the checksum can never be bad.
>>>
>>> How should the checksumming be configured for the packets going into
>>> the packet-socket from user-space?
>>
>>
>>
>> It seems like that only the receiver should decide whether or not to
>> checksum packets on the veth, not the sender.
>>
>> How about:
>>
>> We could add a receiving socket option for "don't checksum packets
>> received from a veth when the other side has marked them as
>> elide-checksum-suggested" (similar to UDP_NOCHECKSUM), and a sending
>> socket option for "mark all data sent via this socket to a veth as
>> elide-checksum-suggested".
>>
>> So the process would be:
>>
>> Writer:
>> 1. open read socket
>> 2. open write socket, with option elide-checksum-for-veth-suggested
>> 3. write data
>>
>> Reader:
>> 1. open read socket with "follow-elide-checksum-suggestions-on-veth"
>> 2. read data
>>
>> The kernel / module would then need to persist the flag on all packets
>> that traverse a veth, and drop these data when they leave the veth
>> module.
>
>
> I'm not sure this works completely.  In my app, the packet flow might be:
>
> eth0 <-> raw-socket <-> user-space-bridge <-> raw-socket <-> vethA <-> vethB
> <-> [kernel router/bridge logic ...] <-> eth1

Good point, so if you had:

eth0 <-> raw <-> user space-bridge <-> raw <-> vethA <-> veth B <->
userspace-stub <->eth1

and user-space hub enabled this elide flag, things would work, right?
Then, it seems like what we need is a way to tell the kernel
router/bridge logic to follow elide signals in packets coming from
veth. I'm not sure what the best way to do this is because I'm less
familiar with conventions in that part of the kernel, but assuming
there's a way to do this, would it be acceptable?



>
> There may be no sockets on the vethB port.  And reader/writer is not
> a good way to look at it since I am implementing a bi-directional bridge in
> user-space and each packet-socket is for both rx and tx.

Sure, but we could model a bidrectional connection as two
unidirectional sockets for our discussions here, right?



>
>>> Also, I might want to send raw frames that do have
>>> broken checksums (lets assume a real NIC, not veth), and I want them
>>> to hit the wire with those bad checksums.
>>>
>>>
>>> How do I configure the checksumming in this case?
>>
>>
>>
>> Correct me if I'm wrong but I think this is already possible now. You
>> can have packets with incorrect checksum hitting the wire as is. What
>> you cannot do is instruct the receiving end to ignore the checksum
>> from the sending end when using a physical device (and something I
>> think we should mimic on the sending device).
>
>
> Yes, it does work currently (or, last I checked)...I just want to make sure
> it keeps working.

Yeah, good point. It would be great if we could write a test, or at
the very least, a list of invariants about what kinds of things should
work somewhere.

>
>
> Thanks,
> Ben
>
> --
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
  2016-04-30 21:36                         ` Vijay Pandurangan
@ 2016-04-30 21:52                           ` Ben Greear
  2016-04-30 22:01                             ` Vijay Pandurangan
  0 siblings, 1 reply; 24+ messages in thread
From: Ben Greear @ 2016-04-30 21:52 UTC (permalink / raw)
  To: Vijay Pandurangan
  Cc: Tom Herbert, Ben Hutchings, Sabrina Dubroca, Hannes Frederic Sowa,
	LKML, stable, akpm, David S. Miller, Cong Wang,
	Linux Kernel Network Developers, Evan Jones, Nicolas Dichtel,
	Phil Sutter, Toshiaki Makita, Cong Wang



On 04/30/2016 02:36 PM, Vijay Pandurangan wrote:
> On Sat, Apr 30, 2016 at 5:29 PM, Ben Greear <greearb@candelatech.com> wrote:
>>
>>
>> On 04/30/2016 02:13 PM, Vijay Pandurangan wrote:
>>>
>>> On Sat, Apr 30, 2016 at 4:59 PM, Ben Greear <greearb@candelatech.com>
>>> wrote:
>>>>
>>>>
>>>>
>>>> On 04/30/2016 12:54 PM, Tom Herbert wrote:
>>>>>
>>>>>
>>>>> We've put considerable effort into cleaning up the checksum interface
>>>>> to make it as unambiguous as possible, please be very careful to
>>>>> follow it. Broken checksum processing is really hard to detect and
>>>>> debug.
>>>>>
>>>>> CHECKSUM_UNNECESSARY means that some number of _specific_ checksums
>>>>> (indicated by csum_level) have been verified to be correct in a
>>>>> packet. Blindly promoting CHECKSUM_NONE to CHECKSUM_UNNECESSARY is
>>>>> never right. If CHECKSUM_UNNECESSARY is set in such a manner but the
>>>>> checksum it would refer to has not been verified and is incorrect this
>>>>> is a major bug.
>>>>
>>>>
>>>>
>>>> Suppose I know that the packet received on a packet-socket has
>>>> already been verified by a NIC that supports hardware checksumming.
>>>>
>>>> Then, I want to transmit it on a veth interface using a second
>>>> packet socket.  I do not want veth to recalculate the checksum on
>>>> transmit, nor to validate it on the peer veth on receive, because I do
>>>> not want to waste the CPU cycles.  I am assuming that my app is not
>>>> accidentally corrupting frames, so the checksum can never be bad.
>>>>
>>>> How should the checksumming be configured for the packets going into
>>>> the packet-socket from user-space?
>>>
>>>
>>>
>>> It seems like that only the receiver should decide whether or not to
>>> checksum packets on the veth, not the sender.
>>>
>>> How about:
>>>
>>> We could add a receiving socket option for "don't checksum packets
>>> received from a veth when the other side has marked them as
>>> elide-checksum-suggested" (similar to UDP_NOCHECKSUM), and a sending
>>> socket option for "mark all data sent via this socket to a veth as
>>> elide-checksum-suggested".
>>>
>>> So the process would be:
>>>
>>> Writer:
>>> 1. open read socket
>>> 2. open write socket, with option elide-checksum-for-veth-suggested
>>> 3. write data
>>>
>>> Reader:
>>> 1. open read socket with "follow-elide-checksum-suggestions-on-veth"
>>> 2. read data
>>>
>>> The kernel / module would then need to persist the flag on all packets
>>> that traverse a veth, and drop these data when they leave the veth
>>> module.
>>
>>
>> I'm not sure this works completely.  In my app, the packet flow might be:
>>
>> eth0 <-> raw-socket <-> user-space-bridge <-> raw-socket <-> vethA <-> vethB
>> <-> [kernel router/bridge logic ...] <-> eth1
>
> Good point, so if you had:
>
> eth0 <-> raw <-> user space-bridge <-> raw <-> vethA <-> veth B <->
> userspace-stub <->eth1
>
> and user-space hub enabled this elide flag, things would work, right?
> Then, it seems like what we need is a way to tell the kernel
> router/bridge logic to follow elide signals in packets coming from
> veth. I'm not sure what the best way to do this is because I'm less
> familiar with conventions in that part of the kernel, but assuming
> there's a way to do this, would it be acceptable?

You cannot receive on one veth without transmitting on the other, so
I think the elide csum logic can go on the raw-socket, and apply to packets
in the transmit-from-user-space direction.  Just allowing the socket to make
the veth behave like it used to before this patch in question should be good
enough, since that worked for us for years.  So, just an option to modify the
ip_summed for pkts sent on a socket is probably sufficient.

>> There may be no sockets on the vethB port.  And reader/writer is not
>> a good way to look at it since I am implementing a bi-directional bridge in
>> user-space and each packet-socket is for both rx and tx.
>
> Sure, but we could model a bidrectional connection as two
> unidirectional sockets for our discussions here, right?

Best not to I think, you want to make sure that one socket can
correctly handle tx and rx.  As long as that works, then using
uni-directional sockets should work too.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
  2016-04-30 21:52                           ` Ben Greear
@ 2016-04-30 22:01                             ` Vijay Pandurangan
  2016-04-30 22:43                               ` Ben Greear
  0 siblings, 1 reply; 24+ messages in thread
From: Vijay Pandurangan @ 2016-04-30 22:01 UTC (permalink / raw)
  To: Ben Greear
  Cc: Tom Herbert, Ben Hutchings, Sabrina Dubroca, Hannes Frederic Sowa,
	LKML, stable, akpm, David S. Miller, Cong Wang,
	Linux Kernel Network Developers, Evan Jones, Nicolas Dichtel,
	Phil Sutter, Toshiaki Makita, Cong Wang

On Sat, Apr 30, 2016 at 5:52 PM, Ben Greear <greearb@candelatech.com> wrote:
>>
>> Good point, so if you had:
>>
>> eth0 <-> raw <-> user space-bridge <-> raw <-> vethA <-> veth B <->
>> userspace-stub <->eth1
>>
>> and user-space hub enabled this elide flag, things would work, right?
>> Then, it seems like what we need is a way to tell the kernel
>> router/bridge logic to follow elide signals in packets coming from
>> veth. I'm not sure what the best way to do this is because I'm less
>> familiar with conventions in that part of the kernel, but assuming
>> there's a way to do this, would it be acceptable?
>
>
> You cannot receive on one veth without transmitting on the other, so
> I think the elide csum logic can go on the raw-socket, and apply to packets
> in the transmit-from-user-space direction.  Just allowing the socket to make
> the veth behave like it used to before this patch in question should be good
> enough, since that worked for us for years.  So, just an option to modify
> the
> ip_summed for pkts sent on a socket is probably sufficient.

I don't think this is right. Consider:

- App A  sends out corrupt packets 50% of the time and discards inbound data.
- App B doesn't care about corrupt packets and is happy to receive
them and has some way of dealing with them (special case)
- App C is a regular app, say nc or something.

In your world, where A decides what happens to data it transmits,
then
A<--veth-->B and A<---wire-->B will have the same behaviour

but

A<-- veth --> C and A<-- wire --> C will have _different_ behaviour: C
will behave incorrectly if it's connected over veth but correctly if
connected with a wire. That is a bug.

Since A cannot know what the app it's talking to will desire, I argue
that both sides of a message must be opted in to this optimization.






>
>>> There may be no sockets on the vethB port.  And reader/writer is not
>>> a good way to look at it since I am implementing a bi-directional bridge
>>> in
>>> user-space and each packet-socket is for both rx and tx.
>>
>>
>> Sure, but we could model a bidrectional connection as two
>> unidirectional sockets for our discussions here, right?
>
>
> Best not to I think, you want to make sure that one socket can
> correctly handle tx and rx.  As long as that works, then using
> uni-directional sockets should work too.
>
>
> Thanks,
> Ben
>
> --
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
  2016-04-30 20:59                   ` Ben Greear
  2016-04-30 21:13                     ` Vijay Pandurangan
@ 2016-04-30 22:42                     ` Tom Herbert
  1 sibling, 0 replies; 24+ messages in thread
From: Tom Herbert @ 2016-04-30 22:42 UTC (permalink / raw)
  To: Ben Greear
  Cc: Ben Hutchings, Sabrina Dubroca, Hannes Frederic Sowa, LKML,
	stable, akpm, David S. Miller, Vijay Pandurangan, Cong Wang,
	Linux Kernel Network Developers, Evan Jones, Nicolas Dichtel,
	Phil Sutter, Toshiaki Makita, Cong Wang

On Sat, Apr 30, 2016 at 1:59 PM, Ben Greear <greearb@candelatech.com> wrote:
>
> On 04/30/2016 12:54 PM, Tom Herbert wrote:
>>
>> We've put considerable effort into cleaning up the checksum interface
>> to make it as unambiguous as possible, please be very careful to
>> follow it. Broken checksum processing is really hard to detect and
>> debug.
>>
>> CHECKSUM_UNNECESSARY means that some number of _specific_ checksums
>> (indicated by csum_level) have been verified to be correct in a
>> packet. Blindly promoting CHECKSUM_NONE to CHECKSUM_UNNECESSARY is
>> never right. If CHECKSUM_UNNECESSARY is set in such a manner but the
>> checksum it would refer to has not been verified and is incorrect this
>> is a major bug.
>
>
> Suppose I know that the packet received on a packet-socket has
> already been verified by a NIC that supports hardware checksumming.
>
> Then, I want to transmit it on a veth interface using a second
> packet socket.  I do not want veth to recalculate the checksum on
> transmit, nor to validate it on the peer veth on receive, because I do
> not want to waste the CPU cycles.  I am assuming that my app is not
> accidentally corrupting frames, so the checksum can never be bad.
>
> How should the checksumming be configured for the packets going into
> the packet-socket from user-space?
>
Checksum unnecessary conversion to checksum complete should be done
and then the computed value can be sent to user space. If you want to
take advantage of the value on transmit then we would to change the
interface. But I'm not sure why recalculating the the checksum in the
host is even a problem. With all the advances in checksum offload,
LCO, RCO it seems like that shouldn't be a common case anyway.

> Also, I might want to send raw frames that do have
> broken checksums (lets assume a real NIC, not veth), and I want them
> to hit the wire with those bad checksums.
>
> How do I configure the checksumming in this case?

Just send raw packets with bad checksums (no checksum offload).

Tom

>
>
> Thanks,
> Ben
>
>
>
>>
>> Tom
>>
>> On Sat, Apr 30, 2016 at 12:40 PM, Ben Greear <greearb@candelatech.com>
>> wrote:
>>>
>>>
>>>
>>> On 04/30/2016 11:33 AM, Ben Hutchings wrote:
>>>>
>>>>
>>>> On Thu, 2016-04-28 at 12:29 +0200, Sabrina Dubroca wrote:
>>>>>
>>>>>
>>>>> Hello,
>>>
>>>
>>>
>>>>>>
>>>>>>
>>>>>> http://dmz2.candelatech.com/?p=linux-4.4.dev.y/.git;a=commitdiff;h=8153e983c0e5eba1aafe1fc296248ed2a553f1ac;hp=454b07405d694dad52e7f41af5816eed0190da8a
>>>>>
>>>>>
>>>>> Actually, no, this is not really a regression.
>>>>
>>>>
>>>> [...]
>>>>
>>>> It really is.  Even though the old behaviour was a bug (raw packets
>>>> should not be changed), if there are real applications that depend on
>>>> that then we have to keep those applications working somehow.
>>>
>>>
>>>
>>> To be honest, I fail to see why the old behaviour is a bug when sending
>>> raw packets from user-space.  If raw packets should not be changed, then
>>> we need some way to specify what the checksum setting is to begin with,
>>> otherwise, user-space has not enough control.
>>>
>>> A socket option for new programs, and sysctl configurable defaults for
>>> raw
>>> sockets
>>> for old binary programs would be sufficient I think.
>>>
>>>
>>> Thanks,
>>> Ben
>>>
>>> --
>>> Ben Greear <greearb@candelatech.com>
>>> Candela Technologies Inc  http://www.candelatech.com
>>
>>
>
> --
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
  2016-04-30 22:01                             ` Vijay Pandurangan
@ 2016-04-30 22:43                               ` Ben Greear
  2016-05-01  5:30                                 ` [PATCH 3.2 085/115] veth: don???t " Willy Tarreau
  0 siblings, 1 reply; 24+ messages in thread
From: Ben Greear @ 2016-04-30 22:43 UTC (permalink / raw)
  To: Vijay Pandurangan
  Cc: Tom Herbert, Ben Hutchings, Sabrina Dubroca, Hannes Frederic Sowa,
	LKML, stable, akpm, David S. Miller, Cong Wang,
	Linux Kernel Network Developers, Evan Jones, Nicolas Dichtel,
	Phil Sutter, Toshiaki Makita, Cong Wang



On 04/30/2016 03:01 PM, Vijay Pandurangan wrote:
> On Sat, Apr 30, 2016 at 5:52 PM, Ben Greear <greearb@candelatech.com> wrote:
>>>
>>> Good point, so if you had:
>>>
>>> eth0 <-> raw <-> user space-bridge <-> raw <-> vethA <-> veth B <->
>>> userspace-stub <->eth1
>>>
>>> and user-space hub enabled this elide flag, things would work, right?
>>> Then, it seems like what we need is a way to tell the kernel
>>> router/bridge logic to follow elide signals in packets coming from
>>> veth. I'm not sure what the best way to do this is because I'm less
>>> familiar with conventions in that part of the kernel, but assuming
>>> there's a way to do this, would it be acceptable?
>>
>>
>> You cannot receive on one veth without transmitting on the other, so
>> I think the elide csum logic can go on the raw-socket, and apply to packets
>> in the transmit-from-user-space direction.  Just allowing the socket to make
>> the veth behave like it used to before this patch in question should be good
>> enough, since that worked for us for years.  So, just an option to modify
>> the
>> ip_summed for pkts sent on a socket is probably sufficient.
>
> I don't think this is right. Consider:
>
> - App A  sends out corrupt packets 50% of the time and discards inbound data.
> - App B doesn't care about corrupt packets and is happy to receive
> them and has some way of dealing with them (special case)
> - App C is a regular app, say nc or something.
>
> In your world, where A decides what happens to data it transmits,
> then
> A<--veth-->B and A<---wire-->B will have the same behaviour
>
> but
>
> A<-- veth --> C and A<-- wire --> C will have _different_ behaviour: C
> will behave incorrectly if it's connected over veth but correctly if
> connected with a wire. That is a bug.
>
> Since A cannot know what the app it's talking to will desire, I argue
> that both sides of a message must be opted in to this optimization.

How can you make a generic app C know how to do this?  The path could be,
for instance:

eth0 <-> user-space-A <-> vethA <-> vethB <-> { kernel routing logic } <-> vethC <-> vethD <-> appC

There are no sockets on vethB, but it does need to have special behaviour to elide
csums.  Even if appC is hacked to know how to twiddle some thing on it's veth port,
mucking with vethD will have no effect on vethB.

With regard to your example above, why would A corrupt packets?  My guess:

1)  It has bugs (so, fix the bugs, it could equally create incorrect data with proper checksums,
     so just enabling checksumming adds no useful protection.)

2)  It means to corrupt frames.  In that case, someone must expect that C should receive incorrect
     frames, otherwise why bother making App-A corrupt them in the first place?

3)  You are explicitly trying to test the kernel checksum logic, so you want the kernel to
     detect the bad checksum and throw away the packet.  In this case, just don't set the socket
     option in appA to elide checksums and the packet will be thrown away.

Any other cases you can think of?

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3.2 085/115] veth: don???t modify ip_summed; doing so treats packets with bad checksums as good.
  2016-04-30 22:43                               ` Ben Greear
@ 2016-05-01  5:30                                 ` Willy Tarreau
  2016-05-13 16:57                                   ` Ben Greear
  0 siblings, 1 reply; 24+ messages in thread
From: Willy Tarreau @ 2016-05-01  5:30 UTC (permalink / raw)
  To: Ben Greear
  Cc: Vijay Pandurangan, Tom Herbert, Ben Hutchings, Sabrina Dubroca,
	Hannes Frederic Sowa, LKML, stable, akpm, David S. Miller,
	Cong Wang, Linux Kernel Network Developers, Evan Jones,
	Nicolas Dichtel, Phil Sutter, Toshiaki Makita, Cong Wang

On Sat, Apr 30, 2016 at 03:43:51PM -0700, Ben Greear wrote:
> On 04/30/2016 03:01 PM, Vijay Pandurangan wrote:
> > Consider:
> > 
> > - App A  sends out corrupt packets 50% of the time and discards inbound data.
(...)
> How can you make a generic app C know how to do this?  The path could be,
> for instance:
> 
> eth0 <-> user-space-A <-> vethA <-> vethB <-> { kernel routing logic } <-> vethC <-> vethD <-> appC
> 
> There are no sockets on vethB, but it does need to have special behaviour to elide
> csums.  Even if appC is hacked to know how to twiddle some thing on it's veth port,
> mucking with vethD will have no effect on vethB.
> 
> With regard to your example above, why would A corrupt packets?  My guess:
> 
> 1)  It has bugs (so, fix the bugs, it could equally create incorrect data with proper checksums,
>     so just enabling checksumming adds no useful protection.)

I agree with Ben here, what he needs is the ability for userspace to be
trusted when *forwarding* a packet. Ideally you'd only want to receive
the csum status per packet on the packet socket and pass the same value
on the vethA interface, with this status being kept when the packet
reaches vethB.

If A purposely corrupts packet, it's A's problem. It's similar to designing
a NIC which intentionally corrupts packets and reports "checksum good".

The real issue is that in order to do things right, the userspace bridge
(here, "A") would really need to pass this status. In Ben's case as he
says, bad checksum packets are dropped before reaching A, so that
simplifies the process quite a bit and that might be what causes some
confusion, but ideally we'd rather have recvmsg() and sendmsg() with
these flags.

I faced the exact same issue 3 years ago when playing with netmap, it was
slow as hell because it would lose all checksum information when packets
were passing through userland, resulting in GRO/GSO etc being disabled,
and had to modify it to let userland preserve it. That's especially
important when you have to deal with possibly corrupted packets not yet
detected in the chain because the NIC did not validate their checksums.

Willy

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3.2 085/115] veth: don???t modify ip_summed; doing so treats packets with bad checksums as good.
  2016-05-01  5:30                                 ` [PATCH 3.2 085/115] veth: don???t " Willy Tarreau
@ 2016-05-13 16:57                                   ` Ben Greear
  2016-05-13 18:21                                     ` David Miller
  0 siblings, 1 reply; 24+ messages in thread
From: Ben Greear @ 2016-05-13 16:57 UTC (permalink / raw)
  To: David S. Miller
  Cc: Willy Tarreau, Vijay Pandurangan, Tom Herbert, Ben Hutchings,
	Sabrina Dubroca, Hannes Frederic Sowa, LKML, stable, akpm,
	Cong Wang, Linux Kernel Network Developers, Evan Jones,
	Nicolas Dichtel, Phil Sutter, Toshiaki Makita, Cong Wang

Mr Miller:

How do you feel about a new socket-option to allow a socket to
request the old veth behaviour?

Thanks,
Ben

On 04/30/2016 10:30 PM, Willy Tarreau wrote:
> On Sat, Apr 30, 2016 at 03:43:51PM -0700, Ben Greear wrote:
>> On 04/30/2016 03:01 PM, Vijay Pandurangan wrote:
>>> Consider:
>>>
>>> - App A  sends out corrupt packets 50% of the time and discards inbound data.
> (...)
>> How can you make a generic app C know how to do this?  The path could be,
>> for instance:
>>
>> eth0 <-> user-space-A <-> vethA <-> vethB <-> { kernel routing logic } <-> vethC <-> vethD <-> appC
>>
>> There are no sockets on vethB, but it does need to have special behaviour to elide
>> csums.  Even if appC is hacked to know how to twiddle some thing on it's veth port,
>> mucking with vethD will have no effect on vethB.
>>
>> With regard to your example above, why would A corrupt packets?  My guess:
>>
>> 1)  It has bugs (so, fix the bugs, it could equally create incorrect data with proper checksums,
>>      so just enabling checksumming adds no useful protection.)
>
> I agree with Ben here, what he needs is the ability for userspace to be
> trusted when *forwarding* a packet. Ideally you'd only want to receive
> the csum status per packet on the packet socket and pass the same value
> on the vethA interface, with this status being kept when the packet
> reaches vethB.
>
> If A purposely corrupts packet, it's A's problem. It's similar to designing
> a NIC which intentionally corrupts packets and reports "checksum good".
>
> The real issue is that in order to do things right, the userspace bridge
> (here, "A") would really need to pass this status. In Ben's case as he
> says, bad checksum packets are dropped before reaching A, so that
> simplifies the process quite a bit and that might be what causes some
> confusion, but ideally we'd rather have recvmsg() and sendmsg() with
> these flags.
>
> I faced the exact same issue 3 years ago when playing with netmap, it was
> slow as hell because it would lose all checksum information when packets
> were passing through userland, resulting in GRO/GSO etc being disabled,
> and had to modify it to let userland preserve it. That's especially
> important when you have to deal with possibly corrupted packets not yet
> detected in the chain because the NIC did not validate their checksums.
>
> Willy
>


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3.2 085/115] veth: don???t modify ip_summed; doing so treats packets with bad checksums as good.
  2016-05-13 16:57                                   ` Ben Greear
@ 2016-05-13 18:21                                     ` David Miller
  2016-05-13 18:23                                       ` Ben Greear
  0 siblings, 1 reply; 24+ messages in thread
From: David Miller @ 2016-05-13 18:21 UTC (permalink / raw)
  To: greearb
  Cc: w, vijayp, tom, ben, sd, hannes, linux-kernel, stable, akpm,
	cwang, netdev, ej, nicolas.dichtel, phil, makita.toshiaki,
	xiyou.wangcong

From: Ben Greear <greearb@candelatech.com>
Date: Fri, 13 May 2016 09:57:19 -0700

> How do you feel about a new socket-option to allow a socket to
> request the old veth behaviour?

I depend upon the opinions of the experts who work upstream on and
maintain these components, since it is an area I am not so familiar
with.

Generally speaking asking me directly for opinions on matters like
this isn't the way to go, in fact I kind of find it irritating.  It
can't all be on me.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3.2 085/115] veth: don???t modify ip_summed; doing so treats packets with bad checksums as good.
  2016-05-13 18:21                                     ` David Miller
@ 2016-05-13 18:23                                       ` Ben Greear
  0 siblings, 0 replies; 24+ messages in thread
From: Ben Greear @ 2016-05-13 18:23 UTC (permalink / raw)
  To: David Miller
  Cc: w, vijayp, tom, ben, sd, hannes, linux-kernel, stable, akpm,
	cwang, netdev, ej, nicolas.dichtel, phil, makita.toshiaki,
	xiyou.wangcong

On 05/13/2016 11:21 AM, David Miller wrote:
> From: Ben Greear <greearb@candelatech.com>
> Date: Fri, 13 May 2016 09:57:19 -0700
>
>> How do you feel about a new socket-option to allow a socket to
>> request the old veth behaviour?
>
> I depend upon the opinions of the experts who work upstream on and
> maintain these components, since it is an area I am not so familiar
> with.
>
> Generally speaking asking me directly for opinions on matters like
> this isn't the way to go, in fact I kind of find it irritating.  It
> can't all be on me.
>

Fair enough, thanks for your time.

Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2016-05-13 18:23 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <lsq.1461711744.351546278@decadent.org.uk>
2016-04-26 23:02 ` [PATCH 3.2 085/115] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good Ben Hutchings
2016-04-27 15:59   ` Ben Greear
2016-04-27 18:07     ` Ben Hutchings
2016-04-28  0:00       ` Hannes Frederic Sowa
2016-04-28  0:14         ` Ben Greear
2016-04-28 10:29           ` Sabrina Dubroca
2016-04-28 13:45             ` Ben Greear
2016-04-30 19:18               ` Ben Hutchings
2016-04-30 18:33             ` Ben Hutchings
2016-04-30 19:40               ` Ben Greear
2016-04-30 19:54                 ` Tom Herbert
2016-04-30 20:59                   ` Ben Greear
2016-04-30 21:13                     ` Vijay Pandurangan
2016-04-30 21:29                       ` Ben Greear
2016-04-30 21:36                         ` Vijay Pandurangan
2016-04-30 21:52                           ` Ben Greear
2016-04-30 22:01                             ` Vijay Pandurangan
2016-04-30 22:43                               ` Ben Greear
2016-05-01  5:30                                 ` [PATCH 3.2 085/115] veth: don???t " Willy Tarreau
2016-05-13 16:57                                   ` Ben Greear
2016-05-13 18:21                                     ` David Miller
2016-05-13 18:23                                       ` Ben Greear
2016-04-30 22:42                     ` [PATCH 3.2 085/115] veth: don’t " Tom Herbert
2016-04-30 20:15             ` Vijay Pandurangan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).