Netdev List

Netdev List
 help / color / mirror / Atom feed

* [patch] yam: integer underflow in yam_ioctl()
From: Dan Carpenter @ 2013-10-14 12:28 UTC (permalink / raw)
  To: Jean-Paul Roubelat; +Cc: linux-hams, netdev, kernel-janitors

We cap bitrate at YAM_MAXBITRATE in yam_ioctl(), but it could also be
negative.  I don't know the impact of using a negative bitrate but let's
prevent it.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

diff --git a/include/linux/yam.h b/include/linux/yam.h
index 7fe2822..512cdc2 100644
--- a/include/linux/yam.h
+++ b/include/linux/yam.h
@@ -77,6 +77,6 @@ struct yamdrv_ioctl_cfg {
 
 struct yamdrv_ioctl_mcs {
 	int cmd;
-	int bitrate;
+	unsigned int bitrate;
 	unsigned char bits[YAM_FPGA_SIZE];
 };

^ permalink raw reply related

* Re: [Xen-devel] [PATCH net-next v4 2/5] xen-netback: add support for IPv6 checksum offload from guest
From: David Vrabel @ 2013-10-14 12:19 UTC (permalink / raw)
  To: Wei Liu
  Cc: Paul Durrant, netdev@vger.kernel.org, Ian Campbell, David Vrabel,
	xen-devel@lists.xen.org
In-Reply-To: <20131014105527.GD11739@zion.uk.xensource.com>

On 14/10/13 11:55, Wei Liu wrote:
> On Mon, Oct 14, 2013 at 11:49:20AM +0100, Paul Durrant wrote:
>>> -----Original Message-----
>>> From: Wei Liu [mailto:wei.liu2@citrix.com]
>>> Sent: 14 October 2013 11:43
>>> To: Paul Durrant
>>> Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; David Vrabel;
>>> Ian Campbell
>>> Subject: Re: [PATCH net-next v4 2/5] xen-netback: add support for IPv6
>>> checksum offload from guest
>>>
>>> On Fri, Oct 11, 2013 at 04:06:19PM +0100, Paul Durrant wrote:
>>> [...]
>>>> -/*
>>>> - * This is the amount of packet we copy rather than map, so that the
>>>> - * guest can't fiddle with the contents of the headers while we do
>>>> - * packet processing on them (netfilter, routing, etc).
>>>> +/* This is a miniumum size for the linear area to avoid lots of
>>>> + * calls to __pskb_pull_tail() as we set up checksum offsets.
>>>>   */
>>>
>>> You seem to forget to explain why 128 is chosen. :-)
>>
>> Is that not sufficient explanation? What sort of thing are you looking for?
>>
> 
>>From the second version of this patch, we had a conversation.
> 
>> Where does 128 come from?
>>
> 
> "It's just an arbitrary power of 2 that was chosen because it seems to
> cover most likely v6 headers and all v4 headers."
> 
> So something like: "We choose 128 which is likely to cover most V6
> headers and all V4 headers" would be sufficeint.

Is "most IPv6 headers" actually good enough?  Don't we need to ensure
netback copies all IP headers?

David

^ permalink raw reply

* Re: [PATCH] net/ethernet: cpsw: Bugfix interrupts before enabling napi
From: Peter Korsgaard @ 2013-10-14 11:48 UTC (permalink / raw)
  To: Markus Pargmann
  Cc: David S. Miller, Florian Fainelli, Mugunthan V N,
	linux-arm-kernel, netdev, kernel
In-Reply-To: <1381691821-25498-1-git-send-email-mpa@pengutronix.de>

>>>>> "Markus" == Markus Pargmann <mpa@pengutronix.de> writes:

 Markus> If interrupts happen before napi_enable was called, the driver will not
 Markus> work as expected. Network transmissions are impossible in this state.
 Markus> This bug can be reproduced easily by restarting the network interface in
 Markus> a loop. After some time any network transmissions on the network
 Markus> interface will fail.

 Markus> This patch fixes the bug by enabling napi before enabling the network
 Markus> interface interrupts.

 Markus> Signed-off-by: Markus Pargmann <mpa@pengutronix.de>

Acked-by: Peter Korsgaard <jacmet@sunsite.dk>

-- 
Bye, Peter Korsgaard

^ permalink raw reply

* Re: DomU's network interface will hung when Dom0 running 32bit
From: Wei Liu @ 2013-10-14 11:19 UTC (permalink / raw)
  To: jianhai luan; +Cc: Ian Campbell, Wei Liu, xen-devel, netdev
In-Reply-To: <52590DFE.6080203@oracle.com>

On Sat, Oct 12, 2013 at 04:53:18PM +0800, jianhai luan wrote:
> Hi Ian,
>   I meet the DomU's network interface hung issue recently, and have
> been working on the issue from that time. I find that DomU's network
> interface, which send lesser package, will hung if Dom0 running
> 32bit and DomU's up-time is very long.  I think that one jiffies
> overflow bug exist in the function tx_credit_exceeded().
>   I know the inline function time_after_eq(a,b) will process jiffies
> overflow, but the function have one limit a should little that (b +
> MAX_SIGNAL_LONG). If a large than the value, time_after_eq will
> return false. The MAX_SINGNAL_LONG should be 0x7fffffff at 32-bit
> machine.
>   If DomU's network interface send lesser package (<0.5k/s if
> jiffies=250 and credit_bytes=ULONG_MAX), jiffies will beyond out
> (credit_timeout.expires + MAX_SIGNAL_LONG) and time_after_eq(now,
> next_credit) will failure (should be true). So one timer which will
> not be trigger in short time, and later process will be aborted when
> timer_pending(&vif->credit_timeout) is true. The result will be
> DomU's network interface will be hung in long time (> 40days).
>   Please think about the below scenario:
>   Condition:
>     Dom0 running 32-bit and HZ = 1000
>     vif->credit_timeout->expire = 0xffffffff, vif->remaining_credit
> = 0xffffffff, vif->credit_usec=0 jiffies=0
>     vif receive lesser package (DomU send lesser package). If the
> value is litter than 2K/s, consume 4G(0xffffffff) will need 582.55
> hours. jiffies will large than 0x7ffffff. we guess jiffies =
> 0x800000ff, time_after_eq(0x800000ff, 0xffffffff) will failure, and
> one time which expire is 0xfffffff will be pended into system. So
> the interface will hung until jiffies recount 0xffffffff (that will
> need very long time).

If I'm not mistaken you meant time_after_eq(now, next_credit) in
netback. How does next_credit become 0xffffffff?

Wei.

> 
>   If some error exist in above explain, please help me point it out.
> 
> Thanks,
> Jason

^ permalink raw reply

* RE: [PATCH net-next v4 1/5] xen-netback: add support for IPv6 checksum offload to guest
From: Paul Durrant @ 2013-10-14 11:10 UTC (permalink / raw)
  To: Ian Campbell
  Cc: xen-devel@lists.xen.org, netdev@vger.kernel.org, Wei Liu,
	David Vrabel
In-Reply-To: <1381748013.24708.102.camel@kazak.uk.xensource.com>

> -----Original Message-----
> From: Ian Campbell
> Sent: 14 October 2013 11:54
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; David Vrabel
> Subject: Re: [PATCH net-next v4 1/5] xen-netback: add support for IPv6
> checksum offload to guest
> 
> On Fri, 2013-10-11 at 16:06 +0100, Paul Durrant wrote:
> > Check xenstore flag feature-ipv6-csum-offload to determine if a
> > guest is happy to accept IPv6 packets with only partial checksum.
> > Also check analogous feature-ip-csum-offload to determine if a
> > guest is happy to accept IPv4 packets with only partial checksum
> > as a replacement for a negated feature-no-csum-offload value and
> > add a comment to deprecate use of feature-no-csum-offload.
> >
> > Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> > Cc: Wei Liu <wei.liu2@citrix.com>
> > Cc: David Vrabel <david.vrabel@citrix.com>
> > Cc: Ian Campbell <ian.campbell@citrix.com>
> 
> Shouldn't this come later in the series, i.e. after netback is actually
> able to cope with ipv6 offloads?
> 

I guess that's debatable. The patches don't have any dependency relation; offloads to and from the guest are quite independent.

> > diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-
> netback/common.h
> > index 5715318..b4a9a3c 100644
> > --- a/drivers/net/xen-netback/common.h
> > +++ b/drivers/net/xen-netback/common.h
> > @@ -153,7 +153,8 @@ struct xenvif {
> >  	u8 can_sg:1;
> >  	u8 gso:1;
> >  	u8 gso_prefix:1;
> > -	u8 csum:1;
> > +	u8 ip_csum:1;
> > +	u8 ipv6_csum:1;
> 
> Why not ipv4_csum for consistency/unambiguity?
> 

I followed general linux naming conventions e.g. ip_hdr and ipv6_hdr.

> > diff --git a/include/xen/interface/io/netif.h
> b/include/xen/interface/io/netif.h
> > index eb262e3..d9fb44739 100644
> > --- a/include/xen/interface/io/netif.h
> > +++ b/include/xen/interface/io/netif.h
> > @@ -51,6 +51,16 @@
> >   */
> >
> >  /*
> > + * "feature-no-csum-offload" was used to turn off IPv4 TCP/UDP
> checksum
> > + * offload but is now deprecated. Two new feature flags should now be
> used
> > + * to control checksum offload:
> 
> How is a frontend to know which sort of backend it is talking too? Is
> there going to be a feature flag to indicate support for these new
> flags?
> 
> In particular a new frontend running on an old backend needs to know
> that it needs to set no-csum-offload instead of ip-csum-offload somehow.
> 

Good point. Without any version I guess we have to live with the old flag forever. I'll stick with it for v4 and just leave the new one for v6.

  Paul

> > + * "feature-ip-csum-offload" should be used to turn IPv4 TCP/UDP
> checksum
> 
> "ipv4" again?
> 
> > + * offload on or off. If it is missing then the feature is assumed to be on.
> > + * "feature-ipv6-csum-offload" should be used to turn IPv6 TCP/UDP
> checksum
> > + * offload on or off. If it is missing then the feature is assumed to be off.
> > + */
> > +
> > +/*
> >   * This is the 'wire' format for packets:
> >   *  Request 1: xen_netif_tx_request  -- XEN_NETTXF_* (any flags)
> >   * [Request 2: xen_netif_extra_info]    (only if request 1 has
> XEN_NETTXF_extra_info)
> 


^ permalink raw reply

* Re: [PATCH net 2/2] virtio-net: refill only when device is up during setting queues
From: Michael S. Tsirkin @ 2013-10-14 11:09 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <1381744595-26881-2-git-send-email-jasowang@redhat.com>

On Mon, Oct 14, 2013 at 05:56:35PM +0800, Jason Wang wrote:
> We used to schedule the refill work unconditionally after changing the
> number of queues. This may lead an issue if the device is not
> up. Since we only try to cancel the work in ndo_stop(), this may cause
> the refill work still work after removing the device. Fix this by only
> schedule the work when device is up.
> 
> The bug were introduce by commit 9b9cd8024a2882e896c65222aa421d461354e3f2.
> (virtio-net: fix the race between channels setting and refill)
> 
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

It bothers me that we look at the flag without any
locks here.
I think we'll need to take the rtnl lock at least
on restore.

> ---
> The patch were need for 3.10 and above.
> ---
>  drivers/net/virtio_net.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index c4bc1cc..92f0096 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -938,7 +938,9 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
>  		return -EINVAL;
>  	} else {
>  		vi->curr_queue_pairs = queue_pairs;
> -		schedule_delayed_work(&vi->refill, 0);
> +		/* virtnet_open() will refill when device is going to up. */
> +		if (dev->flags & IFF_UP)
> +			schedule_delayed_work(&vi->refill, 0);
>  	}
>  
>  	return 0;
> -- 
> 1.8.1.2

^ permalink raw reply

* Re: [PATCH net 1/2] virtio-net: don't respond to cpu hotplug notifier if we're not ready
From: Michael S. Tsirkin @ 2013-10-14 11:06 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, virtualization
In-Reply-To: <1381744595-26881-1-git-send-email-jasowang@redhat.com>

On Mon, Oct 14, 2013 at 05:56:34PM +0800, Jason Wang wrote:
> We're trying to re-configure the affinity unconditionally in cpu hotplug
> callback. This may lead the issue during resuming from s3/s4 since
> 
> - virt queues haven't been allocated at that time.
> - it's unnecessary since thaw method will re-configure the affinity.
> 
> Fix this issue by checking the config_enable and do nothing is we're not ready.
> 
> The bug were introduced by commit 8de4b2f3ae90c8fc0f17eeaab87d5a951b66ee17
> (virtio-net: reset virtqueue affinity when doing cpu hotplug).
> 
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Wanlong Gao <gaowanlong@cn.fujitsu.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
> The patch is need for 3.8 and above.
> ---
>  drivers/net/virtio_net.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index defec2b..c4bc1cc 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1116,6 +1116,11 @@ static int virtnet_cpu_callback(struct notifier_block *nfb,
>  {
>  	struct virtnet_info *vi = container_of(nfb, struct virtnet_info, nb);
>  
> +	mutex_lock(&vi->config_lock);
> +
> +	if (!vi->config_enable)
> +		goto done;
> +
>  	switch(action & ~CPU_TASKS_FROZEN) {
>  	case CPU_ONLINE:
>  	case CPU_DOWN_FAILED:
> @@ -1128,6 +1133,9 @@ static int virtnet_cpu_callback(struct notifier_block *nfb,
>  	default:
>  		break;
>  	}
> +
> +done:
> +	mutex_unlock(&vi->config_lock);
>  	return NOTIFY_OK;
>  }
>  
> -- 
> 1.8.1.2

^ permalink raw reply

* RE: [PATCH net-next v4 3/5] xen-netback: Unconditionally set NETIF_F_RXCSUM
From: Paul Durrant @ 2013-10-14 11:03 UTC (permalink / raw)
  To: Ian Campbell
  Cc: xen-devel@lists.xen.org, netdev@vger.kernel.org, Wei Liu,
	David Vrabel
In-Reply-To: <1381748176.24708.103.camel@kazak.uk.xensource.com>

> -----Original Message-----
> From: Ian Campbell
> Sent: 14 October 2013 11:56
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; David Vrabel
> Subject: Re: [PATCH net-next v4 3/5] xen-netback: Unconditionally set
> NETIF_F_RXCSUM
> 
> On Fri, 2013-10-11 at 16:06 +0100, Paul Durrant wrote:
> > There is no mechanism to insist that a guest always generates a packet
> > with good checksum (at least for IPv4)
> 
> Isn't this what feature-no-csum-offload is?
> 

Theoretically, yes, but netback does not have code to advertise that flag (and never has?) and I don't see anything in xen-netfront that checks such a flag so I think we have to assume it's not going to be honoured even if we were to introduce it now.

  Paul

^ permalink raw reply

* RE: [PATCH net-next v4 2/5] xen-netback: add support for IPv6 checksum offload from guest
From: Paul Durrant @ 2013-10-14 10:57 UTC (permalink / raw)
  To: Wei Liu
  Cc: Wei Liu, xen-devel@lists.xen.org, netdev@vger.kernel.org,
	David Vrabel, Ian Campbell
In-Reply-To: <20131014105527.GD11739@zion.uk.xensource.com>

> -----Original Message-----
> From: Wei Liu [mailto:wei.liu2@citrix.com]
> Sent: 14 October 2013 11:55
> To: Paul Durrant
> Cc: Wei Liu; xen-devel@lists.xen.org; netdev@vger.kernel.org; David Vrabel;
> Ian Campbell
> Subject: Re: [PATCH net-next v4 2/5] xen-netback: add support for IPv6
> checksum offload from guest
> 
> On Mon, Oct 14, 2013 at 11:49:20AM +0100, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Wei Liu [mailto:wei.liu2@citrix.com]
> > > Sent: 14 October 2013 11:43
> > > To: Paul Durrant
> > > Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; David
> Vrabel;
> > > Ian Campbell
> > > Subject: Re: [PATCH net-next v4 2/5] xen-netback: add support for IPv6
> > > checksum offload from guest
> > >
> > > On Fri, Oct 11, 2013 at 04:06:19PM +0100, Paul Durrant wrote:
> > > [...]
> > > > -/*
> > > > - * This is the amount of packet we copy rather than map, so that the
> > > > - * guest can't fiddle with the contents of the headers while we do
> > > > - * packet processing on them (netfilter, routing, etc).
> > > > +/* This is a miniumum size for the linear area to avoid lots of
> > > > + * calls to __pskb_pull_tail() as we set up checksum offsets.
> > > >   */
> > >
> > > You seem to forget to explain why 128 is chosen. :-)
> >
> > Is that not sufficient explanation? What sort of thing are you looking for?
> >
> 
> From the second version of this patch, we had a conversation.
> 
> > Where does 128 come from?
> >
> 
> "It's just an arbitrary power of 2 that was chosen because it seems to
> cover most likely v6 headers and all v4 headers."
> 
> So something like: "We choose 128 which is likely to cover most V6
> headers and all V4 headers" would be sufficeint.
> 

Ok. I figured that was implied by "set up checksum offsets" but I can be more explicit.

  Paul

^ permalink raw reply

* Re: [PATCH net-next v4 3/5] xen-netback: Unconditionally set NETIF_F_RXCSUM
From: Ian Campbell @ 2013-10-14 10:56 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, netdev, Wei Liu, David Vrabel
In-Reply-To: <1381503982-1418-4-git-send-email-paul.durrant@citrix.com>

On Fri, 2013-10-11 at 16:06 +0100, Paul Durrant wrote:
> There is no mechanism to insist that a guest always generates a packet
> with good checksum (at least for IPv4)

Isn't this what feature-no-csum-offload is?

>  so we must handle checksum
> offloading from the guest and hence should set NETIF_F_RXCSUM.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> Cc: David Vrabel <david.vrabel@citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>
> ---
>  drivers/net/xen-netback/interface.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
> index 8e92783..cb0d8ea 100644
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -321,7 +321,7 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
>  	dev->hw_features = NETIF_F_SG |
>  		NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
>  		NETIF_F_TSO;
> -	dev->features = dev->hw_features;
> +	dev->features = dev->hw_features | NETIF_F_RXCSUM;
>  	SET_ETHTOOL_OPS(dev, &xenvif_ethtool_ops);
>  
>  	dev->tx_queue_len = XENVIF_QUEUE_LENGTH;

^ permalink raw reply

* Re: [PATCH net-next v4 2/5] xen-netback: add support for IPv6 checksum offload from guest
From: Wei Liu @ 2013-10-14 10:55 UTC (permalink / raw)
  To: Paul Durrant
  Cc: Wei Liu, xen-devel@lists.xen.org, netdev@vger.kernel.org,
	David Vrabel, Ian Campbell
In-Reply-To: <9AAE0902D5BC7E449B7C8E4E778ABCD0138B8D@AMSPEX01CL01.citrite.net>

On Mon, Oct 14, 2013 at 11:49:20AM +0100, Paul Durrant wrote:
> > -----Original Message-----
> > From: Wei Liu [mailto:wei.liu2@citrix.com]
> > Sent: 14 October 2013 11:43
> > To: Paul Durrant
> > Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; David Vrabel;
> > Ian Campbell
> > Subject: Re: [PATCH net-next v4 2/5] xen-netback: add support for IPv6
> > checksum offload from guest
> > 
> > On Fri, Oct 11, 2013 at 04:06:19PM +0100, Paul Durrant wrote:
> > [...]
> > > -/*
> > > - * This is the amount of packet we copy rather than map, so that the
> > > - * guest can't fiddle with the contents of the headers while we do
> > > - * packet processing on them (netfilter, routing, etc).
> > > +/* This is a miniumum size for the linear area to avoid lots of
> > > + * calls to __pskb_pull_tail() as we set up checksum offsets.
> > >   */
> > 
> > You seem to forget to explain why 128 is chosen. :-)
> 
> Is that not sufficient explanation? What sort of thing are you looking for?
> 

>From the second version of this patch, we had a conversation.

> Where does 128 come from?
>

"It's just an arbitrary power of 2 that was chosen because it seems to
cover most likely v6 headers and all v4 headers."

So something like: "We choose 128 which is likely to cover most V6
headers and all V4 headers" would be sufficeint.

Wei.

^ permalink raw reply

* Re: [PATCH net-next v4 1/5] xen-netback: add support for IPv6 checksum offload to guest
From: Ian Campbell @ 2013-10-14 10:53 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, netdev, Wei Liu, David Vrabel
In-Reply-To: <1381503982-1418-2-git-send-email-paul.durrant@citrix.com>

On Fri, 2013-10-11 at 16:06 +0100, Paul Durrant wrote:
> Check xenstore flag feature-ipv6-csum-offload to determine if a
> guest is happy to accept IPv6 packets with only partial checksum.
> Also check analogous feature-ip-csum-offload to determine if a
> guest is happy to accept IPv4 packets with only partial checksum
> as a replacement for a negated feature-no-csum-offload value and
> add a comment to deprecate use of feature-no-csum-offload.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Cc: Wei Liu <wei.liu2@citrix.com>
> Cc: David Vrabel <david.vrabel@citrix.com>
> Cc: Ian Campbell <ian.campbell@citrix.com>

Shouldn't this come later in the series, i.e. after netback is actually
able to cope with ipv6 offloads?

> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
> index 5715318..b4a9a3c 100644
> --- a/drivers/net/xen-netback/common.h
> +++ b/drivers/net/xen-netback/common.h
> @@ -153,7 +153,8 @@ struct xenvif {
>  	u8 can_sg:1;
>  	u8 gso:1;
>  	u8 gso_prefix:1;
> -	u8 csum:1;
> +	u8 ip_csum:1;
> +	u8 ipv6_csum:1;

Why not ipv4_csum for consistency/unambiguity?

> diff --git a/include/xen/interface/io/netif.h b/include/xen/interface/io/netif.h
> index eb262e3..d9fb44739 100644
> --- a/include/xen/interface/io/netif.h
> +++ b/include/xen/interface/io/netif.h
> @@ -51,6 +51,16 @@
>   */
>  
>  /*
> + * "feature-no-csum-offload" was used to turn off IPv4 TCP/UDP checksum
> + * offload but is now deprecated. Two new feature flags should now be used
> + * to control checksum offload:

How is a frontend to know which sort of backend it is talking too? Is
there going to be a feature flag to indicate support for these new
flags?

In particular a new frontend running on an old backend needs to know
that it needs to set no-csum-offload instead of ip-csum-offload somehow.

> + * "feature-ip-csum-offload" should be used to turn IPv4 TCP/UDP checksum

"ipv4" again?

> + * offload on or off. If it is missing then the feature is assumed to be on.
> + * "feature-ipv6-csum-offload" should be used to turn IPv6 TCP/UDP checksum
> + * offload on or off. If it is missing then the feature is assumed to be off.
> + */
> +
> +/*
>   * This is the 'wire' format for packets:
>   *  Request 1: xen_netif_tx_request  -- XEN_NETTXF_* (any flags)
>   * [Request 2: xen_netif_extra_info]    (only if request 1 has XEN_NETTXF_extra_info)

^ permalink raw reply

* Re: [PATCH 2/2] ixgbe: enable l2 forwarding acceleration for macvlans
From: Neil Horman @ 2013-10-14 10:50 UTC (permalink / raw)
  To: John Fastabend; +Cc: netdev, Andy Gospodarek, David Miller
In-Reply-To: <525B0721.7060709@gmail.com>

On Sun, Oct 13, 2013 at 01:48:33PM -0700, John Fastabend wrote:
> On 10/11/2013 11:43 AM, Neil Horman wrote:
> >Now that l2 acceleration ops are in place from the prior patch, enable ixgbe to
> >take advantage of these operations.  Allow it to allocate queues for a macvlan
> >so that when we transmit a frame, we can do the switching in hardware inside the
> >ixgbe card, rather than in software.
> >
> >Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> >CC: john.fastabend@gmail.com
> >CC: Andy Gospodarek <andy@greyhouse.net>
> >CC: "David S. Miller" <davem@davemloft.net>
> >---
> 
> Neil, I'm fairly sure I can simplify this patch further. I'll be able
> to take a shot at it Tuesday if you don't mind waiting.
> 
> I can resubmit the series then and preserve your signed-off on at least
> the first patch. Seem reasonable?
> 
That sounds fine.
Thanks!
Neil

> .John
> 
> -- 
> John Fastabend         Intel Corporation
> 

^ permalink raw reply

* RE: [PATCH net-next v4 2/5] xen-netback: add support for IPv6 checksum offload from guest
From: Paul Durrant @ 2013-10-14 10:49 UTC (permalink / raw)
  To: Wei Liu
  Cc: xen-devel@lists.xen.org, netdev@vger.kernel.org, Wei Liu,
	David Vrabel, Ian Campbell
In-Reply-To: <20131014104235.GB11739@zion.uk.xensource.com>

> -----Original Message-----
> From: Wei Liu [mailto:wei.liu2@citrix.com]
> Sent: 14 October 2013 11:43
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; David Vrabel;
> Ian Campbell
> Subject: Re: [PATCH net-next v4 2/5] xen-netback: add support for IPv6
> checksum offload from guest
> 
> On Fri, Oct 11, 2013 at 04:06:19PM +0100, Paul Durrant wrote:
> [...]
> > -/*
> > - * This is the amount of packet we copy rather than map, so that the
> > - * guest can't fiddle with the contents of the headers while we do
> > - * packet processing on them (netfilter, routing, etc).
> > +/* This is a miniumum size for the linear area to avoid lots of
> > + * calls to __pskb_pull_tail() as we set up checksum offsets.
> >   */
> 
> You seem to forget to explain why 128 is chosen. :-)

Is that not sufficient explanation? What sort of thing are you looking for?

  Paul

^ permalink raw reply

* Re: [PATCH 1/2] net: Add layer 2 hardware acceleration operations for macvlan devices
From: Neil Horman @ 2013-10-14 10:48 UTC (permalink / raw)
  To: John Fastabend; +Cc: netdev, Andy Gospodarek, David Miller
In-Reply-To: <525B0689.4070109@gmail.com>

On Sun, Oct 13, 2013 at 01:46:01PM -0700, John Fastabend wrote:
> On 10/11/2013 11:43 AM, Neil Horman wrote:
> >Add a operations structure that allows a network interface to export the fact
> >that it supports package forwarding in hardware between physical interfaces and
> >other mac layer devices assigned to it (such as macvlans).  this operaions
> >structure can be used by virtual mac devices to bypass software switching so
> >that forwarding can be done in hardware more efficiently.
> >
> >Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> >CC: john.fastabend@gmail.com
> >CC: Andy Gospodarek <andy@greyhouse.net>
> >CC: "David S. Miller" <davem@davemloft.net>
> >---
> 
> [...]
> 
> >
> >diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> >index 2ddb48d..1710fdb 100644
> >--- a/include/linux/skbuff.h
> >+++ b/include/linux/skbuff.h
> >@@ -426,9 +426,9 @@ struct sk_buff {
> >  	char			cb[48] __aligned(8);
> >
> >  	unsigned long		_skb_refdst;
> >-#ifdef CONFIG_XFRM
> >+
> 
> Is this a hold-over from the previous patches? 'sp' isn't touched
> anywhere else so put the ifdef/endif back.
> 
Yeah, my screw up, I wanted to get this out before the weekend and missed that
screw up.  Sorry.

Neil

> >  	struct	sec_path	*sp;
> >-#endif
> >+
> >  	unsigned int		len,
> >  				data_len;
> >  	__u16			mac_len,
> >diff --git a/net/core/dev.c b/net/core/dev.c
> >index 5c713f2..ecad8c2 100644
> 
> -- 
> John Fastabend         Intel Corporation
> 

^ permalink raw reply

* Re: [PATCH net-next v4 2/5] xen-netback: add support for IPv6 checksum offload from guest
From: Wei Liu @ 2013-10-14 10:42 UTC (permalink / raw)
  To: Paul Durrant; +Cc: xen-devel, netdev, Wei Liu, David Vrabel, Ian Campbell
In-Reply-To: <1381503982-1418-3-git-send-email-paul.durrant@citrix.com>

On Fri, Oct 11, 2013 at 04:06:19PM +0100, Paul Durrant wrote:
[...]
> -/*
> - * This is the amount of packet we copy rather than map, so that the
> - * guest can't fiddle with the contents of the headers while we do
> - * packet processing on them (netfilter, routing, etc).
> +/* This is a miniumum size for the linear area to avoid lots of
> + * calls to __pskb_pull_tail() as we set up checksum offsets.
>   */

You seem to forget to explain why 128 is chosen. :-)

Wei.

^ permalink raw reply

* Re: [PATCH] staging: octeon-ethernet: trivial: Avoid OOPS if phydev is not set
From: Dan Carpenter @ 2013-10-14 10:10 UTC (permalink / raw)
  To: Greg KH; +Cc: support, netdev, driverdev-devel, Sebastian Pöhn
In-Reply-To: <20131013212810.GA15167@kroah.com>

On Sun, Oct 13, 2013 at 02:28:10PM -0700, Greg KH wrote:
> On Sun, Oct 13, 2013 at 08:59:54PM +0200, Sebastian Pöhn wrote:
> > A zero pointer deref on priv->phydev->link was causing oops on our systems.
> > Might be related to improper configuration but we should fail gracefully here ;-)
> > 
> > Signed-off-by: Sebastian Poehn <sebastian.poehn@googlemail.com>
> > 
> > ---
> > 
> > diff --git a/drivers/staging/octeon/ethernet-mdio.c b/drivers/staging/octeon/ethernet-mdio.c
> > index 83b1030..bc8c503 100644
> > --- a/drivers/staging/octeon/ethernet-mdio.c
> > +++ b/drivers/staging/octeon/ethernet-mdio.c
> > @@ -121,6 +121,9 @@ static void cvm_oct_adjust_link(struct net_device *dev)
> >         struct octeon_ethernet *priv = netdev_priv(dev);
> >         cvmx_helper_link_info_t link_info;
> >  
> > +       if(!priv->phydev)
> > +               return ;
> 
> Please always run your patches through the scripts/checkpatch.pl tool so
> that maintainers don't have to point out the obvious coding syle errors
> by hand each time :)

Also it's whitespace damaged and doesn't apply.

> 
> Care to try again?
> 
> Also, how was phydev NULL?  What was causing that?

To me it looks like phydev is always NULL.

regards,
dan carpenter

^ permalink raw reply

* [PATCH net 2/2] virtio-net: refill only when device is up during setting queues
From: Jason Wang @ 2013-10-14  9:56 UTC (permalink / raw)
  To: virtualization, netdev, linux-kernel, rusty, mst
In-Reply-To: <1381744595-26881-1-git-send-email-jasowang@redhat.com>

We used to schedule the refill work unconditionally after changing the
number of queues. This may lead an issue if the device is not
up. Since we only try to cancel the work in ndo_stop(), this may cause
the refill work still work after removing the device. Fix this by only
schedule the work when device is up.

The bug were introduce by commit 9b9cd8024a2882e896c65222aa421d461354e3f2.
(virtio-net: fix the race between channels setting and refill)

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
The patch were need for 3.10 and above.
---
 drivers/net/virtio_net.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c4bc1cc..92f0096 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -938,7 +938,9 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
 		return -EINVAL;
 	} else {
 		vi->curr_queue_pairs = queue_pairs;
-		schedule_delayed_work(&vi->refill, 0);
+		/* virtnet_open() will refill when device is going to up. */
+		if (dev->flags & IFF_UP)
+			schedule_delayed_work(&vi->refill, 0);
 	}
 
 	return 0;
-- 
1.8.1.2

^ permalink raw reply related

* [PATCH net 1/2] virtio-net: don't respond to cpu hotplug notifier if we're not ready
From: Jason Wang @ 2013-10-14  9:56 UTC (permalink / raw)
  To: virtualization, netdev, linux-kernel, rusty, mst

We're trying to re-configure the affinity unconditionally in cpu hotplug
callback. This may lead the issue during resuming from s3/s4 since

- virt queues haven't been allocated at that time.
- it's unnecessary since thaw method will re-configure the affinity.

Fix this issue by checking the config_enable and do nothing is we're not ready.

The bug were introduced by commit 8de4b2f3ae90c8fc0f17eeaab87d5a951b66ee17
(virtio-net: reset virtqueue affinity when doing cpu hotplug).

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
The patch is need for 3.8 and above.
---
 drivers/net/virtio_net.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index defec2b..c4bc1cc 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1116,6 +1116,11 @@ static int virtnet_cpu_callback(struct notifier_block *nfb,
 {
 	struct virtnet_info *vi = container_of(nfb, struct virtnet_info, nb);
 
+	mutex_lock(&vi->config_lock);
+
+	if (!vi->config_enable)
+		goto done;
+
 	switch(action & ~CPU_TASKS_FROZEN) {
 	case CPU_ONLINE:
 	case CPU_DOWN_FAILED:
@@ -1128,6 +1133,9 @@ static int virtnet_cpu_callback(struct notifier_block *nfb,
 	default:
 		break;
 	}
+
+done:
+	mutex_unlock(&vi->config_lock);
 	return NOTIFY_OK;
 }
 
-- 
1.8.1.2

^ permalink raw reply related

* Re: [PATCHv2 RESEND] {xfrm, sctp} Stick to software crc32 even if hardware is capable of that
From: Fan Du @ 2013-10-14  8:33 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: vyasevich, nhorman, steffen.klassert, davem, netdev
In-Reply-To: <525BA63A.7040708@redhat.com>



On 2013年10月14日 16:07, Daniel Borkmann wrote:
> On 10/14/2013 09:27 AM, Fan Du wrote:
>> igb/ixgbe have hardware sctp checksum support, when this feature is enabled
>> and also IPsec is armed to protect sctp traffic, ugly things happened as
>> xfrm_output checks CHECKSUM_PARTIAL to do check sum operation(sum every thing
>> up and pack the 16bits result in the checksum field). The result is fail
>> establishment of sctp communication.
>>
>> Signed-off-by: Fan Du <fan.du@windriver.com>
>> Cc: Vlad Yasevich <vyasevich@gmail.com>
>> Cc: Neil Horman <nhorman@tuxdriver.com>
>> Cc: Steffen Klassert <steffen.klassert@secunet.com>
>> Acked-by: Vlad Yasevich <vyasevich@gmail.com>
>> ---
>>   net/sctp/output.c |   14 +++++++++++++-
>>   1 file changed, 13 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/sctp/output.c b/net/sctp/output.c
>> index 0ac3a65..6de6402 100644
>> --- a/net/sctp/output.c
>> +++ b/net/sctp/output.c
>> @@ -372,6 +372,16 @@ static void sctp_packet_set_owner_w(struct sk_buff *skb, struct sock *sk)
>>       atomic_inc(&sk->sk_wmem_alloc);
>>   }
>>
>> +static int is_xfrm_armed(struct dst_entry *dst)
>> +{
>> +#ifdef CONFIG_XFRM
>> +    /* If dst->xfrm is valid, this skb needs to be transformed */
>> +    return dst->xfrm != NULL;
>> +#else
>> +    return 0;
>> +#endif
>> +}
>
> Instead of putting this into SCTP code, isn't the above rather a candidate for
> include/net/xfrm.h, e.g. as ... bool xfrm_is_armed(...) ?

Should be in such style in terms of its name, but this is truly SCTP specific in this scenario.
No one elsewhere barely need this as far as I can tell...

-- 
浮沉随浪只记今朝笑

--fan

^ permalink raw reply

* [PATCH net-next] sctp: Namespacify checksum_disable
From: Fan Du @ 2013-10-14  8:32 UTC (permalink / raw)
  To: vyasevich, nhorman; +Cc: davem, netdev

SCTP CRC32-C checksum computing and verifying should be namespace-sensible,
as each, e.g. tenant instance might need different checksum configuration for
its requirement. So this patch enhance SCTP with this feature.

Signed-off-by: Fan Du <fan.du@windriver.com>
---
 include/net/netns/sctp.h   |    5 +++++
 include/net/sctp/structs.h |    3 ---
 net/sctp/input.c           |    2 +-
 net/sctp/output.c          |    4 +++-
 net/sctp/protocol.c        |    5 +++--
 net/sctp/sysctl.c          |    7 +++++++
 6 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/include/net/netns/sctp.h b/include/net/netns/sctp.h
index 3573a81..704adb3 100644
--- a/include/net/netns/sctp.h
+++ b/include/net/netns/sctp.h
@@ -129,6 +129,11 @@ struct netns_sctp {
 
 	/* Threshold for autoclose timeout, in seconds. */
 	unsigned long max_autoclose;
+
+	/* Set to 1 to disable CRC32-C checksum computing and verifying.
+	 * Default to 0 to enable this feature.
+	 */
+	int checksum_disable;
 };
 
 #endif /* __NETNS_SCTP_H__ */
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 2174d8d..820895e 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -134,9 +134,6 @@ extern struct sctp_globals {
 	__u16 max_instreams;
 	__u16 max_outstreams;
 
-	/* Flag to indicate whether computing and verifying checksum
-	 * is disabled. */
-        bool checksum_disable;
 } sctp_globals;
 
 #define sctp_max_instreams		(sctp_globals.max_instreams)
diff --git a/net/sctp/input.c b/net/sctp/input.c
index 98b69bb..9db2a65 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -134,7 +134,7 @@ int sctp_rcv(struct sk_buff *skb)
 	__skb_pull(skb, skb_transport_offset(skb));
 	if (skb->len < sizeof(struct sctphdr))
 		goto discard_it;
-	if (!sctp_checksum_disable && !skb_csum_unnecessary(skb) &&
+	if (!net->sctp.checksum_disable && !skb_csum_unnecessary(skb) &&
 		  sctp_rcv_checksum(net, skb) < 0)
 		goto discard_it;
 
diff --git a/net/sctp/output.c b/net/sctp/output.c
index 6de6402..5d0a45e 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -395,6 +395,7 @@ int sctp_packet_transmit(struct sctp_packet *packet)
 	struct sk_buff *nskb;
 	struct sctp_chunk *chunk, *tmp;
 	struct sock *sk;
+	struct net *net;
 	int err = 0;
 	int padding;		/* How much padding do we need?  */
 	__u8 has_data = 0;
@@ -411,6 +412,7 @@ int sctp_packet_transmit(struct sctp_packet *packet)
 	/* Set up convenience variables... */
 	chunk = list_entry(packet->chunk_list.next, struct sctp_chunk, list);
 	sk = chunk->skb->sk;
+	net = sock_net(sk);
 
 	/* Allocate the new skb.  */
 	nskb = alloc_skb(packet->size + LL_MAX_HEADER, GFP_ATOMIC);
@@ -545,7 +547,7 @@ int sctp_packet_transmit(struct sctp_packet *packet)
 	 * Note: Adler-32 is no longer applicable, as has been replaced
 	 * by CRC32-C as described in <draft-ietf-tsvwg-sctpcsum-02.txt>.
 	 */
-	if (!sctp_checksum_disable) {
+	if (!net->sctp.checksum_disable) {
 		if ((!(dst->dev->features & NETIF_F_SCTP_CSUM)) ||
 			is_xfrm_armed(dst)) {
 
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 5e17092..b3c51ce 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -1239,6 +1239,9 @@ static int __net_init sctp_net_init(struct net *net)
 	/* Initialize maximum autoclose timeout. */
 	net->sctp.max_autoclose		= INT_MAX / HZ;
 
+	/* Enable checksum by default. */
+	net->sctp.checksum_disable = 0;
+
 	status = sctp_sysctl_net_register(net);
 	if (status)
 		goto err_sysctl_register;
@@ -1543,6 +1546,4 @@ MODULE_ALIAS("net-pf-" __stringify(PF_INET) "-proto-132");
 MODULE_ALIAS("net-pf-" __stringify(PF_INET6) "-proto-132");
 MODULE_AUTHOR("Linux Kernel SCTP developers <linux-sctp@vger.kernel.org>");
 MODULE_DESCRIPTION("Support for the SCTP protocol (RFC2960)");
-module_param_named(no_checksums, sctp_checksum_disable, bool, 0644);
-MODULE_PARM_DESC(no_checksums, "Disable checksums computing and verification");
 MODULE_LICENSE("GPL");
diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
index 6b36561..d6a6cca 100644
--- a/net/sctp/sysctl.c
+++ b/net/sctp/sysctl.c
@@ -290,6 +290,13 @@ static struct ctl_table sctp_net_table[] = {
 		.extra1		= &max_autoclose_min,
 		.extra2		= &max_autoclose_max,
 	},
+	{
+		.procname	= "checksum_disable",
+		.data		= &init_net.sctp.checksum_disable,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
 
 	{ /* sentinel */ }
 };
-- 
1.7.9.5

^ permalink raw reply related

* Re: Is fallback vhost_net to qemu for live migrate available？
From: Qin Chuanyu @ 2013-10-14  8:19 UTC (permalink / raw)
  To: Anthony Liguori, Michael S. Tsirkin, jasowang, Wei Liu
  Cc: KVM list, netdev, qianhuibin, xen-devel@lists.xen.org, wangfuhai,
	liuyongan
In-Reply-To: <CA+aC4kv66bvt5mREv-YXVddD9wdgAiT=heo9xhdkkVmuLOyrBw@mail.gmail.com>

On 2013/8/30 0:08, Anthony Liguori wrote:
> Hi Qin,
>
> KVM and Xen represent memory in a very different way.  KVM can only
> track when guest mode code dirties memory.  It relies on QEMU to track
> when guest memory is dirtied by QEMU.  Since vhost is running outside
> of QEMU, vhost also needs to tell QEMU when it has dirtied memory.
>
> I don't think this is a problem with Xen though.  I believe (although
> could be wrong) that Xen is able to track when either the domain or
> dom0 dirties memory.
>
> So I think you can simply ignore the dirty logging with vhost and it
> should Just Work.
>
Xen track guest's memory when live migrating as what KVM did (I guess it 
rely on EPT)，it couldn't mark dom0's dirty memory automatically.

I did the same dirty log with vhost_net but instead of KVM's api with 
Xen's dirty memory interface，then live migration work.

--------------------------------------------------------------------
There is a bug on the Xen live migration when using qemu emulate 
nic(such as virtio_net).
current flow:
     xc_save->dirty memory copy->suspend->stop_vcpu->last memory copy
     stop_qemu->stop_virtio_net
     save_qemu->save_virtio_net
it means virtio_net would dirty memory after the last memory copy.

I have test it both vhost_on_qemu and virtio_net in qemu,there are same 
problem, the update of vring_index would be mistake and lead network 
unreachable. my solution is:
     xc_save->dirty memory copy->suspend->stop_vcpu->stop_qemu
			->stop_virtio_net->last memory copy
     save_qemu->save_virtio_net

Xen's netfront and netback disconnect and flush IO-ring when live 
migrate，so it is OK.

^ permalink raw reply

* Re: [PATCHv2 RESEND] {xfrm, sctp} Stick to software crc32 even if hardware is capable of that
From: Daniel Borkmann @ 2013-10-14  8:07 UTC (permalink / raw)
  To: Fan Du; +Cc: vyasevich, nhorman, steffen.klassert, davem, netdev
In-Reply-To: <1381735658-15478-1-git-send-email-fan.du@windriver.com>

On 10/14/2013 09:27 AM, Fan Du wrote:
> igb/ixgbe have hardware sctp checksum support, when this feature is enabled
> and also IPsec is armed to protect sctp traffic, ugly things happened as
> xfrm_output checks CHECKSUM_PARTIAL to do check sum operation(sum every thing
> up and pack the 16bits result in the checksum field). The result is fail
> establishment of sctp communication.
>
> Signed-off-by: Fan Du <fan.du@windriver.com>
> Cc: Vlad Yasevich <vyasevich@gmail.com>
> Cc: Neil Horman <nhorman@tuxdriver.com>
> Cc: Steffen Klassert <steffen.klassert@secunet.com>
> Acked-by: Vlad Yasevich <vyasevich@gmail.com>
> ---
>   net/sctp/output.c |   14 +++++++++++++-
>   1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/net/sctp/output.c b/net/sctp/output.c
> index 0ac3a65..6de6402 100644
> --- a/net/sctp/output.c
> +++ b/net/sctp/output.c
> @@ -372,6 +372,16 @@ static void sctp_packet_set_owner_w(struct sk_buff *skb, struct sock *sk)
>   	atomic_inc(&sk->sk_wmem_alloc);
>   }
>
> +static int is_xfrm_armed(struct dst_entry *dst)
> +{
> +#ifdef CONFIG_XFRM
> +	/* If dst->xfrm is valid, this skb needs to be transformed */
> +	return dst->xfrm != NULL;
> +#else
> +	return 0;
> +#endif
> +}

Instead of putting this into SCTP code, isn't the above rather a candidate for
include/net/xfrm.h, e.g. as ... bool xfrm_is_armed(...) ?

>   /* All packets are sent to the network through this function from
>    * sctp_outq_tail().
>    *
> @@ -536,7 +546,9 @@ int sctp_packet_transmit(struct sctp_packet *packet)
>   	 * by CRC32-C as described in <draft-ietf-tsvwg-sctpcsum-02.txt>.
>   	 */
>   	if (!sctp_checksum_disable) {
> -		if (!(dst->dev->features & NETIF_F_SCTP_CSUM)) {
> +		if ((!(dst->dev->features & NETIF_F_SCTP_CSUM)) ||
> +			is_xfrm_armed(dst)) {
> +
>   			__u32 crc32 = sctp_start_cksum((__u8 *)sh, cksum_buf_len);
>
>   			/* 3) Put the resultant value into the checksum field in the
>

^ permalink raw reply

* [PATCHv2 RESEND] {xfrm, sctp} Stick to software crc32 even if hardware is capable of that
From: Fan Du @ 2013-10-14  7:27 UTC (permalink / raw)
  To: vyasevich, nhorman; +Cc: steffen.klassert, davem, netdev

igb/ixgbe have hardware sctp checksum support, when this feature is enabled
and also IPsec is armed to protect sctp traffic, ugly things happened as
xfrm_output checks CHECKSUM_PARTIAL to do check sum operation(sum every thing
up and pack the 16bits result in the checksum field). The result is fail
establishment of sctp communication.

Signed-off-by: Fan Du <fan.du@windriver.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
---
 net/sctp/output.c |   14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/net/sctp/output.c b/net/sctp/output.c
index 0ac3a65..6de6402 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -372,6 +372,16 @@ static void sctp_packet_set_owner_w(struct sk_buff *skb, struct sock *sk)
 	atomic_inc(&sk->sk_wmem_alloc);
 }
 
+static int is_xfrm_armed(struct dst_entry *dst)
+{
+#ifdef CONFIG_XFRM
+	/* If dst->xfrm is valid, this skb needs to be transformed */
+	return dst->xfrm != NULL;
+#else
+	return 0;
+#endif
+}
+
 /* All packets are sent to the network through this function from
  * sctp_outq_tail().
  *
@@ -536,7 +546,9 @@ int sctp_packet_transmit(struct sctp_packet *packet)
 	 * by CRC32-C as described in <draft-ietf-tsvwg-sctpcsum-02.txt>.
 	 */
 	if (!sctp_checksum_disable) {
-		if (!(dst->dev->features & NETIF_F_SCTP_CSUM)) {
+		if ((!(dst->dev->features & NETIF_F_SCTP_CSUM)) ||
+			is_xfrm_armed(dst)) {
+
 			__u32 crc32 = sctp_start_cksum((__u8 *)sh, cksum_buf_len);
 
 			/* 3) Put the resultant value into the checksum field in the
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCHv2 2/2] Don't compute checksum value for SCTP skb with, CHECKSUM_PARTIAL set
From: Fan Du @ 2013-10-14  7:16 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: Neil Horman, steffen.klassert, davem, netdev
In-Reply-To: <b4c6105a-057c-4fd7-9b49-99df902447d8@email.android.com>



On 2013年10月12日 21:06, Vlad Yasevich wrote:
>
>
> Fan Du<fan.du@windriver.com>  wrote:
>
>>
>>
>> On 2013年10月11日 22:25, Vlad Yasevich wrote:
>>> On 10/11/2013 03:08 AM, Fan Du wrote:
>>>>
>>>>
>>>> On 2013年10月10日 21:11, Neil Horman wrote:
>>>>> On Thu, Oct 10, 2013 at 01:51:36PM +0800, Fan Du wrote:
>>>>>> igb/ixgbe have hardware sctp checksum support, when this feature
>> is
>>>>>> enabled
>>>>>> and also IPsec is armed to protect sctp traffic, ugly things
>> happened as
>>>>>> xfrm_output checks CHECKSUM_PARTIAL to do check sum operation(sum
>>>>>> every thing
>>>>>> up and pack the 16bits result in the checksum field). The result
>> is fail
>>>>>> establishment of sctp communication.
>>>>>>
>>>>> Shouldn't this be fixed in the xfrm code then? E.g. check the
>> device
>>>>> features
>>>>> for SCTP checksum offloading and and skip the checksum during xfrm
>>>>> output if its
>>>>> available?
>>>>>
>>>>> Or am I missing something?
>>>>> Neil
>>>>>
>>>>>
>>>>
>>>>
>>>>  From cff25810910603ff991f0c56441d6de8dc33a822 Mon Sep 17 00:00:00
>> 2001
>>>> From: Fan Du<fan.du@windriver.com>
>>>> Date: Fri, 11 Oct 2013 14:31:57 +0800
>>>> Subject: [PATCH 2/2] Don't compute checksum value for SCTP skb with
>>>> CHECKSUM_PARTIAL set
>>>> MIME-Version: 1.0
>>>> Content-Type: text/plain; charset=UTF-8
>>>> Content-Transfer-Encoding: 8bit
>>>>
>>>> IPsec is not enabled in this scenario:
>>>>
>>>> SCTP skb set CHECKSUM_PARTIAL to indicate hardware is capable of
>> doing
>>>> SCTP checksum(crc32-c) scoping the whole SCTP packet range. However
>> when
>>>> such kind of skb is delivered through IPv4/v6 output handler,
>> IPv4/v6 stack
>>>> interpret CHECKSUM_PARTIAL by calling skb_checksum_help to compute
>> 16bits
>>>> checksum value by summing everything up, the whole SCTP packet in
>> software
>>>> manner！ After this skb reach NIC, after hardware doing its SCTP
>> checking
>>>> business, a crc32-c value will overwrite the value IPv4/v6 stack
>> computed
>>>> before.
>>>>
>>>> This patch solves this by introducing skb_is_sctpv4/6 to optimize
>> such
>>>> case.
>>>>
>>>> Signed-off-by: Fan Du<fan.du@windriver.com>
>>>> ---
>>>> v2:
>>>> Rework this problem by introducing skb_is_scktv4/6
>>>>
>>>> ---
>>>> include/linux/ip.h | 5 +++++
>>>> include/linux/ipv6.h | 6 ++++++
>>>> include/linux/skbuff.h | 1 -
>>>> net/core/skbuff.c | 1 +
>>>> net/ipv4/ip_output.c | 4 +++-
>>>> net/ipv6/ip6_output.c | 1 +
>>>> net/xfrm/xfrm_output.c | 20 +++++++++++++++-----
>>>> 7 files changed, 31 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/include/linux/ip.h b/include/linux/ip.h
>>>> index 492bc65..f556292 100644
>>>> --- a/include/linux/ip.h
>>>> +++ b/include/linux/ip.h
>>>> @@ -19,6 +19,7 @@
>>>>
>>>> #include<linux/skbuff.h>
>>>> #include<uapi/linux/ip.h>
>>>> +#include<uapi/linux/in.h>
>>>>
>>>> static inline struct iphdr *ip_hdr(const struct sk_buff *skb)
>>>> {
>>>> @@ -34,4 +35,8 @@ static inline struct iphdr *ipip_hdr(const struct
>>>> sk_buff *skb)
>>>> {
>>>> return (struct iphdr *)skb_transport_header(skb);
>>>> }
>>>> +static inline int skb_is_sctpv4(const struct sk_buff *skb)
>>>> +{
>>>> + return ip_hdr(skb)->protocol == IPPROTO_SCTP;
>>>> +}
>>>> #endif /* _LINUX_IP_H */
>>>> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
>>>> index 28ea384..6e17c04 100644
>>>> --- a/include/linux/ipv6.h
>>>> +++ b/include/linux/ipv6.h
>>>> @@ -2,6 +2,7 @@
>>>> #define _IPV6_H
>>>>
>>>> #include<uapi/linux/ipv6.h>
>>>> +#include<uapi/linux/in.h>
>>>>
>>>> #define ipv6_optlen(p) (((p)->hdrlen+1)<<  3)
>>>> /*
>>>> @@ -387,4 +388,9 @@ static inline struct raw6_sock *raw6_sk(const
>> struct
>>>> sock *sk)
>>>> ((__sk)->sk_bound_dev_if == (__dif)))&&  \
>>>> net_eq(sock_net(__sk), (__net)))
>>>>
>>>> +static inline int skb_is_sctpv6(const struct sk_buff *skb)
>>>> +{
>>>> + return ipv6_hdr(skb)->nexthdr == IPPROTO_SCTP;
>>>> +}
>>>> +
>>>> #endif /* _IPV6_H */
>>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>>>> index 2ddb48d..b36d0cc 100644
>>>> --- a/include/linux/skbuff.h
>>>> +++ b/include/linux/skbuff.h
>>>> @@ -2393,7 +2393,6 @@ extern void skb_split(struct sk_buff *skb,
>>>> extern int skb_shift(struct sk_buff *tgt, struct sk_buff *skb,
>>>> int shiftlen);
>>>> extern void skb_scrub_packet(struct sk_buff *skb, bool xnet);
>>>> -
>>>> extern struct sk_buff *skb_segment(struct sk_buff *skb,
>>>> netdev_features_t features);
>>>>
>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>>> index d81cff1..54d6172 100644
>>>> --- a/net/core/skbuff.c
>>>> +++ b/net/core/skbuff.c
>>>> @@ -3526,3 +3526,4 @@ void skb_scrub_packet(struct sk_buff *skb,
>> bool xnet)
>>>> nf_reset_trace(skb);
>>>> }
>>>> EXPORT_SYMBOL_GPL(skb_scrub_packet);
>>>> +
>>>> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
>>>> index a04d872..8676b2c 100644
>>>> --- a/net/ipv4/ip_output.c
>>>> +++ b/net/ipv4/ip_output.c
>>>> @@ -587,7 +587,9 @@ slow_path_clean:
>>>>
>>>> slow_path:
>>>> /* for offloaded checksums cleanup checksum before fragmentation */
>>>> - if ((skb->ip_summed == CHECKSUM_PARTIAL)&&
>> skb_checksum_help(skb))
>>>> + if ((skb->ip_summed == CHECKSUM_PARTIAL)&&
>>>> + !skb_is_sctpv4(skb)&&
>>>> + skb_checksum_help(skb))
>>>> goto fail;
>>>> iph = ip_hdr(skb);
>>>>
>>>> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
>>>> index 3a692d5..9b27d95 100644
>>>> --- a/net/ipv6/ip6_output.c
>>>> +++ b/net/ipv6/ip6_output.c
>>>> @@ -671,6 +671,7 @@ slow_path_clean:
>>>>
>>>> slow_path:
>>>> if ((skb->ip_summed == CHECKSUM_PARTIAL)&&
>>>> + !skb_is_sctpv6(skb)&&
>>>> skb_checksum_help(skb))
>>>> goto fail;
>>>>
>>>
>>> No, this isn't right. This is a case where IP fragmentation is
>>> required, and the above code will cause SCTP checksum to not be
>>> computed.
>>
>> Ok, I got my ball, ten bucks bet this correct ;)
>>
>> IPv4:
>> after skb reach fragmentation part, CHECKSUM_PARTIAL denotes
>> L4 layer protocol need to be checksummed, then IPv4 checksum is
>> recomputed for each fragmented IPv4 packet.
>>
>> IPv6:
>> Here IPv6 doesn't need checksum for its header, again
>> CHECKSUM_PARTIAL denotes L4 layer protocol need to be checksummed.
>>
>> So all in all, this is the right place to distinguish SCTP skb out,
>> and skip checksum operation as hw does it thereafter.
>>
>
>   How does HW compute SCTP checksum when the data is split between skb?
> Each skb will be submitted separately to the HW.
> I think we need to fall back to SW checksum when packer will be fragmented.

Hi, Vlad

I understand your argument now, I finally applied a 82576 NIC with SCTP CHECKSUM
supported to test this. It seems when sending this super-sized packet,
sctp_datamsg_from_user fragments each packet to 1480 size already using pathmtu,
this pathmtu is equal or less than the interface device mtu, which means
ip_fragment/ip6_fragment didn't fragments any SCTP skb.

Host A(with 82576):
sctp_test -h 128.224.162.161 -p 5001 -H 128.224.162.220 -P 500 -x 1 -c 5 -s -T
                                                                      ^^^
                                                 set each packet to 32768 bytes

I cannot picture any scenario where a SCTP skb with CHECK_PARTIAL set to reach
ip_fragment/ip6_fragment. FWIW, the fix for xfrm_output part is definitely a
valid one.


> -vlad
>
>> Q.E.D.
>>
>>
>>> Looks like SCTP needs to compute the checksum in the case where
>>> skb will be fragmented.
>>>
>>> An alternative, that will also allow us to get rid of patch 1
>>> in the serices is to have a checksum handler offload function
>>> that can be used to compute checksum in this case.
>>>
>>> -vlad
>>>
>>>> diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
>>>> index 3bb2cdc..ddef94a 100644
>>>> --- a/net/xfrm/xfrm_output.c
>>>> +++ b/net/xfrm/xfrm_output.c
>>>> @@ -180,6 +180,14 @@ static int xfrm_output_gso(struct sk_buff *skb)
>>>> return 0;
>>>> }
>>>>
>>>> +static int skb_is_sctp(const struct sk_buff *skb)
>>>> +{
>>>> + if (skb->protocol == __constant_htons(ETH_P_IP))
>>>> + return skb_is_sctpv4(skb);
>>>> + else
>>>> + return skb_is_sctpv6(skb);
>>>> +}
>>>> +
>>>> int xfrm_output(struct sk_buff *skb)
>>>> {
>>>> struct net *net = dev_net(skb_dst(skb)->dev);
>>>> @@ -189,11 +197,13 @@ int xfrm_output(struct sk_buff *skb)
>>>> return xfrm_output_gso(skb);
>>>>
>>>> if (skb->ip_summed == CHECKSUM_PARTIAL) {
>>>> - err = skb_checksum_help(skb);
>>>> - if (err) {
>>>> - XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTERROR);
>>>> - kfree_skb(skb);
>>>> - return err;
>>>> + if (!skb_is_sctp(skb)) {
>>>> + err = skb_checksum_help(skb);
>>>> + if (err) {
>>>> + XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTERROR);
>>>> + kfree_skb(skb);
>>>> + return err;
>>>> + }
>>>> }
>>>> }
>>>>
>>>
>>>
>

-- 
浮沉随浪只记今朝笑

--fan

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox