Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v3] mac80211: port CCMP to cryptoapi's CCM driver
From: Johannes Berg @ 2013-10-11 13:44 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, patches-QSEj5FYQhm4dnm+yROfE0A
In-Reply-To: <1381391720-28771-1-git-send-email-ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>

On Thu, 2013-10-10 at 09:55 +0200, Ard Biesheuvel wrote:
> Use the generic CCM aead chaining mode driver rather than a local
> implementation that sits right on top of the core AES cipher.
> 
> This allows the use of accelerated implementations of either
> CCM as a whole or the CTR mode which it encapsulates.

Applied.

johannes

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH 14/14] net: smc91x: dont't use SMC_outw for fixing up halfword-aligned data
From: Matthew Leach @ 2013-10-11 13:52 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: ankit.jindal, steve.mcintyre, tushar.jagad, will.deacon,
	catalin.marinas, netdev, Nicolas Pitre, David S. Miller
In-Reply-To: <1381499540-28794-1-git-send-email-matthew.leach@arm.com>

From: Will Deacon <will.deacon@arm.com>

SMC_outw invokes an endian-aware I/O accessor, which may change the data
endianness before writing to the device. This is not suitable for data
transfers where the memory buffer is simply a string of bytes that does
not require any byte-swapping.

This patches fixes the smc91x SMC_PUSH_DATA macro so that it uses the
string I/O accessor for outputting the leading or trailing halfwords on
halfword-aligned buffers.

Cc: <netdev@vger.kernel.org>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 drivers/net/ethernet/smsc/smc91x.h |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/smsc/smc91x.h b/drivers/net/ethernet/smsc/smc91x.h
index 5730fe2..98eedb9 100644
--- a/drivers/net/ethernet/smsc/smc91x.h
+++ b/drivers/net/ethernet/smsc/smc91x.h
@@ -1124,8 +1124,7 @@ static const char * chip_ids[ 16 ] =  {
 			void __iomem *__ioaddr = ioaddr;		\
 			if (__len >= 2 && (unsigned long)__ptr & 2) {	\
 				__len -= 2;				\
-				SMC_outw(*(u16 *)__ptr, ioaddr,		\
-					DATA_REG(lp));		\
+				SMC_outsw(ioaddr, DATA_REG(lp), __ptr, 1); \
 				__ptr += 2;				\
 			}						\
 			if (SMC_CAN_USE_DATACS && lp->datacs)		\
@@ -1133,8 +1132,7 @@ static const char * chip_ids[ 16 ] =  {
 			SMC_outsl(__ioaddr, DATA_REG(lp), __ptr, __len>>2); \
 			if (__len & 2) {				\
 				__ptr += (__len & ~3);			\
-				SMC_outw(*((u16 *)__ptr), ioaddr,	\
-					 DATA_REG(lp));		\
+				SMC_outsw(ioaddr, DATA_REG(lp), __ptr, 1); \
 			}						\
 		} else if (SMC_16BIT(lp))				\
 			SMC_outsw(ioaddr, DATA_REG(lp), p, (l) >> 1);	\
-- 
1.7.9.5

^ permalink raw reply related

* RE: [PATCH net-next v3 5/5] xen-netback: enable IPv6 TCP GSO to the guest
From: Paul Durrant @ 2013-10-11 14:03 UTC (permalink / raw)
  To: Wei Liu
  Cc: xen-devel@lists.xen.org, netdev@vger.kernel.org, Wei Liu,
	David Vrabel, Ian Campbell
In-Reply-To: <20131011100950.GF15556@zion.uk.xensource.com>

> -----Original Message-----
> From: Wei Liu [mailto:wei.liu2@citrix.com]
> Sent: 11 October 2013 11:10
> To: Paul Durrant
> Cc: xen-devel@lists.xen.org; netdev@vger.kernel.org; Wei Liu; David Vrabel;
> Ian Campbell
> Subject: Re: [PATCH net-next v3 5/5] xen-netback: enable IPv6 TCP GSO to
> the guest
> 
> On Thu, Oct 10, 2013 at 04:25:33PM +0100, Paul Durrant wrote:
> [...]
> >  	return max;
> > @@ -312,6 +312,7 @@ static struct xenvif_rx_meta
> *get_next_rx_buffer(struct xenvif *vif,
> >  	req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
> >
> >  	meta = npo->meta + npo->meta_prod++;
> > +	meta->gso_type = 0;
> 
> Should use XEN_NETIF_GSO_TYPE_NONE here.
> 
> >  	meta->gso_size = 0;
> >  	meta->size = 0;
> [...]
> > +	if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) {
> > +		gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
> > +		gso_size = skb_shinfo(skb)->gso_size;
> > +	} else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6) {
> > +		gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
> > +		gso_size = skb_shinfo(skb)->gso_size;
> > +	} else {
> > +		gso_type = 0;
> 
> Ditto.
> 
> > +		gso_size = 0;
> > +	}
> > +
> [...]
> > @@ -438,10 +461,13 @@ static int xenvif_gop_skb(struct sk_buff *skb,
> >  	req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
> >  	meta = npo->meta + npo->meta_prod++;
> >
> > -	if (!vif->gso_prefix)
> > -		meta->gso_size = skb_shinfo(skb)->gso_size;
> > -	else
> > +	if ((1 << gso_type) & vif->gso_mask) {
> > +		meta->gso_type = gso_type;
> > +		meta->gso_size = gso_size;
> > +	} else {
> > +		meta->gso_type = 0;
> 
> Ditto.
> 
> >  		meta->gso_size = 0;
> > +	}
> >
> >  	meta->size = 0;
> >  	meta->id = req->id;
> > @@ -587,7 +613,8 @@ void xenvif_rx_action(struct xenvif *vif)
> >
> >  		vif = netdev_priv(skb->dev);
> >
> > -		if (vif->meta[npo.meta_cons].gso_size && vif->gso_prefix) {
> > +		if ((1 << vif->meta[npo.meta_cons].gso_type) &
> > +		    vif->gso_prefix_mask) {
> 
> The logic is changed here. Why do you remove the test on gso_size?
> 

Sorry, I missed this one before so never responded...

The test on gso_size is no longer necessary, because if gso_size is 0 then gso_type will be XEN_NETIF_GSO_TYPE_NONE and that will never appear in the masks.

  Paul

> >  			resp = RING_GET_RESPONSE(&vif->rx,
> >  						 vif->rx.rsp_prod_pvt++);
> >
> > @@ -624,7 +651,8 @@ void xenvif_rx_action(struct xenvif *vif)
> >  					vif->meta[npo.meta_cons].size,
> >  					flags);
> >
> > -		if (vif->meta[npo.meta_cons].gso_size && !vif->gso_prefix)
> {
> > +		if ((1 << vif->meta[npo.meta_cons].gso_type) &
> > +		    vif->gso_mask) {
> 
> Ditto.
> 
> >  			struct xen_netif_extra_info *gso =
> >  				(struct xen_netif_extra_info *)
> >  				RING_GET_RESPONSE(&vif->rx,
> > @@ -632,8 +660,8 @@ void xenvif_rx_action(struct xenvif *vif)
> >
> [...]
> >  	 * was turned on by a zero value in (or lack of)
> > diff --git a/include/xen/interface/io/netif.h
> b/include/xen/interface/io/netif.h
> > index d7dd8d7..41674bc 100644
> > --- a/include/xen/interface/io/netif.h
> > +++ b/include/xen/interface/io/netif.h
> > @@ -113,6 +113,7 @@ struct xen_netif_tx_request {
> >  #define  XEN_NETIF_EXTRA_FLAG_MORE
> 	(1U<<_XEN_NETIF_EXTRA_FLAG_MORE)
> >
> >  /* GSO types */
> > +#define XEN_NETIF_GSO_TYPE_NONE         (0)
> 
> Indentation.
> 
> Wei.

^ permalink raw reply

* Re: [PATCHv2 1/2 ] {xfrm, sctp} Stick to software crc32 even if hardware is capable of that
From: Vlad Yasevich @ 2013-10-11 14:04 UTC (permalink / raw)
  To: Fan Du, Neil Horman, steffen.klassert, davem; +Cc: netdev
In-Reply-To: <5257A349.3010605@windriver.com>

On 10/11/2013 03:05 AM, Fan Du wrote:
>
>
> On 2013年10月10日 21:11, Neil Horman wrote:
>> On Thu, Oct 10, 2013 at 01:51:36PM +0800, Fan Du wrote:
>>> igb/ixgbe have hardware sctp checksum support, when this feature is
>>> enabled
>>> and also IPsec is armed to protect sctp traffic, ugly things happened as
>>> xfrm_output checks CHECKSUM_PARTIAL to do check sum operation(sum
>>> every thing
>>> up and pack the 16bits result in the checksum field). The result is fail
>>> establishment of sctp communication.
>>>
>> Shouldn't this be fixed in the xfrm code then?  E.g. check the device
>> features
>> for SCTP checksum offloading and and skip the checksum during xfrm
>> output if its
>> available?
>>
>> Or am I missing something?
>> Neil
>>
>>
>
>
>  From 014276de0877f11d46e1704114a7d91f19221a63 Mon Sep 17 00:00:00 2001
> From: Fan Du <fan.du@windriver.com>
> Date: Fri, 11 Oct 2013 14:24:33 +0800
> Subject: [PATCH 1/2] {xfrm, sctp} Stick to software crc32 even if
> hardware is
>   capable of that
>
> igb/ixgbe have hardware sctp checksum support, when this feature is enabled
> and also IPsec is armed to protect sctp traffic, ugly things happened as
> xfrm_output checks CHECKSUM_PARTIAL to do check sum operation(sum every
> thing
> up and pack the 16bits result in the checksum field). The result is fail
> establishment of sctp communication.
>
> Signed-off-by: Fan Du <fan.du@windriver.com>
> ---
> v2:
>    Leave ip_summed as CHECKSUM_PARTIAL as before, the second patch will
> fix this.
>
> ---
>   net/sctp/output.c |   14 +++++++++++++-
>   1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/net/sctp/output.c b/net/sctp/output.c
> index 0ac3a65..6de6402 100644
> --- a/net/sctp/output.c
> +++ b/net/sctp/output.c
> @@ -372,6 +372,16 @@ static void sctp_packet_set_owner_w(struct sk_buff
> *skb, struct sock *sk)
>       atomic_inc(&sk->sk_wmem_alloc);
>   }
>
> +static int is_xfrm_armed(struct dst_entry *dst)
> +{
> +#ifdef CONFIG_XFRM
> +    /* If dst->xfrm is valid, this skb needs to be transformed */
> +    return dst->xfrm != NULL;
> +#else
> +    return 0;
> +#endif
> +}
> +

I would really prefer to have an accessor function to dst->xfrm, but
I see that everyone codes it inside the #ifdef.  Gack.

>   /* All packets are sent to the network through this function from
>    * sctp_outq_tail().
>    *
> @@ -536,7 +546,9 @@ int sctp_packet_transmit(struct sctp_packet *packet)
>        * by CRC32-C as described in <draft-ietf-tsvwg-sctpcsum-02.txt>.
>        */
>       if (!sctp_checksum_disable) {
> -        if (!(dst->dev->features & NETIF_F_SCTP_CSUM)) {
> +        if ((!(dst->dev->features & NETIF_F_SCTP_CSUM)) ||
> +            is_xfrm_armed(dst)) {
> +
>               __u32 crc32 = sctp_start_cksum((__u8 *)sh, cksum_buf_len);
>
>               /* 3) Put the resultant value into the checksum field in the

Acked-by: Vlad Yasevich <vyasevich@gmail.com>

-vlad

^ permalink raw reply

* Re: [PATCH net 0/4] bridge: Fix problems around the PVID
From: Vlad Yasevich @ 2013-10-11 14:14 UTC (permalink / raw)
  To: Toshiaki Makita
  Cc: Toshiaki Makita, David Miller, netdev, Fernando Luis Vazquez Cao,
	Patrick McHardy
In-Reply-To: <1381476853.3172.40.camel@ubuntu-vm-makita>

On 10/11/2013 03:34 AM, Toshiaki Makita wrote:
> On Wed, 2013-10-09 at 11:01 -0400, Vlad Yasevich wrote:
>> On 10/01/2013 07:56 AM, Toshiaki Makita wrote:
>>> On Mon, 2013-09-30 at 12:01 -0400, Vlad Yasevich wrote:
>>>> On 09/30/2013 07:46 AM, Toshiaki Makita wrote:
>>>>> On Fri, 2013-09-27 at 14:10 -0400, Vlad Yasevich wrote:
>>>>>> On 09/27/2013 01:11 PM, Toshiaki Makita wrote:
>>>>>>> On Thu, 2013-09-26 at 10:22 -0400, Vlad Yasevich wrote:
>>>>>>>> On 09/26/2013 06:38 AM, Toshiaki Makita wrote:
>>>>>>>>> On Tue, 2013-09-24 at 13:55 -0400, Vlad Yasevich wrote:
>>>>>>>>>> On 09/24/2013 01:30 PM, Toshiaki Makita wrote:
>>>>>>>>>>> On Tue, 2013-09-24 at 09:35 -0400, Vlad Yasevich wrote:
>>>>>>>>>>>> On 09/24/2013 07:45 AM, Toshiaki Makita wrote:
>>>>>>>>>>>>> On Mon, 2013-09-23 at 10:41 -0400, Vlad Yasevich
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> On 09/17/2013 04:12 AM, Toshiaki Makita wrote:
>>>>>>>>>>>>>>> On Mon, 2013-09-16 at 13:49 -0400, Vlad Yasevich
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> On 09/13/2013 08:06 AM, Toshiaki Makita wrote:
>>>>>>>>>>>>>>>>> On Thu, 2013-09-12 at 16:00 -0400, David
>>>>>>>>>>>>>>>>> Miller wrote:
>>>>>>>>>>>>>>>>>> From: Toshiaki Makita
>>>>>>>>>>>>>>>>>> <makita.toshiaki@lab.ntt.co.jp> Date: Tue,
>>>>>>>>>>>>>>>>>> 10 Sep 2013 19:27:54 +0900
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> There seem to be some undesirable
>>>>>>>>>>>>>>>>>>> behaviors related with PVID. 1. It has no
>>>>>>>>>>>>>>>>>>> effect assigning PVID to a port. PVID
>>>>>>>>>>>>>>>>>>> cannot be applied to any frame regardless
>>>>>>>>>>>>>>>>>>> of whether we set it or not. 2. FDB
>>>>>>>>>>>>>>>>>>> entries learned via frames applied PVID
>>>>>>>>>>>>>>>>>>> are registered with VID 0 rather than VID
>>>>>>>>>>>>>>>>>>> value of PVID. 3. We can set 0 or 4095 as
>>>>>>>>>>>>>>>>>>> a PVID that are not allowed in IEEE
>>>>>>>>>>>>>>>>>>> 802.1Q. This leads interoperational
>>>>>>>>>>>>>>>>>>> problems such as sending frames with VID
>>>>>>>>>>>>>>>>>>> 4095, which is not allowed in IEEE
>>>>>>>>>>>>>>>>>>> 802.1Q, and treating frames with VID 0 as
>>>>>>>>>>>>>>>>>>> they belong to VLAN 0, which is expected
>>>>>>>>>>>>>>>>>>> to be handled as they have no VID
>>>>>>>>>>>>>>>>>>> according to IEEE 802.1Q.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Note: 2nd and 3rd problems are potential
>>>>>>>>>>>>>>>>>>> and not exposed unless 1st problem is
>>>>>>>>>>>>>>>>>>> fixed, because we cannot activate PVID
>>>>>>>>>>>>>>>>>>> due to it.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Please work out the issues in patch #2 with
>>>>>>>>>>>>>>>>>> Vlad and resubmit this series.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thank you.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I'm hovering between whether we should fix
>>>>>>>>>>>>>>>>> the issue by changing vlan 0 interface
>>>>>>>>>>>>>>>>> behavior in 8021q module or enabling a bridge
>>>>>>>>>>>>>>>>> port to sending priority-tagged frames, or
>>>>>>>>>>>>>>>>> another better way.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> If you could comment it, I'd appreciate it
>>>>>>>>>>>>>>>>> :)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> BTW, I think what is discussed in patch #2 is
>>>>>>>>>>>>>>>>> another problem about handling priority-tags,
>>>>>>>>>>>>>>>>> and it exists without this patch set
>>>>>>>>>>>>>>>>> applied. It looks like that we should prepare
>>>>>>>>>>>>>>>>> another patch set than this to fix that
>>>>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Should I include patches that fix the
>>>>>>>>>>>>>>>>> priority-tags problem in this patch set and
>>>>>>>>>>>>>>>>> resubmit them all together?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I am thinking that we might need to do it in
>>>>>>>>>>>>>>>> bridge and it looks like the simplest way to do
>>>>>>>>>>>>>>>> it is to have default priority regeneration
>>>>>>>>>>>>>>>> table (table 6-5 from 802.1Q doc).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> That way I think we would conform to the spec.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -vlad
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Unfortunately I don't think the default priority
>>>>>>>>>>>>>>> regeneration table resolves the problem because
>>>>>>>>>>>>>>> IEEE 802.1Q says that a VLAN-aware bridge can
>>>>>>>>>>>>>>> transmit untagged or VLAN-tagged frames only (the
>>>>>>>>>>>>>>> end of section 7.5 and 8.1.7).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> No mechanism to send priority-tagged frames is
>>>>>>>>>>>>>>> found as far as I can see the standard. I think
>>>>>>>>>>>>>>> the regenerated priority is used for outgoing
>>>>>>>>>>>>>>> PCP field only if egress policy is not untagged
>>>>>>>>>>>>>>> (i.e. transmitting as VLAN-tagged), and unused if
>>>>>>>>>>>>>>> untagged (Section 6.9.2 3rd/4th Paragraph).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If we want to transmit priority-tagged frames
>>>>>>>>>>>>>>> from a bridge port, I think we need to implement
>>>>>>>>>>>>>>> a new (optional) feature that is above the
>>>>>>>>>>>>>>> standard, as I stated previously.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> How do you feel about adding a per-port policy
>>>>>>>>>>>>>>> that enables a bridge to send priority-tagged
>>>>>>>>>>>>>>> frames instead of untagged frames when egress
>>>>>>>>>>>>>>> policy for the port is untagged? With this
>>>>>>>>>>>>>>> change, we can transmit frames for a given vlan
>>>>>>>>>>>>>>> as either all untagged, all priority-tagged or
>>>>>>>>>>>>>>> all VLAN-tagged.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> That would work.  What I am thinking is that we do
>>>>>>>>>>>>>> it by special casing the vid 0 egress policy
>>>>>>>>>>>>>> specification.  Let it be untagged by default and
>>>>>>>>>>>>>> if it is tagged, then we preserve the priority
>>>>>>>>>>>>>> field and forward it on.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This keeps the API stable and doesn't require
>>>>>>>>>>>>>> user/admin from knowing exactly what happens.
>>>>>>>>>>>>>> Default operation conforms to the spec and allows
>>>>>>>>>>>>>> simple change to make it backward-compatible.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What do you think.  I've done a simple prototype of
>>>>>>>>>>>>>> this an it seems to work with the VMs I am testing
>>>>>>>>>>>>>> with.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Are you saying that - by default, set the 0th bit of
>>>>>>>>>>>>> untagged_bitmap; and - if we unset the 0th bit and
>>>>>>>>>>>>> set the "vid"th bit, we transmit frames classified as
>>>>>>>>>>>>> belonging to VLAN "vid" as priority-tagged?
>>>>>>>>>>>>>
>>>>>>>>>>>>> If so, though it's attractive to keep current API,
>>>>>>>>>>>>> I'm worried about if it could be a bit confusing and
>>>>>>>>>>>>> not intuitive for kernel/iproute2 developers that VID
>>>>>>>>>>>>> 0 has a special meaning only in the egress policy.
>>>>>>>>>>>>> Wouldn't it be better to adding a new member to
>>>>>>>>>>>>> struct net_port_vlans instead of using VID 0 of
>>>>>>>>>>>>> untagged_bitmap?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Or are you saying that we use a new flag in struct
>>>>>>>>>>>>> net_port_vlans but use the BRIDGE_VLAN_INFO_UNTAGGED
>>>>>>>>>>>>> bit with VID 0 in netlink to set the flag?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Even in that case, I'm afraid that it might be
>>>>>>>>>>>>> confusing for developers for the same reason. We are
>>>>>>>>>>>>> going to prohibit to specify VID with 0 (and 4095) in
>>>>>>>>>>>>> adding/deleting a FDB entry or a vlan filtering
>>>>>>>>>>>>> entry, but it would allow us to use VID 0 only when a
>>>>>>>>>>>>> vlan filtering entry is configured. I am thinking a
>>>>>>>>>>>>> new nlattr is a straightforward approach to
>>>>>>>>>>>>> configure it.
>>>>>>>>>>>>
>>>>>>>>>>>> By making this an explicit attribute it makes vid 0 a
>>>>>>>>>>>> special case for any automatic tool that would
>>>>>>>>>>>> provision such filtering.  Seeing vid 0 would mean that
>>>>>>>>>>>> these tools would have to know that this would have to
>>>>>>>>>>>> be translated to a different attribute instead of
>>>>>>>>>>>> setting the policy values.
>>>>>>>>>>>
>>>>>>>>>>> Yes, I agree with you that we can do it by the way you
>>>>>>>>>>> explained. What I don't understand is the advantage of
>>>>>>>>>>> using vid 0 over another way such as adding a new
>>>>>>>>>>> nlattr. I think we can indicate transmitting
>>>>>>>>>>> priority-tags explicitly by such a nlattr. Using vid 0
>>>>>>>>>>> seems to be easier to implement than a new nlattr, but,
>>>>>>>>>>> for me, it looks less intuitive and more difficult to
>>>>>>>>>>> maintain because we have to care about vid 0 instead of
>>>>>>>>>>> simply ignoring it.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The point I am trying to make is that regardless of the
>>>>>>>>>> approach someone has to know what to do when enabling
>>>>>>>>>> priority tagged frames.  You proposal would require the
>>>>>>>>>> administrator or config tool to have that knowledge.
>>>>>>>>>> Example is: Admin does: bridge vlan set priority on dev
>>>>>>>>>> eth0 Automated app: if (vid == 0) /* Turn on priority
>>>>>>>>>> tagged frame support */
>>>>>>>>>>
>>>>>>>>>> My proposal would require the bridge filtering
>>>>>>>>>> implementation to have it. user tool: bridge vlan add vid 0
>>>>>>>>>> tagged Automated app:  No special case.
>>>>>>>>>>
>>>>>>>>>> IMO its better to have 1 piece code handling the special
>>>>>>>>>> case then putting it multiple places.
>>>>>>>>>
>>>>>>>>> Thank you for the detailed explanation. Now I understand your
>>>>>>>>> intention.
>>>>>>>>>
>>>>>>>>> I have one question about your proposal. I guess the way to
>>>>>>>>> enable priority-tagged is something like bridge vlan add vid
>>>>>>>>> 10 dev eth0 bridge vlan add vid 10 dev vnet0 pvid untagged
>>>>>>>>> bridge vlan add vid 0 dev vnet0 tagged where vnet0 has sub
>>>>>>>>> interface vnet0.0.
>>>>>>>>>
>>>>>>>>> Here the admin have to know the egress policy is applied to a
>>>>>>>>> frame twice in a certain order when it is transmitted from
>>>>>>>>> the port vnet0 attached, that is, first, a frame with vid 10
>>>>>>>>> get untagged, and then, an untagged frame get
>>>>>>>>> priority-tagged.
>>>>>>>>>
>>>>>>>>> This behavior looks difficult to know without previous
>>>>>>>>> knowledge. Any good idea to avoid such a need for the admin's
>>>>>>>>> additional knowledge?
>>>>>>>>
>>>>>>>> To me, the fact that there is vnet0.0 (or typically, there is
>>>>>>>> eth0.0 in the guest or on the remote host) already tells the
>>>>>>>> admin vlan 0 has to be tagged.  The fact that we codify this in
>>>>>>>> the policy makes it explicit.
>>>>>>>
>>>>>>> My worry is that the admin might not be able to guess how to use
>>>>>>> bridge commands to enable priority-tag without any additional
>>>>>>> hint in "man bridge", "bridge vlan help", etc. I actually
>>>>>>> couldn't hit upon such a usage before seeing example commands you
>>>>>>> gave, because I had never think the egress policy could be
>>>>>>> applied twice.
>>>>>>>
>>>>>>>>
>>>>>>>> However, I can see strong argument to be made for an addition
>>>>>>>> egress policy attribute that could be for instance:
>>>>>>>>
>>>>>>>> bridge vlan add vid 10 dev eth0 pvid bridge vlan add vid 10 dev
>>>>>>>> vnet0 pvid untagged prio_tag
>>>>>>>>
>>>>>>>> But this has the same connotations as wrt to egress policy.
>>>>>>>> The 2 policies are applied: (1) untag the frame. (2) add
>>>>>>>> priority_tag.
>>>>>>>>
>>>>>>>> (2) only happens if initial fame received on eth0 was priority
>>>>>>>> tagged.
>>>>>>>
>>>>>>> If we do so, we will not be able to communicate using vlan 0
>>>>>>> interface under a certain circumstance. Eth0 can be receive mixed
>>>>>>> untagged and priority-tagged frames according to the network
>>>>>>> element it is connected to: for example, Open vSwitch can send
>>>>>>> such two kinds of frames from the same port even if original
>>>>>>> incoming frames belong to the same vlan.
>>>>>>
>>>>>> Which priority would you assign to the frame that was received
>>>>>> untagged?
>>>>>
>>>>> Untagged frame's priority is by default 0, so I think 0 is likely.
>>>>>
>>>>> 802.1Q 6.9.1 i) The received priority value and the drop_eligible
>>>>> parameter value are the values in the M_UNITDATA.indication.
>>>>>
>>>>> M_UNITDATA is passed from ISS.
>>>>>
>>>>> 802.1Q 6.7.1 The priority parameter provided in a data indication
>>>>> primitive shall take the value of the Default User Priority parameter
>>>>> for the Port through which the MAC frame was received. The default
>>>>> value of this parameter is 0, it may be set by management in which
>>>>> case the capability to set it to any of the values 0 through 7 shall
>>>>> be provided.
>>>>>
>>>>>>
>>>>>>> In this situation, we can only receive frames that is
>>>>>>> priority-tagged when received on eth0.
>>>>>>
>>>>>> Not sure I understand.  Let's look at this config: bridge vlan add
>>>>>> vid 10 dev eth0 pvid bridge vlan add vid 10 dev vnet0 pvid untagged
>>>>>> prio_tag
>>>>>>
>>>>>> Here, eth0 is allowed to receive vid 10 tagged, untagged, and
>>>>>> prio_tagged (if we look at the patch 2 from this set). Now, frame
>>>>>> is forwarded to vnet0. 1) if the frame had vid 10 in the tag or was
>>>>>> untagged, it should probably be sent untagged. 2) if the frame had
>>>>>> a priority tag, it should probably be sent as such.
>>>>>>
>>>>>> Now, I think a case could be made that if the frame had any
>>>>>> priority markings in the vlan header, we should try to preserve
>>>>>> those markings if prio_tag is turned on.  We can assume value of 0
>>>>>> means not set.
>>>>>
>>>>> If we don't insert prio_tag when PCP is 0, we might receive mixed
>>>>> priority-tagged and untagged frames on eth0.
>>>>
>>>> Right, and that's what you were trying to handle in your patch:
>>>>
>>>>> +		/* PVID is set on this port.  Any untagged or priority-tagged +
>>>>> * ingress frame is considered to belong to this vlan. */
>>>>
>>>> So, in this case we are prepared to handle the "mixed" scenario on ingress.
>>>>
>>>>> Even if we are sending frames from eth0.0 with some priority other
>>>>> than 0, we could receive frames with priority 0 or untagged on the
>>>>> other side of the bridge.
>>>>> For example, if we receive untagged arp reply on the bridge port, we
>>>>> migit not be able to communicate with such an end station, because
>>>>> untagged reply will not be passed to eth0.0.
>>>>
>>>> So the ARP request was sent tagged, but the reply came back untagged?
>>>
>>> Yes, it can happen.
>>> These are problematic cases.
>>>
>>> Example 1:
>>>               prio_tagged         prio_tagged
>>> +------------+ ---> +------------+ ---> +----------+
>>> |guest eth0.0|------|host1 Bridge|------|host2 eth0|
>>> +------------+ <--- +------------+ <--- +----------+
>>>                untagged            untagged
>>>
>>> Note: Host2 eth0, which is an interface on Linux, can receive
>>> priority-tagged frames if it doesn't have vlan 0 interface (eth0.0).
>>
>> Hmm..  Just to see if this works, I ran the this scenario with
>> a dumb switch in the middle, and it did not work as you noted.
>> I then realized that one of the kernels was rather old and after
>> updating it, behaved differently.  The communication still didn't
>> work, but host2 behaved properly.
>>
>>>>
>>>> How does that work when the end station is attached directly to the
>>>> HW switch instead of a linux bridge?
>>>>
>>>> The station configures eth0.0 and sends priority-tagged traffic to
>>>> the HW switch.  If the HW switch sends back untagged traffic, then
>>>> the untagged traffic will never reach eth0.0.
>>>
>>> Currently we cannot communicate using eth0.0 via directly connected
>>> 802.1Q conformed switch, because we never receive priority-tagged frames
>>> from the switch.
>>> It is not a problem of Linux bridge and is why I wondered whether it
>>> should be fixed by bridge or vlan 0 interface.
>>>
>>>>
>>>>>
>>>>>>
>>>>>>> IMO, if prio_tag is configured, the bridge should send any
>>>>>>> untagged frame as priority-tagged regardless of whatever it is on
>>>>>>> eth0.
>>>>>>
>>>>>> Which priority would you use, 0?  You are not guaranteed to
>>>>>> properly deliver the traffic then for a configuration such as: VM:
>>>>>> eth0: 10.0.0.1/24 eth0.0: 10.0.1.1/24
>>>>>
>>>>> I'd like to use priority 0 for untagged frames.
>>>>>
>>>>> I am assuming that one of our goals is at least that eth0.0 comes to
>>>>> be able to communicate with another end station. It seems to be hard
>>>>> to use both eth0 and eth0.0 simultaneously.
>>>>
>>>> I understand, but I don't agree that we should always tag.
>>>>
>>>> Consider config:
>>>>
>>>>       hw switch <---> (eth0: Linux Bridge: eth1) <--- (em1.0:end station)
>>>>
>>>> If the end station sends priority-tagged traffic it should receive
>>>> priority tagged traffic back.  Otherwise, untagged traffic may be
>>>> dropped by the end station.  This is true whether it is connected to
>>>> the hw switch or Linux bridge.
>>>
>>> Though such a behavior is generally not necessary as far as I can read
>>> 802.1Q spec, it is essential for vlan 0 interface on Linux, I think.
>>> My proposal aims to resolve it at least when we use Linux bridge.
>>>
>>> Example configuration:
>>> 	bridge vlan add vid 10 dev eth1 pvid untagged
>>> 	bridge vlan add vid 10 dev eth0
>>> 	bridge vlan set prio_tag on dev eth1
>>>
>>> Intended behavior:
>>>
>>>           VID10-tagged                     prio_tagged
>>> +---------+ <--- +------------------------+ <--- +-----------------+
>>> |hw switch|------|eth0: Linux Bridge: eth1|------|em1.0:end station|
>>> +---------+ ---> +------------------------+ ---> +-----------------+
>>>           VID10-tagged                     prio_tagged
>>>                                 (always if egress policy untagged)
>>
>> Ok, I think you've convinced me that this is the right approach. The
>> only thing that I am not crazy about is the API.  I'd almost want to
>> introduce a new flag that can be set in a 'vlan add' or 'vlan set'
>> command that communicates a new policy.
>
> I'm glad that we reached a consensus on the approach :)
>
> I agree with you that the API is flag based.
> I'm guessing your intention is that 'vlan add' means a per vlan per port
> policy and 'vlan set' means a per port one, that is,
> 	'vlan add': bridge vlan add vid 10 dev eth1 prio_tag
> 	'vlan set': bridge vlan set prio_tag on dev eth1
>
> I think they can behave differently only when we set untagged to
> multiple vlans on the same port.
>
> 'vlan add' example with vid 10 and 20:
> 	bridge vlan add vid 10 dev eth1 pvid untagged prio_tag
> 	bridge vlan add vid 10 dev eth0
> 	bridge vlan add vid 20 dev eth1 untagged
> 	bridge vlan add vid 20 dev eth2
>
>           VID10-tagged                  prio_tagged (from eth0)
> +---------+ ---> +------------------------+ ---> +-----------------+
> |hw switch|------|eth0                eth1|------|em1.0:end station|
> +---------+      |      Linux Bridge      | ---> +-----------------+
> +---------+      |                        | *untagged*
> |hw switch|------|eth2                    | (from eth2)
> +---------+ ---> +------------------------+
>           VID20-tagged
>

This is what I was thinking of, but I was actually considering that
untagged and prio_tag can not co-exist for the same vlan as they don't
really make sence together anymore.

So one can do:
	bridge vlan add vid 10 dev eth1 pvid prio_tag
	bridge vlan add vid 20 dev eth1 untagged

and recieve VLAN 10 as priority tagged and vlan 20 as untagged.

-vlad

>
> 'vlan set' example with vid 10 and 20:
> 	bridge vlan add vid 10 dev eth1 pvid untagged
> 	bridge vlan add vid 10 dev eth0
> 	bridge vlan add vid 20 dev eth1 untagged
> 	bridge vlan add vid 20 dev eth2
> 	bridge vlan set prio_tag on dev eth1
>
>           VID10-tagged                  prio_tagged (from eth0)
> +---------+ ---> +------------------------+ ---> +-----------------+
> |hw switch|------|eth0                eth1|------|em1.0:end station|
> +---------+      |      Linux Bridge      | ---> +-----------------+
> +---------+      |                        | prio_tagged
> |hw switch|------|eth2                    | (from eth2)
> +---------+ ---> +------------------------+
>           VID20-tagged
>
> Em1.0 can always receive traffic from eth1 if we adopt 'vlan set'.
> However, I cannot imagine when multiple untagged vlans is required, so
> cannot figure out whether 'vlan add' is useful or harmful.
> Anyway, both of approaches are OK with me.
>
> Thanks,
> Toshiaki Makita
>
>>
>> Thanks
>> -vlad
>>
>>>
>>> Thanks,
>>>
>>> Toshiaki Makita
>>>
>>>>
>>>> -vlad
>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Toshiaki Makita
>>>>>
>>>>>>
>>>>>> -vlad
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> I think I am ok with either approach.  Explicit vid 0 policy is
>>>>>>>> easier for automatic provisioning.   The flag based one is
>>>>>>>> easier for admin/ manual provisioning.
>>>>>>>
>>>>>>> Supposing we have to add something to help or man in any case, I
>>>>>>> think flag based is better. The format below seems to suitable
>>>>>>> for a per-port policy. bridge vlan set prio_tag on dev vnet0
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Toshiaki Makita
>>>>>>>
>>>>>>>>
>>>>>>>> -vlad.
>>>>>>>>
>>>>>>>> -vlad
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks -vlad
>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Toshiaki Makita
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> How it is implemented internally in the kernel isn't as
>>>>>>>>>>>> big of an issue. We can do it as a separate flag or as
>>>>>>>>>>>> part of existing policy.
>>>>>>>>>>>>
>>>>>>>>>>>> -vlad
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Toshiaki Makita
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -vlad
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Toshiaki Makita
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Toshiaki Makita
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> -- To unsubscribe from this list: send the
>>>>>>>>>>>>>>>>>> line "unsubscribe netdev" in the body of a
>>>>>>>>>>>>>>>>>> message to majordomo@vger.kernel.org More
>>>>>>>>>>>>>>>>>> majordomo info at
>>>>>>>>>>>>>>>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -- To unsubscribe from this list: send the line
>>>>>>>>>>>>>>> "unsubscribe netdev" in the body of a message to
>>>>>>>>>>>>>>> majordomo@vger.kernel.org More majordomo info at
>>>>>>>>>>>>>>> http://vger.kernel.org/majordomo-info.html
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

^ permalink raw reply

* Re: [PATCHv2 2/2] Don't compute checksum value for SCTP skb with, CHECKSUM_PARTIAL set
From: Vlad Yasevich @ 2013-10-11 14:25 UTC (permalink / raw)
  To: Fan Du, Neil Horman, steffen.klassert, davem; +Cc: netdev
In-Reply-To: <5257A3D6.3010002@windriver.com>

On 10/11/2013 03:08 AM, Fan Du wrote:
>
>
> On 2013年10月10日 21:11, Neil Horman wrote:
>> On Thu, Oct 10, 2013 at 01:51:36PM +0800, Fan Du wrote:
>>> igb/ixgbe have hardware sctp checksum support, when this feature is
>>> enabled
>>> and also IPsec is armed to protect sctp traffic, ugly things happened as
>>> xfrm_output checks CHECKSUM_PARTIAL to do check sum operation(sum
>>> every thing
>>> up and pack the 16bits result in the checksum field). The result is fail
>>> establishment of sctp communication.
>>>
>> Shouldn't this be fixed in the xfrm code then?  E.g. check the device
>> features
>> for SCTP checksum offloading and and skip the checksum during xfrm
>> output if its
>> available?
>>
>> Or am I missing something?
>> Neil
>>
>>
>
>
>  From cff25810910603ff991f0c56441d6de8dc33a822 Mon Sep 17 00:00:00 2001
> From: Fan Du <fan.du@windriver.com>
> Date: Fri, 11 Oct 2013 14:31:57 +0800
> Subject: [PATCH 2/2] Don't compute checksum value for SCTP skb with
>   CHECKSUM_PARTIAL set
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> IPsec is not enabled in this scenario:
>
> SCTP skb set CHECKSUM_PARTIAL to indicate hardware is capable of doing
> SCTP checksum(crc32-c) scoping the whole SCTP packet range. However when
> such kind of skb is delivered through IPv4/v6 output handler, IPv4/v6 stack
> interpret CHECKSUM_PARTIAL by calling skb_checksum_help to compute 16bits
> checksum value by summing everything up, the whole SCTP packet in software
> manner！ After this skb reach NIC, after hardware doing its SCTP checking
> business, a crc32-c value will overwrite the value IPv4/v6 stack computed
> before.
>
> This patch solves this by introducing skb_is_sctpv4/6 to optimize such
> case.
>
> Signed-off-by: Fan Du <fan.du@windriver.com>
> ---
> v2:
>     Rework this problem by introducing skb_is_scktv4/6
>
> ---
>   include/linux/ip.h     |    5 +++++
>   include/linux/ipv6.h   |    6 ++++++
>   include/linux/skbuff.h |    1 -
>   net/core/skbuff.c      |    1 +
>   net/ipv4/ip_output.c   |    4 +++-
>   net/ipv6/ip6_output.c  |    1 +
>   net/xfrm/xfrm_output.c |   20 +++++++++++++++-----
>   7 files changed, 31 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/ip.h b/include/linux/ip.h
> index 492bc65..f556292 100644
> --- a/include/linux/ip.h
> +++ b/include/linux/ip.h
> @@ -19,6 +19,7 @@
>
>   #include <linux/skbuff.h>
>   #include <uapi/linux/ip.h>
> +#include <uapi/linux/in.h>
>
>   static inline struct iphdr *ip_hdr(const struct sk_buff *skb)
>   {
> @@ -34,4 +35,8 @@ static inline struct iphdr *ipip_hdr(const struct
> sk_buff *skb)
>   {
>       return (struct iphdr *)skb_transport_header(skb);
>   }
> +static inline int skb_is_sctpv4(const struct sk_buff *skb)
> +{
> +    return ip_hdr(skb)->protocol == IPPROTO_SCTP;
> +}
>   #endif    /* _LINUX_IP_H */
> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> index 28ea384..6e17c04 100644
> --- a/include/linux/ipv6.h
> +++ b/include/linux/ipv6.h
> @@ -2,6 +2,7 @@
>   #define _IPV6_H
>
>   #include <uapi/linux/ipv6.h>
> +#include <uapi/linux/in.h>
>
>   #define ipv6_optlen(p)  (((p)->hdrlen+1) << 3)
>   /*
> @@ -387,4 +388,9 @@ static inline struct raw6_sock *raw6_sk(const struct
> sock *sk)
>         ((__sk)->sk_bound_dev_if == (__dif)))                && \
>        net_eq(sock_net(__sk), (__net)))
>
> +static inline int skb_is_sctpv6(const struct sk_buff *skb)
> +{
> +    return ipv6_hdr(skb)->nexthdr == IPPROTO_SCTP;
> +}
> +
>   #endif /* _IPV6_H */
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 2ddb48d..b36d0cc 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -2393,7 +2393,6 @@ extern void           skb_split(struct sk_buff *skb,
>   extern int           skb_shift(struct sk_buff *tgt, struct sk_buff *skb,
>                    int shiftlen);
>   extern void           skb_scrub_packet(struct sk_buff *skb, bool xnet);
> -
>   extern struct sk_buff *skb_segment(struct sk_buff *skb,
>                      netdev_features_t features);
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index d81cff1..54d6172 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3526,3 +3526,4 @@ void skb_scrub_packet(struct sk_buff *skb, bool xnet)
>       nf_reset_trace(skb);
>   }
>   EXPORT_SYMBOL_GPL(skb_scrub_packet);
> +
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index a04d872..8676b2c 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -587,7 +587,9 @@ slow_path_clean:
>
>   slow_path:
>       /* for offloaded checksums cleanup checksum before fragmentation */
> -    if ((skb->ip_summed == CHECKSUM_PARTIAL) && skb_checksum_help(skb))
> +    if ((skb->ip_summed == CHECKSUM_PARTIAL) &&
> +        !skb_is_sctpv4(skb) &&
> +        skb_checksum_help(skb))
>           goto fail;
>       iph = ip_hdr(skb);
>
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 3a692d5..9b27d95 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -671,6 +671,7 @@ slow_path_clean:
>
>   slow_path:
>       if ((skb->ip_summed == CHECKSUM_PARTIAL) &&
> +        !skb_is_sctpv6(skb) &&
>           skb_checksum_help(skb))
>           goto fail;
>

No, this isn't right.  This is a case where IP fragmentation is
required, and the above code will cause SCTP checksum to not be
computed.

Looks like SCTP needs to compute the checksum in the case where
skb will be fragmented.

An alternative, that will also allow us to get rid of patch 1
in the serices is to have a checksum handler offload function
that can be used to compute checksum in this case.

-vlad

> diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
> index 3bb2cdc..ddef94a 100644
> --- a/net/xfrm/xfrm_output.c
> +++ b/net/xfrm/xfrm_output.c
> @@ -180,6 +180,14 @@ static int xfrm_output_gso(struct sk_buff *skb)
>       return 0;
>   }
>
> +static int skb_is_sctp(const struct sk_buff *skb)
> +{
> +    if (skb->protocol == __constant_htons(ETH_P_IP))
> +        return skb_is_sctpv4(skb);
> +    else
> +        return skb_is_sctpv6(skb);
> +}
> +
>   int xfrm_output(struct sk_buff *skb)
>   {
>       struct net *net = dev_net(skb_dst(skb)->dev);
> @@ -189,11 +197,13 @@ int xfrm_output(struct sk_buff *skb)
>           return xfrm_output_gso(skb);
>
>       if (skb->ip_summed == CHECKSUM_PARTIAL) {
> -        err = skb_checksum_help(skb);
> -        if (err) {
> -            XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTERROR);
> -            kfree_skb(skb);
> -            return err;
> +        if (!skb_is_sctp(skb)) {
> +            err = skb_checksum_help(skb);
> +            if (err) {
> +                XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTERROR);
> +                kfree_skb(skb);
> +                return err;
> +            }
>           }
>       }
>

^ permalink raw reply

* Re: [PATCH] ipv6: Initialize ip6_tnl.hlen in gre tunnel even if no route is found
From: Hannes Frederic Sowa @ 2013-10-11 15:02 UTC (permalink / raw)
  To: Oussama Ghorbel
  Cc: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, linux-kernel
In-Reply-To: <1381427427-14880-1-git-send-email-ou.ghorbel@gmail.com>

On Thu, Oct 10, 2013 at 06:50:27PM +0100, Oussama Ghorbel wrote:
> The ip6_tnl.hlen (gre and ipv6 headers length) is independent from the
> outgoing interface, so it would be better to initialize it even when no
> route is found, otherwise its value will be zero.
> While I'm not sure if this could happen in real life, but doing that
> will avoid to call the skb_push function with a zero in ip6gre_header
> function.
> 
> Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Signed-off-by: Oussama Ghorbel <ou.ghorbel@gmail.com>

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Thanks,

  Hannes

^ permalink raw reply

* [PATCH net-next v4 1/5] xen-netback: add support for IPv6 checksum offload to guest
From: Paul Durrant @ 2013-10-11 15:06 UTC (permalink / raw)
  To: xen-devel, netdev; +Cc: Paul Durrant, Wei Liu, David Vrabel, Ian Campbell
In-Reply-To: <1381503982-1418-1-git-send-email-paul.durrant@citrix.com>

Check xenstore flag feature-ipv6-csum-offload to determine if a
guest is happy to accept IPv6 packets with only partial checksum.
Also check analogous feature-ip-csum-offload to determine if a
guest is happy to accept IPv4 packets with only partial checksum
as a replacement for a negated feature-no-csum-offload value and
add a comment to deprecate use of feature-no-csum-offload.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
---
 drivers/net/xen-netback/common.h    |    3 ++-
 drivers/net/xen-netback/interface.c |   10 +++++++---
 drivers/net/xen-netback/xenbus.c    |   24 +++++++++++++++++++++++-
 include/xen/interface/io/netif.h    |   10 ++++++++++
 4 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 5715318..b4a9a3c 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -153,7 +153,8 @@ struct xenvif {
 	u8 can_sg:1;
 	u8 gso:1;
 	u8 gso_prefix:1;
-	u8 csum:1;
+	u8 ip_csum:1;
+	u8 ipv6_csum:1;
 
 	/* Internal feature information. */
 	u8 can_queue:1;	    /* can queue packets for receiver? */
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 01bb854..8e92783 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -216,8 +216,10 @@ static netdev_features_t xenvif_fix_features(struct net_device *dev,
 		features &= ~NETIF_F_SG;
 	if (!vif->gso && !vif->gso_prefix)
 		features &= ~NETIF_F_TSO;
-	if (!vif->csum)
+	if (!vif->ip_csum)
 		features &= ~NETIF_F_IP_CSUM;
+	if (!vif->ipv6_csum)
+		features &= ~NETIF_F_IPV6_CSUM;
 
 	return features;
 }
@@ -306,7 +308,7 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	vif->domid  = domid;
 	vif->handle = handle;
 	vif->can_sg = 1;
-	vif->csum = 1;
+	vif->ip_csum = 1;
 	vif->dev = dev;
 
 	vif->credit_bytes = vif->remaining_credit = ~0UL;
@@ -316,7 +318,9 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	vif->credit_timeout.expires = jiffies;
 
 	dev->netdev_ops	= &xenvif_netdev_ops;
-	dev->hw_features = NETIF_F_SG | NETIF_F_IP_CSUM | NETIF_F_TSO;
+	dev->hw_features = NETIF_F_SG |
+		NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
+		NETIF_F_TSO;
 	dev->features = dev->hw_features;
 	SET_ETHTOOL_OPS(dev, &xenvif_ethtool_ops);
 
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 1b08d87..99343ca 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -571,10 +571,32 @@ static int connect_rings(struct backend_info *be)
 		val = 0;
 	vif->gso_prefix = !!val;
 
+	/* Before feature-ipv6-csum-offload was introduced, checksum offload
+	 * was turned on by a zero value in (or lack of)
+	 * feature-no-csum-offload. Therefore, for compatibility, assume
+	 * that if feature-ip-csum-offload is missing that IP (v4) offload
+	 * is turned on. If this is not the case then the flag will be
+	 * cleared when we subsequently sample feature-no-csum-offload.
+	 */
+	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-ip-csum-offload",
+			 "%d", &val) < 0)
+		val = 1;
+	vif->ip_csum = !!val;
+
+	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-ipv6-csum-offload",
+			 "%d", &val) < 0)
+		val = 0;
+	vif->ipv6_csum = !!val;
+
+	/* This is deprecated. Frontends should set IP v4 and v6 checksum
+	 * offload using feature-ip-csum-offload and
+	 * feature-ipv6-csum-offload respectively.
+	 */
 	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-no-csum-offload",
 			 "%d", &val) < 0)
 		val = 0;
-	vif->csum = !val;
+	if (val)
+		vif->ip_csum = 0;
 
 	/* Map the shared frame, irq etc. */
 	err = xenvif_connect(vif, tx_ring_ref, rx_ring_ref,
diff --git a/include/xen/interface/io/netif.h b/include/xen/interface/io/netif.h
index eb262e3..d9fb44739 100644
--- a/include/xen/interface/io/netif.h
+++ b/include/xen/interface/io/netif.h
@@ -51,6 +51,16 @@
  */
 
 /*
+ * "feature-no-csum-offload" was used to turn off IPv4 TCP/UDP checksum
+ * offload but is now deprecated. Two new feature flags should now be used
+ * to control checksum offload:
+ * "feature-ip-csum-offload" should be used to turn IPv4 TCP/UDP checksum
+ * offload on or off. If it is missing then the feature is assumed to be on.
+ * "feature-ipv6-csum-offload" should be used to turn IPv6 TCP/UDP checksum
+ * offload on or off. If it is missing then the feature is assumed to be off.
+ */
+
+/*
  * This is the 'wire' format for packets:
  *  Request 1: xen_netif_tx_request  -- XEN_NETTXF_* (any flags)
  * [Request 2: xen_netif_extra_info]    (only if request 1 has XEN_NETTXF_extra_info)
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH net-next v4 2/5] xen-netback: add support for IPv6 checksum offload from guest
From: Paul Durrant @ 2013-10-11 15:06 UTC (permalink / raw)
  To: xen-devel, netdev; +Cc: Paul Durrant, Wei Liu, David Vrabel, Ian Campbell
In-Reply-To: <1381503982-1418-1-git-send-email-paul.durrant@citrix.com>

For performance of VM to VM traffic on a single host it is better to avoid
calculation of TCP/UDP checksum in the sending frontend. To allow this this
patch adds the code necessary to set up partial checksum for IPv6 packets
and xenstore flag feature-ipv6-csum-offload to advertise that fact to
frontends.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
---
 drivers/net/xen-netback/netback.c |  233 ++++++++++++++++++++++++++++++-------
 drivers/net/xen-netback/xenbus.c  |    9 ++
 2 files changed, 203 insertions(+), 39 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index f3e591c..74c13b9 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -109,15 +109,10 @@ static inline unsigned long idx_to_kaddr(struct xenvif *vif,
 	return (unsigned long)pfn_to_kaddr(idx_to_pfn(vif, idx));
 }
 
-/*
- * This is the amount of packet we copy rather than map, so that the
- * guest can't fiddle with the contents of the headers while we do
- * packet processing on them (netfilter, routing, etc).
+/* This is a miniumum size for the linear area to avoid lots of
+ * calls to __pskb_pull_tail() as we set up checksum offsets.
  */
-#define PKT_PROT_LEN    (ETH_HLEN + \
-			 VLAN_HLEN + \
-			 sizeof(struct iphdr) + MAX_IPOPTLEN + \
-			 sizeof(struct tcphdr) + MAX_TCP_OPTION_SPACE)
+#define PKT_PROT_LEN 128
 
 static u16 frag_get_pending_idx(skb_frag_t *frag)
 {
@@ -1118,61 +1113,74 @@ static int xenvif_set_skb_gso(struct xenvif *vif,
 	return 0;
 }
 
-static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
+static inline void maybe_pull_tail(struct sk_buff *skb, unsigned int len)
+{
+	if (skb_is_nonlinear(skb) && skb_headlen(skb) < len) {
+		/* If we need to pullup then pullup to the max, so we
+		 * won't need to do it again.
+		 */
+		int target = min_t(int, skb->len, MAX_TCP_HEADER);
+		__pskb_pull_tail(skb, target - skb_headlen(skb));
+	}
+}
+
+static int checksum_setup_ip(struct xenvif *vif, struct sk_buff *skb,
+			     int recalculate_partial_csum)
 {
-	struct iphdr *iph;
+	struct iphdr *iph = (void *)skb->data;
+	unsigned int header_size;
+	unsigned int off;
 	int err = -EPROTO;
-	int recalculate_partial_csum = 0;
 
-	/*
-	 * A GSO SKB must be CHECKSUM_PARTIAL. However some buggy
-	 * peers can fail to set NETRXF_csum_blank when sending a GSO
-	 * frame. In this case force the SKB to CHECKSUM_PARTIAL and
-	 * recalculate the partial checksum.
-	 */
-	if (skb->ip_summed != CHECKSUM_PARTIAL && skb_is_gso(skb)) {
-		vif->rx_gso_checksum_fixup++;
-		skb->ip_summed = CHECKSUM_PARTIAL;
-		recalculate_partial_csum = 1;
-	}
+	off = sizeof(struct iphdr);
 
-	/* A non-CHECKSUM_PARTIAL SKB does not require setup. */
-	if (skb->ip_summed != CHECKSUM_PARTIAL)
-		return 0;
+	header_size = skb->network_header + off + MAX_IPOPTLEN;
+	maybe_pull_tail(skb, header_size);
 
-	if (skb->protocol != htons(ETH_P_IP))
-		goto out;
+	off = iph->ihl * 4;
 
-	iph = (void *)skb->data;
 	switch (iph->protocol) {
 	case IPPROTO_TCP:
-		if (!skb_partial_csum_set(skb, 4 * iph->ihl,
+		if (!skb_partial_csum_set(skb, off,
 					  offsetof(struct tcphdr, check)))
 			goto out;
 
 		if (recalculate_partial_csum) {
 			struct tcphdr *tcph = tcp_hdr(skb);
+
+			header_size = skb->network_header +
+				off +
+				sizeof(struct tcphdr);
+			maybe_pull_tail(skb, header_size);
+
 			tcph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
-							 skb->len - iph->ihl*4,
+							 skb->len - off,
 							 IPPROTO_TCP, 0);
 		}
 		break;
 	case IPPROTO_UDP:
-		if (!skb_partial_csum_set(skb, 4 * iph->ihl,
+		if (!skb_partial_csum_set(skb, off,
 					  offsetof(struct udphdr, check)))
 			goto out;
 
 		if (recalculate_partial_csum) {
 			struct udphdr *udph = udp_hdr(skb);
+
+			header_size = skb->network_header +
+				off +
+				sizeof(struct udphdr);
+			maybe_pull_tail(skb, header_size);
+
 			udph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
-							 skb->len - iph->ihl*4,
+							 skb->len - off,
 							 IPPROTO_UDP, 0);
 		}
 		break;
 	default:
 		if (net_ratelimit())
 			netdev_err(vif->dev,
-				   "Attempting to checksum a non-TCP/UDP packet, dropping a protocol %d packet\n",
+				   "Attempting to checksum a non-TCP/UDP packet, "
+				   "dropping a protocol %d packet\n",
 				   iph->protocol);
 		goto out;
 	}
@@ -1183,6 +1191,158 @@ out:
 	return err;
 }
 
+static int checksum_setup_ipv6(struct xenvif *vif, struct sk_buff *skb,
+			       int recalculate_partial_csum)
+{
+	int err = -EPROTO;
+	struct ipv6hdr *ipv6h = (void *)skb->data;
+	u8 nexthdr;
+	unsigned int header_size;
+	unsigned int off;
+	bool fragment;
+	bool done;
+
+	done = false;
+
+	off = sizeof(struct ipv6hdr);
+
+	header_size = skb->network_header + off;
+	maybe_pull_tail(skb, header_size);
+
+	nexthdr = ipv6h->nexthdr;
+
+	while ((off <= sizeof(struct ipv6hdr) + ntohs(ipv6h->payload_len)) &&
+	       !done) {
+		switch (nexthdr) {
+		case IPPROTO_DSTOPTS:
+		case IPPROTO_HOPOPTS:
+		case IPPROTO_ROUTING: {
+			struct ipv6_opt_hdr *hp = (void *)(skb->data + off);
+
+			header_size = skb->network_header +
+				off +
+				sizeof(struct ipv6_opt_hdr);
+			maybe_pull_tail(skb, header_size);
+
+			nexthdr = hp->nexthdr;
+			off += ipv6_optlen(hp);
+			break;
+		}
+		case IPPROTO_AH: {
+			struct ip_auth_hdr *hp = (void *)(skb->data + off);
+
+			header_size = skb->network_header +
+				off +
+				sizeof(struct ip_auth_hdr);
+			maybe_pull_tail(skb, header_size);
+
+			nexthdr = hp->nexthdr;
+			off += (hp->hdrlen+2)<<2;
+			break;
+		}
+		case IPPROTO_FRAGMENT:
+			fragment = true;
+			/* fall through */
+		default:
+			done = true;
+			break;
+		}
+	}
+
+	if (!done) {
+		if (net_ratelimit())
+			netdev_err(vif->dev, "Failed to parse packet header\n");
+		goto out;
+	}
+
+	if (fragment) {
+		if (net_ratelimit())
+			netdev_err(vif->dev, "Packet is a fragment!\n");
+		goto out;
+	}
+
+	switch (nexthdr) {
+	case IPPROTO_TCP:
+		if (!skb_partial_csum_set(skb, off,
+					  offsetof(struct tcphdr, check)))
+			goto out;
+
+		if (recalculate_partial_csum) {
+			struct tcphdr *tcph = tcp_hdr(skb);
+
+			header_size = skb->network_header +
+				off +
+				sizeof(struct tcphdr);
+			maybe_pull_tail(skb, header_size);
+
+			tcph->check = ~csum_ipv6_magic(&ipv6h->saddr,
+						       &ipv6h->daddr,
+						       skb->len - off,
+						       IPPROTO_TCP, 0);
+		}
+		break;
+	case IPPROTO_UDP:
+		if (!skb_partial_csum_set(skb, off,
+					  offsetof(struct udphdr, check)))
+			goto out;
+
+		if (recalculate_partial_csum) {
+			struct udphdr *udph = udp_hdr(skb);
+
+			header_size = skb->network_header +
+				off +
+				sizeof(struct udphdr);
+			maybe_pull_tail(skb, header_size);
+
+			udph->check = ~csum_ipv6_magic(&ipv6h->saddr,
+						       &ipv6h->daddr,
+						       skb->len - off,
+						       IPPROTO_UDP, 0);
+		}
+		break;
+	default:
+		if (net_ratelimit())
+			netdev_err(vif->dev,
+				   "Attempting to checksum a non-TCP/UDP packet, "
+				   "dropping a protocol %d packet\n",
+				   nexthdr);
+		goto out;
+	}
+
+	err = 0;
+
+out:
+	return err;
+}
+
+static int checksum_setup(struct xenvif *vif, struct sk_buff *skb)
+{
+	int err = -EPROTO;
+	int recalculate_partial_csum = 0;
+
+	/* A GSO SKB must be CHECKSUM_PARTIAL. However some buggy
+	 * peers can fail to set NETRXF_csum_blank when sending a GSO
+	 * frame. In this case force the SKB to CHECKSUM_PARTIAL and
+	 * recalculate the partial checksum.
+	 */
+	if (skb->ip_summed != CHECKSUM_PARTIAL && skb_is_gso(skb)) {
+		vif->rx_gso_checksum_fixup++;
+		skb->ip_summed = CHECKSUM_PARTIAL;
+		recalculate_partial_csum = 1;
+	}
+
+	/* A non-CHECKSUM_PARTIAL SKB does not require setup. */
+	if (skb->ip_summed != CHECKSUM_PARTIAL)
+		return 0;
+
+	if (skb->protocol == htons(ETH_P_IP))
+		err = checksum_setup_ip(vif, skb, recalculate_partial_csum);
+	else if (skb->protocol == htons(ETH_P_IPV6))
+		err = checksum_setup_ipv6(vif, skb, recalculate_partial_csum);
+
+	return err;
+}
+
 static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
 {
 	unsigned long now = jiffies;
@@ -1428,12 +1588,7 @@ static int xenvif_tx_submit(struct xenvif *vif, int budget)
 
 		xenvif_fill_frags(vif, skb);
 
-		/*
-		 * If the initial fragment was < PKT_PROT_LEN then
-		 * pull through some bytes from the other fragments to
-		 * increase the linear region to PKT_PROT_LEN bytes.
-		 */
-		if (skb_headlen(skb) < PKT_PROT_LEN && skb_is_nonlinear(skb)) {
+		if (skb_is_nonlinear(skb) && skb_headlen(skb) < PKT_PROT_LEN) {
 			int target = min_t(int, skb->len, PKT_PROT_LEN);
 			__pskb_pull_tail(skb, target - skb_headlen(skb));
 		}
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 99343ca..9c9b37d 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -105,6 +105,15 @@ static int netback_probe(struct xenbus_device *dev,
 			goto abort_transaction;
 		}
 
+		/* We support partial checksum setup for IPv6 packets */
+		err = xenbus_printf(xbt, dev->nodename,
+				    "feature-ipv6-csum-offload",
+				    "%d", 1);
+		if (err) {
+			message = "writing feature-ipv6-csum-offload";
+			goto abort_transaction;
+		}
+
 		/* We support rx-copy path. */
 		err = xenbus_printf(xbt, dev->nodename,
 				    "feature-rx-copy", "%d", 1);
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH net-next v4 3/5] xen-netback: Unconditionally set NETIF_F_RXCSUM
From: Paul Durrant @ 2013-10-11 15:06 UTC (permalink / raw)
  To: xen-devel, netdev; +Cc: Paul Durrant, Wei Liu, David Vrabel, Ian Campbell
In-Reply-To: <1381503982-1418-1-git-send-email-paul.durrant@citrix.com>

There is no mechanism to insist that a guest always generates a packet
with good checksum (at least for IPv4) so we must handle checksum
offloading from the guest and hence should set NETIF_F_RXCSUM.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
---
 drivers/net/xen-netback/interface.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 8e92783..cb0d8ea 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -321,7 +321,7 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	dev->hw_features = NETIF_F_SG |
 		NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
 		NETIF_F_TSO;
-	dev->features = dev->hw_features;
+	dev->features = dev->hw_features | NETIF_F_RXCSUM;
 	SET_ETHTOOL_OPS(dev, &xenvif_ethtool_ops);
 
 	dev->tx_queue_len = XENVIF_QUEUE_LENGTH;
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH net-next v4 4/5] xen-netback: handle IPv6 TCP GSO packets from the guest
From: Paul Durrant @ 2013-10-11 15:06 UTC (permalink / raw)
  To: xen-devel, netdev; +Cc: Paul Durrant, Wei Liu, David Vrabel, Ian Campbell
In-Reply-To: <1381503982-1418-1-git-send-email-paul.durrant@citrix.com>

This patch adds a xenstore feature flag, festure-gso-tcpv6, to advertise
that netback can handle IPv6 TCP GSO packets. It creates SKB_GSO_TCPV6 skbs
if the frontend passes an extra segment with the new type
XEN_NETIF_GSO_TYPE_TCPV6 added to netif.h.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
---
 drivers/net/xen-netback/netback.c |   11 ++++++++---
 drivers/net/xen-netback/xenbus.c  |    7 +++++++
 include/xen/interface/io/netif.h  |   10 +++++++++-
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 74c13b9..65f04e7 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1096,15 +1096,20 @@ static int xenvif_set_skb_gso(struct xenvif *vif,
 		return -EINVAL;
 	}
 
-	/* Currently only TCPv4 S.O. is supported. */
-	if (gso->u.gso.type != XEN_NETIF_GSO_TYPE_TCPV4) {
+	switch (gso->u.gso.type) {
+	case XEN_NETIF_GSO_TYPE_TCPV4:
+		skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
+		break;
+	case XEN_NETIF_GSO_TYPE_TCPV6:
+		skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
+		break;
+	default:
 		netdev_err(vif->dev, "Bad GSO type %d.\n", gso->u.gso.type);
 		xenvif_fatal_tx_err(vif);
 		return -EINVAL;
 	}
 
 	skb_shinfo(skb)->gso_size = gso->u.gso.size;
-	skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
 
 	/* Header must be checked, and gso_segs computed. */
 	skb_shinfo(skb)->gso_type |= SKB_GSO_DODGY;
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 9c9b37d..7e4dcc9 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -105,6 +105,13 @@ static int netback_probe(struct xenbus_device *dev,
 			goto abort_transaction;
 		}
 
+		err = xenbus_printf(xbt, dev->nodename, "feature-gso-tcpv6",
+				    "%d", sg);
+		if (err) {
+			message = "writing feature-gso-tcpv6";
+			goto abort_transaction;
+		}
+
 		/* We support partial checksum setup for IPv6 packets */
 		err = xenbus_printf(xbt, dev->nodename,
 				    "feature-ipv6-csum-offload",
diff --git a/include/xen/interface/io/netif.h b/include/xen/interface/io/netif.h
index d9fb44739..d7dd8d7 100644
--- a/include/xen/interface/io/netif.h
+++ b/include/xen/interface/io/netif.h
@@ -61,6 +61,13 @@
  */
 
 /*
+ * "feature-gso-tcpv4" and "feature-gso-tcpv6" advertise the capability to
+ * handle large TCP packets (in IPv4 or IPv6 form respectively). Neither
+ * frontends nor backends are assumed to be capable unless the flags are
+ * present.
+ */
+
+/*
  * This is the 'wire' format for packets:
  *  Request 1: xen_netif_tx_request  -- XEN_NETTXF_* (any flags)
  * [Request 2: xen_netif_extra_info]    (only if request 1 has XEN_NETTXF_extra_info)
@@ -105,8 +112,9 @@ struct xen_netif_tx_request {
 #define _XEN_NETIF_EXTRA_FLAG_MORE	(0)
 #define  XEN_NETIF_EXTRA_FLAG_MORE	(1U<<_XEN_NETIF_EXTRA_FLAG_MORE)
 
-/* GSO types - only TCPv4 currently supported. */
+/* GSO types */
 #define XEN_NETIF_GSO_TYPE_TCPV4	(1)
+#define XEN_NETIF_GSO_TYPE_TCPV6	(2)
 
 /*
  * This structure needs to fit within both netif_tx_request and
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH net-next v4 0/5] xen-netback: IPv6 offload support
From: Paul Durrant @ 2013-10-11 15:06 UTC (permalink / raw)
  To: xen-devel, netdev

This patch series adds support for checksum and large packet offloads into
xen-netback.
Testing has mainly been done using the Microsoft network hardware
certification suite running in Server 2008R2 VMs with Citrix PV frontends.

v2:
- Fixed Wei's email address in Cc lines

v3:
- Responded to Wei's comments:
 - netif.h now updated with comments and a definition of
   XEN_NETIF_GSO_TYPE_NONE.
 - limited number of pullups
- Responded to Annie's comments:
 - New GSO_BIT macro

v4:
- Responded to more of Wei's comments
- Remove parsing of IPv6 fragment header and added warning

^ permalink raw reply

* [PATCH net-next v4 5/5] xen-netback: enable IPv6 TCP GSO to the guest
From: Paul Durrant @ 2013-10-11 15:06 UTC (permalink / raw)
  To: xen-devel, netdev; +Cc: Paul Durrant, Wei Liu, David Vrabel, Ian Campbell
In-Reply-To: <1381503982-1418-1-git-send-email-paul.durrant@citrix.com>

This patch adds code to handle SKB_GSO_TCPV6 skbs and construct appropriate
extra or prefix segments to pass the large packet to the frontend. New
xenstore flags, feature-gso-tcpv6 and feature-gso-tcpv6-prefix, are sampled
to determine if the frontend is capable of handling such packets.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
---
 drivers/net/xen-netback/common.h    |    9 +++++--
 drivers/net/xen-netback/interface.c |    6 +++--
 drivers/net/xen-netback/netback.c   |   48 +++++++++++++++++++++++++++--------
 drivers/net/xen-netback/xenbus.c    |   29 +++++++++++++++++++--
 include/xen/interface/io/netif.h    |    1 +
 5 files changed, 77 insertions(+), 16 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index b4a9a3c..55b8dec 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -87,9 +87,13 @@ struct pending_tx_info {
 struct xenvif_rx_meta {
 	int id;
 	int size;
+	int gso_type;
 	int gso_size;
 };
 
+#define GSO_BIT(type) \
+	(1 << XEN_NETIF_GSO_TYPE_ ## type)
+
 /* Discriminate from any valid pending_idx value. */
 #define INVALID_PENDING_IDX 0xFFFF
 
@@ -150,9 +154,10 @@ struct xenvif {
 	u8               fe_dev_addr[6];
 
 	/* Frontend feature information. */
+	int gso_mask;
+	int gso_prefix_mask;
+
 	u8 can_sg:1;
-	u8 gso:1;
-	u8 gso_prefix:1;
 	u8 ip_csum:1;
 	u8 ipv6_csum:1;
 
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index cb0d8ea..e4aa267 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -214,8 +214,10 @@ static netdev_features_t xenvif_fix_features(struct net_device *dev,
 
 	if (!vif->can_sg)
 		features &= ~NETIF_F_SG;
-	if (!vif->gso && !vif->gso_prefix)
+	if (~(vif->gso_mask | vif->gso_prefix_mask) & GSO_BIT(TCPV4))
 		features &= ~NETIF_F_TSO;
+	if (~(vif->gso_mask | vif->gso_prefix_mask) & GSO_BIT(TCPV6))
+		features &= ~NETIF_F_TSO6;
 	if (!vif->ip_csum)
 		features &= ~NETIF_F_IP_CSUM;
 	if (!vif->ipv6_csum)
@@ -320,7 +322,7 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	dev->netdev_ops	= &xenvif_netdev_ops;
 	dev->hw_features = NETIF_F_SG |
 		NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
-		NETIF_F_TSO;
+		NETIF_F_TSO | NETIF_F_TSO6;
 	dev->features = dev->hw_features | NETIF_F_RXCSUM;
 	SET_ETHTOOL_OPS(dev, &xenvif_ethtool_ops);
 
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 65f04e7..b19d407 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -140,7 +140,7 @@ static int max_required_rx_slots(struct xenvif *vif)
 	int max = DIV_ROUND_UP(vif->dev->mtu, PAGE_SIZE);
 
 	/* XXX FIXME: RX path dependent on MAX_SKB_FRAGS */
-	if (vif->can_sg || vif->gso || vif->gso_prefix)
+	if (vif->can_sg || vif->gso_mask || vif->gso_prefix_mask)
 		max += MAX_SKB_FRAGS + 1; /* extra_info + frags */
 
 	return max;
@@ -312,6 +312,7 @@ static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif *vif,
 	req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
 
 	meta = npo->meta + npo->meta_prod++;
+	meta->gso_type = XEN_NETIF_GSO_TYPE_NONE;
 	meta->gso_size = 0;
 	meta->size = 0;
 	meta->id = req->id;
@@ -334,6 +335,7 @@ static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
 	struct gnttab_copy *copy_gop;
 	struct xenvif_rx_meta *meta;
 	unsigned long bytes;
+	int gso_type;
 
 	/* Data must not cross a page boundary. */
 	BUG_ON(size + offset > PAGE_SIZE<<compound_order(page));
@@ -392,7 +394,14 @@ static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
 		}
 
 		/* Leave a gap for the GSO descriptor. */
-		if (*head && skb_shinfo(skb)->gso_size && !vif->gso_prefix)
+		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
+			gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
+		else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
+			gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
+		else
+			gso_type = XEN_NETIF_GSO_TYPE_NONE;
+
+		if (*head && ((1 << gso_type) & vif->gso_mask))
 			vif->rx.req_cons++;
 
 		*head = 0; /* There must be something in this buffer now. */
@@ -423,14 +432,28 @@ static int xenvif_gop_skb(struct sk_buff *skb,
 	unsigned char *data;
 	int head = 1;
 	int old_meta_prod;
+	int gso_type;
+	int gso_size;
 
 	old_meta_prod = npo->meta_prod;
 
+	if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) {
+		gso_type = XEN_NETIF_GSO_TYPE_TCPV4;
+		gso_size = skb_shinfo(skb)->gso_size;
+	} else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6) {
+		gso_type = XEN_NETIF_GSO_TYPE_TCPV6;
+		gso_size = skb_shinfo(skb)->gso_size;
+	} else {
+		gso_type = XEN_NETIF_GSO_TYPE_NONE;
+		gso_size = 0;
+	}
+
 	/* Set up a GSO prefix descriptor, if necessary */
-	if (skb_shinfo(skb)->gso_size && vif->gso_prefix) {
+	if ((1 << skb_shinfo(skb)->gso_type) & vif->gso_prefix_mask) {
 		req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
 		meta = npo->meta + npo->meta_prod++;
-		meta->gso_size = skb_shinfo(skb)->gso_size;
+		meta->gso_type = gso_type;
+		meta->gso_size = gso_size;
 		meta->size = 0;
 		meta->id = req->id;
 	}
@@ -438,10 +461,13 @@ static int xenvif_gop_skb(struct sk_buff *skb,
 	req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
 	meta = npo->meta + npo->meta_prod++;
 
-	if (!vif->gso_prefix)
-		meta->gso_size = skb_shinfo(skb)->gso_size;
-	else
+	if ((1 << gso_type) & vif->gso_mask) {
+		meta->gso_type = gso_type;
+		meta->gso_size = gso_size;
+	} else {
+		meta->gso_type = XEN_NETIF_GSO_TYPE_NONE;
 		meta->gso_size = 0;
+	}
 
 	meta->size = 0;
 	meta->id = req->id;
@@ -587,7 +613,8 @@ void xenvif_rx_action(struct xenvif *vif)
 
 		vif = netdev_priv(skb->dev);
 
-		if (vif->meta[npo.meta_cons].gso_size && vif->gso_prefix) {
+		if ((1 << vif->meta[npo.meta_cons].gso_type) &
+		    vif->gso_prefix_mask) {
 			resp = RING_GET_RESPONSE(&vif->rx,
 						 vif->rx.rsp_prod_pvt++);
 
@@ -624,7 +651,8 @@ void xenvif_rx_action(struct xenvif *vif)
 					vif->meta[npo.meta_cons].size,
 					flags);
 
-		if (vif->meta[npo.meta_cons].gso_size && !vif->gso_prefix) {
+		if ((1 << vif->meta[npo.meta_cons].gso_type) &
+		    vif->gso_mask) {
 			struct xen_netif_extra_info *gso =
 				(struct xen_netif_extra_info *)
 				RING_GET_RESPONSE(&vif->rx,
@@ -632,8 +660,8 @@ void xenvif_rx_action(struct xenvif *vif)
 
 			resp->flags |= XEN_NETRXF_extra_info;
 
+			gso->u.gso.type = vif->meta[npo.meta_cons].gso_type;
 			gso->u.gso.size = vif->meta[npo.meta_cons].gso_size;
-			gso->u.gso.type = XEN_NETIF_GSO_TYPE_TCPV4;
 			gso->u.gso.pad = 0;
 			gso->u.gso.features = 0;
 
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 7e4dcc9..6a005f1 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -577,15 +577,40 @@ static int connect_rings(struct backend_info *be)
 		val = 0;
 	vif->can_sg = !!val;
 
+	vif->gso_mask = 0;
+	vif->gso_prefix_mask = 0;
+
 	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv4",
 			 "%d", &val) < 0)
 		val = 0;
-	vif->gso = !!val;
+	if (val)
+		vif->gso_mask |= GSO_BIT(TCPV4);
 
 	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv4-prefix",
 			 "%d", &val) < 0)
 		val = 0;
-	vif->gso_prefix = !!val;
+	if (val)
+		vif->gso_prefix_mask |= GSO_BIT(TCPV4);
+
+	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv6",
+			 "%d", &val) < 0)
+		val = 0;
+	if (val)
+		vif->gso_mask |= GSO_BIT(TCPV6);
+
+	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv6-prefix",
+			 "%d", &val) < 0)
+		val = 0;
+	if (val)
+		vif->gso_prefix_mask |= GSO_BIT(TCPV6);
+
+	if (vif->gso_mask & vif->gso_prefix_mask) {
+		xenbus_dev_fatal(dev, err,
+				 "%s: gso and gso prefix flags are not "
+				 "mutually exclusive",
+				 dev->otherend);
+		return -EOPNOTSUPP;
+	}
 
 	/* Before feature-ipv6-csum-offload was introduced, checksum offload
 	 * was turned on by a zero value in (or lack of)
diff --git a/include/xen/interface/io/netif.h b/include/xen/interface/io/netif.h
index d7dd8d7..be341f9 100644
--- a/include/xen/interface/io/netif.h
+++ b/include/xen/interface/io/netif.h
@@ -113,6 +113,7 @@ struct xen_netif_tx_request {
 #define  XEN_NETIF_EXTRA_FLAG_MORE	(1U<<_XEN_NETIF_EXTRA_FLAG_MORE)
 
 /* GSO types */
+#define XEN_NETIF_GSO_TYPE_NONE		(0)
 #define XEN_NETIF_GSO_TYPE_TCPV4	(1)
 #define XEN_NETIF_GSO_TYPE_TCPV6	(2)
 
-- 
1.7.10.4

^ permalink raw reply related

* Re: [PATCH net] bridge: allow receiption on disabled port
From: Stephen Hemminger @ 2013-10-11 15:10 UTC (permalink / raw)
  To: Felix Fietkau; +Cc: netdev
In-Reply-To: <5257D067.2050607@openwrt.org>

On Fri, 11 Oct 2013 12:18:15 +0200
Felix Fietkau <nbd@openwrt.org> wrote:

> On 2013-10-11 4:35 AM, Stephen Hemminger wrote:
> > This is what I was thinking would be better.
> > 
> > Don't want these packets leaking into PRE_ROUTEING chain or have
> > any chance to get flooded out other ports.
> > 
> > Compile tested only...
> > 
> > I could use another goto instead but that becomes spaghetti and
> > never like to jump back into a block.
> [...]
> >  forward:
> >  	switch (p->state) {
> > +	case BR_STATE_DISABLED:
> > +		if (!ether_addr_equal(p->br->dev->dev_addr, dest))
> > +			goto drop;
> Checking against the bridge device address isn't enough, WPA EAPOL
> packets are addressed to the wifi device MAC address.

Correct, this should be skb->dev->dev_addr which matchs against
the MAC address that frame arrived on.

^ permalink raw reply

* [PATCH net-next] inet_diag: use sock_gen_put()
From: Eric Dumazet @ 2013-10-11 15:54 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

From: Eric Dumazet <edumazet@google.com>

TCP listener refactoring, part 6 :

Use sock_gen_put() from inet_diag_dump_one_icsk() for future
SYN_RECV support.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/inet_diag.c |    9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 41e1c3e..56a964a 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -336,12 +336,9 @@ int inet_diag_dump_one_icsk(struct inet_hashinfo *hashinfo, struct sk_buff *in_s
 		err = 0;
 
 out:
-	if (sk) {
-		if (sk->sk_state == TCP_TIME_WAIT)
-			inet_twsk_put((struct inet_timewait_sock *)sk);
-		else
-			sock_put(sk);
-	}
+	if (sk)
+		sock_gen_put(sk);
+
 out_nosk:
 	return err;
 }

^ permalink raw reply related

* [PATCH net-next] netfilter: xt_socket: use sock_gen_put()
From: Eric Dumazet @ 2013-10-11 16:03 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Pablo Neira Ayuso, netfilter-devel

From: Eric Dumazet <edumazet@google.com>

TCP listener refactoring, part 7 :

Use sock_gen_put() instead of xt_socket_put_sk() for future
SYN_RECV support.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/xt_socket.c |   13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/net/netfilter/xt_socket.c b/net/netfilter/xt_socket.c
index 3dd0e37..1ba6793 100644
--- a/net/netfilter/xt_socket.c
+++ b/net/netfilter/xt_socket.c
@@ -35,15 +35,6 @@
 #include <net/netfilter/nf_conntrack.h>
 #endif
 
-static void
-xt_socket_put_sk(struct sock *sk)
-{
-	if (sk->sk_state == TCP_TIME_WAIT)
-		inet_twsk_put(inet_twsk(sk));
-	else
-		sock_put(sk);
-}
-
 static int
 extract_icmp4_fields(const struct sk_buff *skb,
 		    u8 *protocol,
@@ -216,7 +207,7 @@ socket_match(const struct sk_buff *skb, struct xt_action_param *par,
 					inet_twsk(sk)->tw_transparent));
 
 		if (sk != skb->sk)
-			xt_socket_put_sk(sk);
+			sock_gen_put(sk);
 
 		if (wildcard || !transparent)
 			sk = NULL;
@@ -381,7 +372,7 @@ socket_mt6_v1_v2(const struct sk_buff *skb, struct xt_action_param *par)
 					inet_twsk(sk)->tw_transparent));
 
 		if (sk != skb->sk)
-			xt_socket_put_sk(sk);
+			sock_gen_put(sk);
 
 		if (wildcard || !transparent)
 			sk = NULL;

^ permalink raw reply related

* SIT Tunnel: Generate an ICMPV6 message and send them back to the original IPV6 node
From: Oussama Ghorbel @ 2013-10-11 16:36 UTC (permalink / raw)
  To: netdev, linux-kernel

In RFC 4213, section 3.4, page 11, it says:
If sufficient data bytes from the offending packet are available, the
encapsulator MAY extract the encapsulated IPv6 packet and use it to
generate an ICMPv6 message directed back to the originating IPv6  node.

Currently, linux does not do that.
More info on the RFC: http://tools.ietf.org/html/rfc4213
Issue described also in  https://bugzilla.kernel.org/show_bug.cgi?id=49761

I'm considering to implement this feature.
Will be interesting to do it?

Regards,
Oussama

^ permalink raw reply

* Re: [PATCHv2 1/2 ] {xfrm, sctp} Stick to software crc32 even if hardware is capable of that
From: Vlad Yasevich @ 2013-10-11 17:12 UTC (permalink / raw)
  To: Fan Du, Neil Horman, steffen.klassert, davem; +Cc: netdev
In-Reply-To: <5258055F.5060703@gmail.com>

On 10/11/2013 10:04 AM, Vlad Yasevich wrote:
> On 10/11/2013 03:05 AM, Fan Du wrote:
>>
>>
>> On 2013年10月10日 21:11, Neil Horman wrote:
>>> On Thu, Oct 10, 2013 at 01:51:36PM +0800, Fan Du wrote:
>>>> igb/ixgbe have hardware sctp checksum support, when this feature is
>>>> enabled
>>>> and also IPsec is armed to protect sctp traffic, ugly things
>>>> happened as
>>>> xfrm_output checks CHECKSUM_PARTIAL to do check sum operation(sum
>>>> every thing
>>>> up and pack the 16bits result in the checksum field). The result is
>>>> fail
>>>> establishment of sctp communication.
>>>>
>>> Shouldn't this be fixed in the xfrm code then?  E.g. check the device
>>> features
>>> for SCTP checksum offloading and and skip the checksum during xfrm
>>> output if its
>>> available?
>>>
>>> Or am I missing something?
>>> Neil
>>>
>>>
>>
>>
>>  From 014276de0877f11d46e1704114a7d91f19221a63 Mon Sep 17 00:00:00 2001
>> From: Fan Du <fan.du@windriver.com>
>> Date: Fri, 11 Oct 2013 14:24:33 +0800
>> Subject: [PATCH 1/2] {xfrm, sctp} Stick to software crc32 even if
>> hardware is
>>   capable of that
>>
>> igb/ixgbe have hardware sctp checksum support, when this feature is
>> enabled
>> and also IPsec is armed to protect sctp traffic, ugly things happened as
>> xfrm_output checks CHECKSUM_PARTIAL to do check sum operation(sum every
>> thing
>> up and pack the 16bits result in the checksum field). The result is fail
>> establishment of sctp communication.
>>
>> Signed-off-by: Fan Du <fan.du@windriver.com>
>> ---
>> v2:
>>    Leave ip_summed as CHECKSUM_PARTIAL as before, the second patch will
>> fix this.
>>
>> ---
>>   net/sctp/output.c |   14 +++++++++++++-
>>   1 file changed, 13 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/sctp/output.c b/net/sctp/output.c
>> index 0ac3a65..6de6402 100644
>> --- a/net/sctp/output.c
>> +++ b/net/sctp/output.c
>> @@ -372,6 +372,16 @@ static void sctp_packet_set_owner_w(struct sk_buff
>> *skb, struct sock *sk)
>>       atomic_inc(&sk->sk_wmem_alloc);
>>   }
>>
>> +static int is_xfrm_armed(struct dst_entry *dst)
>> +{
>> +#ifdef CONFIG_XFRM
>> +    /* If dst->xfrm is valid, this skb needs to be transformed */
>> +    return dst->xfrm != NULL;
>> +#else
>> +    return 0;
>> +#endif
>> +}
>> +
>
> I would really prefer to have an accessor function to dst->xfrm, but
> I see that everyone codes it inside the #ifdef.  Gack.
>
>>   /* All packets are sent to the network through this function from
>>    * sctp_outq_tail().
>>    *
>> @@ -536,7 +546,9 @@ int sctp_packet_transmit(struct sctp_packet *packet)
>>        * by CRC32-C as described in <draft-ietf-tsvwg-sctpcsum-02.txt>.
>>        */
>>       if (!sctp_checksum_disable) {
>> -        if (!(dst->dev->features & NETIF_F_SCTP_CSUM)) {
>> +        if ((!(dst->dev->features & NETIF_F_SCTP_CSUM)) ||
>> +            is_xfrm_armed(dst)) {
>> +
>>               __u32 crc32 = sctp_start_cksum((__u8 *)sh, cksum_buf_len);
>>
>>               /* 3) Put the resultant value into the checksum field in
>> the
>
> Acked-by: Vlad Yasevich <vyasevich@gmail.com>

This patch doesn't seem to apply to net.git.


-vlad

^ permalink raw reply

* Re: Peak TCP performance
From: Rick Jones @ 2013-10-11 17:46 UTC (permalink / raw)
  To: Kyle Hubert, Eric Dumazet; +Cc: netdev
In-Reply-To: <CAJoZ4U2CRaF+kvgXhbq3DNkMdYbouoKdU-xAUAkuhD5LbC7X8Q@mail.gmail.com>

On 10/10/2013 09:21 PM, Kyle Hubert wrote:
> On Thu, Oct 10, 2013 at 11:44 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>> Also, my copy of ethtool does not recognize tx-nocache-copy. However,
>>> I do have control over the net device. Is there something there I can
>>> set, or is tx-nocache-copy also a new feature? I'll start digging.
>>>
>>
>> nocache-copy was added in 3.0, but I do find its not a gain for current
>> cpus.
>>
>> You could get a fresh copy of ethtool sources :
>>
>> git clone git://git.kernel.org/pub/scm/network/ethtool/ethtool.git
>> cd ethtool
>> ./autogen.sh  ...
>
> That did the trick. Thanks for the help! Is there somewhere I can read
> up on this feature? A lot of the netdev features are opaque to me.
> Also, I can set NETIF_F_NOCACHE_COPY in the netdev->features to set
> this by default, yes?
>
> This at least mirrors the performance improvement that I see when
> forwarding, however I still see reserved CPU time. Is there a way to
> push it even farther?

Thought I would point-out that unless you do concrete steps to make it 
behave otherwise, netperf will constantly present the same set of 
cache-clean buffers to the transport.  The size of those buffers will be 
determined by some heuristics and will depend on the socket buffer size 
at the time the data socket is create, which itself will depend on 
whether or not you have used a test-specific -s option.  And the 
test-specific -m option will come into play.

If I am recalling correctly, the number of buffers will be one more than:

initialSO_SNDBUF / send_size

though you can control that with global -W option.

happy benchmarking,

rick jones

^ permalink raw reply

* Re: [PATCH 1/1] net: fix cipso packet validation when !NETLABEL
From: David Miller @ 2013-10-11 17:53 UTC (permalink / raw)
  To: seif; +Cc: paul, netdev, thomas.petazzoni, dima
In-Reply-To: <0DB595A2CB707F458400BE9663B6A72269C004777F@SC-VEXCH2.marvell.com>

Sorry, no HTML encoded email is allowed on the list.

Turn off all encodings and send plain ASCII text to this mailing
list.

Thank you.

^ permalink raw reply

* [PATCH 1/1] net: fix cipso packet validation when !NETLABEL
From: Seif Mazareeb @ 2013-10-11 17:58 UTC (permalink / raw)
  To: davem@davemloft.net, paul@paul-moore.com, netdev@vger.kernel.org
  Cc: thomas.petazzoni@free-electrons.com, Dmitri Epshtein

When CONFIG_NETLABEL is disabled, the cipso_v4_validate() function could loop
forever in the main loop if opt[opt_iter +1] == 0, this will causing a kernel
crash in an SMP system, since the CPU executing this function will
stall /not respond to IPIs.

This problem can be reproduced by running the IP Stack Integrity Checker
(http://isic.sourceforge.net) using the following command on a Linux machine
connected to DUT:

"icmpsic -s rand -d <DUT IP address> -r 123456"
wait (1-2 min)

Signed-off-by: Seif Mazareeb <seif@marvell.com>
---
 include/net/cipso_ipv4.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/net/cipso_ipv4.h b/include/net/cipso_ipv4.h
index a7a683e..047f1f6 100644
--- a/include/net/cipso_ipv4.h
+++ b/include/net/cipso_ipv4.h
@@ -306,6 +306,10 @@ static inline int cipso_v4_validate(const struct sk_buff *skb,
                        err_offset = opt_iter + 1;
                        goto out;
                }
+
+               if (opt[opt_iter + 1] == 0)
+                       break;
+
                opt_iter += opt[opt_iter + 1];
        }

--
1.8.1.2

^ permalink raw reply related

* Re: [PATCH net-next] openvswitch: fix vport-netdev unregister
From: Jesse Gross @ 2013-10-11 18:11 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Pravin Shelar, David S. Miller, Jiri Pirko, dev@openvswitch.org,
	netdev
In-Reply-To: <CAMEtUuzf5UbZ_87DU+vSMbKzxLQgmuEfrf3NXQSz7vSF8LQTzw@mail.gmail.com>

On Thu, Oct 10, 2013 at 9:48 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> On Thu, Oct 10, 2013 at 8:56 PM, Jesse Gross <jesse@nicira.com> wrote:
>> However, the check dev->reg_state in netdev_destroy() looks racy to
>> me, as it could already be in NETREG_UNREGISTERED even if we already
>> processed this device.
>
> you mean that netdev_destroy() will see reg_state == netreg_unregistered,
> while dp_device_event() didn't see reg_state == netreg_unregistering yet?
> or dp_device_event() saw it, proceeded to do unlink and
> netdev_destroy() ran in parallel?
> well, that's why reg_state == netreg_unregistering check in netdev_destroy()
> is done with rtnl_lock() held.
> reg_state cannot go into netreg_unregistered state skipping
> netreg_unregistering and notifier.
> therefore I don't think it's racy.
>
> In ovs_dp_notify_wq() you're checking for both unregistering and
> unregistered and that makes
> sense, since workq can run after unregistering notifier called and
> netdev_run_todo()
> already changed the state to unregistered.
> But here it's not the case.

ovs_dp_notify_wq() calls ovs_dp_detach_port(), which indirectly calls
netdev_destroy() so it seems like it actually is the same case to me.

^ permalink raw reply

* Re: IPv6 kernel warning
From: dormando @ 2013-10-11 18:15 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Michele Baldessari, Russell King - ARM Linux, netdev,
	Neal Cardwell, Nandita Dukkipati
In-Reply-To: <alpine.DEB.2.02.1310091147340.5669@dtop>



On Wed, 9 Oct 2013, dormando wrote:

> > > >> >>
> > > >> >
> > > >> > Should I apply this and see if the warning stops?
> > Hi Dormando,
> >
> > Could you try this patch to make sure it fixes the warning (with
> > sysctl net.ipv4.early_retrans=3)?
>
> It's now running on one machine, with early_retrans=3. Will have to give
> it 24 hours to confirm.
>

Almost 48 hours, early_retrans=3, no warnings! (or crashes...)

Good catch :)

^ permalink raw reply

* Re: [PATCH net] netem: update backlog after drop
From: David Miller @ 2013-10-11 18:30 UTC (permalink / raw)
  To: stephen; +Cc: eric.dumazet, netdev
In-Reply-To: <20131006151533.52988624@nehalam.linuxnetplumber.net>

From: Stephen Hemminger <stephen@networkplumber.org>
Date: Sun, 6 Oct 2013 15:15:33 -0700

> When packet is dropped from rb-tree netem the backlog statistic should
> also be updated.
> 
> Reported-by: Сергеев Сергей <adron@yapic.net>
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> 
> ---
> Should be reviewed by Eric (he added the rb-tree stuff), and added to stable
> as well.

Eric please review this patch, thanks.

^ permalink raw reply

* Re: [PATCH net] netem: update backlog after drop
From: Eric Dumazet @ 2013-10-11 18:38 UTC (permalink / raw)
  To: David Miller; +Cc: stephen, netdev
In-Reply-To: <20131011.143018.1801127237009991334.davem@davemloft.net>

On Fri, 2013-10-11 at 14:30 -0400, David Miller wrote:
> From: Stephen Hemminger <stephen@networkplumber.org>
> Date: Sun, 6 Oct 2013 15:15:33 -0700
> 
> > When packet is dropped from rb-tree netem the backlog statistic should
> > also be updated.
> > 
> > Reported-by: Сергеев Сергей <adron@yapic.net>
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > 
> > ---
> > Should be reviewed by Eric (he added the rb-tree stuff), and added to stable
> > as well.
> 
> Eric please review this patch, thanks.

It somehow escaped from my radar ;)

Acked-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox