Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next 2/2] net: initialize hashrnd in flow_dissector with net_get_random_once
From: David Miller @ 2013-10-25 23:04 UTC (permalink / raw)
  To: hannes; +Cc: netdev, edumazet
In-Reply-To: <20131023111219.GA31531@order.stressinduktion.org>

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Wed, 23 Oct 2013 13:12:19 +0200

> We also can defer the initialization of hashrnd in flow_dissector
> to its first use. Since net_get_random_once is irqsave now we don't
> have to audit the call paths if one of this functions get called by an
> interrupt handler.
> 
> Cc: David S. Miller <davem@davemloft.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] net: add missing dev_put() in __netdev_adjacent_dev_insert
From: David Miller @ 2013-10-25 23:04 UTC (permalink / raw)
  To: nikolay; +Cc: netdev, vfalico
In-Reply-To: <1382534936-23080-1-git-send-email-nikolay@redhat.com>

From: Nikolay Aleksandrov <nikolay@redhat.com>
Date: Wed, 23 Oct 2013 15:28:56 +0200

> I think that a dev_put() is needed in the error path to preserve the
> proper dev refcount.
> 
> CC: Veaceslav Falico <vfalico@redhat.com>
> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>

Applied.

^ permalink raw reply

* Re: [PATCH v2 net-next] fix rtnl notification in atomic context
From: David Miller @ 2013-10-25 23:05 UTC (permalink / raw)
  To: ast; +Cc: nicolas.dichtel, amwang, vfalico, netdev
In-Reply-To: <1382569362-4820-1-git-send-email-ast@plumgrid.com>

From: Alexei Starovoitov <ast@plumgrid.com>
Date: Wed, 23 Oct 2013 16:02:42 -0700

> commit 991fb3f74c "dev: always advertise rx_flags changes via netlink"
> introduced rtnl notification from __dev_set_promiscuity(),
> which can be called in atomic context.
> 
> Steps to reproduce:
> ip tuntap add dev tap1 mode tap
> ifconfig tap1 up
> tcpdump -nei tap1 &
> ip tuntap del dev tap1 mode tap
 ...
> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>

Applied.

^ permalink raw reply

* Re: [patch net-next] ipv6: allow userspace to create address with IFLA_F_TEMPORARY flag
From: Vladislav Yasevich @ 2013-10-25 23:05 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev@vger.kernel.org, David Miller, kuznet, jmorris, yoshfuji,
	kaber, thaller, Stephen Hemminger
In-Reply-To: <1382622355-6500-1-git-send-email-jiri@resnulli.us>

On Thu, Oct 24, 2013 at 9:45 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> This is needed in order to implement userspace address configuration,
> namely ip6-privacy (rfc4941) in NetworkManager.
>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
>  net/ipv6/addrconf.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index cd3fb30..962c7c9 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -3715,7 +3715,8 @@ inet6_rtm_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh)
>                 return -ENODEV;
>
>         /* We ignore other flags so far. */
> -       ifa_flags = ifm->ifa_flags & (IFA_F_NODAD | IFA_F_HOMEADDRESS);
> +       ifa_flags = ifm->ifa_flags & (IFA_F_NODAD | IFA_F_HOMEADDRESS |
> +                                     IFA_F_TEMPORARY);
>
>         ifa = ipv6_get_ifaddr(net, pfx, dev, 1);
>         if (ifa == NULL) {
> --
> 1.8.3.1
>

Jiri

So, is the idea behind this is that all of temp address management
would be done in user space?  If so, then you
may have to verify that no-one sets the lifetime values on the prefix
in your other patch.   I am still trying to figure out
why this would be needed.

Thanks
-vlad

^ permalink raw reply

* Re: [PATCH net] netconsole: fix NULL pointer dereference
From: Francois Romieu @ 2013-10-25 23:06 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: David Miller, David.Laight, vfalico, netdev
In-Reply-To: <526A7CE5.8040205@redhat.com>

Nikolay Aleksandrov <nikolay@redhat.com> :
[...]
> I thought this implies that the spinlock protects us against running with
> write_msg().
> It's fine by me either way (with or w/o the addition to the comment). It's
> up to you Dave, do you still want it explicitly there ?

There was some facetiousness in David's comment.

-- 
Ueimor

^ permalink raw reply

* Re: vxlan gso is broken by stackable gso_segment()
From: David Miller @ 2013-10-25 23:10 UTC (permalink / raw)
  To: ast; +Cc: eric.dumazet, edumazet, stephen, netdev
In-Reply-To: <CAMEtUuxwi05oX0TqNdMbjy04MT=ZLemLXJQTTcAMrnWuhXLKdQ@mail.gmail.com>

From: Alexei Starovoitov <ast@plumgrid.com>
Date: Fri, 25 Oct 2013 15:41:47 -0700

> 'bool tunnel' actually still used to indicate encap_level > 0

Good catch, I misread the code.

^ permalink raw reply

* Re: [PATCH] ibm emac: Don't call napi_complete if napi_reschedule failed
From: David Miller @ 2013-10-25 23:13 UTC (permalink / raw)
  To: alistair; +Cc: netdev, benh
In-Reply-To: <1382585780-13244-1-git-send-email-alistair@popple.id.au>

From: Alistair Popple <alistair@popple.id.au>
Date: Thu, 24 Oct 2013 14:36:20 +1100

> This patch fixes a bug which would trigger the BUG_ON() at
> net/core/dev.c:4156. It was found that this was due to napi_complete
> being called even when the corresponding call to napi_reschedule had
> failed.
> 
> This patch ensures that we only contine processing rotting packets in
> the current mal_poll call if we are not already on the polling list.
> It also adds locking calls to protect the read-write-modify of the
> MAL_CFG DCR.
> 
> Signed-off-by: Alistair Popple <alistair@popple.id.au>

The mistaken napi_complete() call and the locking change are two different
bugs, fix them in two different patches.

^ permalink raw reply

* Re: vxlan gso is broken by stackable gso_segment()
From: Eric Dumazet @ 2013-10-25 23:25 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: David Miller, Eric Dumazet, stephen, netdev
In-Reply-To: <CAMEtUuxwi05oX0TqNdMbjy04MT=ZLemLXJQTTcAMrnWuhXLKdQ@mail.gmail.com>

On Fri, 2013-10-25 at 15:41 -0700, Alexei Starovoitov wrote:

> 'bool tunnel' actually still used to indicate encap_level > 0
> 

Yes, I am studying if the setting of skb->encapsulation = 1 was really
needed in the :

if (tunnel) {
     skb_reset_inner_headers(skb);
     skb->encapsulation = 1;
}

And was planning to rename 'bool tunnel' by 'bool stacked' or
something... 

> Eric's fix brings back performance for vxlan and gre keeps working. Thx!

Please note the original performance is not that good, you mentioned 230
Mbps on lxc, while I get more than 5Gb/s on a 10G link.

This should be investigated ...

> 
> net/core/skbuff.c:3474 skb_try_coalesce() warning, I mentioned before,
> is unrelated.
> I still see it with this patch. Running either gre or vxlan tunnels.

I think this might be related to commit 6ff50cd55545 ("tcp: gso: do not
generate out of order packets")

I'll investigate this as well, thanks.

^ permalink raw reply

* Re: [PATCH] net: wan: sbni: remove assembly crc32 code
From: David Miller @ 2013-10-25 23:27 UTC (permalink / raw)
  To: sebastian; +Cc: andi, akpm, linux-kernel, ak, netdev
In-Reply-To: <20131022183625.GA4382@breakpoint.cc>

From: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Date: Tue, 22 Oct 2013 20:36:25 +0200

> There is also a C function doing the same thing. Unless the asm code is
> 110% faster we could stick to the C function.
> 
> Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>

Applied.

^ permalink raw reply

* Re: [PATCHv2 net] netpoll: fix rx_hook() interface by passing the skb
From: David Miller @ 2013-10-25 23:27 UTC (permalink / raw)
  To: antonio; +Cc: netdev, David.Laight
In-Reply-To: <1382564190-334-1-git-send-email-antonio@meshcoding.com>

From: Antonio Quartulli <antonio@meshcoding.com>
Date: Wed, 23 Oct 2013 23:36:30 +0200

> Right now skb->data is passed to rx_hook() even if the skb
> has not been linearised and without giving rx_hook() a way
> to linearise it.
> 
> Change the rx_hook() interface and make it accept the skb
> and the offset to the UDP payload as arguments. rx_hook() is
> also renamed to rx_skb_hook() to ensure that out of the tree
> users notice the API change.
> 
> In this way any rx_skb_hook() implementation can perform all
> the needed operations to properly (and safely) access the
> skb data.
> 
> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>

Applied.

^ permalink raw reply

* Re: [PATCH 1/1] ax88179_178a: Remove AX_MEDIUM_ALWAYS_ONE bit in AX_MEDIUM_STATUS_MODE register to avoid TX throttling
From: David Miller @ 2013-10-25 23:27 UTC (permalink / raw)
  To: freddy; +Cc: linux-usb, linux-kernel, netdev, allan, louis
In-Reply-To: <1382597905-2393-1-git-send-email-freddy@asix.com.tw>

From: freddy@asix.com.tw
Date: Thu, 24 Oct 2013 14:58:25 +0800

> From: Freddy Xin <freddy@asix.com.tw>
> 
> Remove AX_MEDIUM_ALWAYS_ONE in AX_MEDIUM_STATUS_MODE register.
> Setting this bit may cause TX throttling in Half-Duplex mode.
> 
> Signed-off-by: Freddy Xin <freddy@asix.com.tw>

Applied.

^ permalink raw reply

* Re: [PATCH net v2 1/2] ipv6: reset dst.expires value when clearing expire flag
From: David Miller @ 2013-10-25 23:27 UTC (permalink / raw)
  To: hannes; +Cc: netdev, sgunderson, valentyn, yoshfuji, edumazet
In-Reply-To: <20131024081427.GC15744@order.stressinduktion.org>

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Thu, 24 Oct 2013 10:14:27 +0200

> On receiving a packet too big icmp error we update the expire value by
> calling rt6_update_expires. This function uses dst_set_expires which is
> implemented that it can only reduce the expiration value of the dst entry.
> 
> If we insert new routing non-expiry information into the ipv6 fib where
> we already have a matching rt6_info we only clear the RTF_EXPIRES flag
> in rt6i_flags and leave the dst.expires value as is.
> 
> When new mtu information arrives for that cached dst_entry we again
> call dst_set_expires. This time it won't update the dst.expire value
> because we left the dst.expire value intact from the last update. So
> dst_set_expires won't touch dst.expires.
> 
> Fix this by resetting dst.expires when clearing the RTF_EXPIRE flag.
> dst_set_expires checks for a zero expiration and updates the
> dst.expires.
> 
> In the past this (not updating dst.expires) was necessary because
> dst.expire was placed in a union with the dst_entry *from reference
> and rt6_clean_expires did assign NULL to it. This split happend in
> ecd9883724b78cc72ed92c98bcb1a46c764fff21 ("ipv6: fix race condition
> regarding dst->expires and dst->from").
> 
> Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com>
> Reported-by: Valentijn Sessink <valentyn@blub.net>
> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
> Acked-by: Eric Dumazet <edumazet@google.com>
> Tested-by: Valentijn Sessink <valentyn@blub.net>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>

Applied.

^ permalink raw reply

* Re: [PATCH net 2/2] ipv6: ip6_dst_check needs to check for expired dst_entries
From: David Miller @ 2013-10-25 23:27 UTC (permalink / raw)
  To: hannes; +Cc: netdev, sgunderson, valentyn, yoshfuji
In-Reply-To: <20131024054824.GA15744@order.stressinduktion.org>

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Thu, 24 Oct 2013 07:48:24 +0200

> On receiving a packet too big icmp error we check if our current cached
> dst_entry in the socket is still valid. This validation check did not
> care about the expiration of the (cached) route.
> 
> The error path I traced down:
> The socket receives a packet too big mtu notification. It still has a
> valid dst_entry and thus issues the ip6_rt_pmtu_update on this dst_entry,
> setting RTF_EXPIRE and updates the dst.expiration value (which could
> fail because of not up-to-date expiration values, see previous patch).
> 
> In some seldom cases we race with a) the ip6_fib gc or b) another routing
> lookup which would result in a recreation of the cached rt6_info from its
> parent non-cached rt6_info. While copying the rt6_info we reinitialize the
> metrics store by copying it over from the parent thus invalidating the
> just installed pmtu update (both dsts use the same key to the inetpeer
> storage). The dst_entry with the just invalidated metrics data would
> just get its RTF_EXPIRES flag cleared and would continue to stay valid
> for the socket.
> 
> We should have not issued the pmtu update on the already expired dst_entry
> in the first placed. By checking the expiration on the dst entry and
> doing a relookup in case it is out of date we close the race because
> we would install a new rt6_info into the fib before we issue the pmtu
> update, thus closing this race.
> 
> Not reliably updating the dst.expire value was fixed by the patch "ipv6:
> reset dst.expires value when clearing expire flag".
> 
> Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com>
> Reported-by: Valentijn Sessink <valentyn@blub.net>
> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> ---
> I would propose this patch for -stable.

Applied and queued up for -stable.

^ permalink raw reply

* Re: [PATCH 1/4] sctp: merge two if statements to one
From: Vlad Yasevich @ 2013-10-25 23:40 UTC (permalink / raw)
  To: Wang Weidong, davem, nhorman; +Cc: dingtianhong, linux-sctp, netdev
In-Reply-To: <1382665805-13952-2-git-send-email-wangweidong1@huawei.com>

On 10/24/2013 09:50 PM, Wang Weidong wrote:
> Two if statements do the same work, maybe we can merge them to
> one. There is just code simplification, no functional changes.
>
> Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
> ---
>   net/sctp/auth.c | 12 ++++--------
>   1 file changed, 4 insertions(+), 8 deletions(-)
>
> diff --git a/net/sctp/auth.c b/net/sctp/auth.c
> index 8c4fa5d..19fb0ae 100644
> --- a/net/sctp/auth.c
> +++ b/net/sctp/auth.c
> @@ -539,18 +539,14 @@ struct sctp_hmac *sctp_auth_asoc_get_hmac(const struct sctp_association *asoc)
>   	for (i = 0; i < n_elt; i++) {
>   		id = ntohs(hmacs->hmac_ids[i]);
>
> -		/* Check the id is in the supported range */
> -		if (id > SCTP_AUTH_HMAC_ID_MAX) {
> -			id = 0;
> -			continue;
> -		}
> -
> -		/* See is we support the id.  Supported IDs have name and
> +		/* Check the id is in the supported range. And
> +		 * see is we support the id.  Supported IDs have name and
>   		 * length fields set, so that we can allocated and use
>   		 * them.  We can safely just check for name, for without the
>   		 * name, we can't allocate the TFM.
>   		 	*/
> -		if (!sctp_hmac_list[id].hmac_name) {
> +		if (id > SCTP_AUTH_HMAC_ID_MAX ||
> +			!sctp_hmac_list[id].hmac_name) {

Can you please make the 2 parts of the 'if' statement above line up
with each other instead of the code below.  I makes it easy to see what
the whole 'if conditional' is.

Thanks
-vlad

>   			id = 0;
>   			continue;
>   		}
>

^ permalink raw reply

* Re: [PATCH 2/4] sctp: remove the repeat initialize with 0
From: Vlad Yasevich @ 2013-10-25 23:42 UTC (permalink / raw)
  To: Wang Weidong, davem, nhorman; +Cc: dingtianhong, linux-sctp, netdev
In-Reply-To: <1382665805-13952-3-git-send-email-wangweidong1@huawei.com>

On 10/24/2013 09:50 PM, Wang Weidong wrote:
> kmem_cache_zalloc had set the allocated memory to zero. I think no need
> to initialize with 0. And move the comments to the function begin.
>
> Signed-off-by: Wang Weidong <wangweidong1@huawei.com>


Yes, thank you.

Acked-by: Vlad Yasevich <vyasevich@gmail.com>

-vlad

> ---
>   net/sctp/sm_make_chunk.c | 29 ++++++++---------------------
>   1 file changed, 8 insertions(+), 21 deletions(-)
>
> diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> index d244a23..fe69032 100644
> --- a/net/sctp/sm_make_chunk.c
> +++ b/net/sctp/sm_make_chunk.c
> @@ -1297,6 +1297,13 @@ struct sctp_chunk *sctp_make_auth(const struct sctp_association *asoc)
>
>   /* Turn an skb into a chunk.
>    * FIXME: Eventually move the structure directly inside the skb->cb[].
> + *
> + * sctpimpguide-05.txt Section 2.8.2
> + * M1) Each time a new DATA chunk is transmitted
> + * set the 'TSN.Missing.Report' count for that TSN to 0. The
> + * 'TSN.Missing.Report' count will be used to determine missing chunks
> + * and when to fast retransmit.
> + *
>    */
>   struct sctp_chunk *sctp_chunkify(struct sk_buff *skb,
>   			    const struct sctp_association *asoc,
> @@ -1314,29 +1321,9 @@ struct sctp_chunk *sctp_chunkify(struct sk_buff *skb,
>   	INIT_LIST_HEAD(&retval->list);
>   	retval->skb		= skb;
>   	retval->asoc		= (struct sctp_association *)asoc;
> -	retval->has_tsn		= 0;
> -	retval->has_ssn         = 0;
> -	retval->rtt_in_progress	= 0;
> -	retval->sent_at		= 0;
>   	retval->singleton	= 1;
> -	retval->end_of_packet	= 0;
> -	retval->ecn_ce_done	= 0;
> -	retval->pdiscard	= 0;
> -
> -	/* sctpimpguide-05.txt Section 2.8.2
> -	 * M1) Each time a new DATA chunk is transmitted
> -	 * set the 'TSN.Missing.Report' count for that TSN to 0. The
> -	 * 'TSN.Missing.Report' count will be used to determine missing chunks
> -	 * and when to fast retransmit.
> -	 */
> -	retval->tsn_missing_report = 0;
> -	retval->tsn_gap_acked = 0;
> -	retval->fast_retransmit = SCTP_CAN_FRTX;
>
> -	/* If this is a fragmented message, track all fragments
> -	 * of the message (for SEND_FAILED).
> -	 */
> -	retval->msg = NULL;
> +	retval->fast_retransmit = SCTP_CAN_FRTX;
>
>   	/* Polish the bead hole.  */
>   	INIT_LIST_HEAD(&retval->transmitted_list);
>

^ permalink raw reply

* Re: [PATCH 3/4] sctp: fix some comments in associola.c
From: Vlad Yasevich @ 2013-10-25 23:43 UTC (permalink / raw)
  To: Wang Weidong, davem, nhorman; +Cc: dingtianhong, linux-sctp, netdev
In-Reply-To: <1382665805-13952-4-git-send-email-wangweidong1@huawei.com>

On 10/24/2013 09:50 PM, Wang Weidong wrote:
> fix some spellings
>
> Signed-off-by: Wang Weidong <wangweidong1@huawei.com>

Acked-by: Vlad Yasevich <vyasevich@gmail.com>

-vlad

> ---
>   net/sctp/associola.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
> index cef5099..c9b91cb 100644
> --- a/net/sctp/associola.c
> +++ b/net/sctp/associola.c
> @@ -602,7 +602,7 @@ void sctp_assoc_rm_peer(struct sctp_association *asoc,
>
>   		/* Start a T3 timer here in case it wasn't running so
>   		 * that these migrated packets have a chance to get
> -		 * retrnasmitted.
> +		 * retransmitted.
>   		 */
>   		if (!timer_pending(&active->T3_rtx_timer))
>   			if (!mod_timer(&active->T3_rtx_timer,
> @@ -665,7 +665,7 @@ struct sctp_transport *sctp_assoc_add_peer(struct sctp_association *asoc,
>   	/* Set the path max_retrans.  */
>   	peer->pathmaxrxt = asoc->pathmaxrxt;
>
> -	/* And the partial failure retrnas threshold */
> +	/* And the partial failure retrans threshold */
>   	peer->pf_retrans = asoc->pf_retrans;
>
>   	/* Initialize the peer's SACK delay timeout based on the
>

^ permalink raw reply

* Re: [PATCH 4/4] sctp: fix comment in chunk.c
From: Vlad Yasevich @ 2013-10-25 23:45 UTC (permalink / raw)
  To: Wang Weidong, davem, nhorman; +Cc: dingtianhong, linux-sctp, netdev
In-Reply-To: <1382665805-13952-5-git-send-email-wangweidong1@huawei.com>

On 10/24/2013 09:50 PM, Wang Weidong wrote:
> fix a spelling
>
> Signed-off-by: Wang Weidong <wangweidong1@huawei.com>

You could have combined these spelling fixes into 1 patch.

Acked-by: Vlad Yasevich <vyasevich@gmail.com>

-vlad

> ---
>   net/sctp/chunk.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
> index 7bd5ed4..f2044fc 100644
> --- a/net/sctp/chunk.c
> +++ b/net/sctp/chunk.c
> @@ -201,7 +201,7 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
>
>   	max = asoc->frag_point;
>   	/* If the the peer requested that we authenticate DATA chunks
> -	 * we need to accound for bundling of the AUTH chunks along with
> +	 * we need to account for bundling of the AUTH chunks along with
>   	 * DATA.
>   	 */
>   	if (sctp_auth_send_cid(SCTP_CID_DATA, asoc)) {
>

^ permalink raw reply

* [PATCH net-next] tcp: gso: fix truesize tracking
From: Eric Dumazet @ 2013-10-26  0:26 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: David Miller, Eric Dumazet, stephen, netdev
In-Reply-To: <1382743502.4032.6.camel@edumazet-glaptop.roam.corp.google.com>

From: Eric Dumazet <edumazet@google.com>

commit 6ff50cd55545 ("tcp: gso: do not generate out of order packets")
had an heuristic that can trigger a warning in skb_try_coalesce(),
because skb->truesize of the gso segments were exactly set to mss.

This breaks the requirement that

skb->truesize >= skb->len + truesizeof(struct sk_buff);

It can trivially be reproduced by :

ifconfig lo mtu 1500
ethtool -K lo tso off
netperf

As the skbs are looped into the TCP networking stack, skb_try_coalesce()
warns us of these skb under-estimating their truesize.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Alexei Starovoitov <ast@plumgrid.com>
---
Based on net-next but applies as well on net tree with some fuzz.

David, we might backport this one to 3.12 and stable later, I let you
decide.

 net/ipv4/tcp_offload.c |   13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index a7a5583e..a2b68a1 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -18,6 +18,7 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
 				netdev_features_t features)
 {
 	struct sk_buff *segs = ERR_PTR(-EINVAL);
+	unsigned int sum_truesize = 0;
 	struct tcphdr *th;
 	unsigned int thlen;
 	unsigned int seq;
@@ -104,13 +105,7 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
 		if (copy_destructor) {
 			skb->destructor = gso_skb->destructor;
 			skb->sk = gso_skb->sk;
-			/* {tcp|sock}_wfree() use exact truesize accounting :
-			 * sum(skb->truesize) MUST be exactly be gso_skb->truesize
-			 * So we account mss bytes of 'true size' for each segment.
-			 * The last segment will contain the remaining.
-			 */
-			skb->truesize = mss;
-			gso_skb->truesize -= mss;
+			sum_truesize += skb->truesize;
 		}
 		skb = skb->next;
 		th = tcp_hdr(skb);
@@ -127,7 +122,9 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
 	if (copy_destructor) {
 		swap(gso_skb->sk, skb->sk);
 		swap(gso_skb->destructor, skb->destructor);
-		swap(gso_skb->truesize, skb->truesize);
+		sum_truesize += skb->truesize;
+		atomic_add(sum_truesize - gso_skb->truesize,
+			   &skb->sk->sk_wmem_alloc);
 	}
 
 	delta = htonl(oldlen + (skb_tail_pointer(skb) -

^ permalink raw reply related

* Re: vxlan gso is broken by stackable gso_segment()
From: Eric Dumazet @ 2013-10-26  0:52 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: David Miller, Eric Dumazet, stephen, netdev
In-Reply-To: <1382743502.4032.6.camel@edumazet-glaptop.roam.corp.google.com>

On Fri, 2013-10-25 at 16:25 -0700, Eric Dumazet wrote:

> Please note the original performance is not that good, you mentioned 230
> Mbps on lxc, while I get more than 5Gb/s on a 10G link.
> 
> This should be investigated ...

This is probably trivial to increase performance :

veth currently do not support any kind of tunneling TSO :

tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-mpls-segmentation: off [fixed]

I'll submit a patch for net-next

^ permalink raw reply

* [PATCH net-next] veth: extend features to support tunneling
From: Eric Dumazet @ 2013-10-26  1:25 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: David Miller, Eric Dumazet, stephen, netdev
In-Reply-To: <1382748742.4032.24.camel@edumazet-glaptop.roam.corp.google.com>

From: Eric Dumazet <edumazet@google.com>

While investigating on a recent vxlan regression, I found veth
was using a zero features set for vxlan tunnels.

We have to segment GSO frames, copy the payload, and do the checksum.

This patch brings a ~200% performance increase

We probably have to add hw_enc_features support
on other virtual devices.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
---
 drivers/net/veth.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index b2d0347..b24db7a 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -261,6 +261,8 @@ static const struct net_device_ops veth_netdev_ops = {
 
 #define VETH_FEATURES (NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_ALL_TSO |    \
 		       NETIF_F_HW_CSUM | NETIF_F_RXCSUM | NETIF_F_HIGHDMA | \
+		       NETIF_F_GSO_GRE | NETIF_F_GSO_UDP_TUNNEL |	    \
+		       NETIF_F_GSO_IPIP | NETIF_F_GSO_SIT | NETIF_F_UFO	|   \
 		       NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX | \
 		       NETIF_F_HW_VLAN_STAG_TX | NETIF_F_HW_VLAN_STAG_RX )
 
@@ -279,6 +281,7 @@ static void veth_setup(struct net_device *dev)
 	dev->destructor = veth_dev_free;
 
 	dev->hw_features = VETH_FEATURES;
+	dev->hw_enc_features = VETH_FEATURES;
 }
 
 /*

^ permalink raw reply related

* Re: [PATCH net-next] tcp: gso: fix truesize tracking
From: Alexei Starovoitov @ 2013-10-26  1:46 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Eric Dumazet, stephen, netdev
In-Reply-To: <1382747177.4032.21.camel@edumazet-glaptop.roam.corp.google.com>

On Fri, Oct 25, 2013 at 5:26 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> commit 6ff50cd55545 ("tcp: gso: do not generate out of order packets")
> had an heuristic that can trigger a warning in skb_try_coalesce(),
> because skb->truesize of the gso segments were exactly set to mss.
>
> This breaks the requirement that
>
> skb->truesize >= skb->len + truesizeof(struct sk_buff);
>
> It can trivially be reproduced by :
>
> ifconfig lo mtu 1500
> ethtool -K lo tso off
> netperf
>
> As the skbs are looped into the TCP networking stack, skb_try_coalesce()
> warns us of these skb under-estimating their truesize.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Alexei Starovoitov <ast@plumgrid.com>
> ---

it fixed it in my setup. Thanks!

^ permalink raw reply

* Re: [PATCH net-next] veth: extend features to support tunneling
From: Alexei Starovoitov @ 2013-10-26  2:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, Eric Dumazet, stephen, netdev
In-Reply-To: <1382750703.4032.32.camel@edumazet-glaptop.roam.corp.google.com>

On Fri, Oct 25, 2013 at 6:25 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> While investigating on a recent vxlan regression, I found veth
> was using a zero features set for vxlan tunnels.

oneliner can be better :)

> We have to segment GSO frames, copy the payload, and do the checksum.
>
> This patch brings a ~200% performance increase
>
> We probably have to add hw_enc_features support
> on other virtual devices.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Alexei Starovoitov <ast@plumgrid.com>
> ---

iperf over veth with gre/vxlan tunneling is now ~4Gbps. 20x gain as advertised.
Thanks!

^ permalink raw reply

* Re: [PATCH 1/4] sctp: merge two if statements to one
From: wangweidong @ 2013-10-26  2:55 UTC (permalink / raw)
  To: Sergei Shtylyov, davem, nhorman, vyasevich
  Cc: dingtianhong, linux-sctp, netdev
In-Reply-To: <526A6C45.3080200@cogentembedded.com>

On 2013/10/25 21:04, Sergei Shtylyov wrote:
> Hello.
> 
> On 25-10-2013 5:50, Wang Weidong wrote:
> 
>> Two if statements do the same work, maybe we can merge them to
>> one. There is just code simplification, no functional changes.
> 
>> Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
>> ---
>>   net/sctp/auth.c | 12 ++++--------
>>   1 file changed, 4 insertions(+), 8 deletions(-)
> 
>    I understand what I noticed below is not your typos but maybe it's time to fix them?

Yeah, I will fix them.
Thanks.

> 
>> diff --git a/net/sctp/auth.c b/net/sctp/auth.c
>> index 8c4fa5d..19fb0ae 100644
>> --- a/net/sctp/auth.c
>> +++ b/net/sctp/auth.c
>> @@ -539,18 +539,14 @@ struct sctp_hmac *sctp_auth_asoc_get_hmac(const struct sctp_association *asoc)
>>       for (i = 0; i < n_elt; i++) {
>>           id = ntohs(hmacs->hmac_ids[i]);
>>
>> -        /* Check the id is in the supported range */
>> -        if (id > SCTP_AUTH_HMAC_ID_MAX) {
>> -            id = 0;
>> -            continue;
>> -        }
>> -
>> -        /* See is we support the id.  Supported IDs have name and
>> +        /* Check the id is in the supported range. And
>> +         * see is we support the id.  Supported IDs have name and
> 
>    s/is/if/.
> 
>>            * length fields set, so that we can allocated and use
> 
>    s/allocated/allocate/.
> 
> WBR, Sergei
> 
> 
> 

^ permalink raw reply

* Re: [PATCH 1/4] sctp: merge two if statements to one
From: wangweidong @ 2013-10-26  2:59 UTC (permalink / raw)
  To: Vlad Yasevich, davem, nhorman; +Cc: dingtianhong, linux-sctp, netdev
In-Reply-To: <526B016D.6030301@gmail.com>

On 2013/10/26 7:40, Vlad Yasevich wrote:
> On 10/24/2013 09:50 PM, Wang Weidong wrote:
>> Two if statements do the same work, maybe we can merge them to
>> one. There is just code simplification, no functional changes.
>>
>> Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
>> ---
>>   net/sctp/auth.c | 12 ++++--------
>>   1 file changed, 4 insertions(+), 8 deletions(-)
>>
>> diff --git a/net/sctp/auth.c b/net/sctp/auth.c
>> index 8c4fa5d..19fb0ae 100644
>> --- a/net/sctp/auth.c
>> +++ b/net/sctp/auth.c
>> @@ -539,18 +539,14 @@ struct sctp_hmac *sctp_auth_asoc_get_hmac(const struct sctp_association *asoc)
>>       for (i = 0; i < n_elt; i++) {
>>           id = ntohs(hmacs->hmac_ids[i]);
>>
>> -        /* Check the id is in the supported range */
>> -        if (id > SCTP_AUTH_HMAC_ID_MAX) {
>> -            id = 0;
>> -            continue;
>> -        }
>> -
>> -        /* See is we support the id.  Supported IDs have name and
>> +        /* Check the id is in the supported range. And
>> +         * see is we support the id.  Supported IDs have name and
>>            * length fields set, so that we can allocated and use
>>            * them.  We can safely just check for name, for without the
>>            * name, we can't allocate the TFM.
>>                */
>> -        if (!sctp_hmac_list[id].hmac_name) {
>> +        if (id > SCTP_AUTH_HMAC_ID_MAX ||
>> +            !sctp_hmac_list[id].hmac_name) {
> 
> Can you please make the 2 parts of the 'if' statement above line up
> with each other instead of the code below.  I makes it easy to see what
> the whole 'if conditional' is.
> 
> Thanks
> -vlad
> 

Ok, I will resend a new version.
Thanks. 

>>               id = 0;
>>               continue;
>>           }
>>
> 
> 
> 

^ permalink raw reply

* Re: [PATCH net 0/2] qlcnic: Bug fixes
From: David Miller @ 2013-10-26  4:05 UTC (permalink / raw)
  To: shahed.shaikh; +Cc: netdev, Dept_NX_Linux_NIC_Driver
In-Reply-To: <1382711917-28501-1-git-send-email-shahed.shaikh@qlogic.com>

From: Shahed Shaikh <shahed.shaikh@qlogic.com>
Date: Fri, 25 Oct 2013 10:38:35 -0400

> From: Shahed Shaikh <shahed.shaikh@qlogic.com>
> 
> This patch series contains following fixes-
> * Performace drop because driver was forcing adapter not to check
>   destination IP for LRO.
> * driver was not issuing qlcnic_fw_cmd_set_drv_version() to 83xx adapter
>   becasue of improper handling of QLCNIC_FW_CAPABILITY_MORE_CAPS bit.
> 
> Please apply to net.

Applied, what exactly does that destination IP check do?

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox