Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net] ipv6: introduce tcp_v6_iif()
From: Cong Wang @ 2014-10-18  0:02 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1413562640.24953.26.camel@edumazet-glaptop2.roam.corp.google.com>

On Fri, Oct 17, 2014 at 9:17 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> Commit 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line
> misses") added a regression for SO_BINDTODEVICE on IPv6.
>
> This is because we still use inet6_iif() which expects that IP6 control
> block is still at the beginning of skb->cb[]
>
> This patch adds tcp_v6_iif() helper and uses it where necessary.
>
> Because __inet6_lookup_skb() is used by TCP and DCCP, we add an iif
> parameter to it.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses")

As I already explained in previous reply, this is probably the best solution
for net, so:

Acked-by: Cong Wang <cwang@twopensource.com>

^ permalink raw reply

* Re: [PATCH net] ipv6: introduce tcp_v6_iif()
From: Cong Wang @ 2014-10-18  0:00 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1413569996.25949.6.camel@edumazet-glaptop2.roam.corp.google.com>

On Fri, Oct 17, 2014 at 11:19 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2014-10-17 at 09:58 -0700, Cong Wang wrote:
>
>> I doubt we still need to store iif in IP6CB() since we have skb->skb_iif,
>> we can probably just make inet6_iif() be like inet_iif() so that IP6CB()->iif
>> can be just removed? Does this make any sense to you?
>>
>
> This makes sense and would gain 4 bytes in skb->cb[], so definitely
> worth it.
>


Hmm, after testing my patch, I found that some IPv6 code (scope_id) calls
inet6_iif() at socket layer too, where skb dst is already dropped. So we can't
simply use skb_dst() for all inet6_iif(). This is also why IPv4 is different.

Therefore, I think your patch is fine at least for net.

^ permalink raw reply

* Re: [PATCH] dsa: Fix conversion from host device to mii bus
From: Florian Fainelli @ 2014-10-17 23:40 UTC (permalink / raw)
  To: Guenter Roeck, David S. Miller; +Cc: Alexander Duyck, netdev, linux-kernel
In-Reply-To: <1413574258-21766-1-git-send-email-linux@roeck-us.net>

On 10/17/2014 12:30 PM, Guenter Roeck wrote:
> Commit b4d2394d01bc ("dsa: Replace mii_bus with a generic host device")
> replaces mii_bus with a generic host_dev, and introduces
> dsa_host_dev_to_mii_bus() to support conversion from host_dev to mii_bus.
> However, in some cases it uses to_mii_bus to perform that conversion.
> Since host_dev is not the phy bus device but typically a platform device,
> this fails and results in a crash with the affected drivers.
> 
> BUG: unable to handle kernel NULL pointer dereference at           (null)
> IP: [<ffffffff81781d35>] __mutex_lock_slowpath+0x75/0x100
> PGD 406783067 PUD 406784067 PMD 0
> Oops: 0002 [#1] SMP
> ...
> Call Trace:
> [<ffffffff810a538b>] ? pick_next_task_fair+0x61b/0x880
> [<ffffffff81781de3>] mutex_lock+0x23/0x37
> [<ffffffff81533244>] mdiobus_read+0x34/0x60
> [<ffffffff8153b95a>] __mv88e6xxx_reg_read+0x8a/0xa0
> [<ffffffff8153b9bc>] mv88e6xxx_reg_read+0x4c/0xa0
> 
> Fixes: b4d2394d01bc ("dsa: Replace mii_bus with a generic host device")
> Cc: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Guenter Roeck <linux@roeck-us.net>

Acked-by: Florian Fainelli <f.fainelli@gmail.com>

> ---
>  drivers/net/dsa/mv88e6060.c | 16 ++++++++++++----
>  drivers/net/dsa/mv88e6xxx.c | 14 ++++++++++----
>  2 files changed, 22 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/dsa/mv88e6060.c b/drivers/net/dsa/mv88e6060.c
> index 776e965..05b0ca3 100644
> --- a/drivers/net/dsa/mv88e6060.c
> +++ b/drivers/net/dsa/mv88e6060.c
> @@ -21,8 +21,12 @@
>  
>  static int reg_read(struct dsa_switch *ds, int addr, int reg)
>  {
> -	return mdiobus_read(to_mii_bus(ds->master_dev),
> -			    ds->pd->sw_addr + addr, reg);
> +	struct mii_bus *bus = dsa_host_dev_to_mii_bus(ds->master_dev);
> +
> +	if (bus == NULL)
> +		return -EINVAL;
> +
> +	return mdiobus_read(bus, ds->pd->sw_addr + addr, reg);
>  }
>  
>  #define REG_READ(addr, reg)					\
> @@ -38,8 +42,12 @@ static int reg_read(struct dsa_switch *ds, int addr, int reg)
>  
>  static int reg_write(struct dsa_switch *ds, int addr, int reg, u16 val)
>  {
> -	return mdiobus_write(to_mii_bus(ds->master_dev),
> -			     ds->pd->sw_addr + addr, reg, val);
> +	struct mii_bus *bus = dsa_host_dev_to_mii_bus(ds->master_dev);
> +
> +	if (bus == NULL)
> +		return -EINVAL;
> +
> +	return mdiobus_write(bus, ds->pd->sw_addr + addr, reg, val);
>  }
>  
>  #define REG_WRITE(addr, reg, val)				\
> diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
> index d6f6428..a6c90cf 100644
> --- a/drivers/net/dsa/mv88e6xxx.c
> +++ b/drivers/net/dsa/mv88e6xxx.c
> @@ -75,11 +75,14 @@ int __mv88e6xxx_reg_read(struct mii_bus *bus, int sw_addr, int addr, int reg)
>  int mv88e6xxx_reg_read(struct dsa_switch *ds, int addr, int reg)
>  {
>  	struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
> +	struct mii_bus *bus = dsa_host_dev_to_mii_bus(ds->master_dev);
>  	int ret;
>  
> +	if (bus == NULL)
> +		return -EINVAL;
> +
>  	mutex_lock(&ps->smi_mutex);
> -	ret = __mv88e6xxx_reg_read(to_mii_bus(ds->master_dev),
> -				   ds->pd->sw_addr, addr, reg);
> +	ret = __mv88e6xxx_reg_read(bus, ds->pd->sw_addr, addr, reg);
>  	mutex_unlock(&ps->smi_mutex);
>  
>  	return ret;
> @@ -119,11 +122,14 @@ int __mv88e6xxx_reg_write(struct mii_bus *bus, int sw_addr, int addr,
>  int mv88e6xxx_reg_write(struct dsa_switch *ds, int addr, int reg, u16 val)
>  {
>  	struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
> +	struct mii_bus *bus = dsa_host_dev_to_mii_bus(ds->master_dev);
>  	int ret;
>  
> +	if (bus == NULL)
> +		return -EINVAL;
> +
>  	mutex_lock(&ps->smi_mutex);
> -	ret = __mv88e6xxx_reg_write(to_mii_bus(ds->master_dev),
> -				    ds->pd->sw_addr, addr, reg, val);
> +	ret = __mv88e6xxx_reg_write(bus, ds->pd->sw_addr, addr, reg, val);
>  	mutex_unlock(&ps->smi_mutex);
>  
>  	return ret;
> 

^ permalink raw reply

* [PATCH net] openvswitch: Set flow-key members.
From: Pravin B Shelar @ 2014-10-17 20:56 UTC (permalink / raw)
  To: davem; +Cc: netdev, Pravin B Shelar

This patch adds missing memset which are required to initialize
flow key member. For example for IP flow we need to initialize
ip.frag for all cases.

Found by inspection.

This bug is introduced by commit 0714812134d7dcadeb7ecfbfeb18788aa7e1eaac
("openvswitch: Eliminate memset() from flow_extract").

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 net/openvswitch/flow.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index c5cfc72..2b78789 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -274,6 +274,8 @@ static int parse_ipv6hdr(struct sk_buff *skb, struct sw_flow_key *key)
 			key->ip.frag = OVS_FRAG_TYPE_LATER;
 		else
 			key->ip.frag = OVS_FRAG_TYPE_FIRST;
+	} else {
+		key->ip.frag = OVS_FRAG_TYPE_NONE;
 	}
 
 	nh_len = payload_ofs - nh_ofs;
@@ -358,6 +360,7 @@ static int parse_icmpv6(struct sk_buff *skb, struct sw_flow_key *key,
 	 */
 	key->tp.src = htons(icmp->icmp6_type);
 	key->tp.dst = htons(icmp->icmp6_code);
+	memset(&key->ipv6.nd, 0, sizeof(key->ipv6.nd));
 
 	if (icmp->icmp6_code == 0 &&
 	    (icmp->icmp6_type == NDISC_NEIGHBOUR_SOLICITATION ||
@@ -674,9 +677,6 @@ int ovs_flow_key_extract(struct ovs_tunnel_info *tun_info,
 	key->ovs_flow_hash = 0;
 	key->recirc_id = 0;
 
-	/* Flags are always used as part of stats */
-	key->tp.flags = 0;
-
 	return key_extract(skb, key);
 }
 
-- 
1.7.1

^ permalink raw reply related

* [PATCH net v2] net: dsa: add includes for ethtool and phy_fixed definitions
From: Florian Fainelli @ 2014-10-17 23:02 UTC (permalink / raw)
  To: netdev; +Cc: davem, Florian Fainelli

net/dsa/slave.c uses functions and structures declared in phy_fixed.h
but does not explicitely include it, while dsa.h needs structure
declarations for 'struct ethtool_wolinfo' and 'struct ethtool_eee', fix
those by including the correct header files.

Fixes: ec9436baedb6 ("net: dsa: allow drivers to do link adjustment")
Fixes: ce31b31c68e7 ("net: dsa: allow updating fixed PHY link information")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
Changes in v2:
- actually include the headers instead of using forward declarations

 include/net/dsa.h | 1 +
 net/dsa/slave.c   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 58ad8c6492db..b76559293535 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -18,6 +18,7 @@
 #include <linux/of.h>
 #include <linux/phy.h>
 #include <linux/phy_fixed.h>
+#include <linux/ethtool.h>
 
 enum dsa_tag_protocol {
 	DSA_TAG_PROTO_NONE = 0,
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 8030489d9cbe..a851e9f14118 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -11,6 +11,7 @@
 #include <linux/list.h>
 #include <linux/etherdevice.h>
 #include <linux/phy.h>
+#include <linux/phy_fixed.h>
 #include <linux/of_net.h>
 #include <linux/of_mdio.h>
 #include "dsa_priv.h"
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH] dsa: Fix conversion from host device to mii bus
From: Alexander Duyck @ 2014-10-17 23:02 UTC (permalink / raw)
  To: Guenter Roeck, David S. Miller
  Cc: Florian Fainelli, Alexander Duyck, netdev, linux-kernel
In-Reply-To: <1413574258-21766-1-git-send-email-linux@roeck-us.net>

On 10/17/2014 12:30 PM, Guenter Roeck wrote:
> Commit b4d2394d01bc ("dsa: Replace mii_bus with a generic host device")
> replaces mii_bus with a generic host_dev, and introduces
> dsa_host_dev_to_mii_bus() to support conversion from host_dev to mii_bus.
> However, in some cases it uses to_mii_bus to perform that conversion.
> Since host_dev is not the phy bus device but typically a platform device,
> this fails and results in a crash with the affected drivers.
>
> BUG: unable to handle kernel NULL pointer dereference at           (null)
> IP: [<ffffffff81781d35>] __mutex_lock_slowpath+0x75/0x100
> PGD 406783067 PUD 406784067 PMD 0
> Oops: 0002 [#1] SMP
> ...
> Call Trace:
> [<ffffffff810a538b>] ? pick_next_task_fair+0x61b/0x880
> [<ffffffff81781de3>] mutex_lock+0x23/0x37
> [<ffffffff81533244>] mdiobus_read+0x34/0x60
> [<ffffffff8153b95a>] __mv88e6xxx_reg_read+0x8a/0xa0
> [<ffffffff8153b9bc>] mv88e6xxx_reg_read+0x4c/0xa0
>
> Fixes: b4d2394d01bc ("dsa: Replace mii_bus with a generic host device")
> Cc: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
> ---

The fix looks good to me.

Acked-by: Alexander Duyck <alexander.h.duyck@redhat.com>

^ permalink raw reply

* Re: [PATCH] net: use hardware buffer pool to allocate skb
From: Alexander Duyck @ 2014-10-17 22:13 UTC (permalink / raw)
  To: Eric Dumazet, Alexander Duyck
  Cc: David Laight, Jiafei.Pan@freescale.com, David Miller,
	jkosina@suse.cz, netdev@vger.kernel.org, LeoLi@freescale.com,
	linux-doc@vger.kernel.org
In-Reply-To: <1413575515.27176.10.camel@edumazet-glaptop2.roam.corp.google.com>


On 10/17/2014 12:51 PM, Eric Dumazet wrote:
> On Fri, 2014-10-17 at 12:38 -0700, Alexander Duyck wrote:
>
>> That's not what I am saying, but there is a trade-off we always have to
>> take into account.  Cutting memory overhead will likely have an impact
>> on performance.  I would like to make the best informed trade-off in
>> that regard rather than just assuming worst case always for the driver.
> It seems you misunderstood me. You believe I suggested doing another
> allocation strategy in the drivers.
>
> This was not the case.
>
> This allocation strategy is wonderful. I repeat : This is wonderful.

No, I think I understand you.  I'm just not sure listing this as a 4K 
allocation in truesize makes sense.  The problem is the actual 
allocation can be either 2K or 4K, and my concern is that by setting it 
to 4K we are going to be hurting the case where the actual allocation to 
the socket is only 2K for the half page w/ reuse.

I was brining up the other allocation strategy to prove a point. From my 
perspective it wouldn't make any more sense to assign 32K to the 
truesize for an allocated fragment using __netdev_alloc_frag, but it can 
suffer the same type of issues only to a greater extent due to the use 
of the compound page.  Just because it is shared among many more uses 
doesn't mean it couldn't end up in a scenario where one socket somehow 
keeps queueing up the 32K pages and sitting on them.  I would think all 
it would take is 1 bad acting flow interleaved in ~20 active flows to 
suddenly gobble up a ton of memory without it being accounted for.

> We only have to make sure we do not fool memory management layers, when
> they do not understand where the memory is.
>
> Apparently you think it is hard, while it really is not.

I think you are over simplifying it.  By setting it to 4K there are 
situations where a socket will be double charged for getting two halves 
of the same page.  In these cases there will be a negative impact on 
performance as the number of frames that can be queued is reduced.  What 
you are proposing is possibly overreporting memory use by a factor of 2 
instead of possibly under-reporting it by a factor of 2.

I would be more moved by data than just conjecture on what the driver is 
or isn't doing.  My theory is that most of the time the page is reused 
so 2K is the correct value to report, and very seldom would 4K ever be 
the correct value.  This is what I have seen historically with igb/ixgbe 
using the page reuse.  If you have cases that show that the page isn't 
being reused then we can explore the 4K truesize change, but until then 
I think the page is likely being reused and we should probably just 
stick with the 2K value as we should be getting at least 2 uses per page.

Thanks,

Alex


^ permalink raw reply

* Re: Regarding tx-nocache-copy in the Sheevaplug
From: Benjamin Poirier @ 2014-10-17 20:55 UTC (permalink / raw)
  To: Lluís Batlle i Rossell
  Cc: Eric Dumazet, linux-kernel, netdev, Carles Pagès,
	linux-arm-kernel
In-Reply-To: <20141016174628.GL1972@vicerveza.homeunix.net>

On 2014/10/16 19:46, Lluís Batlle i Rossell wrote:
[...]
> 
> Hello all,
> 
> it seems I was a bit wrong - although enabling back tx-nocache-copy makes the
> tx-errors happen much less often (ssh complaining about HMAC), they still
> happen. It seems that something was introduced in some recent kernels that broke
> the tx offload.
> 
> I have no idea what it can be, but since 2.6 until at least 3.10 the network
> driver worked fine with tx offload in this sheevaplug board.

It's not the most pleasant alternative but if you can be sure enough
whether the problem is occurring or not, you could try bisecting,
possibly limiting the bisection to mv643xx

$ git bisect start v3.16.3 v3.10 -- drivers/net/ethernet/marvell/mv643xx_eth.c
Bisecting: 16 revisions left to test after this (roughly 4 steps)

The problem might be outside of the driver though.

^ permalink raw reply

* Re: [PATCH net-next] openvswitch: Create right mask with disabled megaflows
From: David Miller @ 2014-10-17 20:49 UTC (permalink / raw)
  To: pshelar; +Cc: netdev, ddiproietto, azhou
In-Reply-To: <CALnjE+qehC2StaqpznbDBAVivoJ+-ej7KFZSXZr5GzVQ+FKYgg@mail.gmail.com>

From: Pravin Shelar <pshelar@nicira.com>
Date: Fri, 17 Oct 2014 13:40:24 -0700

> I meant it for net tree. Can you take this patch for net?

Yep, done.

^ permalink raw reply

* Re: [PATCH] Move rlb table dump to procfs
From: David Miller @ 2014-10-17 20:48 UTC (permalink / raw)
  To: emunson; +Cc: netdev, andy, vfalico, j.vosburgh, linux-kernel
In-Reply-To: <1413571313-10137-1-git-send-email-emunson@akamai.com>

From: Eric B Munson <emunson@akamai.com>
Date: Fri, 17 Oct 2014 14:41:52 -0400

> Because debugfs is not net namespace aware, it is currently not
> possible to dump the RLB hash table when network namespaces are enabled.
> This patch creates a second proc file called IFNAME_rlb_table for
> bonding interfaces that use adaptive load balancing which will expose
> this table to userspace.  It also removes all the debugfs code from the
> bonding module as the RLB hash was the only entry in debugfs.
> 
> Signed-off-by: Eric B Munson <emunson@akamai.com>

It's meant to be a debugging facility, which has no stable API
and is not something the user can depend upon it.

Once you move it to procfs, it's basically over and we are
stuck with this forever.

I think the intention of this meaning to be a debug facility for
developers is very clear.

I'm not applying this, sorry.

^ permalink raw reply

* Re: [PATCH net-next] openvswitch: Create right mask with disabled megaflows
From: Pravin Shelar @ 2014-10-17 20:40 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Daniele Di Proietto, Andy Zhou
In-Reply-To: <20141017.162058.9617975607356013.davem@davemloft.net>

On Fri, Oct 17, 2014 at 1:20 PM, David Miller <davem@davemloft.net> wrote:
> From: Pravin B Shelar <pshelar@nicira.com>
> Date: Thu, 16 Oct 2014 21:55:45 -0700
>
>> If megaflows are disabled, the userspace does not send the netlink attribute
>> OVS_FLOW_ATTR_MASK, and the kernel must create an exact match mask.
>>
>> sw_flow_mask_set() sets every bytes (in 'range') of the mask to 0xff, even the
>> bytes that represent padding for struct sw_flow, or the bytes that represent
>> fields that may not be set during ovs_flow_extract().
>> This is a problem, because when we extract a flow from a packet,
>> we do not memset() anymore the struct sw_flow to 0.
>>
>> This commit gets rid of sw_flow_mask_set() and introduces mask_set_nlattr(),
>> which operates on the netlink attributes rather than on the mask key. Using
>> this approach we are sure that only the bytes that the user provided in the
>> flow are matched.
>>
>> Also, if the parse_flow_mask_nlattrs() for the mask ENCAP attribute fails, we
>> now return with an error.
>>
>> This bug is introduced by commit 0714812134d7dcadeb7ecfbfeb18788aa7e1eaac
>> ("openvswitch: Eliminate memset() from flow_extract").
>>
>> Reported-by: Alex Wang <alexw@nicira.com>
>> Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
>> Signed-off-by: Andy Zhou <azhou@nicira.com>
>> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
>
> Why are you targetting a bug fix like this to net-next?
>
I meant it for net tree. Can you take this patch for net?

Thanks.

^ permalink raw reply

* Re: [PATCH v2] vxlan: remove the dead codes
From: David Miller @ 2014-10-17 20:24 UTC (permalink / raw)
  To: cwang; +Cc: roy.qing.li, netdev, stephen
In-Reply-To: <CAHA+R7O7Yyd2XMkMnv5b_TLFekPCRRtOGG7bHgHs2Qd+YRu3Hg@mail.gmail.com>

From: Cong Wang <cwang@twopensource.com>
Date: Fri, 17 Oct 2014 10:33:08 -0700

> On Thu, Oct 16, 2014 at 11:04 PM,  <roy.qing.li@gmail.com> wrote:
>> From: Li RongQing <roy.qing.li@gmail.com>
>>
>> remove the dead codes, no user uses vxlan_salt
>>
>> Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
>> ---
>>  drivers/net/vxlan.c |    4 ----
>>  1 file changed, 4 deletions(-)
>>
>> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
>> index 77ab844..855a81d 100644
>> --- a/drivers/net/vxlan.c
>> +++ b/drivers/net/vxlan.c
>> @@ -152,8 +152,6 @@ struct vxlan_dev {
>>         struct hlist_head fdb_head[FDB_HASH_SIZE];
>>  };
>>
>> -/* salt for hash table */
>> -static u32 vxlan_salt __read_mostly;
> 
> Hmm, it has never been used since it was born. Maybe we should
> really use for vxlan hash table?

Yes, agreed, and I am very sure that this was Stephen Hemminger's
original intention.

^ permalink raw reply

* Re: [PATCH v2] vxlan: fix a free after use
From: David Miller @ 2014-10-17 20:24 UTC (permalink / raw)
  To: roy.qing.li; +Cc: netdev, edumazet
In-Reply-To: <1413525976-2624-1-git-send-email-roy.qing.li@gmail.com>

From: roy.qing.li@gmail.com
Date: Fri, 17 Oct 2014 14:06:16 +0800

> From: Li RongQing <roy.qing.li@gmail.com>
> 
> pskb_may_pull maybe change skb->data and make eth pointer oboslete,
> so eth needs to reload
> 
> Fixes: 91269e390d062 ("vxlan: using pskb_may_pull as early as possible")
> Cc: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Li RongQing <roy.qing.li@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] openvswitch: fix a use after free
From: David Miller @ 2014-10-17 20:23 UTC (permalink / raw)
  To: roy.qing.li; +Cc: netdev, edumazet, jesse
In-Reply-To: <1413525788-2272-1-git-send-email-roy.qing.li@gmail.com>

From: roy.qing.li@gmail.com
Date: Fri, 17 Oct 2014 14:03:08 +0800

> From: Li RongQing <roy.qing.li@gmail.com>
> 
> pskb_may_pull() called by arphdr_ok can change skb->data, so put the arp
> setting after arphdr_ok to avoid the use the freed memory
> 
> Fixes: 0714812134d7d ("openvswitch: Eliminate memset() from flow_extract.")
> Cc: Jesse Gross <jesse@nicira.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Li RongQing <roy.qing.li@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] openvswitch: Create right mask with disabled megaflows
From: David Miller @ 2014-10-17 20:20 UTC (permalink / raw)
  To: pshelar; +Cc: netdev, ddiproietto, azhou
In-Reply-To: <1413521745-1492-1-git-send-email-pshelar@nicira.com>

From: Pravin B Shelar <pshelar@nicira.com>
Date: Thu, 16 Oct 2014 21:55:45 -0700

> If megaflows are disabled, the userspace does not send the netlink attribute
> OVS_FLOW_ATTR_MASK, and the kernel must create an exact match mask.
> 
> sw_flow_mask_set() sets every bytes (in 'range') of the mask to 0xff, even the
> bytes that represent padding for struct sw_flow, or the bytes that represent
> fields that may not be set during ovs_flow_extract().
> This is a problem, because when we extract a flow from a packet,
> we do not memset() anymore the struct sw_flow to 0.
> 
> This commit gets rid of sw_flow_mask_set() and introduces mask_set_nlattr(),
> which operates on the netlink attributes rather than on the mask key. Using
> this approach we are sure that only the bytes that the user provided in the
> flow are matched.
> 
> Also, if the parse_flow_mask_nlattrs() for the mask ENCAP attribute fails, we
> now return with an error.
> 
> This bug is introduced by commit 0714812134d7dcadeb7ecfbfeb18788aa7e1eaac
> ("openvswitch: Eliminate memset() from flow_extract").
> 
> Reported-by: Alex Wang <alexw@nicira.com>
> Signed-off-by: Daniele Di Proietto <ddiproietto@vmware.com>
> Signed-off-by: Andy Zhou <azhou@nicira.com>
> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>

Why are you targetting a bug fix like this to net-next?

That makes no sense at all.

^ permalink raw reply

* Re: [PATCH net] net: dsa: remove phy.h and phy_fixed.h inclusions
From: David Miller @ 2014-10-17 20:19 UTC (permalink / raw)
  To: f.fainelli; +Cc: netdev
In-Reply-To: <1413507355-28837-1-git-send-email-f.fainelli@gmail.com>

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Thu, 16 Oct 2014 17:55:55 -0700

> There is no need to include phy.h nor phy_fixed.h when we can use
> forward declarations instead to keep the include chain smaller.
> 
> Doing this unveiled that we were implicitely getting the definitions for
> struct ethtool_eee and struct ethtool_wolinfo, and that net/dsa/slave.c
> was missing an include of phy_fixed.h.
> 
> Fixes: ec9436baedb6 ("net: dsa: allow drivers to do link adjustment")
> Fixes: ce31b31c68e7 ("net: dsa: allow updating fixed PHY link information")
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>

Please, don't do this.

Get the definitions from the header file that provides them.

The only situation where forward declarations done by hand like this
make sense is when there is absolutely no way to avoid looping header
includes.

For example in A.h you want to provide a function that takes a pointer
to foo, but foo is defined B.h which includes A.h first.

Thanks.

^ permalink raw reply

* [PATCH 1/1 net-next] netrom: use linux/uaccess.h
From: Fabian Frederick @ 2014-10-17 20:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: Fabian Frederick, Ralf Baechle, David S. Miller, linux-hams,
	netdev

replace asm/uaccess.h by linux/uaccess.h

Signed-off-by: Fabian Frederick <fabf@skynet.be>
---
 net/netrom/af_netrom.c | 2 +-
 net/netrom/nr_dev.c    | 2 +-
 net/netrom/nr_in.c     | 2 +-
 net/netrom/nr_out.c    | 2 +-
 net/netrom/nr_route.c  | 2 +-
 net/netrom/nr_subr.c   | 2 +-
 net/netrom/nr_timer.c  | 2 +-
 7 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index 71cf1bf..1b06a1f 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -30,7 +30,7 @@
 #include <linux/skbuff.h>
 #include <net/net_namespace.h>
 #include <net/sock.h>
-#include <asm/uaccess.h>
+#include <linux/uaccess.h>
 #include <linux/fcntl.h>
 #include <linux/termios.h>	/* For TIOCINQ/OUTQ */
 #include <linux/mm.h>
diff --git a/net/netrom/nr_dev.c b/net/netrom/nr_dev.c
index 743262b..6ae063c 100644
--- a/net/netrom/nr_dev.c
+++ b/net/netrom/nr_dev.c
@@ -20,8 +20,8 @@
 #include <linux/in.h>
 #include <linux/if_ether.h>	/* For the statistics structure. */
 #include <linux/slab.h>
+#include <linux/uaccess.h>
 
-#include <asm/uaccess.h>
 #include <asm/io.h>
 
 #include <linux/inet.h>
diff --git a/net/netrom/nr_in.c b/net/netrom/nr_in.c
index c3073a2..80dbd0b 100644
--- a/net/netrom/nr_in.c
+++ b/net/netrom/nr_in.c
@@ -23,7 +23,7 @@
 #include <linux/skbuff.h>
 #include <net/sock.h>
 #include <net/tcp_states.h>
-#include <asm/uaccess.h>
+#include <linux/uaccess.h>
 #include <linux/fcntl.h>
 #include <linux/mm.h>
 #include <linux/interrupt.h>
diff --git a/net/netrom/nr_out.c b/net/netrom/nr_out.c
index 0b4bcb2..00fbf14 100644
--- a/net/netrom/nr_out.c
+++ b/net/netrom/nr_out.c
@@ -22,7 +22,7 @@
 #include <linux/netdevice.h>
 #include <linux/skbuff.h>
 #include <net/sock.h>
-#include <asm/uaccess.h>
+#include <linux/uaccess.h>
 #include <linux/fcntl.h>
 #include <linux/mm.h>
 #include <linux/interrupt.h>
diff --git a/net/netrom/nr_route.c b/net/netrom/nr_route.c
index b976d5e..96b64d2 100644
--- a/net/netrom/nr_route.c
+++ b/net/netrom/nr_route.c
@@ -25,7 +25,7 @@
 #include <linux/if_arp.h>
 #include <linux/skbuff.h>
 #include <net/sock.h>
-#include <asm/uaccess.h>
+#include <linux/uaccess.h>
 #include <linux/fcntl.h>
 #include <linux/termios.h>	/* For TIOCINQ/OUTQ */
 #include <linux/mm.h>
diff --git a/net/netrom/nr_subr.c b/net/netrom/nr_subr.c
index ca40e22..029c8bb 100644
--- a/net/netrom/nr_subr.c
+++ b/net/netrom/nr_subr.c
@@ -22,7 +22,7 @@
 #include <linux/skbuff.h>
 #include <net/sock.h>
 #include <net/tcp_states.h>
-#include <asm/uaccess.h>
+#include <linux/uaccess.h>
 #include <linux/fcntl.h>
 #include <linux/mm.h>
 #include <linux/interrupt.h>
diff --git a/net/netrom/nr_timer.c b/net/netrom/nr_timer.c
index ff2c1b1..94d0580 100644
--- a/net/netrom/nr_timer.c
+++ b/net/netrom/nr_timer.c
@@ -23,7 +23,7 @@
 #include <linux/skbuff.h>
 #include <net/sock.h>
 #include <net/tcp_states.h>
-#include <asm/uaccess.h>
+#include <linux/uaccess.h>
 #include <linux/fcntl.h>
 #include <linux/mm.h>
 #include <linux/interrupt.h>
-- 
1.9.3


^ permalink raw reply related

* Re: [PATCH] net: use hardware buffer pool to allocate skb
From: Eric Dumazet @ 2014-10-17 19:51 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Alexander Duyck, David Laight, Jiafei.Pan@freescale.com,
	David Miller, jkosina@suse.cz, netdev@vger.kernel.org,
	LeoLi@freescale.com, linux-doc@vger.kernel.org
In-Reply-To: <54417044.5090602@gmail.com>

On Fri, 2014-10-17 at 12:38 -0700, Alexander Duyck wrote:

> That's not what I am saying, but there is a trade-off we always have to
> take into account.  Cutting memory overhead will likely have an impact
> on performance.  I would like to make the best informed trade-off in
> that regard rather than just assuming worst case always for the driver.

It seems you misunderstood me. You believe I suggested doing another
allocation strategy in the drivers.

This was not the case.

This allocation strategy is wonderful. I repeat : This is wonderful.

We only have to make sure we do not fool memory management layers, when
they do not understand where the memory is.

Apparently you think it is hard, while it really is not.

^ permalink raw reply

* Re: something is wrong in commit 971f10eca1 - tcp: better TCP_SKB_CB layout to reduce cache line misses
From: Eric Dumazet @ 2014-10-17 19:48 UTC (permalink / raw)
  To: David Miller; +Cc: cwang, kkolasa, edumazet, netdev
In-Reply-To: <20141017.152923.678515861731659347.davem@davemloft.net>

On Fri, 2014-10-17 at 15:29 -0400, David Miller wrote:
> So I applied Cong's patches, but if I am to understand the testing feedback
> so far only the 64-bit case of the reporter's problems is fixed, and that
> the 32-bit graphics corruption still happens.
> 
> Can I get some clarification of where we stand after Cong's fixes
> which are already merged into 'net'?

I spent some time on the (awful) memmove() implementation on x86_32, and
could not find a bug in it.

It would be worth trying doing 2 memcpy() instead just in case...

^ permalink raw reply

* [PATCH net] bna: fix skb->truesize underestimation
From: Eric Dumazet @ 2014-10-17 19:45 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Rasesh Mody

From: Eric Dumazet <edumazet@google.com>

skb->truesize is not meant to be tracking amount of used bytes
in an skb, but amount of reserved/consumed bytes in memory.

For instance, if we use a single byte in last page fragment,
we have to account the full size of the fragment.

skb->truesize can be very different from skb->len, that has
a very specific safety purpose.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Rasesh Mody <rasesh.mody@qlogic.com>
---
 drivers/net/ethernet/brocade/bna/bnad.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/brocade/bna/bnad.c b/drivers/net/ethernet/brocade/bna/bnad.c
index 153cafac323c..c3861de9dc81 100644
--- a/drivers/net/ethernet/brocade/bna/bnad.c
+++ b/drivers/net/ethernet/brocade/bna/bnad.c
@@ -552,6 +552,7 @@ bnad_cq_setup_skb_frags(struct bna_rcb *rcb, struct sk_buff *skb,
 
 		len = (vec == nvecs) ?
 			last_fraglen : unmap->vector.len;
+		skb->truesize += unmap->vector.len;
 		totlen += len;
 
 		skb_fill_page_desc(skb, skb_shinfo(skb)->nr_frags,
@@ -563,7 +564,6 @@ bnad_cq_setup_skb_frags(struct bna_rcb *rcb, struct sk_buff *skb,
 
 	skb->len += totlen;
 	skb->data_len += totlen;
-	skb->truesize += totlen;
 }
 
 static inline void

^ permalink raw reply related

* Re: [PATCH] staging: octeon: Replace memcpy with ETH_ALEN
From: Joe Perches @ 2014-10-17 19:40 UTC (permalink / raw)
  To: Balavasu; +Cc: netdev, linux-kernel
In-Reply-To: <20141017184705.GA17944@vasu-Inspiron-3542>

On Sat, 2014-10-18 at 00:17 +0530, Balavasu wrote:
> This patch fixes the checkpatch.pl issue
> WARNING: memcpy(dev->dev_addr, mac, ETH_ALEN);

That is not the actual warning.

This is the warning:

Prefer ether_addr_copy() over memcpy() if the Ethernet addresses are __aligned(2)

How have you made sure that mac can not have odd alignment?

Make sure your commit message shows how you know
that mac is appropriately aligned.

> diff --git a/drivers/staging/octeon/ethernet.c b/drivers/staging/octeon/ethernet.c
[]
> @@ -452,7 +452,7 @@ int cvm_oct_common_init(struct net_device *dev)
>  		mac = of_get_mac_address(priv->of_node);
>  
>  	if (mac)
> -		memcpy(dev->dev_addr, mac, ETH_ALEN);
> +		ether_addr_copy(dev->dev_addr, mac);
>  	else
>  		eth_hw_addr_random(dev);
>  

^ permalink raw reply

* Re: [PATCH] net: use hardware buffer pool to allocate skb
From: Alexander Duyck @ 2014-10-17 19:38 UTC (permalink / raw)
  To: Eric Dumazet, Alexander Duyck
  Cc: David Laight, Jiafei.Pan@freescale.com, David Miller,
	jkosina@suse.cz, netdev@vger.kernel.org, LeoLi@freescale.com,
	linux-doc@vger.kernel.org
In-Reply-To: <1413572539.25949.17.camel@edumazet-glaptop2.roam.corp.google.com>

On 10/17/2014 12:02 PM, Eric Dumazet wrote:
> On Fri, 2014-10-17 at 11:28 -0700, Alexander Duyck wrote:
>
>> So long as it doesn't impact performance significantly I am fine with 
>> it.  My concern is that you are bringing up issues that none of the 
>> customers were brining up when I was at Intel, and the fixes you are 
>> proposing are likely to result in customers seeing things they will 
>> report as issues.
> We regularly hit these issues at Google.
>
> We have memory containers, and we detect quite easily if some layer is
> lying, because we cant afford having 20% of headroom on our servers.

Data is useful here.  If you can give enough data about the setup to
reproduce it then we can actually start looking at fixing it.  Otherwise
it is just your anecdotal data versus mine.

> I am not claiming IGB is the only offender.
>
> I am sorry if you believe it was an attack on IGB, or any network
> driver.

My concern is that igb is guilty of being off by at most a factor of 2. 
What about the drivers and implementations that are off by possibly much
larger values?  I'd be much more interested in this if there was data to
back up your position.  Right now it is mostly just conjecture and my
concern is that this type of change may cause more harm than good.

> truesize should really be the thing that protects us from OOM,
> and apparently driver authors hitting TCP performance problems
> thinks it is better to simply let TCP do no garbage collection.

One key point to keep in mind is that the igb performance should take a
pretty hard hit if pages aren't being freed.  The overhead difference
between the page reuse path vs non-page reuse is pretty significant.  If
this is a scenerio you are actually seeing this would be of interest.

> It seems that nobody cares or even understand what I am saying,
> so I should probably not care and suggest Google to buy PetaBytes of
> memory, right ?

That's not what I am saying, but there is a trade-off we always have to
take into account.  Cutting memory overhead will likely have an impact
on performance.  I would like to make the best informed trade-off in
that regard rather than just assuming worst case always for the driver.

Thanks,

Alex

^ permalink raw reply

* [PATCH] dsa: Fix conversion from host device to mii bus
From: Guenter Roeck @ 2014-10-17 19:30 UTC (permalink / raw)
  To: David S. Miller
  Cc: Florian Fainelli, Alexander Duyck, netdev, linux-kernel,
	Guenter Roeck

Commit b4d2394d01bc ("dsa: Replace mii_bus with a generic host device")
replaces mii_bus with a generic host_dev, and introduces
dsa_host_dev_to_mii_bus() to support conversion from host_dev to mii_bus.
However, in some cases it uses to_mii_bus to perform that conversion.
Since host_dev is not the phy bus device but typically a platform device,
this fails and results in a crash with the affected drivers.

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffff81781d35>] __mutex_lock_slowpath+0x75/0x100
PGD 406783067 PUD 406784067 PMD 0
Oops: 0002 [#1] SMP
...
Call Trace:
[<ffffffff810a538b>] ? pick_next_task_fair+0x61b/0x880
[<ffffffff81781de3>] mutex_lock+0x23/0x37
[<ffffffff81533244>] mdiobus_read+0x34/0x60
[<ffffffff8153b95a>] __mv88e6xxx_reg_read+0x8a/0xa0
[<ffffffff8153b9bc>] mv88e6xxx_reg_read+0x4c/0xa0

Fixes: b4d2394d01bc ("dsa: Replace mii_bus with a generic host device")
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
 drivers/net/dsa/mv88e6060.c | 16 ++++++++++++----
 drivers/net/dsa/mv88e6xxx.c | 14 ++++++++++----
 2 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/drivers/net/dsa/mv88e6060.c b/drivers/net/dsa/mv88e6060.c
index 776e965..05b0ca3 100644
--- a/drivers/net/dsa/mv88e6060.c
+++ b/drivers/net/dsa/mv88e6060.c
@@ -21,8 +21,12 @@
 
 static int reg_read(struct dsa_switch *ds, int addr, int reg)
 {
-	return mdiobus_read(to_mii_bus(ds->master_dev),
-			    ds->pd->sw_addr + addr, reg);
+	struct mii_bus *bus = dsa_host_dev_to_mii_bus(ds->master_dev);
+
+	if (bus == NULL)
+		return -EINVAL;
+
+	return mdiobus_read(bus, ds->pd->sw_addr + addr, reg);
 }
 
 #define REG_READ(addr, reg)					\
@@ -38,8 +42,12 @@ static int reg_read(struct dsa_switch *ds, int addr, int reg)
 
 static int reg_write(struct dsa_switch *ds, int addr, int reg, u16 val)
 {
-	return mdiobus_write(to_mii_bus(ds->master_dev),
-			     ds->pd->sw_addr + addr, reg, val);
+	struct mii_bus *bus = dsa_host_dev_to_mii_bus(ds->master_dev);
+
+	if (bus == NULL)
+		return -EINVAL;
+
+	return mdiobus_write(bus, ds->pd->sw_addr + addr, reg, val);
 }
 
 #define REG_WRITE(addr, reg, val)				\
diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index d6f6428..a6c90cf 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -75,11 +75,14 @@ int __mv88e6xxx_reg_read(struct mii_bus *bus, int sw_addr, int addr, int reg)
 int mv88e6xxx_reg_read(struct dsa_switch *ds, int addr, int reg)
 {
 	struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
+	struct mii_bus *bus = dsa_host_dev_to_mii_bus(ds->master_dev);
 	int ret;
 
+	if (bus == NULL)
+		return -EINVAL;
+
 	mutex_lock(&ps->smi_mutex);
-	ret = __mv88e6xxx_reg_read(to_mii_bus(ds->master_dev),
-				   ds->pd->sw_addr, addr, reg);
+	ret = __mv88e6xxx_reg_read(bus, ds->pd->sw_addr, addr, reg);
 	mutex_unlock(&ps->smi_mutex);
 
 	return ret;
@@ -119,11 +122,14 @@ int __mv88e6xxx_reg_write(struct mii_bus *bus, int sw_addr, int addr,
 int mv88e6xxx_reg_write(struct dsa_switch *ds, int addr, int reg, u16 val)
 {
 	struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
+	struct mii_bus *bus = dsa_host_dev_to_mii_bus(ds->master_dev);
 	int ret;
 
+	if (bus == NULL)
+		return -EINVAL;
+
 	mutex_lock(&ps->smi_mutex);
-	ret = __mv88e6xxx_reg_write(to_mii_bus(ds->master_dev),
-				    ds->pd->sw_addr, addr, reg, val);
+	ret = __mv88e6xxx_reg_write(bus, ds->pd->sw_addr, addr, reg, val);
 	mutex_unlock(&ps->smi_mutex);
 
 	return ret;
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH] ipv4: dst_entry leak in ip_send_unicast_reply()
From: David Miller @ 2014-10-17 19:30 UTC (permalink / raw)
  To: vvs; +Cc: eric.dumazet, netdev, kuznet, jmorris, yoshfuji, kaber
In-Reply-To: <543E6762.9090807@parallels.com>

From: Vasily Averin <vvs@parallels.com>
Date: Wed, 15 Oct 2014 16:24:02 +0400

> Fixes: 2e77d89b2fa8 ("net: avoid a pair of dst_hold()/dst_release() in ip_append_data()")
> 
> ip_setup_cork() called inside ip_append_data() steals dst entry from rt to cork
> and in case errors in __ip_append_data() nobody frees stolen dst entry
> 
> Signed-off-by: Vasily Averin <vvs@parallels.com>

Applied and queued up for -stable, thanks everyone!

^ permalink raw reply

* Re: something is wrong in commit 971f10eca1 - tcp: better TCP_SKB_CB layout to reduce cache line misses
From: David Miller @ 2014-10-17 19:29 UTC (permalink / raw)
  To: eric.dumazet; +Cc: cwang, kkolasa, edumazet, netdev
In-Reply-To: <1413409354.17186.2.camel@edumazet-glaptop2.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 15 Oct 2014 14:42:34 -0700

> On Wed, 2014-10-15 at 14:39 -0700, Eric Dumazet wrote:
>> On Wed, 2014-10-15 at 13:36 -0700, Cong Wang wrote:
>> > On Wed, Oct 15, 2014 at 3:35 AM, Krzysztof Kolasa <kkolasa@winsoft.pl> wrote:
>> > >
>> > > one-line patch not resolve problem, fix created by Cong Wang resolves
>> > > problem !!!
>> > >
>> > > tested on 64bit system and working OK
>> > >
>> > 
>> > So my patch fixes your problem on 64bit but not on 32bit,
>> > is this correct?
>> > 
>> > I will send out the patch. And as Eric said, the problem on 32bit
>> > might be a different issue, we can definitely fix it separately.
>> > 
>> > Thanks for testing!
>> 
>> I think more fixes are needed.
>> 
>> This is most probably an IPv6 issue.
>> 
> 
> Yes, ipv6 uses inet6_iif() a lot. We need to use a different accessor.

So I applied Cong's patches, but if I am to understand the testing feedback
so far only the 64-bit case of the reporter's problems is fixed, and that
the 32-bit graphics corruption still happens.

Can I get some clarification of where we stand after Cong's fixes
which are already merged into 'net'?

Thanks.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox