Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v2 2/2] net: skbuff: Coding: Use eth_type_vlan() instead of open coding it
From: Shmulik Ladkani @ 2016-09-19 16:23 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David S . Miller, Jiri Pirko, Daniel Borkmann, netdev
In-Reply-To: <1474302055.22679.89.camel@edumazet-glaptop3.roam.corp.google.com>

On Mon, 19 Sep 2016 09:20:55 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Mon, 2016-09-19 at 18:53 +0300, Shmulik Ladkani wrote:
> > Fix 'skb_vlan_pop' to use eth_type_vlan instead of directly comparing
> > skb->protocol to ETH_P_8021Q or ETH_P_8021AD.
> > 
> > Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
> > ---  
> 
> You forgot to explicitly state this was targeting net-next tree.

Thanks, was an oversight.
Requires resubmission?

^ permalink raw reply

* Re: [PATCH v2 2/2] net: skbuff: Coding: Use eth_type_vlan() instead of open coding it
From: Eric Dumazet @ 2016-09-19 16:20 UTC (permalink / raw)
  To: Shmulik Ladkani; +Cc: David S . Miller, Jiri Pirko, Daniel Borkmann, netdev
In-Reply-To: <1474300400-32362-2-git-send-email-shmulik.ladkani@gmail.com>

On Mon, 2016-09-19 at 18:53 +0300, Shmulik Ladkani wrote:
> Fix 'skb_vlan_pop' to use eth_type_vlan instead of directly comparing
> skb->protocol to ETH_P_8021Q or ETH_P_8021AD.
> 
> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
> ---

You forgot to explicitly state this was targeting net-next tree.

This is detailed in Documentation/networking/netdev-FAQ.tx

Q: How do I indicate which tree (net vs. net-next) my patch should be
in?

^ permalink raw reply

* [PATCH net] i40e: fix call of ndo_dflt_bridge_getlink()
From: Nicolas Dichtel @ 2016-09-19 16:14 UTC (permalink / raw)
  To: davem, jeffrey.t.kirsher
  Cc: netdev, Huaibin Wang, Scott Feldman, Carolyn Wyborny,
	Catherine Sullivan, Nicolas Dichtel

From: Huaibin Wang <huaibin.wang@6wind.com>

Order of arguments is wrong.
The wrong code has been introduced by commit 7d4f8d871ab1, but is compiled
only since commit 9df70b66418e.

Note that this may break netlink dumps.

Fixes: 9df70b66418e ("i40e: Remove incorrect #ifdef's")
Fixes: 7d4f8d871ab1 ("switchdev; add VLAN support for port's bridge_getlink")
CC: Scott Feldman <sfeldma@gmail.com>
CC: Carolyn Wyborny <carolyn.wyborny@intel.com>
CC: Catherine Sullivan <catherine.sullivan@intel.com>
Signed-off-by: Huaibin Wang <huaibin.wang@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index d0b3a1bb82ca..2843f6cae97a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -9012,7 +9012,7 @@ static int i40e_ndo_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq,
 		return 0;
 
 	return ndo_dflt_bridge_getlink(skb, pid, seq, dev, veb->bridge_mode,
-				       nlflags, 0, 0, filter_mask, NULL);
+				       0, 0, nlflags, filter_mask, NULL);
 }
 
 /* Hardware supports L4 tunnel length of 128B (=2^7) which includes
-- 
2.8.1

^ permalink raw reply related

* [PATCH v2 2/2] net/sched: act_vlan: Introduce TCA_VLAN_ACT_MODIFY vlan action
From: Shmulik Ladkani @ 2016-09-19 16:11 UTC (permalink / raw)
  To: David S . Miller; +Cc: Jiri Pirko, Jamal Hadi Salim, netdev, Shmulik Ladkani
In-Reply-To: <1474301470-17965-1-git-send-email-shmulik.ladkani@gmail.com>

TCA_VLAN_ACT_MODIFY allows one to change an existing tag.

It accepts same attributes as TCA_VLAN_ACT_PUSH (protocol, id,
priority).
If packet is vlan tagged, then the tag gets overwritten according to
user specified attributes.

For example, this allows user to replace a tag's vid while preserving
its priority bits (as opposed to "action vlan pop pipe action vlan push").

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
 v2:
   - Removed superflous test 'mac_len < VLAN_ETH_HLEN' prior __skb_vlan_pop
   - Added some code comments

 include/uapi/linux/tc_act/tc_vlan.h |  1 +
 net/sched/act_vlan.c                | 29 ++++++++++++++++++++++++++++-
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/tc_act/tc_vlan.h b/include/uapi/linux/tc_act/tc_vlan.h
index be72b6e384..bddb272b84 100644
--- a/include/uapi/linux/tc_act/tc_vlan.h
+++ b/include/uapi/linux/tc_act/tc_vlan.h
@@ -16,6 +16,7 @@
 
 #define TCA_VLAN_ACT_POP	1
 #define TCA_VLAN_ACT_PUSH	2
+#define TCA_VLAN_ACT_MODIFY	3
 
 struct tc_vlan {
 	tc_gen;
diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index 59a8d3150a..a95c00b119 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -30,6 +30,7 @@ static int tcf_vlan(struct sk_buff *skb, const struct tc_action *a,
 	struct tcf_vlan *v = to_vlan(a);
 	int action;
 	int err;
+	u16 tci;
 
 	spin_lock(&v->tcf_lock);
 	tcf_lastuse_update(&v->tcf_tm);
@@ -48,6 +49,30 @@ static int tcf_vlan(struct sk_buff *skb, const struct tc_action *a,
 		if (err)
 			goto drop;
 		break;
+	case TCA_VLAN_ACT_MODIFY:
+		/* No-op if no vlan tag (either hw-accel or in-payload) */
+		if (!skb_vlan_tagged(skb))
+			goto unlock;
+		/* extract existing tag (and guarantee no hw-accel tag) */
+		if (skb_vlan_tag_present(skb)) {
+			tci = skb_vlan_tag_get(skb);
+			skb->vlan_tci = 0;
+		} else {
+			/* in-payload vlan tag, pop it */
+			err = __skb_vlan_pop(skb, &tci);
+			if (err)
+				goto drop;
+		}
+		/* replace the vid */
+		tci = (tci & ~VLAN_VID_MASK) | v->tcfv_push_vid;
+		/* replace prio bits, if tcfv_push_prio specified */
+		if (v->tcfv_push_prio) {
+			tci &= ~VLAN_PRIO_MASK;
+			tci |= v->tcfv_push_prio << VLAN_PRIO_SHIFT;
+		}
+		/* put updated tci as hwaccel tag */
+		__vlan_hwaccel_put_tag(skb, v->tcfv_push_proto, tci);
+		break;
 	default:
 		BUG();
 	}
@@ -102,6 +127,7 @@ static int tcf_vlan_init(struct net *net, struct nlattr *nla,
 	case TCA_VLAN_ACT_POP:
 		break;
 	case TCA_VLAN_ACT_PUSH:
+	case TCA_VLAN_ACT_MODIFY:
 		if (!tb[TCA_VLAN_PUSH_VLAN_ID]) {
 			if (exists)
 				tcf_hash_release(*a, bind);
@@ -185,7 +211,8 @@ static int tcf_vlan_dump(struct sk_buff *skb, struct tc_action *a,
 	if (nla_put(skb, TCA_VLAN_PARMS, sizeof(opt), &opt))
 		goto nla_put_failure;
 
-	if (v->tcfv_action == TCA_VLAN_ACT_PUSH &&
+	if ((v->tcfv_action == TCA_VLAN_ACT_PUSH ||
+	     v->tcfv_action == TCA_VLAN_ACT_MODIFY) &&
 	    (nla_put_u16(skb, TCA_VLAN_PUSH_VLAN_ID, v->tcfv_push_vid) ||
 	     nla_put_be16(skb, TCA_VLAN_PUSH_VLAN_PROTOCOL,
 			  v->tcfv_push_proto) ||
-- 
2.7.4

^ permalink raw reply related

* [PATCH v2 1/2] net: skbuff: Export __skb_vlan_pop
From: Shmulik Ladkani @ 2016-09-19 16:11 UTC (permalink / raw)
  To: David S . Miller; +Cc: Jiri Pirko, Jamal Hadi Salim, netdev, Shmulik Ladkani
In-Reply-To: <1474301470-17965-1-git-send-email-shmulik.ladkani@gmail.com>

This exports the functionality of extracting the tag from the payload,
without moving next vlan tag into hw accel tag.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
---
 include/linux/skbuff.h | 1 +
 net/core/skbuff.c      | 7 +++++--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4c5662f05b..000c5301b8 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3075,6 +3075,7 @@ bool skb_gso_validate_mtu(const struct sk_buff *skb, unsigned int mtu);
 struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features);
 struct sk_buff *skb_vlan_untag(struct sk_buff *skb);
 int skb_ensure_writable(struct sk_buff *skb, int write_len);
+int __skb_vlan_pop(struct sk_buff *skb, u16 *vlan_tci);
 int skb_vlan_pop(struct sk_buff *skb);
 int skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16 vlan_tci);
 struct sk_buff *pskb_extract(struct sk_buff *skb, int off, int to_copy,
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 1369faa182..c927ed4727 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4493,8 +4493,10 @@ int skb_ensure_writable(struct sk_buff *skb, int write_len)
 }
 EXPORT_SYMBOL(skb_ensure_writable);
 
-/* remove VLAN header from packet and update csum accordingly. */
-static int __skb_vlan_pop(struct sk_buff *skb, u16 *vlan_tci)
+/* remove VLAN header from packet and update csum accordingly.
+ * expects a non skb_vlan_tag_present skb with a vlan tag payload
+ */
+int __skb_vlan_pop(struct sk_buff *skb, u16 *vlan_tci)
 {
 	struct vlan_hdr *vhdr;
 	unsigned int offset = skb->data - skb_mac_header(skb);
@@ -4525,6 +4527,7 @@ pull:
 
 	return err;
 }
+EXPORT_SYMBOL(__skb_vlan_pop);
 
 int skb_vlan_pop(struct sk_buff *skb)
 {
-- 
2.7.4

^ permalink raw reply related

* [PATCH v2 0/2] act_vlan: Introduce TCA_VLAN_ACT_MODIFY vlan action
From: Shmulik Ladkani @ 2016-09-19 16:11 UTC (permalink / raw)
  To: David S . Miller; +Cc: Jiri Pirko, Jamal Hadi Salim, netdev, Shmulik Ladkani

TCA_VLAN_ACT_MODIFY allows one to change an existing tag.

It accepts same attributes as TCA_VLAN_ACT_PUSH (protocol, id,
priority).
If packet is vlan tagged, then the tag gets overwritten according to
user specified attributes.

For example, this allows user to replace a tag's vid while preserving
its priority bits (as opposed to "action vlan pop pipe action vlan push").

Shmulik Ladkani (2):
  net: skbuff: Export __skb_vlan_pop
  net/sched: act_vlan: Introduce TCA_VLAN_ACT_MODIFY vlan action

 include/linux/skbuff.h              |  1 +
 include/uapi/linux/tc_act/tc_vlan.h |  1 +
 net/core/skbuff.c                   |  7 +++++--
 net/sched/act_vlan.c                | 29 ++++++++++++++++++++++++++++-
 4 files changed, 35 insertions(+), 3 deletions(-)

-- 
2.7.4

^ permalink raw reply

* Re: [net-next PATCH] mlx4: add missed recycle opportunity for XDP_TX on TX failure
From: Jesper Dangaard Brouer @ 2016-09-19 15:57 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, tariqt, tom, bblanco, rana.shahot,
	"David S. Miller\" <davem
In-Reply-To: <1474295315.22679.85.camel@edumazet-glaptop3.roam.corp.google.com>

On Mon, 19 Sep 2016 07:28:35 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> > +			xdp_drop:  
> 
> A label should start at the beginning of the line (ie : not indented)

Understood, send a V2

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* [PATCH v2 2/2] net: skbuff: Coding: Use eth_type_vlan() instead of open coding it
From: Shmulik Ladkani @ 2016-09-19 15:53 UTC (permalink / raw)
  To: David S . Miller; +Cc: Jiri Pirko, Daniel Borkmann, netdev, Shmulik Ladkani
In-Reply-To: <1474300400-32362-1-git-send-email-shmulik.ladkani@gmail.com>

Fix 'skb_vlan_pop' to use eth_type_vlan instead of directly comparing
skb->protocol to ETH_P_8021Q or ETH_P_8021AD.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
---
 net/core/skbuff.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 4dbaedb745..1369faa182 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4535,8 +4535,7 @@ int skb_vlan_pop(struct sk_buff *skb)
 	if (likely(skb_vlan_tag_present(skb))) {
 		skb->vlan_tci = 0;
 	} else {
-		if (unlikely(skb->protocol != htons(ETH_P_8021Q) &&
-			     skb->protocol != htons(ETH_P_8021AD)))
+		if (unlikely(!eth_type_vlan(skb->protocol)))
 			return 0;
 
 		err = __skb_vlan_pop(skb, &vlan_tci);
@@ -4544,8 +4543,7 @@ int skb_vlan_pop(struct sk_buff *skb)
 			return err;
 	}
 	/* move next vlan tag to hw accel tag */
-	if (likely(skb->protocol != htons(ETH_P_8021Q) &&
-		   skb->protocol != htons(ETH_P_8021AD)))
+	if (likely(!eth_type_vlan(skb->protocol)))
 		return 0;
 
 	vlan_proto = skb->protocol;
-- 
2.7.4

^ permalink raw reply related

* [PATCH v2 1/2] net: skbuff: Remove errornous length validation in skb_vlan_pop()
From: Shmulik Ladkani @ 2016-09-19 15:53 UTC (permalink / raw)
  To: David S . Miller
  Cc: Jiri Pirko, Daniel Borkmann, netdev, Shmulik Ladkani,
	Pravin Shelar

In 93515d53b1
  "net: move vlan pop/push functions into common code"
skb_vlan_pop was moved from its private location in openvswitch to
skbuff common code.

In case !vlan_tx_tag_present, the original 'pop_vlan()' assured
that skb->len is sufficient (if skb->len < VLAN_ETH_HLEN then pop was a
no-op).

This validation was moved as is into the new common 'skb_vlan_pop'.

Alas, in its original location (openvswitch), there was a guarantee that
'data' points to the mac_header, therefore the 'skb->len < VLAN_ETH_HLEN'
condition made sense.
However there's no such guarantee in the generic 'skb_vlan_pop'.

For short packets received in rx path going through 'skb_vlan_pop',
this causes 'skb_vlan_pop' to fail pop-ing a valid vlan hdr (in the non
hw-accel case) or to fail moving next tag into hw-accel tag.

Remove the 'skb->len < VLAN_ETH_HLEN' condition entirely:
It is superfluous since inner '__skb_vlan_pop' already verifies there
are VLAN_ETH_HLEN writable bytes at the mac_header.

Fixes: 93515d53b1 ("net: move vlan pop/push functions into common code")
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Cc: Pravin Shelar <pshelar@ovn.org>
---
 net/core/skbuff.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 1e329d4112..4dbaedb745 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4535,9 +4535,8 @@ int skb_vlan_pop(struct sk_buff *skb)
 	if (likely(skb_vlan_tag_present(skb))) {
 		skb->vlan_tci = 0;
 	} else {
-		if (unlikely((skb->protocol != htons(ETH_P_8021Q) &&
-			      skb->protocol != htons(ETH_P_8021AD)) ||
-			     skb->len < VLAN_ETH_HLEN))
+		if (unlikely(skb->protocol != htons(ETH_P_8021Q) &&
+			     skb->protocol != htons(ETH_P_8021AD)))
 			return 0;

 		err = __skb_vlan_pop(skb, &vlan_tci);
@@ -4545,9 +4544,8 @@ int skb_vlan_pop(struct sk_buff *skb)
 			return err;
 	}
 	/* move next vlan tag to hw accel tag */
-	if (likely((skb->protocol != htons(ETH_P_8021Q) &&
-		    skb->protocol != htons(ETH_P_8021AD)) ||
-		   skb->len < VLAN_ETH_HLEN))
+	if (likely(skb->protocol != htons(ETH_P_8021Q) &&
+		   skb->protocol != htons(ETH_P_8021AD)))
 		return 0;

 	vlan_proto = skb->protocol;
-- 
2.7.4

^ permalink raw reply related

* Re: [RFC 3/3] phy,leds: add support for led triggers on phy link state change
From: Zach Brown @ 2016-09-19 15:48 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: mlindner, stephen, netdev, linux-kernel, devel,
	florian.c.schilhabel, Larry.Finger
In-Reply-To: <de5b8b94-7bf8-0814-3f60-a268adb4dfde@gmail.com>

On Wed, Sep 14, 2016 at 04:16:15PM -0700, Florian Fainelli wrote:
> On 09/14/2016 02:55 PM, Zach Brown wrote:
> > From: Josh Cartwright <josh.cartwright@ni.com>
> > 
> > Create an option CONFIG_LED_TRIGGER_PHY (default n), which will
> > create a set of led triggers for each instantiated PHY device.  There is
> > one LED trigger per link-speed, per-phy.
> > 
> > This allows for a user to configure their system to allow a set of LEDs
> > to represent link state changes on the phy.
> 
> The part that seems to be missing from this changeset (not an issue) is
> how you can "accelerate" the triggers, or rather make sure that the
> trigger configuration potentially calls back into the PHY driver when
> the requested trigger is actually supported by the on-PHY LEDs.
> 
> Other than that, just minor nits here and there.
> 
> > 
> > Signed-off-by: Josh Cartwright <josh.cartwright@ni.com>
> > Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com>
> > Signed-off-by: Zach Brown <zach.brown@ni.com>
> > ---
> 
> > +config LED_TRIGGER_PHY
> > +	bool "Support LED triggers for tracking link state"
> > +	depends on LEDS_TRIGGERS
> > +	---help---
> > +	  Adds support for a set of LED trigger events per-PHY.  Link
> > +	  state change will trigger the events, for consumption by an
> > +	  LED class driver.  There are triggers for each link speed,
> > +	  and are of the form:
> > +	       <mii bus id>:<phy>:<speed>
> > +
> > +	  Where speed is one of: 10Mb, 100Mb, Gb, 2.5Gb, or 10GbE.
> 
> I would do s/10GbE/10Gb/ just to be consistent with the other speeds, or
> maybe, to avoid too much duplicationg of how we represent speeds, use
> the same set of strings that drivers/net/phy/phy.c::phy_speed_to_str uses.
> 
> 
> > diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
> > index c6f6683..3345767 100644
> > --- a/drivers/net/phy/phy.c
> > +++ b/drivers/net/phy/phy.c
> > @@ -936,6 +936,7 @@ void phy_state_machine(struct work_struct *work)
> >  			phydev->state = PHY_NOLINK;
> >  			netif_carrier_off(phydev->attached_dev);
> >  			phydev->adjust_link(phydev->attached_dev);
> > +			phy_led_trigger_change_speed(phydev);
> 
> Since we have essentially two actions to take when performing link state
> changes:
> 
> - call the adjust_link callback
> - call into the LED trigger
> 
> we might consider encapsulating this into a function that could be named
> phy_adjust_link() and does both of these. That would be a preliminary
> patch to this this one.
> 
> >  
> > @@ -345,6 +347,8 @@ struct phy_device *phy_device_create(struct mii_bus *bus, int addr, int phy_id,
> >  
> >  	dev->state = PHY_DOWN;
> >  
> > +	phy_led_triggers_register(dev);
> > +
> >  	mutex_init(&dev->lock);
> >  	INIT_DELAYED_WORK(&dev->state_queue, phy_state_machine);
> >  	INIT_WORK(&dev->phy_queue, phy_change);
> 
> Humm, should it be before the PHY state machine workqueue creation or
> after? Just wondering if there is a remote chance we could get an user
> to immediately program a trigger and this could create a problem, maybe
> not so much.
> 
I'm not sure, but I don't think there would be an issue since the interaction
between the triggers and the phy system only goes in one direction. The
triggers don't influence the phy system to my understanding.

> > diff --git a/drivers/net/phy/phy_led_triggers.c b/drivers/net/phy/phy_led_triggers.c
> > new file mode 100644
> > index 0000000..6c40414
> > --- /dev/null
> > +++ b/drivers/net/phy/phy_led_triggers.c
> > @@ -0,0 +1,109 @@
> > +/* Copyright (C) 2016 National Instruments Corp.
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + */
> > +#include <linux/leds.h>
> > +#include <linux/phy.h>
> > +
> > +void phy_led_trigger_change_speed(struct phy_device *phy)
> > +{
> > +	struct phy_led_trigger *plt;
> > +
> > +	if (!phy->link) {
> > +		if (phy->last_triggered) {
> > +			led_trigger_event(&phy->last_triggered->trigger,
> > +					  LED_OFF);
> > +			phy->last_triggered = NULL;
> > +		}
> > +		return;
> > +	}
> > +
> > +	switch (phy->speed) {
> > +	case SPEED_10:
> > +		plt = &phy->phy_led_trigger[0];
> > +		break;
> > +	case SPEED_100:
> > +		plt = &phy->phy_led_trigger[1];
> > +		break;
> > +	case SPEED_1000:
> > +		plt = &phy->phy_led_trigger[2];
> > +		break;
> > +	case SPEED_2500:
> > +		plt = &phy->phy_led_trigger[3];
> > +		break;
> > +	case SPEED_10000:
> > +		plt = &phy->phy_led_trigger[4];
> > +		break;
> 
> We could use a table here indexed by the speed, or have a function that
> does phy_speed_to_led_trigger(unsigned int speed) for instance, which
> would be more robust to adding other speeds in the future.
> 
> > +	default:
> > +		plt = NULL;
> > +		break;
> > +	}
> > +
> > +	if (plt != phy->last_triggered) {
> > +		led_trigger_event(&phy->last_triggered->trigger, LED_OFF);
> > +		led_trigger_event(&plt->trigger, LED_FULL);
> > +		phy->last_triggered = plt;
> > +	}
> > +}
> > +EXPORT_SYMBOL_GPL(phy_led_trigger_change_speed);
> > +
> > +static int phy_led_trigger_register(struct phy_device *phy,
> > +				    struct phy_led_trigger *plt, int i)
> > +{
> > +	static const char * const name_suffix[] = {
> > +		"10Mb",
> > +		"100Mb",
> > +		"Gb",
> > +		"2.5Gb",
> > +		"10GbE",
> 
> Same comment as initially, the 10GbE is slightly inconsistent here wrt
> the other speed encodings.
> 
> 
> >  
> > @@ -402,6 +403,14 @@ struct phy_device {
> >  
> >  	int link_timeout;
> >  
> > +#ifdef CONFIG_LED_TRIGGER_PHY
> > +	/*
> > +	 * A led_trigger per SPEED_*
> > +	 */
> > +	struct phy_led_trigger phy_led_trigger[5];
> 
> Same comment, we would probably want to have a define/enum value for the
> maximum number of speeds supported.
> 
> 
> > +#include <linux/leds.h>
> > +
> > +struct phy_led_trigger {
> > +	struct led_trigger trigger;
> > +	char name[64];
> 
> Can we size this buffer based on MII_BUS_ID_SIZE, the amount of
> characters needed to represent a PHY device, and the maximum trigger
> name size?
> 

The MII_BUS_ID_SIZE constant is defined in phy.h and phy_led_trigger.h is
included in phy.h, which makes it not straight forward to use the
MII_BUS_ID_SIZE constant here. Do you have any suggestions for getting around
this?

> > +#else
> > +
> > +static inline int phy_led_triggers_register(struct phy_device *phy)
> > +{
> > +	return 0;
> > +}
> > +static inline void phy_led_triggers_unregister(struct phy_device *phy) { }
> > +static inline void phy_led_trigger_change_speed(struct phy_device *phy) { }
> 
> Kudos for adding stubs!
> -- 
> Florian

^ permalink raw reply

* [net-next PATCH V2] mlx4: add missed recycle opportunity for XDP_TX on TX failure
From: Jesper Dangaard Brouer @ 2016-09-19 15:46 UTC (permalink / raw)
  To: netdev, tariqt
  Cc: Eric Dumazet, tom, bblanco, Jesper Dangaard Brouer, rana.shahot,
	David S. Miller
In-Reply-To: <1474295315.22679.85.camel@edumazet-glaptop3.roam.corp.google.com>

Correct drop handling for XDP_TX on TX failure, were recently added in
commit 95357907ae73 ("mlx4: fix XDP_TX is acting like XDP_PASS on TX
ring full").

The change missed an opportunity for recycling the RX page, instead of
going through the page allocator, like the regular XDP_DROP action does.
This patch cease the opportunity, by going through the XDP_DROP case.

Fixes: 95357907ae73 ("mlx4: fix XDP_TX is acting like XDP_PASS on TX ring full")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
---
V2: adjust indention according to Eric

 drivers/net/ethernet/mellanox/mlx4/en_rx.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index c80073e4947f..c46355bce613 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -906,11 +906,12 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 							length, tx_index,
 							&doorbell_pending))
 					goto consumed;
-				goto next; /* Drop on xmit failure */
+				goto xdp_drop; /* Drop on xmit failure */
 			default:
 				bpf_warn_invalid_xdp_action(act);
 			case XDP_ABORTED:
 			case XDP_DROP:
+xdp_drop:
 				if (mlx4_en_rx_recycle(ring, frags))
 					goto consumed;
 				goto next;

^ permalink raw reply related

* Re: [PATCH v3 net-next] net: phy: Add MAC-IF driver for Microsemi PHYs.
From: Andrew Lunn @ 2016-09-19 15:46 UTC (permalink / raw)
  To: Raju Lakkaraju; +Cc: netdev, f.fainelli, Allan.Nielsen
In-Reply-To: <1474279434-14583-1-git-send-email-Raju.Lakkaraju@microsemi.com>

On Mon, Sep 19, 2016 at 03:33:54PM +0530, Raju Lakkaraju wrote:
> From: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>
> 
> All the review comments updated and resending for review.

Hi Raju

Please put comments like the above below the --- line. They then don't
get placed into the commit log in git.

Also, please include a brief summary of what changed for each version
of a patch. That makes it easier for the reviewer. Again, this should
be below the ---.

> 
> This is MAC interface feature.
> Microsemi PHY can support RGMII, RMII or GMII/MII interface between MAC and PHY.
> MAC-IF function program the right value based on Device tree configuration.
> 
> Tested on Beaglebone Black with VSC 8531 PHY.
> 
> Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [PATCH 2/3] cgroup: duplicate cgroup reference when cloning sockets
From: Vladimir Davydov @ 2016-09-19 15:43 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Tejun Heo, David S. Miller, Michal Hocko, linux-mm,
	cgroups, netdev, linux-kernel, kernel-team
In-Reply-To: <20160914194846.11153-2-hannes@cmpxchg.org>

On Wed, Sep 14, 2016 at 03:48:45PM -0400, Johannes Weiner wrote:
> From: Johannes Weiner <jweiner@fb.com>
> 
> When a socket is cloned, the associated sock_cgroup_data is duplicated
> but not its reference on the cgroup. As a result, the cgroup reference
> count will underflow when both sockets are destroyed later on.
> 
> Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup")
> Cc: <stable@vger.kernel.org> # 4.5+
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Reviewed-by: Vladimir Davydov <vdavydov.dev@gmail.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 1/3] mm: memcontrol: make per-cpu charge cache IRQ-safe for socket accounting
From: Vladimir Davydov @ 2016-09-19 15:35 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Tejun Heo, David S. Miller, Michal Hocko, linux-mm,
	cgroups, netdev, linux-kernel, kernel-team
In-Reply-To: <20160914194846.11153-1-hannes@cmpxchg.org>

On Wed, Sep 14, 2016 at 03:48:44PM -0400, Johannes Weiner wrote:
> From: Johannes Weiner <jweiner@fb.com>
> 
> During cgroup2 rollout into production, we started encountering css
> refcount underflows and css access crashes in the memory controller.
> Splitting the heavily shared css reference counter into logical users
> narrowed the imbalance down to the cgroup2 socket memory accounting.
> 
> The problem turns out to be the per-cpu charge cache. Cgroup1 had a
> separate socket counter, but the new cgroup2 socket accounting goes
> through the common charge path that uses a shared per-cpu cache for
> all memory that is being tracked. Those caches are safe against
> scheduling preemption, but not against interrupts - such as the newly
> added packet receive path. When cache draining is interrupted by
> network RX taking pages out of the cache, the resuming drain operation
> will put references of in-use pages, thus causing the imbalance.
> 
> Disable IRQs during all per-cpu charge cache operations.
> 
> Fixes: f7e1cb6ec51b ("mm: memcontrol: account socket memory in unified hierarchy memory controller")
> Cc: <stable@vger.kernel.org> # 4.5+
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH] net: skbuff: Fix length validation in skb_vlan_pop()
From: Shmulik Ladkani @ 2016-09-19 15:20 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: pravin shelar, Jiri Pirko, David S . Miller,
	Linux Kernel Network Developers
In-Reply-To: <20160919160517.6ee28bde@pixies>

On Mon, 19 Sep 2016 16:05:17 +0300 Shmulik Ladkani <shmulik.ladkani@gmail.com> wrote:
> Also, I'm okay with removing the excess 'skb->mac_len < VLAN_ETH_HLEN'
> condition if it is agreed that the "tag exists but insufficient to pop"
> semantic is no longer wanted.

Thinking this over, the condition is indeed superflous, will send a v2.

^ permalink raw reply

* Re: [patch net-next RFC 0/2] fib4 offload: notifier to let hw to be aware of all prefixes
From: Jiri Pirko @ 2016-09-19 15:15 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: Florian Fainelli, netdev, davem, idosch, eladr, yotamg, nogahf,
	ogerlitz, nikolay, linville, tgraf, gospo, sfeldma, ast, edumazet,
	hannes, dsa, jhs, vivien.didelot, john.fastabend, andrew, ivecera
In-Reply-To: <57DFFD4A.6070403@cumulusnetworks.com>

Mon, Sep 19, 2016 at 04:59:22PM CEST, roopa@cumulusnetworks.com wrote:
>On 9/18/16, 11:14 PM, Jiri Pirko wrote:
>> Mon, Sep 19, 2016 at 01:16:17AM CEST, roopa@cumulusnetworks.com wrote:
>>> On 9/18/16, 1:00 PM, Florian Fainelli wrote:
>>>> Le 06/09/2016 à 05:01, Jiri Pirko a écrit :
>>>>> From: Jiri Pirko <jiri@mellanox.com>
>>>>>
>>>>> This is RFC, unfinished. I came across some issues in the process so I would
>>>>> like to share those and restart the fib offload discussion in order to make it
>>>>> really usable.
>>>>>
>>>>> So the goal of this patchset is to allow driver to propagate all prefixes
>>>>> configured in kernel down HW. This is necessary for routing to work
>>>>> as expected. If we don't do that HW might forward prefixes known to kernel
>>>>> incorrectly. Take an example when default route is set in switch HW and there
>>>>> is an IP address set on a management (non-switch) port.
>>>>>
>>>>> Currently, only fibs related to the switch port netdev are offloaded using
>>>>> switchdev ops. This model is not extendable so the first patch introduces
>>>>> a replacement: notifier to propagate fib additions and removals to whoever
>>>>> interested. The second patch makes mlxsw to adopt this new way, registering
>>>>> one notifier block for each mlxsw (asic) instance.
>>>> Instead of introducing another specialization of a notifier_block
>>>> implementation, could we somehow have a kernel-based netlink listener
>>>> which receives the same kind of event information from rtmsg_fib()?
>>>>
>>>> The reason is that having such a facility would hook directly onto
>>>> existing rtmsg_* calls that exist throughout the stack, and that seems
>>>> to scale better.
>>> I was thinking along the same lines. Instead of proliferating notifier blocks
>>> through-out the stack for switchdev offload, putting existing events to use would be nice.
>>>
>>> But the problem though is drivers having to parse the netlink msg again. also, the intent
>>> here is to do the offload first ..before the route is added to the kernel (though i don't see that in
>>> the current series). existing netlink rmsg_fib events are generated after the route is added to the kernel.
>>>
>>>
>>> Jiri, instead of the notifier, do you see a problem with always calling the existing switchdev
>>> offload api for every route  for every asic instance ?. the first device where the route fits wins.
>> There is not list of asic instances. Therefore the notifier fits much better here.
>>
>>
>>
>>> it seems similar to driver registering for notifier and looking at every route ...
>>> am i missing something ?
>>> and the policies you mention could help around selecting the asic instance (FCFS or mirror).
>>> you will need to abstract out the asic instance for switchdev api to call on, but I thought you
>>> already have that in some form in your devlink infrastructure.
>> switchdev asic instances and devlink instances are orthogonal.
>
>maybe it is not today...but the requirement for devlink was to provide a way to communicate
>to the switch driver
>- global switch attributes or
>- things that cannot go via switch ports (exactly the problem you are trying to solve for routes here)

Devlink is a general beast, not switch specific one. I see no need to
use fib->devlink->driver route inside kernel. Devlink is for userspace
facing.


>
>so,  maybe an instance of switch asic modeled via devlink will help here and possibly all/other switchdev
>offload hooks ?

Maybe, but in case of fibs, the notifier just fits great. I see no need
for anything else.

^ permalink raw reply

* Re: [PATCH net-next v6] gso: Support partial splitting at the frag_list pointer
From: Alexander Duyck @ 2016-09-19 15:12 UTC (permalink / raw)
  To: Steffen Klassert
  Cc: Netdev, Alexander Duyck, Eric Dumazet, Marcelo Ricardo Leitner
In-Reply-To: <20160919105847.GG31137@gauss.secunet.com>

On Mon, Sep 19, 2016 at 3:58 AM, Steffen Klassert
<steffen.klassert@secunet.com> wrote:
> Since commit 8a29111c7 ("net: gro: allow to build full sized skb")
> gro may build buffers with a frag_list. This can hurt forwarding
> because most NICs can't offload such packets, they need to be
> segmented in software. This patch splits buffers with a frag_list
> at the frag_list pointer into buffers that can be TSO offloaded.
>
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> ---
>
> Changes since v1:
>
> - Use the assumption that all buffers in the chain excluding the last
>   containing the same amount of data.
>
> - Simplify some checks against gso partial.
>
> - Fix the generation of IP IDs.
>
> Changes since v2:
>
> - Merge common code of gso partial and frag_list pointer splitting.
>
> Changes since v3:
>
> - Fix the checks for doing frag_list pointer splitting.
>
> Changes since v4:
>
> - Whitespace fix.
> - Fix size calculations of the tail packet.
>
> Changes since v5:
>
> - Fix another size calculations of the tail packet.
>
>
>  net/core/skbuff.c      | 51 +++++++++++++++++++++++++++++++++++++++-----------
>  net/ipv4/af_inet.c     | 14 ++++++++++----
>  net/ipv4/gre_offload.c |  6 ++++--
>  net/ipv4/tcp_offload.c | 13 +++++++------
>  net/ipv4/udp_offload.c |  6 ++++--
>  net/ipv6/ip6_offload.c |  5 ++++-
>  6 files changed, 69 insertions(+), 26 deletions(-)

Looks good.

Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>

^ permalink raw reply

* Re: [patch net-next RFC 1/2] fib: introduce fib notification infrastructure
From: Jiri Pirko @ 2016-09-19 15:08 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: netdev, davem, idosch, eladr, yotamg, nogahf, ogerlitz, nikolay,
	linville, tgraf, gospo, sfeldma, ast, edumazet, hannes,
	f.fainelli, dsa, jhs, vivien.didelot, john.fastabend, andrew,
	ivecera
In-Reply-To: <57DFFBD3.9040906@cumulusnetworks.com>

Mon, Sep 19, 2016 at 04:53:07PM CEST, roopa@cumulusnetworks.com wrote:
>On 9/18/16, 11:06 PM, Jiri Pirko wrote:
>> Mon, Sep 19, 2016 at 01:23:47AM CEST, roopa@cumulusnetworks.com wrote:
>>> On 9/6/16, 5:01 AM, Jiri Pirko wrote:
>>>> From: Jiri Pirko <jiri@mellanox.com>
>>>>
>>>> This allows to pass information about added/deleted fib entries to
>>>> whoever is interested. This is done in a very similar way as devinet
>>>> notifies address additions/removals.
>>>>
>>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>>> ---
>
>[snip]
>>>>  
>>>>  #define KEYLENGTH	(8*sizeof(t_key))
>>>> @@ -1190,6 +1221,10 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
>>>>  			fib_release_info(fi_drop);
>>>>  			if (state & FA_S_ACCESSED)
>>>>  				rt_cache_flush(cfg->fc_nlinfo.nl_net);
>>>> +
>>>> +			call_fib_notifiers(FIB_EVENT_TYPE_ADD, key, plen, fi,
>>>> +					   new_fa->fa_tos, cfg->fc_type,
>>>> +					   tb->tb_id, cfg->fc_nlflags);
>>>>  			rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen,
>>>>  				tb->tb_id, &cfg->fc_nlinfo, NLM_F_REPLACE);
>>>>  
>>>> @@ -1241,6 +1276,8 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
>>>>  		tb->tb_num_default++;
>>>>  
>>>>  	rt_cache_flush(cfg->fc_nlinfo.nl_net);
>>>> +	call_fib_notifiers(FIB_EVENT_TYPE_ADD, key, plen, fi, tos,
>>>> +			   cfg->fc_type, tb->tb_id, cfg->fc_nlflags);
>>>
>>> It appears that this is in addition to the existing switchdev_fib_ipv4_add call right above this.
>>> Is the intent to do both ?. i don't see a need to do both.
>> I already have patchset improved that it removes the switchdev fib code.
>> Have to do some more testing, will send it soon.
>
>ok, ack.
>>
>>
>>> and switchdev_fib_ipv4_add  offloads before the route is added to the kernel.
>>> But the notifier seems to fire after the route is added to the kernel.
>> Yeah, I wanted to align it with rtmsg_fib calls. Also I think it makes
>> sense to have slowpath ready before offload.
>>
>
>ok, ..but..that changes existing behavior though. and if the hw route add fails...,
>you may have inconsistent state between hw and sw.

If the hw route add fails, driver will be responsible for abort,
cleanining up hw and set appropriate traps to get packets to cpu
processing.

^ permalink raw reply

* Re: [patch net-next RFC 0/2] fib4 offload: notifier to let hw to be aware of all prefixes
From: Roopa Prabhu @ 2016-09-19 14:59 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Florian Fainelli, netdev, davem, idosch, eladr, yotamg, nogahf,
	ogerlitz, nikolay, linville, tgraf, gospo, sfeldma, ast, edumazet,
	hannes, dsa, jhs, vivien.didelot, john.fastabend, andrew, ivecera
In-Reply-To: <20160919061454.GC1846@nanopsycho.orion>

On 9/18/16, 11:14 PM, Jiri Pirko wrote:
> Mon, Sep 19, 2016 at 01:16:17AM CEST, roopa@cumulusnetworks.com wrote:
>> On 9/18/16, 1:00 PM, Florian Fainelli wrote:
>>> Le 06/09/2016 à 05:01, Jiri Pirko a écrit :
>>>> From: Jiri Pirko <jiri@mellanox.com>
>>>>
>>>> This is RFC, unfinished. I came across some issues in the process so I would
>>>> like to share those and restart the fib offload discussion in order to make it
>>>> really usable.
>>>>
>>>> So the goal of this patchset is to allow driver to propagate all prefixes
>>>> configured in kernel down HW. This is necessary for routing to work
>>>> as expected. If we don't do that HW might forward prefixes known to kernel
>>>> incorrectly. Take an example when default route is set in switch HW and there
>>>> is an IP address set on a management (non-switch) port.
>>>>
>>>> Currently, only fibs related to the switch port netdev are offloaded using
>>>> switchdev ops. This model is not extendable so the first patch introduces
>>>> a replacement: notifier to propagate fib additions and removals to whoever
>>>> interested. The second patch makes mlxsw to adopt this new way, registering
>>>> one notifier block for each mlxsw (asic) instance.
>>> Instead of introducing another specialization of a notifier_block
>>> implementation, could we somehow have a kernel-based netlink listener
>>> which receives the same kind of event information from rtmsg_fib()?
>>>
>>> The reason is that having such a facility would hook directly onto
>>> existing rtmsg_* calls that exist throughout the stack, and that seems
>>> to scale better.
>> I was thinking along the same lines. Instead of proliferating notifier blocks
>> through-out the stack for switchdev offload, putting existing events to use would be nice.
>>
>> But the problem though is drivers having to parse the netlink msg again. also, the intent
>> here is to do the offload first ..before the route is added to the kernel (though i don't see that in
>> the current series). existing netlink rmsg_fib events are generated after the route is added to the kernel.
>>
>>
>> Jiri, instead of the notifier, do you see a problem with always calling the existing switchdev
>> offload api for every route  for every asic instance ?. the first device where the route fits wins.
> There is not list of asic instances. Therefore the notifier fits much better here.
>
>
>
>> it seems similar to driver registering for notifier and looking at every route ...
>> am i missing something ?
>> and the policies you mention could help around selecting the asic instance (FCFS or mirror).
>> you will need to abstract out the asic instance for switchdev api to call on, but I thought you
>> already have that in some form in your devlink infrastructure.
> switchdev asic instances and devlink instances are orthogonal.

maybe it is not today...but the requirement for devlink was to provide a way to communicate
to the switch driver
- global switch attributes or
- things that cannot go via switch ports (exactly the problem you are trying to solve for routes here)

so,  maybe an instance of switch asic modeled via devlink will help here and possibly all/other switchdev
offload hooks ?

^ permalink raw reply

* Re: [patch net-next RFC 1/2] fib: introduce fib notification infrastructure
From: Roopa Prabhu @ 2016-09-19 14:53 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, idosch, eladr, yotamg, nogahf, ogerlitz, nikolay,
	linville, tgraf, gospo, sfeldma, ast, edumazet, hannes,
	f.fainelli, dsa, jhs, vivien.didelot, john.fastabend, andrew,
	ivecera
In-Reply-To: <20160919060610.GA1846@nanopsycho.orion>

On 9/18/16, 11:06 PM, Jiri Pirko wrote:
> Mon, Sep 19, 2016 at 01:23:47AM CEST, roopa@cumulusnetworks.com wrote:
>> On 9/6/16, 5:01 AM, Jiri Pirko wrote:
>>> From: Jiri Pirko <jiri@mellanox.com>
>>>
>>> This allows to pass information about added/deleted fib entries to
>>> whoever is interested. This is done in a very similar way as devinet
>>> notifies address additions/removals.
>>>
>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>> ---

[snip]
>>>  
>>>  #define KEYLENGTH	(8*sizeof(t_key))
>>> @@ -1190,6 +1221,10 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
>>>  			fib_release_info(fi_drop);
>>>  			if (state & FA_S_ACCESSED)
>>>  				rt_cache_flush(cfg->fc_nlinfo.nl_net);
>>> +
>>> +			call_fib_notifiers(FIB_EVENT_TYPE_ADD, key, plen, fi,
>>> +					   new_fa->fa_tos, cfg->fc_type,
>>> +					   tb->tb_id, cfg->fc_nlflags);
>>>  			rtmsg_fib(RTM_NEWROUTE, htonl(key), new_fa, plen,
>>>  				tb->tb_id, &cfg->fc_nlinfo, NLM_F_REPLACE);
>>>  
>>> @@ -1241,6 +1276,8 @@ int fib_table_insert(struct fib_table *tb, struct fib_config *cfg)
>>>  		tb->tb_num_default++;
>>>  
>>>  	rt_cache_flush(cfg->fc_nlinfo.nl_net);
>>> +	call_fib_notifiers(FIB_EVENT_TYPE_ADD, key, plen, fi, tos,
>>> +			   cfg->fc_type, tb->tb_id, cfg->fc_nlflags);
>>
>> It appears that this is in addition to the existing switchdev_fib_ipv4_add call right above this.
>> Is the intent to do both ?. i don't see a need to do both.
> I already have patchset improved that it removes the switchdev fib code.
> Have to do some more testing, will send it soon.

ok, ack.
>
>
>> and switchdev_fib_ipv4_add  offloads before the route is added to the kernel.
>> But the notifier seems to fire after the route is added to the kernel.
> Yeah, I wanted to align it with rtmsg_fib calls. Also I think it makes
> sense to have slowpath ready before offload.
>

ok, ..but..that changes existing behavior though. and if the hw route add fails...,
you may have inconsistent state between hw and sw.

^ permalink raw reply

* Re: [net-next PATCH] net: netlink messages for HW addr programming
From: Patrick Ruddy @ 2016-09-19 14:46 UTC (permalink / raw)
  To: roopa@cumulusnetworks.com
  Cc: stephen@networkplumber.org, netdev@vger.kernel.org,
	davem@davemloft.net, Luca Boccassi, alexander.h.duyck@intel.com,
	jiri@resnulli.us, Sven-Thorsten Dietrich
In-Reply-To: <57DEA9E2.3000703@cumulusnetworks.com>

On Sun, 2016-09-18 at 07:51 -0700, Roopa Prabhu wrote:
> On 9/15/16, 9:48 AM, Patrick Ruddy wrote:
> > Add RTM_NEWADDR and RTM_DELADDR netlink messages with family
> > AF_UNSPEC to indicate interest in specific unicast and multicast
> > hardware addresses. These messages are sent when addresses are
> > added or deleted from the appropriate interface driver.
> > Added AF_UNSPEC GETADDR function to allow the netlink notifications
> > to be replayed to avoid loss of state due to application start
> > ordering or restart.
> >
> > Signed-off-by: Patrick Ruddy <pruddy@brocade.com>
> > ---
> 
> RTM_NEWADDR and RTM_DELADDR are not used to add these entries to the kernel.
> so, it seems a bit wrong to use RTM_NEWADDR and RTM_DELADDR to notify them to
> userspace and also to request a special dump of these addresses.
> 
> This could just be a new nested netlink attribute in the existing link dump ?

Hi Roopa

Thanks for the review. I did initially code this using NEW/DEL/GET_LINK
messages but was asked to change to to ADDR messages by Stephen
Hemminger (cc'd). 

However I agree that these addresses fall between the LINK and ADDR
areas so I'm happy to change this if we can reach some consensus on the
format.

thanks

-pr

^ permalink raw reply

* [PATCH net v2] qed: Fix stack corruption on probe
From: Yuval Mintz @ 2016-09-19 14:47 UTC (permalink / raw)
  To: davem, netdev; +Cc: David.Laight, Yuval Mintz, Yuval Mintz

Commit fe56b9e6a8d95 ("qed: Add module with basic common support")
has introduced a stack corruption during probe, where filling a
local struct with data to be sent to management firmware is incorrectly
filled; The data is written outside of the struct and corrupts
the stack.

Changes from v1:
----------------
 - Correct the value written [Caught by David Laight]

Fixes: fe56b9e6a8d95 ("qed: Add module with basic common support")
Signed-off-by: Yuval Mintz <Yuval.Mintz@caviumnetworks.com>
---
Hi Dave,

Please consider applying this to `net'.

Thanks,
Yuval
---
 drivers/net/ethernet/qlogic/qed/qed_mcp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_mcp.c b/drivers/net/ethernet/qlogic/qed/qed_mcp.c
index a240f26..59c81ce 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_mcp.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_mcp.c
@@ -1153,8 +1153,8 @@ qed_mcp_send_drv_version(struct qed_hwfn *p_hwfn,
 	p_drv_version = &union_data.drv_version;
 	p_drv_version->version = p_ver->version;
 
-	for (i = 0; i < MCP_DRV_VER_STR_SIZE - 1; i += 4) {
-		val = cpu_to_be32(p_ver->name[i]);
+	for (i = 0; i < (MCP_DRV_VER_STR_SIZE - 4) / sizeof(u32); i++) {
+		val = cpu_to_be32(*((u32 *)&p_ver->name[i * sizeof(u32)]));
 		*(__be32 *)&p_drv_version->name[i * sizeof(u32)] = val;
 	}
 
-- 
1.9.3

^ permalink raw reply related

* Re: [net-next PATCH] mlx4: add missed recycle opportunity for XDP_TX on TX failure
From: Eric Dumazet @ 2016-09-19 14:28 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, tariqt, tom, bblanco, rana.shahot, David S. Miller
In-Reply-To: <20160919073957.32059.59156.stgit@firesoul>

On Mon, 2016-09-19 at 09:40 +0200, Jesper Dangaard Brouer wrote:
> Correct drop handling for XDP_TX on TX failure, were recently added in
> commit 95357907ae73 ("mlx4: fix XDP_TX is acting like XDP_PASS on TX
> ring full").
> 
> The change missed an opportunity for recycling the RX page, instead of
> going through the page allocator, like the regular XDP_DROP action does.
> This patch cease the opportunity, by going through the XDP_DROP case.
> 
> Fixes: 95357907ae73 ("mlx4: fix XDP_TX is acting like XDP_PASS on TX ring full")
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  drivers/net/ethernet/mellanox/mlx4/en_rx.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index c80073e4947f..919f0604d04a 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -906,11 +906,12 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
>  							length, tx_index,
>  							&doorbell_pending))
>  					goto consumed;
> -				goto next; /* Drop on xmit failure */
> +				goto xdp_drop; /* Drop on xmit failure */
>  			default:
>  				bpf_warn_invalid_xdp_action(act);
>  			case XDP_ABORTED:
>  			case XDP_DROP:
> +			xdp_drop:

A label should start at the beginning of the line (ie : not indented)

>  				if (mlx4_en_rx_recycle(ring, frags))
>  					goto consumed;
>  				goto next;
> 

^ permalink raw reply

* Re: [PATCH v3 net-next 05/16] tcp: switch back to proper tcp_skb_cb size check in tcp_init()
From: Eric Dumazet @ 2016-09-19 14:41 UTC (permalink / raw)
  To: Lance Richardson
  Cc: Neal Cardwell, David Miller, netdev, Soheil Hassas Yeganeh,
	Yuchung Cheng
In-Reply-To: <270694554.494524.1474295826754.JavaMail.zimbra@redhat.com>

On Mon, Sep 19, 2016 at 7:37 AM, Lance Richardson <lrichard@redhat.com> wrote:

> The skb local variable could be avoided via:
>
>         BUILD_BUG_ON(sizeof(struct tcp_skb_cb) > FIELD_SIZEOF(struct sk_buff, cb));
>

Right you are, thanks !

^ permalink raw reply

* Re: [PATCH v3 net-next 05/16] tcp: switch back to proper tcp_skb_cb size check in tcp_init()
From: Lance Richardson @ 2016-09-19 14:37 UTC (permalink / raw)
  To: Neal Cardwell
  Cc: David Miller, netdev, Eric Dumazet, Soheil Hassas Yeganeh,
	Yuchung Cheng
In-Reply-To: <1474236233-28511-6-git-send-email-ncardwell@google.com>

> From: "Neal Cardwell" <ncardwell@google.com>
> To: "David Miller" <davem@davemloft.net>
> Cc: netdev@vger.kernel.org, "Eric Dumazet" <edumazet@google.com>, "Soheil Hassas Yeganeh" <soheil@google.com>, "Neal
> Cardwell" <ncardwell@google.com>, "Yuchung Cheng" <ycheng@google.com>
> Sent: Sunday, September 18, 2016 6:03:42 PM
> Subject: [PATCH v3 net-next 05/16] tcp: switch back to proper tcp_skb_cb size check in tcp_init()
> 
> From: Eric Dumazet <edumazet@google.com>
> 
> Revert to the tcp_skb_cb size check that tcp_init() had before commit
> b4772ef879a8 ("net: use common macro for assering skb->cb[] available
> size in protocol families"). As related commit 744d5a3e9fe2 ("net:
> move skb->dropcount to skb->cb[]") explains, the
> sock_skb_cb_check_size() mechanism was added to ensure that there is
> space for dropcount, "for protocol families using it". But TCP is not
> a protocol using dropcount, so tcp_init() doesn't need to provision
> space for dropcount in the skb->cb[], and thus we can revert to the
> older form of the tcp_skb_cb size check. Doing so allows TCP to use 4
> more bytes of the skb->cb[] space.
> 
> Fixes: b4772ef879a8 ("net: use common macro for assering skb->cb[] available
> size in protocol families")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
> Signed-off-by: Neal Cardwell <ncardwell@google.com>
> Signed-off-by: Yuchung Cheng <ycheng@google.com>
> ---
>  net/ipv4/tcp.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 5b0b49c..53798e1 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -3244,11 +3244,12 @@ static void __init tcp_init_mem(void)
>  
>  void __init tcp_init(void)
>  {
> -	unsigned long limit;
>  	int max_rshare, max_wshare, cnt;
> +	unsigned long limit;
> +	struct sk_buff *skb;
>  	unsigned int i;
>  
> -	sock_skb_cb_check_size(sizeof(struct tcp_skb_cb));
> +	BUILD_BUG_ON(sizeof(struct tcp_skb_cb) > sizeof(skb->cb));
>  
The skb local variable could be avoided via:

        BUILD_BUG_ON(sizeof(struct tcp_skb_cb) > FIELD_SIZEOF(struct sk_buff, cb));

>  	percpu_counter_init(&tcp_sockets_allocated, 0, GFP_KERNEL);
>  	percpu_counter_init(&tcp_orphan_count, 0, GFP_KERNEL);
> --
> 2.8.0.rc3.226.g39d4020
> 
> 

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox