Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] net/udp: do not touch skb->peeked unless really needed
From: Eric Dumazet @ 2016-12-06 14:35 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: David Miller, netdev, Willem de Bruijn
In-Reply-To: <1481026212.6225.43.camel@redhat.com>

On Tue, 2016-12-06 at 13:10 +0100, Paolo Abeni wrote:

> Please ignore the above dumb comment. I misread the 'skip condition'.
> 
> I'm fine with the patch in its current form.
> 
> Acked-by: Paolo Abeni <pabeni@redhat.com>

No worries, I prefer having multiple eyes on this stuff before doing the
next step ;)

Thanks !

^ permalink raw reply

* Re: [PATCH] net/udp: do not touch skb->peeked unless really needed
From: Eric Dumazet @ 2016-12-06 14:34 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: David Miller, netdev, Willem de Bruijn
In-Reply-To: <1481017989.6225.21.camel@redhat.com>

On Tue, 2016-12-06 at 10:53 +0100, Paolo Abeni wrote:
> Hi Eric,

> I don't understand why we can avoid setting skb->peek if len > 0. I
> think that will change the kernel behavior if:
> - peek with offset is set
> - 3 skbs with len > 0 are enqueued
> - the u/s peek (with offset) the second one
> - the u/s disable peeking with offset and peeks 2 more skbs.
> 
> With the current code in the last step the u/s is going to peek the 1#
> and the 3# skbs, after this patch will peek the 1# and the 2#. Am I
> missing something ? Probably the new behavior is more correct, but still
> is a change. 
> 
> I gave this a run in my test bed on top of your udp-related patches I
> see additional ~3 improvement in the udp flood scenario, and a bit more
> in the un-contended scenario.
> 
> Thank you,

MSG_PEEK always grab the first skb in queue, regardless of its
skb->peeked status.

Unless an offset (given in bytes) is given. Then we skip X bytes in the
queue, again regardless of skb->peeked bit.

skb->peeked is _needed_ to avoid peeking the same 0-byte skb over and
over, since user land could not skip it, as the offset to skip skbs is
computed by sum ( lengthes). An infinite loop of recvmsg() would happen,
stuck on the same skb.

For regular non 0-bytes payload, we can skip over them without even
looking at skb->peeked.

Initially, skb->peeked was only a (bad) way to make sure UDP would
increment its stats only for non peeked messages. We can implement that
using at MSG_PEEK flag.

^ permalink raw reply

* Re: [PATCH]net:switchdev:add judgement and proper return vlaue to make switchdev_port_vlan_fill() more robust
From: David Miller @ 2016-12-06 14:32 UTC (permalink / raw)
  To: jiri; +Cc: cxdx2006, netdev, linux-kernel, feng.deng
In-Reply-To: <20161206123225.GK1984@nanopsycho>

From: Jiri Pirko <jiri@resnulli.us>
Date: Tue, 6 Dec 2016 13:32:25 +0100

> Tue, Dec 06, 2016 at 05:46:36AM CET, cxdx2006@gmail.com wrote:
>>From: Feng Deng<cxdx2006@gmail.com>
>>
>>1.add judgement to make sure switchdev_port_vlan_fill() could
>>  return right,  even when switchdev_port_vlan_dump_put() failed  ,
>>  which will make switchdev_port_vlan_fill()  more robust
>>2.add proper return vlaue at the end of the switchdev_port_vlan_fill()
>>
>>Signed-off-by: Feng Deng <feng.deng@cortina-access.com>
> 
> Please don't send patches like this. You are wasting everyone's time.

+1

^ permalink raw reply

* Re: [PATCH] dsa:mv88e6xxx: allow address 0x1 in smi_init
From: Andrew Lunn @ 2016-12-06 14:27 UTC (permalink / raw)
  To: Volodymyr Bendiuga; +Cc: vivien.didelot, f.fainelli, netdev, volodymyr.bendiuga
In-Reply-To: <1481029592-19496-1-git-send-email-volodymyr.bendiuga@westermo.se>

On Tue, Dec 06, 2016 at 02:06:32PM +0100, Volodymyr Bendiuga wrote:
> Westermo devices use this address.

I think we can have a better change log entry than that. How about:

Some devices, such as the mv88e6097 do have ADDR[0] external and so it
is possible to configure the device to use SMI address 0x1. Remove the
restriction, as there are boards using this address.

Thanks
	Andrew

> 
> Signed-off-by: Volodymyr Bendiuga <volodymyr.bendiuga@westermo.se>
> ---
>  drivers/net/dsa/mv88e6xxx/chip.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> index 4212fb6..05942e3 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> @@ -4253,10 +4253,6 @@ static void mv88e6xxx_phy_destroy(struct mv88e6xxx_chip *chip)
>  static int mv88e6xxx_smi_init(struct mv88e6xxx_chip *chip,
>  			      struct mii_bus *bus, int sw_addr)
>  {
> -	/* ADDR[0] pin is unavailable externally and considered zero */
> -	if (sw_addr & 0x1)
> -		return -EINVAL;
> -
>  	if (sw_addr == 0)
>  		chip->smi_ops = &mv88e6xxx_smi_single_chip_ops;
>  	else if (mv88e6xxx_has(chip, MV88E6XXX_FLAGS_MULTI_CHIP))
> -- 
> 2.7.4
> 

^ permalink raw reply

* Re: [PATCH v2 net-next 1/2] flow dissector: ICMP support
From: Simon Horman @ 2016-12-06 14:23 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Tom Herbert, David Miller, Linux Kernel Network Developers,
	Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, Jamal Hadi Salim,
	Jiri Pirko
In-Reply-To: <20161206122634.GH1984@nanopsycho>

On Tue, Dec 06, 2016 at 01:26:34PM +0100, Jiri Pirko wrote:
> Tue, Dec 06, 2016 at 11:51:46AM CET, simon.horman@netronome.com wrote:
> >On Mon, Dec 05, 2016 at 09:29:45AM -0800, Tom Herbert wrote:
> >> On Mon, Dec 5, 2016 at 6:23 AM, Simon Horman <simon.horman@netronome.com> wrote:
> >> > On Sat, Dec 03, 2016 at 10:52:32AM -0800, Tom Herbert wrote:
> >> >> On Sat, Dec 3, 2016 at 2:49 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> >> >> > Fri, Dec 02, 2016 at 09:31:41PM CET, simon.horman@netronome.com wrote:
> >> >> >>Allow dissection of ICMP(V6) type and code. This re-uses transport layer
> >> >> >>port dissection code as although ICMP is not a transport protocol and their
> >> >> >>type and code are not ports this allows sharing of both code and storage.
> >> >> >>
> >> >> >>Signed-off-by: Simon Horman <simon.horman@netronome.com>
> >> >
> >> > ...
> >> >
> >> >> > Digging into this a bit more. I think it would be much nice not to mix
> >> >> > up l4 ports and icmp stuff.
> >> >> >
> >> >> > How about to have FLOW_DISSECTOR_KEY_ICMP
> >> >> > and
> >> >> > struct flow_dissector_key_icmp {
> >> >> >         u8 type;
> >> >> >         u8 code;
> >> >> > };
> >> >> >
> >> >> > The you can make this structure and struct flow_dissector_key_ports into
> >> >> > an union in struct flow_keys.
> >> >> >
> >> >> > Looks much cleaner to me.
> >> >
> >> > Hi Jiri,
> >> >
> >> > I wondered about that too. The main reason that I didn't do this the first
> >> > time around is that I wasn't sure what to do to differentiate between ports
> >> > and icmp in fl_init_dissector() given that using a union would could to
> >> > mask bits being set in both if they are set in either.
> >> >
> >> > I've taken a stab at addressing that below along with implementing your
> >> > suggestion. If the treatment in fl_init_dissector() is acceptable
> >> > perhaps something similar should be done for FLOW_DISSECTOR_KEY_IPV[46]_ADDRS
> >> > too?
> >> >
> >> >> I agree, this patch adds to many conditionals into the fast path for
> >> >> ICMP handling. Neither is there much point in using type and code as
> >> >> input to the packet hash.
> >> >
> >> > Hi Tom,
> >> >
> >> > I'm not sure that I follow this. A packet may be ICMP or not regardless of
> >> > if FLOW_DISSECTOR_KEY_PORT (and now FLOW_DISSECTOR_KEY_ICMP) port is set or
> >> > not. I'd appreciate some guidance here.
> >> >
> >> > First-pass at implementing Jiri's suggestion follows:
> >> >
> >
> >...
> >
> >> > diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> >> > index 0584b4bb4390..33e5fa2b3e87 100644
> >> > --- a/net/core/flow_dissector.c
> >> > +++ b/net/core/flow_dissector.c
> >
> >...
> >
> >> > @@ -975,6 +983,10 @@ static const struct flow_dissector_key flow_keys_dissector_keys[] = {
> >> >                 .offset = offsetof(struct flow_keys, ports),
> >> >         },
> >> >         {
> >> > +               .key_id = FLOW_DISSECTOR_KEY_ICMP,
> >> > +               .offset = offsetof(struct flow_keys, icmp),
> >> > +       },
> >> 
> >> This is the problem I was referring to. This adds ICMP to the keys for
> >> computing the skb hash and the ICMP type and code are in a union with
> >> port numbers so they are in the range over which the hash is computed.
> >> This means that we are treating type and code as some sort of flow and
> >> and so different type/code between the same src/dst could follow
> >> different paths in ECMP. For the purposes of computing a packet hash I
> >> don't think we want ICMP information included. This might be a matter
> >> of not putting FLOW_DISSECTOR_KEY_ICMP in flow_keys_dissector_keys,
> >> where we need this information we could define another structure that
> >> has all the same fields as in flow_keys_dissector_keys and include
> >> FLOW_DISSECTOR_KEY_ICMP. Alternatively, the flow_dissector_key_icmp
> >> could also be outside of the area that is used in the hash: that is no
> >> in flow_keys_hash_start(keys) to flow_keys_hash_length(keys), keyval);
> >
> >...
> >
> >> > diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
> >> > index c96b2a349779..f4ba69bd362f 100644
> >> > --- a/net/sched/cls_flower.c
> >> > +++ b/net/sched/cls_flower.c
> >> > @@ -38,7 +38,10 @@ struct fl_flow_key {
> >> >                 struct flow_dissector_key_ipv4_addrs ipv4;
> >> >                 struct flow_dissector_key_ipv6_addrs ipv6;
> >> >         };
> >> > -       struct flow_dissector_key_ports tp;
> >> > +       union {
> >> > +               struct flow_dissector_key_ports tp;
> >> > +               struct flow_dissector_key_icmp icmp;
> >> > +       };
> >> 
> >> When an ICMP packet is encapsulated within UDP then icmp overwrites
> >> valid port information with is a stronger signal of the flow (unless
> >> FLOW_DISSECTOR_F_STOP_AT_ENCAP is set). This is another reason not to
> >> use a union with ports.
> >
> >...
> >
> >Hi Tom,
> >
> >thanks for your explanation. I think I have a clearer picture now.
> >
> >I have reworked things to try to address your concerns.
> >In particular:
> >
> >* FLOW_DISSECTOR_KEY_ICMP is no longer added to flow_keys_dissector_keys.
> >  I don't think it is needed at all for the use-case I currently have in
> >  mind, which is classifying packets using tc_flower. And adding it there
> >  was an error on my part.
> 
> I was just about to ask why you are adding it there. Good.
> 
> 
> >
> >* I stopped using a union for ports and icmp. At the very least this seems
> >  to make it easier to reason about things as we no longer need to consider
> >  that a port value may in fact be an ICMP type or code.
> 
> This should be within csl_flower code. You can easily have it as a union
> in struct fl_flow_key. 

Ok, will do.

> >  This seems to allow us avoid adding a number of is_icmp checks (as I believe
> >  you pointed out earlier). And should also address the problem you state
> >  immediately above.
> 
> I don't understand the check you are talking about. The union just allow
> to share the mem. I don't see any checks needed.

I meant the checks that the patchset adds were in bond_main.c and
other places before accessing the ports fields. I now see we can avoid them
while still using a union.

> >* I have also placed icmp outside of the range flow_keys_hash_start(keys)
> >  to flow_keys_hash_length(keys), keyval). This is because I now see no
> >  value of having it inside that range and it both avoids any chance of
> >  contaminating the hash with ICMP values and hashing over unwanted
> >  (hopefully zero) values.
> 
> Okay, now I'm confused again. You don't want to add this to
> flow_keys_dissector_keys. Why would you?

Sorry, I think it was me that was being confused. I'll drop that change.
I agree it should not be necessary.

^ permalink raw reply

* Re: [PATCH] net:mv88e6xxx: dispose irq mapping
From: Andrew Lunn @ 2016-12-06 14:19 UTC (permalink / raw)
  To: Volodymyr Bendiuga; +Cc: vivien.didelot, f.fainelli, netdev
In-Reply-To: <1481018743-27208-1-git-send-email-volodymyr.bendiuga@westermo.se>

On Tue, Dec 06, 2016 at 11:05:43AM +0100, Volodymyr Bendiuga wrote:
> If this is not done, then it is not possible to map
> irq next time, after EPROBE_DEFER.
> 
> Signed-off-by: Volodymyr Bendiuga <volodymyr.bendiuga@westermo.se>

Hi Volodymyr

Thanks for your patches.

Please could you include in the subject line which tree it is for. See

Documentation/networking/netdev-FAQ.txt


> ---
>  drivers/net/dsa/mv88e6xxx/chip.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> index 05942e3..12e7d38 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> @@ -420,6 +420,7 @@ static void mv88e6xxx_g1_irq_free(struct mv88e6xxx_chip *chip)
>  	mv88e6xxx_g1_write(chip, GLOBAL_CONTROL, mask);
>  
>  	free_irq(chip->irq, chip);
> +	irq_dispose_mapping(chip->irq);

This seems like the wrong place to do this.

The mapping is created by chip->irq = of_irq_get(np, 0); in
mv88e6xxx_probe(). So the correct place to dispose of this mapping would be in
mv88e6xxx_remove() and the error path of mv88e6xxx_probe().

Thanks
	Andrew

^ permalink raw reply

* Re: [PATCH net-next 2/3] net/sched: cls_flower: Add support for matching on flags
From: Or Gerlitz @ 2016-12-06 14:02 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: David S. Miller, netdev, Jiri Pirko, Roi Dayan, Hadar Har-Zion
In-Reply-To: <20161206124006.GL1984@nanopsycho>

On 12/6/2016 2:40 PM, Jiri Pirko wrote:
> Tue, Dec 06, 2016 at 01:32:58PM CET, ogerlitz@mellanox.com wrote:
>> Add UAPI to provide set of flags for matching, where the flags
>> provided from user-space are mapped to flow-dissector flags.
>>
>> The 1st flag allows to match on whether the packet is an
>> IP fragment and corresponds to the FLOW_DIS_IS_FRAGMENT flag.
>>
>> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
>> Reviewed-by: Paul Blakey <paulb@mellanox.com>
>> ---
>> include/uapi/linux/pkt_cls.h |  7 ++++
>> net/sched/cls_flower.c       | 83 ++++++++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 90 insertions(+)
>>
>> diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
>> index 86786d4..f114277 100644
>> --- a/include/uapi/linux/pkt_cls.h
>> +++ b/include/uapi/linux/pkt_cls.h
>> @@ -457,11 +457,18 @@ enum {
>> 	TCA_FLOWER_KEY_ENC_UDP_SRC_PORT_MASK,	/* be16 */
>> 	TCA_FLOWER_KEY_ENC_UDP_DST_PORT,	/* be16 */
>> 	TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK,	/* be16 */
>> +
>> +	TCA_FLOWER_KEY_FLAGS,		/* be32 */
>> +	TCA_FLOWER_KEY_FLAGS_MASK,	/* be32 */
>> 	__TCA_FLOWER_MAX,
>> };
>>
>> #define TCA_FLOWER_MAX (__TCA_FLOWER_MAX - 1)
>>
>> +enum {
>> +	TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT = BIT(0),
>> +};
>> +
>> /* Match-all classifier */
>>
>> enum {
>> diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>> index c5cea78..d9f4124 100644
>> --- a/net/sched/cls_flower.c
>> +++ b/net/sched/cls_flower.c
>> @@ -383,6 +383,8 @@ static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 1] = {
>> 	[TCA_FLOWER_KEY_ENC_UDP_SRC_PORT_MASK]	= { .type = NLA_U16 },
>> 	[TCA_FLOWER_KEY_ENC_UDP_DST_PORT]	= { .type = NLA_U16 },
>> 	[TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK]	= { .type = NLA_U16 },
>> +	[TCA_FLOWER_KEY_FLAGS]		= { .type = NLA_U32 },
>> +	[TCA_FLOWER_KEY_FLAGS_MASK]	= { .type = NLA_U32 },
>> };
>>
>> static void fl_set_key_val(struct nlattr **tb,
>> @@ -417,6 +419,40 @@ static void fl_set_key_vlan(struct nlattr **tb,
>> 	}
>> }
>>
>> +static void set_flags(u32 flower_key, u32 flower_mask,
> Use some prefix for helpers like this, or at least __

okay, will use fl_set_key_flags here

>
>
>> +		      u32 *dissector_key, u32 *dissector_mask,
>> +		      u32 flower_flag_bit, u32 dissector_flag_bit)
>> +{
>> +	if (flower_mask & flower_flag_bit) {
>> +		*dissector_mask |= dissector_flag_bit;
>> +		if (flower_key & flower_flag_bit)
>> +			*dissector_key |= dissector_flag_bit;
>> +	}
>> +}
>> +
>> +static void fl_set_key_flags(struct nlattr **tb,
>> +			     u32 *flags_key,
>> +			     u32 *flags_mask)
>> +{
>> +	u32 key, mask;
>> +
>> +	if (!tb[TCA_FLOWER_KEY_FLAGS])
>> +		return;
>> +
>> +	key = be32_to_cpu(nla_get_u32(tb[TCA_FLOWER_KEY_FLAGS]));
>> +
>> +	if (!tb[TCA_FLOWER_KEY_FLAGS_MASK])
>> +		mask = ~0;
>> +	else
>> +		mask = be32_to_cpu(nla_get_u32(tb[TCA_FLOWER_KEY_FLAGS_MASK]));
>> +
>> +	*flags_key  = 0;
>> +	*flags_mask = 0;
>> +
>> +	set_flags(key, mask, flags_key, flags_mask,
>> +		  TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT, FLOW_DIS_IS_FRAGMENT);
>> +}
>> +
>> static int fl_set_key(struct net *net, struct nlattr **tb,
>> 		      struct fl_flow_key *key, struct fl_flow_key *mask)
>> {
>> @@ -543,6 +579,8 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
>> 		       &mask->enc_tp.dst, TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK,
>> 		       sizeof(key->enc_tp.dst));
>>
>> +	fl_set_key_flags(tb, &key->control.flags, &mask->control.flags);
>> +
>> 	return 0;
>> }
>>
>> @@ -877,6 +915,48 @@ static int fl_dump_key_vlan(struct sk_buff *skb,
>> 	return 0;
>> }
>>
>> +static void get_flags(u32 dissector_key, u32 dissector_mask,
> Same here.

and fl_get_key_flags here

>
>> +		      u32 *flower_key, u32 *flower_mask,
>> +		      u32 flower_flag_bit, u32 dissector_flag_bit)
>> +{
>> +	if (dissector_mask & dissector_flag_bit) {
>> +		*flower_mask |= flower_flag_bit;
>> +		if (dissector_key & dissector_flag_bit)
>> +			*flower_key |= flower_flag_bit;
>> +	}
>> +}
>> +
>> +static int fl_dump_key_flags(struct sk_buff *skb,
>> +			     u32 flags_key,
>> +			     u32 flags_mask)
> over-wrapped :)

okay,

>
>> +{
>> +	u32 key, mask;
>> +	__be32 _key, _mask;
>> +	int err;
>> +
>> +	if (!memchr_inv(&flags_mask, 0, sizeof(flags_mask)))
>> +		return 0;
>> +
>> +	key  = 0;
>              ^ remove the extra space

okay

>
>> +	mask = 0;
>> +
>> +	get_flags(flags_key, flags_mask, &key, &mask,
>> +		  TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT, FLOW_DIS_IS_FRAGMENT);
>> +
>> +	_key  = cpu_to_be32(key);
>               ^ remove the extra space

NP

>
>> +	_mask = cpu_to_be32(mask);
>> +
>> +	err = nla_put(skb, TCA_FLOWER_KEY_FLAGS, 4, &_key);
>> +	if (err)
>> +		return err;
>> +
>> +	err = nla_put(skb, TCA_FLOWER_KEY_FLAGS_MASK, 4, &_mask);
> Just return nla_put...

sure

^ permalink raw reply

* [PATCH net-next v2 1/1] driver: ipvlan: Free ipvl_port directly with kfree instead of kfree_rcu
From: fgao @ 2016-12-06 13:54 UTC (permalink / raw)
  To: davem, maheshb, edumazet, netdev, gfree.wind

From: Gao Feng <gfree.wind@gmail.com>

There is no one which may reference the ipvlan port when free it in
ipvlan_port_create and ipvlan_port_destroy. So it is unnecessary to
use kfree_rcu.

Signed-off-by: Gao Feng <gfree.wind@gmail.com>
---
 v2: Remove the rcu of ipvl_port directly
 v1: Initial version

 drivers/net/ipvlan/ipvlan.h      | 1 -
 drivers/net/ipvlan/ipvlan_main.c | 4 ++--
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
index 05a62d2..031093e 100644
--- a/drivers/net/ipvlan/ipvlan.h
+++ b/drivers/net/ipvlan/ipvlan.h
@@ -97,7 +97,6 @@ struct ipvl_port {
 	struct work_struct	wq;
 	struct sk_buff_head	backlog;
 	int			count;
-	struct rcu_head		rcu;
 };
 
 static inline struct ipvl_port *ipvlan_port_get_rcu(const struct net_device *d)
diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index c6aa667..44ceebc 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -128,7 +128,7 @@ static int ipvlan_port_create(struct net_device *dev)
 	return 0;
 
 err:
-	kfree_rcu(port, rcu);
+	kfree(port);
 	return err;
 }
 
@@ -145,7 +145,7 @@ static void ipvlan_port_destroy(struct net_device *dev)
 	netdev_rx_handler_unregister(dev);
 	cancel_work_sync(&port->wq);
 	__skb_queue_purge(&port->backlog);
-	kfree_rcu(port, rcu);
+	kfree(port);
 }
 
 #define IPVLAN_FEATURES \
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH]net:sched:release lock before tcf_dump_walker() normal return to avoid deadlock
From: Jamal Hadi Salim @ 2016-12-06 13:50 UTC (permalink / raw)
  To: Feng Deng, David S. Miller
  Cc: netdev, linux-kernel, feng.deng, Roman Mashak, Cong Wang
In-Reply-To: <CAEX1niWim7BSki6uRtihwbJ3Adw4Qwt8MR1aki8UA4_x7V9UTA@mail.gmail.com>

On 16-12-06 12:36 AM, Feng Deng wrote:
> From: Feng Deng<cxdx2006@gmail.com>
>
> release lock before tcf_dump_walker() normal return to avoid deadlock
>

/Scratching my head.

I am probably missing something obvious.
What are the condition under which this deadlock will happen?
Do you have a testcase we can try?

cheers,
jamal

^ permalink raw reply

* [PATCH] net: stmmac: do not call phy_ethtool_ksettings_set from atomic context
From: Niklas Cassel @ 2016-12-06 13:47 UTC (permalink / raw)
  To: Giuseppe Cavallaro, Alexandre Torgue; +Cc: Niklas Cassel, netdev, linux-kernel

From: Niklas Cassel <niklas.cassel@axis.com>

>From what I can tell, spin_lock(&priv->lock) is not needed, since the
phy_ethtool_ksettings_set call is not given the priv struct.

phy_start_aneg takes the phydev->lock. Calls to phy_adjust_link
from phy_state_machine also takes the phydev->lock.

[   13.718319] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:97
[   13.726717] in_atomic(): 1, irqs_disabled(): 0, pid: 1307, name: ethtool
[   13.742115] Hardware name: Axis ARTPEC-6 Platform
[   13.746829] [<80110568>] (unwind_backtrace) from [<8010c2bc>] (show_stack+0x18/0x1c)
[   13.754575] [<8010c2bc>] (show_stack) from [<80433484>] (dump_stack+0x80/0xa0)
[   13.761801] [<80433484>] (dump_stack) from [<80145428>] (___might_sleep+0x108/0x170)
[   13.769554] [<80145428>] (___might_sleep) from [<806c9b50>] (mutex_lock+0x24/0x44)
[   13.777128] [<806c9b50>] (mutex_lock) from [<8050cbc0>] (phy_start_aneg+0x1c/0x13c)
[   13.784783] [<8050cbc0>] (phy_start_aneg) from [<8050d338>] (phy_ethtool_ksettings_set+0x98/0xd0)
[   13.793656] [<8050d338>] (phy_ethtool_ksettings_set) from [<80517adc>] (stmmac_ethtool_set_link_ksettings+0xa0/0xb4)
[   13.804184] [<80517adc>] (stmmac_ethtool_set_link_ksettings) from [<805c5138>] (ethtool_set_settings+0xd4/0x13c)
[   13.814358] [<805c5138>] (ethtool_set_settings) from [<805c9718>] (dev_ethtool+0x13c4/0x211c)
[   13.822882] [<805c9718>] (dev_ethtool) from [<805dc7c0>] (dev_ioctl+0x480/0x8e0)
[   13.830291] [<805dc7c0>] (dev_ioctl) from [<80260e34>] (do_vfs_ioctl+0x94/0xa00)
[   13.837699] [<80260e34>] (do_vfs_ioctl) from [<802617dc>] (SyS_ioctl+0x3c/0x60)
[   13.845011] [<802617dc>] (SyS_ioctl) from [<801088bc>] (__sys_trace_return+0x0/0x10)

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
index d5a8122b6033..6ab7e2bdcadd 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
@@ -406,9 +406,7 @@ stmmac_ethtool_set_link_ksettings(struct net_device *dev,
 		return 0;
 	}
 
-	spin_lock(&priv->lock);
 	rc = phy_ethtool_ksettings_set(phy, cmd);
-	spin_unlock(&priv->lock);
 
 	return rc;
 }
-- 
2.1.4

^ permalink raw reply related

* Re: [PATCH v4 09/13] net: ethernet: ti: cpts: rework initialization/deinitialization
From: Richard Cochran @ 2016-12-06 13:40 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: David S. Miller, netdev-u79uwXL29TY76Z2rM5mHXA, Mugunthan V N,
	Sekhar Nori, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-omap-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Murali Karicheri, Wingman Kwok,
	Thomas Gleixner
In-Reply-To: <20161205200525.16664-10-grygorii.strashko-l0cyMroinI0@public.gmane.org>

On Mon, Dec 05, 2016 at 02:05:21PM -0600, Grygorii Strashko wrote:
> @@ -372,34 +354,27 @@ void cpts_tx_timestamp(struct cpts *cpts, struct sk_buff *skb)
>  }
>  EXPORT_SYMBOL_GPL(cpts_tx_timestamp);
>  
> -int cpts_register(struct device *dev, struct cpts *cpts,
> -		  u32 mult, u32 shift)
> +int cpts_register(struct cpts *cpts)
>  {
>  	int err, i;
>  
> -	cpts->info = cpts_info;
> -	spin_lock_init(&cpts->lock);
> -
> -	cpts->cc.read = cpts_systim_read;
> -	cpts->cc.mask = CLOCKSOURCE_MASK(32);
> -	cpts->cc_mult = mult;
> -	cpts->cc.mult = mult;
> -	cpts->cc.shift = shift;
> -
>  	INIT_LIST_HEAD(&cpts->events);
>  	INIT_LIST_HEAD(&cpts->pool);
>  	for (i = 0; i < CPTS_MAX_EVENTS; i++)
>  		list_add(&cpts->pool_data[i].list, &cpts->pool);
>  
> -	cpts_clk_init(dev, cpts);
> +	clk_enable(cpts->refclk);
> +
>  	cpts_write32(cpts, CPTS_EN, control);
>  	cpts_write32(cpts, TS_PEND_EN, int_enable);
>  
> +	/* reinitialize cc.mult to original value as it can be modified
> +	 * by cpts_ptp_adjfreq().
> +	 */
> +	cpts->cc.mult = cpts->cc_mult;

This still isn't quite right.  First of all, you shouldn't clobber the
learned cc.mult value in cpts_register().  Presumably, if PTP had been
run on this port before, then the learned frequency is approximately
correct, and it should be left alone.

[ BTW, resetting the timecounter here makes no sense either.  Why
  reset the clock just because the interface goes down?  ]

Secondly, you have made the initialization order of these fields hard
to follow.  With the whole series applied:

probe()
	cpts_create()
		cpts_of_parse()
		{
			/* Set cc_mult but not cc.mult! */
			set cc_mult
			set cc.shift
		}
		cpts_calc_mult_shift()
		{
			/* Set them both. */
			cpts->cc_mult = mult;
			cpts->cc.mult = mult;
			cpts->cc.shift = shift;
		}
/* later on */
cpts_register()
	cpts->cc.mult = cpts->cc_mult;

There is no need for such complexity.  Simply set cc.mult in
cpts_create() _once_, immediately after the call to
cpts_calc_mult_shift().

You can remove the assignment from cpts_calc_mult_shift() and
cpts_register().

Thanks,
Richard
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 0/5] cpsw: add per channel shaper configuration
From: Ivan Khoronzhuk @ 2016-12-06 13:35 UTC (permalink / raw)
  To: Grygorii Strashko; +Cc: netdev, linux-kernel, mugunthanvnm, linux-omap
In-Reply-To: <4640ce88-75b0-8508-05f0-849e8fd4927f@ti.com>

On Mon, Dec 05, 2016 at 02:33:40PM -0600, Grygorii Strashko wrote:
> Hi Ivan,

Hi, Grygorii

I've sent patch that fixes issue in question.
https://lkml.org/lkml/2016/12/5/811
Could you please review it. Also I'm preparing several patches
to sophisticate cases with unexpected speed value.

> 
> On 11/29/2016 09:00 AM, Ivan Khoronzhuk wrote:
> > This series is intended to allow user to set rate for per channel
> > shapers at cpdma level. This patchset doesn't have impact on performance.
> > The rate can be set with:
> > 
> > echo 100 > /sys/class/net/ethX/queues/tx-0/tx_maxrate
> > 
> > Tested on am572xx
> > Based on net-next/master
> > 
> > Ivan Khoronzhuk (5):
> >   net: ethernet: ti: davinci_cpdma: add weight function for channels
> >   net: ethernet: ti: davinci_cpdma: add set rate for a channel
> >   net: ethernet: ti: cpsw: add .ndo to set per-queue rate
> >   net: ethernet: ti: cpsw: optimize end of poll cycle
> >   net: ethernet: ti: cpsw: split tx budget according between channels
> > 
> >  drivers/net/ethernet/ti/cpsw.c          | 264 +++++++++++++++----
> >  drivers/net/ethernet/ti/davinci_cpdma.c | 453 ++++++++++++++++++++++++++++----
> >  drivers/net/ethernet/ti/davinci_cpdma.h |   6 +
> >  3 files changed, 624 insertions(+), 99 deletions(-)
> > 
> 
> 
> I've just tried net-next on BBB and got below back-trace:
> INIT: Entering runlevel: 5
> Configuring network interfaces... [   15.018356] net eth0: initializing cpsw version 1.12 (0)
> [   15.120153] SMSC LAN8710/LAN8720 4a101000.mdio:00: attached PHY driver [SMSC LAN8710/LAN8720] (mii_bus:phy_addr=4a101000.mdio:00, irq=-1)
> [   15.138578] Division by zero in kernel.
> [   15.142667] CPU: 0 PID: 755 Comm: ifconfig Not tainted 4.9.0-rc7-01617-g6ea3f00 #5
> [   15.150277] Hardware name: Generic AM33XX (Flattened Device Tree)
> [   15.156399] Backtrace: 
> [   15.158898] [<c010bbf8>] (dump_backtrace) from [<c010beb4>] (show_stack+0x18/0x1c)
> [   15.166508]  r7:00000000 r6:600f0013 r5:00000000 r4:c0d395d0
> [   15.172200] [<c010be9c>] (show_stack) from [<c03ce6cc>] (dump_stack+0x8c/0xa0)
> [   15.179460] [<c03ce640>] (dump_stack) from [<c010bd60>] (__div0+0x1c/0x20)
> [   15.186368]  r7:00000000 r6:ddf1c010 r5:00000001 r4:00000001
> [   15.192068] [<c010bd44>] (__div0) from [<c03cd17c>] (Ldiv0+0x8/0x14)
> [   15.198503] [<bf07ba38>] (cpsw_split_budget [ti_cpsw]) from [<bf07ee8c>] (cpsw_ndo_open+0x4b8/0x5e4 [ti_cpsw])
> [   15.208554]  r10:ddf1c010 r9:00000000 r8:00000000 r7:ddf1c010 r6:dcc88000 r5:dcc88500
> [   15.216418]  r4:ddf1c0b0
> [   15.218985] [<bf07e9d4>] (cpsw_ndo_open [ti_cpsw]) from [<c0731c68>] (__dev_open+0xb0/0x114)
> [   15.227466]  r10:00000000 r9:00000000 r8:00000000 r7:dcc88030 r6:bf080364 r5:00000000
> [   15.235330]  r4:dcc88000
> [   15.237880] [<c0731bb8>] (__dev_open) from [<c0731f24>] (__dev_change_flags+0x9c/0x14c)
> [   15.245923]  r7:00001002 r6:00000001 r5:00001043 r4:dcc88000
> [   15.251613] [<c0731e88>] (__dev_change_flags) from [<c0731ff4>] (dev_change_flags+0x20/0x50)
> [   15.260093]  r9:00000000 r8:00000000 r7:dcbabf0c r6:00001002 r5:dcc8813c r4:dcc88000
> [   15.267886] [<c0731fd4>] (dev_change_flags) from [<c07a3168>] (devinet_ioctl+0x6d4/0x794)
> [   15.276105]  r9:00000000 r8:bee8cc64 r7:dcbabf0c r6:00000000 r5:dc921e80 r4:00000000
> [   15.283891] [<c07a2a94>] (devinet_ioctl) from [<c07a54bc>] (inet_ioctl+0x19c/0x1c8)
> [   15.291587]  r10:00000000 r9:dc920000 r8:bee8cc64 r7:c0d64bc0 r6:bee8cc64 r5:dd2728e0
> [   15.299450]  r4:00008914
> [   15.302002] [<c07a5320>] (inet_ioctl) from [<c07112dc>] (sock_ioctl+0x14c/0x300)
> [   15.309443] [<c0711190>] (sock_ioctl) from [<c0241868>] (do_vfs_ioctl+0xa8/0x98c)
> [   15.316962]  r7:00000003 r6:ddf0a780 r5:dd2728e0 r4:bee8cc64
> [   15.322653] [<c02417c0>] (do_vfs_ioctl) from [<c0242188>] (SyS_ioctl+0x3c/0x64)
> [   15.330000]  r10:00000000 r9:dc920000 r8:bee8cc64 r7:00008914 r6:ddf0a780 r5:00000003
> [   15.337864]  r4:ddf0a780
> [   15.340416] [<c024214c>] (SyS_ioctl) from [<c0107d60>] (ret_fast_syscall+0x0/0x3c)
> [   15.348024]  r9:dc920000 r8:c0107f24 r7:00000036 r6:bee8cf4d r5:bee8ce4c r4:000949f0
> [   15.361174] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> udhcpc (v1.23.1) started
> 
> 
> -- 
> regards,
> -grygorii

^ permalink raw reply

* Re: [PATCH net-next 1/3] net/flow_dissector: Enable matching on flags for TC filter consumers
From: Jiri Pirko @ 2016-12-06 12:43 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: David S. Miller, netdev, Jiri Pirko, Roi Dayan, Hadar Har-Zion,
	Tom Herbert
In-Reply-To: <1481027579-23195-2-git-send-email-ogerlitz@mellanox.com>

Tue, Dec 06, 2016 at 01:32:57PM CET, ogerlitz@mellanox.com wrote:
>This is a pre-step to allow TC classifiers that makes use of the
>flow-dissector (e.g Flower) to match on flow-dissector flags which
>are located in the control struct.
>
>Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
>---
> include/net/flow_dissector.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
>index c4f3166..a679e6c 100644
>--- a/include/net/flow_dissector.h
>+++ b/include/net/flow_dissector.h
>@@ -154,8 +154,8 @@ struct flow_dissector {
> };
> 
> struct flow_keys {
>+#define FLOW_KEYS_HASH_START_FIELD control
> 	struct flow_dissector_key_control control;
>-#define FLOW_KEYS_HASH_START_FIELD basic

How is this hashing related to cls_flower dissections? What am I
missing?

^ permalink raw reply

* [PATCH RFC v1 2/2] ethtool: add support to specify ring_cookie as VF/queue
From: Jacob Keller @ 2016-12-06 12:40 UTC (permalink / raw)
  To: John W . Linville; +Cc: netdev, Jacob Keller
In-Reply-To: <20161206124031.32346-1-jacob.e.keller@intel.com>

Add support to allow users to specify the ring_cookie as a VF/queue
value, ie: '0/25' means direct to the physical function queue 25, while
'3/10' means to direct to VF id 3, queue 10. It is driver dependent to
determine what the VF id is, excepting that 0 indicates the physical
function. This allows the user to more easily specify the matching
notation as it is understood by the kernel and some drivers. Since no
driver exists today with over 2,147,483,647 queues, this notation will
simply fail on drivers which don't recognize the new partitioning.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 ethtool.8.in |  3 ++-
 rxclass.c    | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 53 insertions(+), 8 deletions(-)

diff --git a/ethtool.8.in b/ethtool.8.in
index 5c36c06385f6..cb165fa4c77a 100644
--- a/ethtool.8.in
+++ b/ethtool.8.in
@@ -826,7 +826,8 @@ Specifies the Rx queue to send packets to, or some other action.
 nokeep;
 lB	l.
 -1	Drop the matched flow
-0 or higher	Rx queue to route the flow
+0 or higher	Route flow to specific Rx queue on the physical function
+N/M	Route flow to Rx queue M of specific virtual function N (0 indicates the physical function)
 .TE
 .TP
 .BI loc \ N
diff --git a/rxclass.c b/rxclass.c
index 7f0e765d3b47..de25550529b4 100644
--- a/rxclass.c
+++ b/rxclass.c
@@ -614,6 +614,7 @@ typedef enum {
 	OPT_IP4,
 	OPT_IP6,
 	OPT_MAC,
+	OPT_ACTION,
 } rule_opt_type_t;
 
 #define NFC_FLAG_RING		0x001
@@ -654,7 +655,7 @@ static const struct rule_opts rule_nfc_tcp_ip4[] = {
 	{ "dst-port", OPT_BE16, NFC_FLAG_DPORT,
 	  offsetof(struct ethtool_rx_flow_spec, h_u.tcp_ip4_spec.pdst),
 	  offsetof(struct ethtool_rx_flow_spec, m_u.tcp_ip4_spec.pdst) },
-	{ "action", OPT_U64, NFC_FLAG_RING,
+	{ "action", OPT_ACTION, NFC_FLAG_RING,
 	  offsetof(struct ethtool_rx_flow_spec, ring_cookie), -1 },
 	{ "loc", OPT_U32, NFC_FLAG_LOC,
 	  offsetof(struct ethtool_rx_flow_spec, location), -1 },
@@ -685,7 +686,7 @@ static const struct rule_opts rule_nfc_esp_ip4[] = {
 	{ "spi", OPT_BE32, NFC_FLAG_SPI,
 	  offsetof(struct ethtool_rx_flow_spec, h_u.esp_ip4_spec.spi),
 	  offsetof(struct ethtool_rx_flow_spec, m_u.esp_ip4_spec.spi) },
-	{ "action", OPT_U64, NFC_FLAG_RING,
+	{ "action", OPT_ACTION, NFC_FLAG_RING,
 	  offsetof(struct ethtool_rx_flow_spec, ring_cookie), -1 },
 	{ "loc", OPT_U32, NFC_FLAG_LOC,
 	  offsetof(struct ethtool_rx_flow_spec, location), -1 },
@@ -728,7 +729,7 @@ static const struct rule_opts rule_nfc_usr_ip4[] = {
 	{ "dst-port", OPT_BE16, NFC_FLAG_DPORT,
 	  offsetof(struct ethtool_rx_flow_spec, h_u.usr_ip4_spec.l4_4_bytes) + 2,
 	  offsetof(struct ethtool_rx_flow_spec, m_u.usr_ip4_spec.l4_4_bytes) + 2 },
-	{ "action", OPT_U64, NFC_FLAG_RING,
+	{ "action", OPT_ACTION, NFC_FLAG_RING,
 	  offsetof(struct ethtool_rx_flow_spec, ring_cookie), -1 },
 	{ "loc", OPT_U32, NFC_FLAG_LOC,
 	  offsetof(struct ethtool_rx_flow_spec, location), -1 },
@@ -762,7 +763,7 @@ static const struct rule_opts rule_nfc_tcp_ip6[] = {
 	{ "dst-port", OPT_BE16, NFC_FLAG_DPORT,
 	  offsetof(struct ethtool_rx_flow_spec, h_u.tcp_ip6_spec.pdst),
 	  offsetof(struct ethtool_rx_flow_spec, m_u.tcp_ip6_spec.pdst) },
-	{ "action", OPT_U64, NFC_FLAG_RING,
+	{ "action", OPT_ACTION, NFC_FLAG_RING,
 	  offsetof(struct ethtool_rx_flow_spec, ring_cookie), -1 },
 	{ "loc", OPT_U32, NFC_FLAG_LOC,
 	  offsetof(struct ethtool_rx_flow_spec, location), -1 },
@@ -793,7 +794,7 @@ static const struct rule_opts rule_nfc_esp_ip6[] = {
 	{ "spi", OPT_BE32, NFC_FLAG_SPI,
 	  offsetof(struct ethtool_rx_flow_spec, h_u.esp_ip6_spec.spi),
 	  offsetof(struct ethtool_rx_flow_spec, m_u.esp_ip6_spec.spi) },
-	{ "action", OPT_U64, NFC_FLAG_RING,
+	{ "action", OPT_ACTION, NFC_FLAG_RING,
 	  offsetof(struct ethtool_rx_flow_spec, ring_cookie), -1 },
 	{ "loc", OPT_U32, NFC_FLAG_LOC,
 	  offsetof(struct ethtool_rx_flow_spec, location), -1 },
@@ -836,7 +837,7 @@ static const struct rule_opts rule_nfc_usr_ip6[] = {
 	{ "dst-port", OPT_BE16, NFC_FLAG_DPORT,
 	  offsetof(struct ethtool_rx_flow_spec, h_u.usr_ip6_spec.l4_4_bytes) + 2,
 	  offsetof(struct ethtool_rx_flow_spec, m_u.usr_ip6_spec.l4_4_bytes) + 2 },
-	{ "action", OPT_U64, NFC_FLAG_RING,
+	{ "action", OPT_ACTION, NFC_FLAG_RING,
 	  offsetof(struct ethtool_rx_flow_spec, ring_cookie), -1 },
 	{ "loc", OPT_U32, NFC_FLAG_LOC,
 	  offsetof(struct ethtool_rx_flow_spec, location), -1 },
@@ -864,7 +865,7 @@ static const struct rule_opts rule_nfc_ether[] = {
 	{ "proto", OPT_BE16, NFC_FLAG_PROTO,
 	  offsetof(struct ethtool_rx_flow_spec, h_u.ether_spec.h_proto),
 	  offsetof(struct ethtool_rx_flow_spec, m_u.ether_spec.h_proto) },
-	{ "action", OPT_U64, NFC_FLAG_RING,
+	{ "action", OPT_ACTION, NFC_FLAG_RING,
 	  offsetof(struct ethtool_rx_flow_spec, ring_cookie), -1 },
 	{ "loc", OPT_U32, NFC_FLAG_LOC,
 	  offsetof(struct ethtool_rx_flow_spec, location), -1 },
@@ -948,6 +949,40 @@ static int rxclass_get_ether(char *str, unsigned char *val)
 	return 0;
 }
 
+static int rxclass_get_action(char *str, unsigned long long *val)
+{
+	const long long max_queue = ~0ULL >> 32;
+	const long long max_vf = ~0ULL >> 8;
+	char *end_vf, *end_queue;
+	unsigned long long queue, vf;
+
+	/* If there is no '/' just read the full value like a u64 */
+	if (!strchr(str, '/'))
+		return rxclass_get_ulong(str, val, 64);
+
+	errno = 0;
+
+	vf = strtoull(str, &end_vf, 0);
+
+	/*
+	 * Ensure that we consume everything up to the slash, and that there
+	 * is more than the empty string following the slash.
+	 */
+	if ((*end_vf != '/') || !(*(end_vf + 1)) || errno || (vf > max_vf))
+		return -1;
+
+	errno = 0;
+
+	queue = strtoull(end_vf + 1, &end_queue, 0);
+
+	if (*end_queue || errno || (queue > max_queue))
+		return -1;
+
+	*val = (vf << ETHTOOL_RX_FLOW_SPEC_RING_VF_OFF) | queue;
+
+	return 0;
+}
+
 static int rxclass_get_val(char *str, unsigned char *p, u32 *flags,
 			   const struct rule_opts *opt)
 {
@@ -1070,6 +1105,14 @@ static int rxclass_get_val(char *str, unsigned char *p, u32 *flags,
 			memcpy(&p[opt->moffset], &mask, ETH_ALEN);
 		break;
 	}
+	case OPT_ACTION: {
+		unsigned long long val;
+		err = rxclass_get_action(str, &val);
+		if (err)
+			return -1;
+		*(u64 *)&p[opt->offset] = (u64)val;
+		break;
+	}
 	case OPT_NONE:
 	default:
 		return -1;
@@ -1183,6 +1226,7 @@ static int rxclass_get_mask(char *str, unsigned char *p,
 		memcpy(&p[opt->moffset], val, ETH_ALEN);
 		break;
 	}
+	case OPT_ACTION:
 	case OPT_NONE:
 	default:
 		return -1;
-- 
2.11.0.rc2.152.g4d04e67

^ permalink raw reply related

* [PATCH RFC v1 1/2] rxclass: use ethtool_get_flow_spec_ring to display queue and VF
From: Jacob Keller @ 2016-12-06 12:40 UTC (permalink / raw)
  To: John W . Linville; +Cc: netdev, Jacob Keller
In-Reply-To: <20161206124031.32346-1-jacob.e.keller@intel.com>

Recent kernels have made the ring_cookie store both the Queue index as
well as a VF identifier. This allows for drivers to direct traffic to
specific VF queues, without having the user have to understand the
physical queue layout. Add support to display this notation so that
users do not have to manually parse the value.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 rxclass.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/rxclass.c b/rxclass.c
index c7bfebaf6e22..7f0e765d3b47 100644
--- a/rxclass.c
+++ b/rxclass.c
@@ -247,11 +247,19 @@ static void rxclass_print_nfc_rule(struct ethtool_rx_flow_spec *fsp)
 
 	rxclass_print_nfc_spec_ext(fsp);
 
-	if (fsp->ring_cookie != RX_CLS_FLOW_DISC)
-		fprintf(stdout, "\tAction: Direct to queue %llu\n",
-			fsp->ring_cookie);
-	else
+	if (fsp->ring_cookie != RX_CLS_FLOW_DISC) {
+		u64 vf = ethtool_get_flow_spec_ring_vf(fsp->ring_cookie);
+		u64 queue = ethtool_get_flow_spec_ring(fsp->ring_cookie);
+
+		if (vf)
+			fprintf(stdout, "\tAction: Direct to queue %llu\n",
+				queue);
+		else
+			fprintf(stdout, "\tAction: Direct to VF %llu queue %llu\n",
+				vf, queue);
+	} else {
 		fprintf(stdout, "\tAction: Drop\n");
+	}
 
 	fprintf(stdout, "\n");
 }
-- 
2.11.0.rc2.152.g4d04e67

^ permalink raw reply related

* [PATCH RFC v1 0/2] ethtool: add support for VF in ring_cookie
From: Jacob Keller @ 2016-12-06 12:40 UTC (permalink / raw)
  To: John W . Linville; +Cc: netdev, Jacob Keller

The ring_cookie value for an ntuple filter is used by the driver to
determine what queue to direct the traffic towards. Since the main
function has cnotrol over all queues it is possible to use this to
direct traffic to a queue on a virtual function rather than a queue on
the physical function. However, this was cumbersome to do if you did not
know exactly how the device laid out virtual function queues.

To alleviate this, the kernel side of the ethtool API gained a couple of
functions used to partition the ring_cookie value into a queue index,
and a virtual function index. It was assumed that no device was likely
to have more than 2^32 queues, and currently no device has more than 2^8
virtual functions, so the split was chosen such that the lower 32 bits
of the ring_cookie represent the queue, and the next 8 bits would
represent a virtual function id.

This is useful, but lacked support on the ethtool command line side. It
is possible to simply use hex notation, and to "know" that the driver
works in this manner. However, this makes use of and knowledge of the
feature not widespread.

This series attempts to fix that with two patches. First, we make use of
the access functions when displaying the queue. Since no device
currently supports queues beyond 2^32 it is unlikely that any driver
which does not honor this partitioning of the ring_cookie would have its
index incorrectly interpreted. This makes it so that the user will see
the queue index and virtual function index split apart instead of being
displayed as a combined base-10 number.

Additionally, the second patch adds support for a new notation for the
action. Instead of just using a number, we add support for the special
syntax "X/Y" where X would be a number from 0-256 which is the virtual
function index, while Y would be a number from 0 to 2^32 which is the
queue index. This makes it much easier to write the encoded ring_cookie
value.

This series is sent as an RFC so that others may comment on the design
of the new syntax, and offer alternative suggestions. Some of the ones I
have considered:

a) just leave the second part alone, and display the VF+queue split, but
do not provide an easier syntax for writing the split value. This is
easy to do, but leaves the feature as a sort of hidden gem.
Additionally, the user would most easily do this by writing it as hex
notation such as

  action 0x100000005

However, this is cumbersome. It can be partially alleviated by using
shell scripting on top of ethtool, but again is somewhat of a pain.

b) actually adding a new key work so that you would do something like
"queue x" and "vf y" or similar. However, this is much more difficult to
correctly implement. The current implementation of handling the values
assumes that each type argument is followed by a single argument which
fits into a well defined section of the structure. Because the structure
is not specified as le64 or be64, we can't use a union or similar to
make it correct (since placement in the union ordering would depend on
byte ordering of the field!) so we would have to re-architect the
handling of values to allow multiple keywords to combine into a single
value. This seemed incredibly cumbersome.

The disadvantage of the new syntax is that it *is* new syntax that we
have to parse ourselves, and that may not be obvious to users. However,
we can document it, and the current syntax will work for all drivers,
and the new syntax will generally only work on drivers which use the
ring_cookie partitioning in this manner (as no current driver has queues
beyond 2^32, as was decided when the partitioning scheme was added to
the ethtool header file).

I like the use of / but am open to alternative suggestions for a simple
sequence. Note that we cannot use spaces, as this would move into the
argv solution of (b) which I do not have time to implement, and would be
a much more massive project.

Jacob Keller (2):
  rxclass: use ethtool_get_flow_spec_ring to display queue and VF
  ethtool: add support to specify ring_cookie as VF/queue

 ethtool.8.in |  3 ++-
 rxclass.c    | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 65 insertions(+), 12 deletions(-)

-- 
2.11.0.rc2.152.g4d04e67

^ permalink raw reply

* Re: [PATCH net-next 2/3] net/sched: cls_flower: Add support for matching on flags
From: Jiri Pirko @ 2016-12-06 12:40 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: David S. Miller, netdev, Jiri Pirko, Roi Dayan, Hadar Har-Zion
In-Reply-To: <1481027579-23195-3-git-send-email-ogerlitz@mellanox.com>

Tue, Dec 06, 2016 at 01:32:58PM CET, ogerlitz@mellanox.com wrote:
>Add UAPI to provide set of flags for matching, where the flags
>provided from user-space are mapped to flow-dissector flags.
>
>The 1st flag allows to match on whether the packet is an
>IP fragment and corresponds to the FLOW_DIS_IS_FRAGMENT flag.
>
>Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
>Reviewed-by: Paul Blakey <paulb@mellanox.com>
>---
> include/uapi/linux/pkt_cls.h |  7 ++++
> net/sched/cls_flower.c       | 83 ++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 90 insertions(+)
>
>diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
>index 86786d4..f114277 100644
>--- a/include/uapi/linux/pkt_cls.h
>+++ b/include/uapi/linux/pkt_cls.h
>@@ -457,11 +457,18 @@ enum {
> 	TCA_FLOWER_KEY_ENC_UDP_SRC_PORT_MASK,	/* be16 */
> 	TCA_FLOWER_KEY_ENC_UDP_DST_PORT,	/* be16 */
> 	TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK,	/* be16 */
>+
>+	TCA_FLOWER_KEY_FLAGS,		/* be32 */
>+	TCA_FLOWER_KEY_FLAGS_MASK,	/* be32 */
> 	__TCA_FLOWER_MAX,
> };
> 
> #define TCA_FLOWER_MAX (__TCA_FLOWER_MAX - 1)
> 
>+enum {
>+	TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT = BIT(0),
>+};
>+
> /* Match-all classifier */
> 
> enum {
>diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>index c5cea78..d9f4124 100644
>--- a/net/sched/cls_flower.c
>+++ b/net/sched/cls_flower.c
>@@ -383,6 +383,8 @@ static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 1] = {
> 	[TCA_FLOWER_KEY_ENC_UDP_SRC_PORT_MASK]	= { .type = NLA_U16 },
> 	[TCA_FLOWER_KEY_ENC_UDP_DST_PORT]	= { .type = NLA_U16 },
> 	[TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK]	= { .type = NLA_U16 },
>+	[TCA_FLOWER_KEY_FLAGS]		= { .type = NLA_U32 },
>+	[TCA_FLOWER_KEY_FLAGS_MASK]	= { .type = NLA_U32 },
> };
> 
> static void fl_set_key_val(struct nlattr **tb,
>@@ -417,6 +419,40 @@ static void fl_set_key_vlan(struct nlattr **tb,
> 	}
> }
> 
>+static void set_flags(u32 flower_key, u32 flower_mask,

Use some prefix for helpers like this, or at least __



>+		      u32 *dissector_key, u32 *dissector_mask,
>+		      u32 flower_flag_bit, u32 dissector_flag_bit)
>+{
>+	if (flower_mask & flower_flag_bit) {
>+		*dissector_mask |= dissector_flag_bit;
>+		if (flower_key & flower_flag_bit)
>+			*dissector_key |= dissector_flag_bit;
>+	}
>+}
>+
>+static void fl_set_key_flags(struct nlattr **tb,
>+			     u32 *flags_key,
>+			     u32 *flags_mask)
>+{
>+	u32 key, mask;
>+
>+	if (!tb[TCA_FLOWER_KEY_FLAGS])
>+		return;
>+
>+	key = be32_to_cpu(nla_get_u32(tb[TCA_FLOWER_KEY_FLAGS]));
>+
>+	if (!tb[TCA_FLOWER_KEY_FLAGS_MASK])
>+		mask = ~0;
>+	else
>+		mask = be32_to_cpu(nla_get_u32(tb[TCA_FLOWER_KEY_FLAGS_MASK]));
>+
>+	*flags_key  = 0;
>+	*flags_mask = 0;
>+
>+	set_flags(key, mask, flags_key, flags_mask,
>+		  TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT, FLOW_DIS_IS_FRAGMENT);
>+}
>+
> static int fl_set_key(struct net *net, struct nlattr **tb,
> 		      struct fl_flow_key *key, struct fl_flow_key *mask)
> {
>@@ -543,6 +579,8 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
> 		       &mask->enc_tp.dst, TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK,
> 		       sizeof(key->enc_tp.dst));
> 
>+	fl_set_key_flags(tb, &key->control.flags, &mask->control.flags);
>+
> 	return 0;
> }
> 
>@@ -877,6 +915,48 @@ static int fl_dump_key_vlan(struct sk_buff *skb,
> 	return 0;
> }
> 
>+static void get_flags(u32 dissector_key, u32 dissector_mask,

Same here.


>+		      u32 *flower_key, u32 *flower_mask,
>+		      u32 flower_flag_bit, u32 dissector_flag_bit)
>+{
>+	if (dissector_mask & dissector_flag_bit) {
>+		*flower_mask |= flower_flag_bit;
>+		if (dissector_key & dissector_flag_bit)
>+			*flower_key |= flower_flag_bit;
>+	}
>+}
>+
>+static int fl_dump_key_flags(struct sk_buff *skb,
>+			     u32 flags_key,
>+			     u32 flags_mask)

over-wrapped :)


>+{
>+	u32 key, mask;
>+	__be32 _key, _mask;
>+	int err;
>+
>+	if (!memchr_inv(&flags_mask, 0, sizeof(flags_mask)))
>+		return 0;
>+
>+	key  = 0;
            ^ remove the extra space


>+	mask = 0;
>+
>+	get_flags(flags_key, flags_mask, &key, &mask,
>+		  TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT, FLOW_DIS_IS_FRAGMENT);
>+
>+	_key  = cpu_to_be32(key);
             ^ remove the extra space


>+	_mask = cpu_to_be32(mask);
>+
>+	err = nla_put(skb, TCA_FLOWER_KEY_FLAGS, 4, &_key);
>+	if (err)
>+		return err;
>+
>+	err = nla_put(skb, TCA_FLOWER_KEY_FLAGS_MASK, 4, &_mask);

Just return nla_put...


>+	if (err)
>+		return err;
>+
>+	return 0;
>+}
>+
> static int fl_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
> 		   struct sk_buff *skb, struct tcmsg *t)
> {
>@@ -1012,6 +1092,9 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
> 			    sizeof(key->enc_tp.dst)))
> 		goto nla_put_failure;
> 
>+	if (fl_dump_key_flags(skb, key->control.flags, mask->control.flags))
>+		goto nla_put_failure;
>+
> 	nla_put_u32(skb, TCA_FLOWER_FLAGS, f->flags);
> 
> 	if (tcf_exts_dump(skb, &f->exts))
>-- 
>2.3.7
>

^ permalink raw reply

* [PATCH net-next 0/3] net/sched: cls_flower: Add support for matching on dissection flags
From: Or Gerlitz @ 2016-12-06 12:32 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Jiri Pirko, Roi Dayan, Hadar Har-Zion, Or Gerlitz

This series add the UAPI to provide set of flags for matching, where the 
flags provided from user-space are mapped to flow-dissector flags.

The 1st flag allows to match on whether the packet is an
IP fragment and corresponds to the FLOW_DIS_IS_FRAGMENT flag.

Or

Or Gerlitz (3):
  net/flow_dissector: Enable matching on flags for TC filter consumers
  net/sched: cls_flower: Add support for matching on flags
  net/mlx5e: Offload TC matching on packets being IP fragments

 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 12 ++++
 include/net/flow_dissector.h                    |  2 +-
 include/uapi/linux/pkt_cls.h                    |  7 +++
 net/sched/cls_flower.c                          | 83 +++++++++++++++++++++++++
 4 files changed, 103 insertions(+), 1 deletion(-)

-- 
2.3.7

^ permalink raw reply

* [PATCH net-next 3/3] net/mlx5e: Offload TC matching on packets being IP fragments
From: Or Gerlitz @ 2016-12-06 12:32 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Jiri Pirko, Roi Dayan, Hadar Har-Zion, Or Gerlitz
In-Reply-To: <1481027579-23195-1-git-send-email-ogerlitz@mellanox.com>

Enable offloading of matching on packets being fragments.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index f07ef8c..f8829b5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -31,6 +31,7 @@
  */
 
 #include <net/flow_dissector.h>
+#include <net/sch_generic.h>
 #include <net/pkt_cls.h>
 #include <net/tc_act/tc_gact.h>
 #include <net/tc_act/tc_skbedit.h>
@@ -363,7 +364,18 @@ static int __parse_cls_flower(struct mlx5e_priv *priv,
 			skb_flow_dissector_target(f->dissector,
 						  FLOW_DISSECTOR_KEY_CONTROL,
 						  f->key);
+
+		struct flow_dissector_key_control *mask =
+			skb_flow_dissector_target(f->dissector,
+						  FLOW_DISSECTOR_KEY_CONTROL,
+						  f->mask);
 		addr_type = key->addr_type;
+
+		if (mask->flags & FLOW_DIS_IS_FRAGMENT) {
+			MLX5_SET(fte_match_set_lyr_2_4, headers_c, frag, 1);
+			MLX5_SET(fte_match_set_lyr_2_4, headers_v, frag,
+				 key->flags & FLOW_DIS_IS_FRAGMENT);
+		}
 	}
 
 	if (dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_BASIC)) {
-- 
2.3.7

^ permalink raw reply related

* [PATCH net-next 1/3] net/flow_dissector: Enable matching on flags for TC filter consumers
From: Or Gerlitz @ 2016-12-06 12:32 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Jiri Pirko, Roi Dayan, Hadar Har-Zion, Or Gerlitz,
	Tom Herbert
In-Reply-To: <1481027579-23195-1-git-send-email-ogerlitz@mellanox.com>

This is a pre-step to allow TC classifiers that makes use of the
flow-dissector (e.g Flower) to match on flow-dissector flags which
are located in the control struct.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 include/net/flow_dissector.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
index c4f3166..a679e6c 100644
--- a/include/net/flow_dissector.h
+++ b/include/net/flow_dissector.h
@@ -154,8 +154,8 @@ struct flow_dissector {
 };
 
 struct flow_keys {
+#define FLOW_KEYS_HASH_START_FIELD control
 	struct flow_dissector_key_control control;
-#define FLOW_KEYS_HASH_START_FIELD basic
 	struct flow_dissector_key_basic basic;
 	struct flow_dissector_key_tags tags;
 	struct flow_dissector_key_vlan vlan;
-- 
2.3.7
Cc: Tom Herbert <tom@herbertland.com>

^ permalink raw reply related

* [PATCH net-next 2/3] net/sched: cls_flower: Add support for matching on flags
From: Or Gerlitz @ 2016-12-06 12:32 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Jiri Pirko, Roi Dayan, Hadar Har-Zion, Or Gerlitz
In-Reply-To: <1481027579-23195-1-git-send-email-ogerlitz@mellanox.com>

Add UAPI to provide set of flags for matching, where the flags
provided from user-space are mapped to flow-dissector flags.

The 1st flag allows to match on whether the packet is an
IP fragment and corresponds to the FLOW_DIS_IS_FRAGMENT flag.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
---
 include/uapi/linux/pkt_cls.h |  7 ++++
 net/sched/cls_flower.c       | 83 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 90 insertions(+)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 86786d4..f114277 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -457,11 +457,18 @@ enum {
 	TCA_FLOWER_KEY_ENC_UDP_SRC_PORT_MASK,	/* be16 */
 	TCA_FLOWER_KEY_ENC_UDP_DST_PORT,	/* be16 */
 	TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK,	/* be16 */
+
+	TCA_FLOWER_KEY_FLAGS,		/* be32 */
+	TCA_FLOWER_KEY_FLAGS_MASK,	/* be32 */
 	__TCA_FLOWER_MAX,
 };
 
 #define TCA_FLOWER_MAX (__TCA_FLOWER_MAX - 1)
 
+enum {
+	TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT = BIT(0),
+};
+
 /* Match-all classifier */
 
 enum {
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index c5cea78..d9f4124 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -383,6 +383,8 @@ static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 1] = {
 	[TCA_FLOWER_KEY_ENC_UDP_SRC_PORT_MASK]	= { .type = NLA_U16 },
 	[TCA_FLOWER_KEY_ENC_UDP_DST_PORT]	= { .type = NLA_U16 },
 	[TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK]	= { .type = NLA_U16 },
+	[TCA_FLOWER_KEY_FLAGS]		= { .type = NLA_U32 },
+	[TCA_FLOWER_KEY_FLAGS_MASK]	= { .type = NLA_U32 },
 };
 
 static void fl_set_key_val(struct nlattr **tb,
@@ -417,6 +419,40 @@ static void fl_set_key_vlan(struct nlattr **tb,
 	}
 }
 
+static void set_flags(u32 flower_key, u32 flower_mask,
+		      u32 *dissector_key, u32 *dissector_mask,
+		      u32 flower_flag_bit, u32 dissector_flag_bit)
+{
+	if (flower_mask & flower_flag_bit) {
+		*dissector_mask |= dissector_flag_bit;
+		if (flower_key & flower_flag_bit)
+			*dissector_key |= dissector_flag_bit;
+	}
+}
+
+static void fl_set_key_flags(struct nlattr **tb,
+			     u32 *flags_key,
+			     u32 *flags_mask)
+{
+	u32 key, mask;
+
+	if (!tb[TCA_FLOWER_KEY_FLAGS])
+		return;
+
+	key = be32_to_cpu(nla_get_u32(tb[TCA_FLOWER_KEY_FLAGS]));
+
+	if (!tb[TCA_FLOWER_KEY_FLAGS_MASK])
+		mask = ~0;
+	else
+		mask = be32_to_cpu(nla_get_u32(tb[TCA_FLOWER_KEY_FLAGS_MASK]));
+
+	*flags_key  = 0;
+	*flags_mask = 0;
+
+	set_flags(key, mask, flags_key, flags_mask,
+		  TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT, FLOW_DIS_IS_FRAGMENT);
+}
+
 static int fl_set_key(struct net *net, struct nlattr **tb,
 		      struct fl_flow_key *key, struct fl_flow_key *mask)
 {
@@ -543,6 +579,8 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
 		       &mask->enc_tp.dst, TCA_FLOWER_KEY_ENC_UDP_DST_PORT_MASK,
 		       sizeof(key->enc_tp.dst));
 
+	fl_set_key_flags(tb, &key->control.flags, &mask->control.flags);
+
 	return 0;
 }
 
@@ -877,6 +915,48 @@ static int fl_dump_key_vlan(struct sk_buff *skb,
 	return 0;
 }
 
+static void get_flags(u32 dissector_key, u32 dissector_mask,
+		      u32 *flower_key, u32 *flower_mask,
+		      u32 flower_flag_bit, u32 dissector_flag_bit)
+{
+	if (dissector_mask & dissector_flag_bit) {
+		*flower_mask |= flower_flag_bit;
+		if (dissector_key & dissector_flag_bit)
+			*flower_key |= flower_flag_bit;
+	}
+}
+
+static int fl_dump_key_flags(struct sk_buff *skb,
+			     u32 flags_key,
+			     u32 flags_mask)
+{
+	u32 key, mask;
+	__be32 _key, _mask;
+	int err;
+
+	if (!memchr_inv(&flags_mask, 0, sizeof(flags_mask)))
+		return 0;
+
+	key  = 0;
+	mask = 0;
+
+	get_flags(flags_key, flags_mask, &key, &mask,
+		  TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT, FLOW_DIS_IS_FRAGMENT);
+
+	_key  = cpu_to_be32(key);
+	_mask = cpu_to_be32(mask);
+
+	err = nla_put(skb, TCA_FLOWER_KEY_FLAGS, 4, &_key);
+	if (err)
+		return err;
+
+	err = nla_put(skb, TCA_FLOWER_KEY_FLAGS_MASK, 4, &_mask);
+	if (err)
+		return err;
+
+	return 0;
+}
+
 static int fl_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
 		   struct sk_buff *skb, struct tcmsg *t)
 {
@@ -1012,6 +1092,9 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
 			    sizeof(key->enc_tp.dst)))
 		goto nla_put_failure;
 
+	if (fl_dump_key_flags(skb, key->control.flags, mask->control.flags))
+		goto nla_put_failure;
+
 	nla_put_u32(skb, TCA_FLOWER_FLAGS, f->flags);
 
 	if (tcf_exts_dump(skb, &f->exts))
-- 
2.3.7

^ permalink raw reply related

* Re: [PATCH]net:switchdev:add judgement and proper return vlaue to make switchdev_port_vlan_fill() more robust
From: Jiri Pirko @ 2016-12-06 12:32 UTC (permalink / raw)
  To: Feng Deng; +Cc: David S. Miller, netdev, linux-kernel, feng.deng
In-Reply-To: <CAEX1niU24TLaRff+TMPu4ek4vT=R2mv9yaE6j1W1ZsFYh99QGg@mail.gmail.com>

Tue, Dec 06, 2016 at 05:46:36AM CET, cxdx2006@gmail.com wrote:
>From: Feng Deng<cxdx2006@gmail.com>
>
>1.add judgement to make sure switchdev_port_vlan_fill() could
>  return right,  even when switchdev_port_vlan_dump_put() failed  ,
>  which will make switchdev_port_vlan_fill()  more robust
>2.add proper return vlaue at the end of the switchdev_port_vlan_fill()
>
>Signed-off-by: Feng Deng <feng.deng@cortina-access.com>

Please don't send patches like this. You are wasting everyone's time.

Thanks!

^ permalink raw reply

* Re: [PATCH] net:switchdev: fix to release the lock before function return to avoid deadlock
From: Jiri Pirko @ 2016-12-06 12:29 UTC (permalink / raw)
  To: Feng Deng; +Cc: davem, netdev, linux-kernel, cxdx2006
In-Reply-To: <1481018964-47554-1-git-send-email-feng.deng@cortina-access.com>

Tue, Dec 06, 2016 at 11:09:24AM CET, feng.deng@cortina-access.com wrote:
>before switchdev_deferred_dequeue() normal return,show release the lock,
>if not,maybe there will be deadlock sometimes
>
>Signed-off-by: Feng Deng <feng.deng@cortina-access.com>
>---
> net/switchdev/switchdev.c | 1 +
> 1 file changed, 1 insertion(+)
>
>diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>index 3b95fe9..c0a1ad4 100644
>--- a/net/switchdev/switchdev.c
>+++ b/net/switchdev/switchdev.c
>@@ -120,6 +120,7 @@ static struct switchdev_deferred_item *switchdev_deferred_dequeue(void)
> 	dfitem = list_first_entry(&deferred,
> 				  struct switchdev_deferred_item, list);
> 	list_del(&dfitem->list);
>+	spin_unlock_bh(&deferred_lock);
> unlock:
> 	spin_unlock_bh(&deferred_lock);

You are joking right?

^ permalink raw reply

* Re: [PATCH v2 net-next 1/2] flow dissector: ICMP support
From: Jiri Pirko @ 2016-12-06 12:26 UTC (permalink / raw)
  To: Simon Horman
  Cc: Tom Herbert, David Miller, Linux Kernel Network Developers,
	Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, Jamal Hadi Salim,
	Jiri Pirko
In-Reply-To: <20161206105145.GA27020@penelope.horms.nl>

Tue, Dec 06, 2016 at 11:51:46AM CET, simon.horman@netronome.com wrote:
>On Mon, Dec 05, 2016 at 09:29:45AM -0800, Tom Herbert wrote:
>> On Mon, Dec 5, 2016 at 6:23 AM, Simon Horman <simon.horman@netronome.com> wrote:
>> > On Sat, Dec 03, 2016 at 10:52:32AM -0800, Tom Herbert wrote:
>> >> On Sat, Dec 3, 2016 at 2:49 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> >> > Fri, Dec 02, 2016 at 09:31:41PM CET, simon.horman@netronome.com wrote:
>> >> >>Allow dissection of ICMP(V6) type and code. This re-uses transport layer
>> >> >>port dissection code as although ICMP is not a transport protocol and their
>> >> >>type and code are not ports this allows sharing of both code and storage.
>> >> >>
>> >> >>Signed-off-by: Simon Horman <simon.horman@netronome.com>
>> >
>> > ...
>> >
>> >> > Digging into this a bit more. I think it would be much nice not to mix
>> >> > up l4 ports and icmp stuff.
>> >> >
>> >> > How about to have FLOW_DISSECTOR_KEY_ICMP
>> >> > and
>> >> > struct flow_dissector_key_icmp {
>> >> >         u8 type;
>> >> >         u8 code;
>> >> > };
>> >> >
>> >> > The you can make this structure and struct flow_dissector_key_ports into
>> >> > an union in struct flow_keys.
>> >> >
>> >> > Looks much cleaner to me.
>> >
>> > Hi Jiri,
>> >
>> > I wondered about that too. The main reason that I didn't do this the first
>> > time around is that I wasn't sure what to do to differentiate between ports
>> > and icmp in fl_init_dissector() given that using a union would could to
>> > mask bits being set in both if they are set in either.
>> >
>> > I've taken a stab at addressing that below along with implementing your
>> > suggestion. If the treatment in fl_init_dissector() is acceptable
>> > perhaps something similar should be done for FLOW_DISSECTOR_KEY_IPV[46]_ADDRS
>> > too?
>> >
>> >> I agree, this patch adds to many conditionals into the fast path for
>> >> ICMP handling. Neither is there much point in using type and code as
>> >> input to the packet hash.
>> >
>> > Hi Tom,
>> >
>> > I'm not sure that I follow this. A packet may be ICMP or not regardless of
>> > if FLOW_DISSECTOR_KEY_PORT (and now FLOW_DISSECTOR_KEY_ICMP) port is set or
>> > not. I'd appreciate some guidance here.
>> >
>> > First-pass at implementing Jiri's suggestion follows:
>> >
>
>...
>
>> > diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
>> > index 0584b4bb4390..33e5fa2b3e87 100644
>> > --- a/net/core/flow_dissector.c
>> > +++ b/net/core/flow_dissector.c
>
>...
>
>> > @@ -975,6 +983,10 @@ static const struct flow_dissector_key flow_keys_dissector_keys[] = {
>> >                 .offset = offsetof(struct flow_keys, ports),
>> >         },
>> >         {
>> > +               .key_id = FLOW_DISSECTOR_KEY_ICMP,
>> > +               .offset = offsetof(struct flow_keys, icmp),
>> > +       },
>> 
>> This is the problem I was referring to. This adds ICMP to the keys for
>> computing the skb hash and the ICMP type and code are in a union with
>> port numbers so they are in the range over which the hash is computed.
>> This means that we are treating type and code as some sort of flow and
>> and so different type/code between the same src/dst could follow
>> different paths in ECMP. For the purposes of computing a packet hash I
>> don't think we want ICMP information included. This might be a matter
>> of not putting FLOW_DISSECTOR_KEY_ICMP in flow_keys_dissector_keys,
>> where we need this information we could define another structure that
>> has all the same fields as in flow_keys_dissector_keys and include
>> FLOW_DISSECTOR_KEY_ICMP. Alternatively, the flow_dissector_key_icmp
>> could also be outside of the area that is used in the hash: that is no
>> in flow_keys_hash_start(keys) to flow_keys_hash_length(keys), keyval);
>
>...
>
>> > diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>> > index c96b2a349779..f4ba69bd362f 100644
>> > --- a/net/sched/cls_flower.c
>> > +++ b/net/sched/cls_flower.c
>> > @@ -38,7 +38,10 @@ struct fl_flow_key {
>> >                 struct flow_dissector_key_ipv4_addrs ipv4;
>> >                 struct flow_dissector_key_ipv6_addrs ipv6;
>> >         };
>> > -       struct flow_dissector_key_ports tp;
>> > +       union {
>> > +               struct flow_dissector_key_ports tp;
>> > +               struct flow_dissector_key_icmp icmp;
>> > +       };
>> 
>> When an ICMP packet is encapsulated within UDP then icmp overwrites
>> valid port information with is a stronger signal of the flow (unless
>> FLOW_DISSECTOR_F_STOP_AT_ENCAP is set). This is another reason not to
>> use a union with ports.
>
>...
>
>Hi Tom,
>
>thanks for your explanation. I think I have a clearer picture now.
>
>I have reworked things to try to address your concerns.
>In particular:
>
>* FLOW_DISSECTOR_KEY_ICMP is no longer added to flow_keys_dissector_keys.
>  I don't think it is needed at all for the use-case I currently have in
>  mind, which is classifying packets using tc_flower. And adding it there
>  was an error on my part.

I was just about to ask why you are adding it there. Good.


>
>* I stopped using a union for ports and icmp. At the very least this seems
>  to make it easier to reason about things as we no longer need to consider
>  that a port value may in fact be an ICMP type or code.

This should be within csl_flower code. You can easily have it as a union
in struct fl_flow_key. 


>
>  This seems to allow us avoid adding a number of is_icmp checks (as I believe
>  you pointed out earlier). And should also address the problem you state
>  immediately above.

I don't understand the check you are talking about. The union just allow
to share the mem. I don't see any checks needed.


>
>* I have also placed icmp outside of the range flow_keys_hash_start(keys)
>  to flow_keys_hash_length(keys), keyval). This is because I now see no
>  value of having it inside that range and it both avoids any chance of
>  contaminating the hash with ICMP values and hashing over unwanted
>  (hopefully zero) values.

Okay, now I'm confused again. You don't want to add this to
flow_keys_dissector_keys. Why would you?


>
>  This required an update to flow_keys_hash_length() as the bound
>  of the end of fields the hash is no longer the end of struct flow_keys.
>  It seems clean but I wonder if there are similar assumptions lurking
>  elsewhere.
>
>I have lightly tested this for the tc_flower case (only).
>
>Incremental patch on top of this series:
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index a6f75cfb2bf7..8029dd4912b6 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -3181,8 +3181,7 @@ static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
> 	} else {
> 		return false;
> 	}
>-	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34 &&
>-	    proto >= 0 && !skb_flow_is_icmp_any(skb, proto))
>+	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34 && proto >= 0)
> 		fk->ports.ports = skb_flow_get_ports(skb, noff, proto);
> 
> 	return true;
>@@ -3210,8 +3209,7 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
> 		return bond_eth_hash(skb);
> 
> 	if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER23 ||
>-	    bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP23 ||
>-	    flow_keys_are_icmp_any(&flow))
>+	    bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP23)
> 		hash = bond_eth_hash(skb);
> 	else
> 		hash = (__force u32)flow.ports.ports;
>diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>index 44a8f69a9198..9c535fbccf2c 100644
>--- a/include/linux/skbuff.h
>+++ b/include/linux/skbuff.h
>@@ -1094,11 +1094,6 @@ u32 __skb_get_poff(const struct sk_buff *skb, void *data,
> __be32 __skb_flow_get_ports(const struct sk_buff *skb, int thoff, u8 ip_proto,
> 			    void *data, int hlen_proto);
> 
>-static inline bool skb_flow_is_icmp_any(const struct sk_buff *skb, u8 ip_proto)
>-{
>-	return flow_protos_are_icmp_any(skb->protocol, ip_proto);
>-}
>-
> static inline __be32 skb_flow_get_ports(const struct sk_buff *skb,
> 					int thoff, u8 ip_proto)
> {
>diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
>index 5540dfa18872..058c9df8a4d8 100644
>--- a/include/net/flow_dissector.h
>+++ b/include/net/flow_dissector.h
>@@ -91,14 +91,10 @@ struct flow_dissector_key_addrs {
> 
> /**
>  * flow_dissector_key_ports:
>- *	@ports: port numbers of Transport header or
>- *		type and code of ICMP header
>+ *	@ports: port numbers of Transport header
>  *		ports: source (high) and destination (low) port numbers
>  *		src: source port number
>  *		dst: destination port number
>- *		icmp: ICMP type (high) and code (low)
>- *		type: ICMP type
>- *		type: ICMP code
>  */
> struct flow_dissector_key_ports {
> 	union {
>@@ -107,6 +103,18 @@ struct flow_dissector_key_ports {
> 			__be16 src;
> 			__be16 dst;
> 		};
>+	};
>+};
>+
>+/**
>+ * flow_dissector_key_icmp:
>+ *	@ports: type and code of ICMP header
>+ *		icmp: ICMP type (high) and code (low)
>+ *		type: ICMP type
>+ *		code: ICMP code
>+ */
>+struct flow_dissector_key_icmp {
>+	union {
> 		__be16 icmp;
> 		struct {
> 			u8 type;
>@@ -115,7 +123,6 @@ struct flow_dissector_key_ports {
> 	};
> };
> 
>-
> /**
>  * struct flow_dissector_key_eth_addrs:
>  * @src: source Ethernet address
>@@ -133,6 +140,7 @@ enum flow_dissector_key_id {
> 	FLOW_DISSECTOR_KEY_IPV4_ADDRS, /* struct flow_dissector_key_ipv4_addrs */
> 	FLOW_DISSECTOR_KEY_IPV6_ADDRS, /* struct flow_dissector_key_ipv6_addrs */
> 	FLOW_DISSECTOR_KEY_PORTS, /* struct flow_dissector_key_ports */
>+	FLOW_DISSECTOR_KEY_ICMP, /* struct flow_dissector_key_icmp */
> 	FLOW_DISSECTOR_KEY_ETH_ADDRS, /* struct flow_dissector_key_eth_addrs */
> 	FLOW_DISSECTOR_KEY_TIPC_ADDRS, /* struct flow_dissector_key_tipc_addrs */
> 	FLOW_DISSECTOR_KEY_VLAN, /* struct flow_dissector_key_flow_vlan */
>@@ -173,11 +181,16 @@ struct flow_keys {
> 	struct flow_dissector_key_keyid keyid;
> 	struct flow_dissector_key_ports ports;
> 	struct flow_dissector_key_addrs addrs;
>+#define FLOW_KEYS_HASH_END_FIELD addrs
>+	struct flow_dissector_key_icmp icmp;
> };
> 
> #define FLOW_KEYS_HASH_OFFSET		\
> 	offsetof(struct flow_keys, FLOW_KEYS_HASH_START_FIELD)
> 
>+#define FLOW_KEYS_HASH_END		\
>+	offsetofend(struct flow_keys, FLOW_KEYS_HASH_END_FIELD)
>+
> __be32 flow_get_u32_src(const struct flow_keys *flow);
> __be32 flow_get_u32_dst(const struct flow_keys *flow);
> 
>@@ -233,8 +246,7 @@ static inline bool flow_keys_are_icmp_any(const struct flow_keys *keys)
> 
> static inline bool flow_keys_have_l4(const struct flow_keys *keys)
> {
>-	return (!flow_keys_are_icmp_any(keys) && keys->ports.ports) ||
>-		keys->tags.flow_label;
>+	return (keys->ports.ports || keys->tags.flow_label);
> }
> 
> u32 flow_hash_from_keys(struct flow_keys *keys);
>diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
>index 0584b4bb4390..ed6d46475343 100644
>--- a/net/core/flow_dissector.c
>+++ b/net/core/flow_dissector.c
>@@ -139,6 +139,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
> 	struct flow_dissector_key_basic *key_basic;
> 	struct flow_dissector_key_addrs *key_addrs;
> 	struct flow_dissector_key_ports *key_ports;
>+	struct flow_dissector_key_icmp *key_icmp;
> 	struct flow_dissector_key_tags *key_tags;
> 	struct flow_dissector_key_vlan *key_vlan;
> 	struct flow_dissector_key_keyid *key_keyid;
>@@ -559,18 +560,25 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
> 		break;
> 	}
> 
>-	if (dissector_uses_key(flow_dissector,
>-			       FLOW_DISSECTOR_KEY_PORTS)) {
>-		key_ports = skb_flow_dissector_target(flow_dissector,
>-						      FLOW_DISSECTOR_KEY_PORTS,
>-						      target_container);
>-		if (flow_protos_are_icmp_any(proto, ip_proto))
>-			key_ports->icmp = skb_flow_get_be16(skb, nhoff, data,
>-							    hlen);
>-		else
>+	if (flow_protos_are_icmp_any(proto, ip_proto)) {
>+		if (dissector_uses_key(flow_dissector,
>+				       FLOW_DISSECTOR_KEY_ICMP)) {
>+			key_icmp = skb_flow_dissector_target(flow_dissector,
>+							     FLOW_DISSECTOR_KEY_ICMP,
>+							     target_container);
>+			key_icmp->icmp = skb_flow_get_be16(skb, nhoff, data,
>+							   hlen);
>+		}
>+	} else {
>+		if (dissector_uses_key(flow_dissector,
>+				       FLOW_DISSECTOR_KEY_PORTS)) {
>+			key_ports = skb_flow_dissector_target(flow_dissector,
>+							      FLOW_DISSECTOR_KEY_PORTS,
>+							      target_container);
> 			key_ports->ports = __skb_flow_get_ports(skb, nhoff,
> 								ip_proto, data,
> 								hlen);
>+		}
> 	}
> 
> out_good:
>@@ -613,9 +621,10 @@ static inline const u32 *flow_keys_hash_start(const struct flow_keys *flow)
> static inline size_t flow_keys_hash_length(const struct flow_keys *flow)
> {
> 	size_t diff = FLOW_KEYS_HASH_OFFSET + sizeof(flow->addrs);
>-	BUILD_BUG_ON((sizeof(*flow) - FLOW_KEYS_HASH_OFFSET) % sizeof(u32));
>+	BUILD_BUG_ON((FLOW_KEYS_HASH_END - FLOW_KEYS_HASH_OFFSET) %
>+		     sizeof(u32));
> 	BUILD_BUG_ON(offsetof(typeof(*flow), addrs) !=
>-		     sizeof(*flow) - sizeof(flow->addrs));
>+		     FLOW_KEYS_HASH_END - sizeof(flow->addrs));
> 
> 	switch (flow->control.addr_type) {
> 	case FLOW_DISSECTOR_KEY_IPV4_ADDRS:
>@@ -628,7 +637,7 @@ static inline size_t flow_keys_hash_length(const struct flow_keys *flow)
> 		diff -= sizeof(flow->addrs.tipcaddrs);
> 		break;
> 	}
>-	return (sizeof(*flow) - diff) / sizeof(u32);
>+	return (FLOW_KEYS_HASH_END - diff) / sizeof(u32);
> }
> 
> __be32 flow_get_u32_src(const struct flow_keys *flow)
>@@ -745,8 +754,7 @@ void make_flow_keys_digest(struct flow_keys_digest *digest,
> 
> 	data->n_proto = flow->basic.n_proto;
> 	data->ip_proto = flow->basic.ip_proto;
>-	if (flow_keys_have_l4(flow))
>-		data->ports = flow->ports.ports;
>+	data->ports = flow->ports.ports;
> 	data->src = flow->addrs.v4addrs.src;
> 	data->dst = flow->addrs.v4addrs.dst;
> }
>diff --git a/net/sched/cls_flow.c b/net/sched/cls_flow.c
>index 408313e33bbe..6575aba87630 100644
>--- a/net/sched/cls_flow.c
>+++ b/net/sched/cls_flow.c
>@@ -96,7 +96,7 @@ static u32 flow_get_proto(const struct sk_buff *skb,
> static u32 flow_get_proto_src(const struct sk_buff *skb,
> 			      const struct flow_keys *flow)
> {
>-	if (!flow_keys_are_icmp_any(flow) && flow->ports.ports)
>+	if (flow->ports.ports)
> 		return ntohs(flow->ports.src);
> 
> 	return addr_fold(skb->sk);
>@@ -105,7 +105,7 @@ static u32 flow_get_proto_src(const struct sk_buff *skb,
> static u32 flow_get_proto_dst(const struct sk_buff *skb,
> 			      const struct flow_keys *flow)
> {
>-	if (!flow_keys_are_icmp_any(flow) && flow->ports.ports)
>+	if (flow->ports.ports)
> 		return ntohs(flow->ports.dst);
> 
> 	return addr_fold(skb_dst(skb)) ^ (__force u16) tc_skb_protocol(skb);
>diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>index c96b2a349779..56df0368125a 100644
>--- a/net/sched/cls_flower.c
>+++ b/net/sched/cls_flower.c
>@@ -39,6 +39,7 @@ struct fl_flow_key {
> 		struct flow_dissector_key_ipv6_addrs ipv6;
> 	};
> 	struct flow_dissector_key_ports tp;
>+	struct flow_dissector_key_icmp icmp;
> 	struct flow_dissector_key_keyid enc_key_id;
> 	union {
> 		struct flow_dissector_key_ipv4_addrs enc_ipv4;
>@@ -511,19 +512,23 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
> 			       &mask->tp.dst, TCA_FLOWER_KEY_SCTP_DST_MASK,
> 			       sizeof(key->tp.dst));
> 	} else if (flow_basic_key_is_icmpv4(&key->basic)) {
>-		fl_set_key_val(tb, &key->tp.type, TCA_FLOWER_KEY_ICMPV4_TYPE,
>-			       &mask->tp.type, TCA_FLOWER_KEY_ICMPV4_TYPE_MASK,
>-			       sizeof(key->tp.type));
>-		fl_set_key_val(tb, &key->tp.code, TCA_FLOWER_KEY_ICMPV4_CODE,
>-			       &mask->tp.code, TCA_FLOWER_KEY_ICMPV4_CODE_MASK,
>-			       sizeof(key->tp.code));
>+		fl_set_key_val(tb, &key->icmp.type, TCA_FLOWER_KEY_ICMPV4_TYPE,
>+			       &mask->icmp.type,
>+			       TCA_FLOWER_KEY_ICMPV4_TYPE_MASK,
>+			       sizeof(key->icmp.type));
>+		fl_set_key_val(tb, &key->icmp.code, TCA_FLOWER_KEY_ICMPV4_CODE,
>+			       &mask->icmp.code,
>+			       TCA_FLOWER_KEY_ICMPV4_CODE_MASK,
>+			       sizeof(key->icmp.code));
> 	} else if (flow_basic_key_is_icmpv6(&key->basic)) {
>-		fl_set_key_val(tb, &key->tp.type, TCA_FLOWER_KEY_ICMPV6_TYPE,
>-			       &mask->tp.type, TCA_FLOWER_KEY_ICMPV6_TYPE_MASK,
>-			       sizeof(key->tp.type));
>-		fl_set_key_val(tb, &key->tp.code, TCA_FLOWER_KEY_ICMPV4_CODE,
>-			       &mask->tp.code, TCA_FLOWER_KEY_ICMPV4_CODE_MASK,
>-			       sizeof(key->tp.code));
>+		fl_set_key_val(tb, &key->icmp.type, TCA_FLOWER_KEY_ICMPV6_TYPE,
>+			       &mask->icmp.type,
>+			       TCA_FLOWER_KEY_ICMPV6_TYPE_MASK,
>+			       sizeof(key->icmp.type));
>+		fl_set_key_val(tb, &key->icmp.code, TCA_FLOWER_KEY_ICMPV4_CODE,
>+			       &mask->icmp.code,
>+			       TCA_FLOWER_KEY_ICMPV4_CODE_MASK,
>+			       sizeof(key->icmp.code));
> 	}
> 
> 	if (tb[TCA_FLOWER_KEY_ENC_IPV4_SRC] ||
>@@ -634,6 +639,8 @@ static void fl_init_dissector(struct cls_fl_head *head,
> 	FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
> 			     FLOW_DISSECTOR_KEY_PORTS, tp);
> 	FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
>+			     FLOW_DISSECTOR_KEY_ICMP, icmp);
>+	FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
> 			     FLOW_DISSECTOR_KEY_VLAN, vlan);
> 	FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
> 			     FLOW_DISSECTOR_KEY_ENC_KEYID, enc_key_id);
>@@ -1000,24 +1007,24 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
> 				  sizeof(key->tp.dst))))
> 		goto nla_put_failure;
> 	else if (flow_basic_key_is_icmpv4(&key->basic) &&
>-		 (fl_dump_key_val(skb, &key->tp.type,
>-				  TCA_FLOWER_KEY_ICMPV4_TYPE, &mask->tp.type,
>+		 (fl_dump_key_val(skb, &key->icmp.type,
>+				  TCA_FLOWER_KEY_ICMPV4_TYPE, &mask->icmp.type,
> 				  TCA_FLOWER_KEY_ICMPV4_TYPE_MASK,
>-				  sizeof(key->tp.type)) ||
>-		  fl_dump_key_val(skb, &key->tp.code,
>-				  TCA_FLOWER_KEY_ICMPV4_CODE, &mask->tp.code,
>+				  sizeof(key->icmp.type)) ||
>+		  fl_dump_key_val(skb, &key->icmp.code,
>+				  TCA_FLOWER_KEY_ICMPV4_CODE, &mask->icmp.code,
> 				  TCA_FLOWER_KEY_ICMPV4_CODE_MASK,
>-				  sizeof(key->tp.code))))
>+				  sizeof(key->icmp.code))))
> 		goto nla_put_failure;
> 	else if (flow_basic_key_is_icmpv6(&key->basic) &&
>-		 (fl_dump_key_val(skb, &key->tp.type,
>-				  TCA_FLOWER_KEY_ICMPV6_TYPE, &mask->tp.type,
>+		 (fl_dump_key_val(skb, &key->icmp.type,
>+				  TCA_FLOWER_KEY_ICMPV6_TYPE, &mask->icmp.type,
> 				  TCA_FLOWER_KEY_ICMPV6_TYPE_MASK,
>-				  sizeof(key->tp.type)) ||
>-		  fl_dump_key_val(skb, &key->tp.code,
>-				  TCA_FLOWER_KEY_ICMPV6_CODE, &mask->tp.code,
>+				  sizeof(key->icmp.type)) ||
>+		  fl_dump_key_val(skb, &key->icmp.code,
>+				  TCA_FLOWER_KEY_ICMPV6_CODE, &mask->icmp.code,
> 				  TCA_FLOWER_KEY_ICMPV6_CODE_MASK,
>-				  sizeof(key->tp.code))))
>+				  sizeof(key->icmp.code))))
> 		goto nla_put_failure;
> 
> 	if (key->enc_control.addr_type == FLOW_DISSECTOR_KEY_IPV4_ADDRS &&

^ permalink raw reply

* Re: [PATCHv2 net] team: team_port_add should check link_up before enable port
From: Jiri Pirko @ 2016-12-06 12:27 UTC (permalink / raw)
  To: Xin Long; +Cc: network dev, davem, Marcelo Ricardo Leitner
In-Reply-To: <CADvbK_e01bz-PGmDrdxztyzLqmR5od3rahO960Oe2OhYQ6fp9w@mail.gmail.com>

Tue, Dec 06, 2016 at 11:38:53AM CET, lucien.xin@gmail.com wrote:
>On Tue, Dec 6, 2016 at 4:22 PM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Tue, Dec 06, 2016 at 08:29:08AM CET, lucien.xin@gmail.com wrote:
>>>Now when users add a nic to team dev, the option 'enable' of the port
>>>is true by default, as team_port_enable enables it after dev_open in
>>>team_port_add.
>>>
>>>But even if the port_dev has no carrier, like it's cable was unpluged,
>>>the port is still enabled. It leads to that team dev couldn't work well
>>>if this port was chosen to connect, and has no chance to change to use
>>>other ports if link_watch is ethtool.
>>>
>>>This patch is to enable the port only when the port_dev has carrier in
>>>team_port_add.
>>>
>>>v1 -> v2:
>>>  use netif_carrier_ok() instead of !!netif_carrier_ok(), as it returns
>>>  bool now.
>>>
>>>Signed-off-by: Xin Long <lucien.xin@gmail.com>
>>>---
>>> drivers/net/team/team.c | 8 ++++++--
>>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>>diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
>>>index a380649..4bc0103 100644
>>>--- a/drivers/net/team/team.c
>>>+++ b/drivers/net/team/team.c
>>>@@ -1140,6 +1140,7 @@ static int team_port_add(struct team *team, struct net_device *port_dev)
>>>       struct net_device *dev = team->dev;
>>>       struct team_port *port;
>>>       char *portname = port_dev->name;
>>>+      bool linkup;
>>>       int err;
>>>
>>>       if (port_dev->flags & IFF_LOOPBACK) {
>>>@@ -1249,9 +1250,12 @@ static int team_port_add(struct team *team, struct net_device *port_dev)
>>>
>>>       port->index = -1;
>>>       list_add_tail_rcu(&port->list, &team->port_list);
>>>-      team_port_enable(team, port);
>>>+      linkup = netif_carrier_ok(port_dev);
>>>+      if (linkup)
>>>+              team_port_enable(team, port);
>>
>> This is obviously wrong change. It you use a simple setup without
>> userspace part (e.g. round robin), The port gets never enabled.
>> team_port_enabl is called from here and team_port_en_option_set.
>yes, without userspace part, that would be a problem.
>
>>
>> By default, all ports should be enabled. Only in case the userspace
>> daemon decides to disable, it does so.
>>
>> Could you send me the exact configuration where you see the issue?
>attachment is the scripts,
>
># ./setup.sh
># ip netns exec ns1 ./team1.sh
># ./team0.sh
># ping 192.168.11.3
>
>the issue is team0 cannot switch to veth2, even if the veth0 has no carrier.
>
>> This should be definitelly fixed in user part.
>now it can disable/enable it in lw_ethtool_event_watch_port_changed()
>when receive event from kernel, but at the beginning, the first event from
>adding port will not going to team_port_en_option_set, as new_link_up
>== common_ppriv->link_up in lw_ethtool_event_watch_port_changed().
>
>maybe it should be fixed there ?

Yes please.

>
>>
>> Thanks.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox