Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] netfilter: xt_connlimit: use rb_entry()
From: Pablo Neira Ayuso @ 2017-01-05 12:26 UTC (permalink / raw)
  To: Geliang Tang
  Cc: Patrick McHardy, Jozsef Kadlecsik, David S. Miller,
	netfilter-devel, coreteam, netdev, linux-kernel
In-Reply-To: <5fab578543b7f15cb416438800e24e70af675835.1482204114.git.geliangtang@gmail.com>

On Tue, Dec 20, 2016 at 10:02:13PM +0800, Geliang Tang wrote:
> To make the code clearer, use rb_entry() instead of container_of() to
> deal with rbtree.

Applied this one to nf-next, thanks.

^ permalink raw reply

* [PATCH ipsec-next] xfrm: state: do not acquire lock in get_mtu helpers
From: Florian Westphal @ 2017-01-05 12:23 UTC (permalink / raw)
  To: netdev; +Cc: Florian Westphal

Once flow cache gets removed the mtu initialisation happens for every skb
that gets an xfrm attached, so this lock starts to show up in perf.

It is not obvious why this lock is required -- the caller holds
reference on the state struct, type->destructor is only called from the
state gc worker (all state structs on gc list must have refcount 0).

xfrm_init_state already has been called (else private data accessed
by type->get_mtu() would not be set up).

So just remove the lock -- the race on the state (DEAD?) doesn't
matter (could change right after dropping the lock too).

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 net/xfrm/xfrm_state.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 64e3c82eedf6..53877fea9316 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -2000,16 +2000,13 @@ EXPORT_SYMBOL(xfrm_state_delete_tunnel);

 int xfrm_state_mtu(struct xfrm_state *x, int mtu)
 {
-	int res;
+	const struct xfrm_type *type = READ_ONCE(x->type);

-	spin_lock_bh(&x->lock);
 	if (x->km.state == XFRM_STATE_VALID &&
-	    x->type && x->type->get_mtu)
-		res = x->type->get_mtu(x, mtu);
-	else
-		res = mtu - x->props.header_len;
-	spin_unlock_bh(&x->lock);
-	return res;
+	    type && type->get_mtu)
+		return type->get_mtu(x, mtu);
+
+	return mtu - x->props.header_len;
 }

 int __xfrm_init_state(struct xfrm_state *x, bool init_replay)
-- 
2.7.3

^ permalink raw reply related

* Re: [PATCH net-next] cxgb4: Synchronize access to mailbox
From: kbuild test robot @ 2017-01-05 12:22 UTC (permalink / raw)
  To: Hariprasad Shenai
  Cc: kbuild-all, netdev, davem, leedom, nirranjan, ganeshgr,
	Hariprasad Shenai
In-Reply-To: <1483595590-12932-1-git-send-email-hariprasad@chelsio.com>

[-- Attachment #1: Type: text/plain, Size: 1444 bytes --]

Hi Hariprasad,

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Hariprasad-Shenai/cxgb4-Synchronize-access-to-mailbox/20170105-193032
config: xtensa-allmodconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 4.9.0
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=xtensa 

All errors (new ones prefixed by >>):

   drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c: In function 'init_one':
>> drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:4712:25: error: 'struct adapter' has no member named 'mbox_list'
     INIT_LIST_HEAD(&adapter->mbox_list.list);
                            ^

vim +4712 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c

  4706	
  4707		spin_lock_init(&adapter->stats_lock);
  4708		spin_lock_init(&adapter->tid_release_lock);
  4709		spin_lock_init(&adapter->win0_lock);
  4710		spin_lock_init(&adapter->mbox_lock);
  4711	
> 4712		INIT_LIST_HEAD(&adapter->mbox_list.list);
  4713	
  4714		INIT_WORK(&adapter->tid_release_task, process_tid_release_list);
  4715		INIT_WORK(&adapter->db_full_task, process_db_full);

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 48144 bytes --]

^ permalink raw reply

* Re: [net-next PATCH v2 5/6] i40e: Add TX and RX support in switchdev mode.
From: Or Gerlitz @ 2017-01-05 12:08 UTC (permalink / raw)
  To: Sridhar Samudrala, jakub.kicinski, John Fastabend, Jiri Pirko
  Cc: Alexander Duyck, Anjali Singhai Jain, David Miller,
	intel-wired-lan, Linux Netdev List
In-Reply-To: <1483466874-2962-6-git-send-email-sridhar.samudrala@intel.com>

On Tue, Jan 3, 2017 at 8:07 PM, Sridhar Samudrala
<sridhar.samudrala@intel.com> wrote:
> A host based switching entity like a linux bridge or OVS redirects these frames
> to the right VFs via VFPR netdevs. Any frames sent via VFPR netdevs are sent as
> directed transmits to the corresponding VFs. To enable directed transmit, skb
> metadata dst is used to pass the VF id and the frame is requeued to call the PFs
> transmit routine.

Jakub/John, patch #4 which didn't appear in the list had a long discussion [1]
ending  with "lets talk on it @ netdev", did we?

Or.

[1] http://marc.info/?t=147457252900002&r=1&w=2

^ permalink raw reply

* Re: [PATCH net-next V2 0/3] net/sched: act_pedit: Use offset relative to conventional network headers
From: Jiri Benc @ 2017-01-05 11:54 UTC (permalink / raw)
  To: Amir Vadai
  Cc: David S. Miller, netdev, Jiri Pirko, Or Gerlitz, Hadar Har-Zion
In-Reply-To: <20170105095454.32644-1-amir@vadai.me>

On Thu,  5 Jan 2017 11:54:51 +0200, Amir Vadai wrote:
> You asked me [1] why did I use specific header names instead of layers (L2, L3...),
> and I explained that it is on purpose, this extra information is planned to be used
> by hardware drivers to offload the action.
> 
> Some FW/HW parser APIs are such that they need to get the specific header type (e.g
> IPV4 or IPV6, TCP or UDP) and not only the networking level (e.g network or transport).

Don't we need better API specification (and enforcement) then, though?
See below.

> Usage example:
> $ tc filter add dev enp0s9 protocol ip parent ffff: \
>    flower \
>      ip_proto tcp \
>     dst_port 80 \
>    action \
>        pedit munge ip ttl add 0xff \
>        pedit munge tcp dport set 8080 \
>      pipe action mirred egress redirect dev veth0

What happens when one does:

tc filter add ... flower ip_proto udp action pedit munge tcp ...

?

 Jiri

^ permalink raw reply

* Re: [net-next PATCH 5/6] i40e: Add TX and RX support in switchdev mode.
From: Or Gerlitz @ 2017-01-05 11:50 UTC (permalink / raw)
  To: Samudrala, Sridhar
  Cc: Alexander Duyck, John Fastabend, Anjali Singhai Jain,
	jakub.kicinski, intel-wired-lan, Linux Netdev List
In-Reply-To: <586D7B4E.2010308@intel.com>

On Thu, Jan 5, 2017 at 12:46 AM, Samudrala, Sridhar
<sridhar.samudrala@intel.com> wrote:
>
>
> On 1/3/2017 3:03 PM, Or Gerlitz wrote:
>>
>> On Fri, Dec 30, 2016 at 7:04 PM, Samudrala, Sridhar
>> <sridhar.samudrala@intel.com> wrote:
>>>
>>> On 12/30/2016 7:31 AM, Or Gerlitz wrote:
>>>>
>>>> Are you exposing switchdev ops for the representators? didn't see that
>>>> or maybe it's in the 4th patch which didn't make it to the list?
>>>
>>> Not at this time. In the future patches when we offload fdb/vlan
>>> functionality, we could use switchdev ops.
>>
>> but wait, this is the switchdev mode... even before doing any
>> offloading, you want (need) your representor netdevices to have the
>> same HW ID marking they are all ports of the same ASIC, this you can
>> do with the switchdev parent ID attribute.
>
> OK. I will add switchdev_port_attr_get() with PORT_PARENT_ID support in v3.

Good, I made this comment, b/c we want to create a well defined user-experience
to be taken into account also by upper virtualization layers.

Another piece there to add is have your VF reps implement the
get_phys_port_name ndo,
where as we explain in commit cb67b832921cfa20ad79bafdc51f1745339d0557 is used
as follows:

    Port phys name (ndo_get_phys_port_name) is implemented to allow exporting
    to user-space the VF vport number and along with the switchdev port parent
    id (phys_switch_id) enable a udev base consistent naming scheme:

    SUBSYSTEM=="net", ACTION=="add", ATTR{phys_switch_id}=="<phys_switch_id>", \
            ATTR{phys_port_name}!="", NAME="$PF_NIC$attr{phys_port_name}"

    where phys_switch_id is exposed by the PF (and VF reps) and $PF_NIC is
    the name of the PF netdevice.

Or.

^ permalink raw reply

* Re: [PATCH net-next] cxgb4: Synchronize access to mailbox
From: kbuild test robot @ 2017-01-05 11:46 UTC (permalink / raw)
  To: Hariprasad Shenai
  Cc: kbuild-all, netdev, davem, leedom, nirranjan, ganeshgr,
	Hariprasad Shenai
In-Reply-To: <1483595590-12932-1-git-send-email-hariprasad@chelsio.com>

[-- Attachment #1: Type: text/plain, Size: 1329 bytes --]

Hi Hariprasad,

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Hariprasad-Shenai/cxgb4-Synchronize-access-to-mailbox/20170105-193032
config: i386-randconfig-x005-201701 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c: In function 'init_one':
>> drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c:4712:25: error: 'struct adapter' has no member named 'mbox_list'; did you mean 'mbox_lock'?
     INIT_LIST_HEAD(&adapter->mbox_list.list);
                            ^~

vim +4712 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c

  4706	
  4707		spin_lock_init(&adapter->stats_lock);
  4708		spin_lock_init(&adapter->tid_release_lock);
  4709		spin_lock_init(&adapter->win0_lock);
  4710		spin_lock_init(&adapter->mbox_lock);
  4711	
> 4712		INIT_LIST_HEAD(&adapter->mbox_list.list);
  4713	
  4714		INIT_WORK(&adapter->tid_release_task, process_tid_release_list);
  4715		INIT_WORK(&adapter->db_full_task, process_db_full);

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 37536 bytes --]

^ permalink raw reply

* Re: SIOCSIWFREQ while in NL80211_IFTYPE_STATION
From: Johannes Berg @ 2017-01-05 11:38 UTC (permalink / raw)
  To: Jorge Ramirez, netdev-u79uwXL29TY76Z2rM5mHXA, Daniel Lezcano
  Cc: linux-wireless
In-Reply-To: <685811c3-6247-77fd-8c70-617951886451-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>

+linux-wireless, where this should've gone

> I am running a single wlan0 interface in managed mode (no aliases,
> no  other wireless interfaces).
> The association with the AP still hasn't happened.
> 
> I noticed that if trying to change the frequency to one of the valid 
> values, the driver returns EBUSY.
> 
> The call stack is
> cfg80211_wext_siwfreq
> -->cfg80211_mgd_wext_siwfreq
> --->cfg80211_set_monitor_channel (notice call to set 'monitor'
> channel 
> in managed mode)
> ----> fails with EBUSY
> 
> Is therefore the expected behavior to fail under the above
> circumstances 
> (managed mode && single wlan0 interface && no association)?
> And if it is, please could you clarify when would it be valid to
> change the frequency in managed mode?

Frankly, I don't remember - all of this is plastered all over with
backward compatibility hooks etc.

How are you running into this? Why are you even trying to do this? You
really shouldn't use wireless extensions any more.

Also, there shouldn't be much reason to be setting the channel anyway,
unless you want to trigger a connection specifically on that channel,
but then when you use nl80211 you get that included in the CONNECT
command there.

Finally, I suspect that this particular backward compatibility hook
can't really work anyway and could be removed, but I'm not sure that
would have the effect you want either.

johannes

^ permalink raw reply

* [PATCH 5/6] netfilter: ipt_CLUSTERIP: check duplicate config when initializing
From: Pablo Neira Ayuso @ 2017-01-05 11:19 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1483615193-2931-1-git-send-email-pablo@netfilter.org>

From: Xin Long <lucien.xin@gmail.com>

Now when adding an ipt_CLUSTERIP rule, it only checks duplicate config in
clusterip_config_find_get(). But after that, there may be still another
thread to insert a config with the same ip, then it leaves proc_create_data
to do duplicate check.

It's more reasonable to check duplicate config by ipt_CLUSTERIP itself,
instead of checking it by proc fs duplicate file check. Before, when proc
fs allowed duplicate name files in a directory, It could even crash kernel
because of use-after-free.

This patch is to check duplicate config under the protection of clusterip
net lock when initializing a new config and correct the return err.

Note that it also moves proc file node creation after adding new config, as
proc_create_data may sleep, it couldn't be called under the clusterip_net
lock. clusterip_config_find_get returns NULL if c->pde is null to make sure
it can't be used until the proc file node creation is done.

Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/ipv4/netfilter/ipt_CLUSTERIP.c | 34 +++++++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c b/net/ipv4/netfilter/ipt_CLUSTERIP.c
index 21db00d0362b..a6b8c1a4102b 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -144,7 +144,7 @@ clusterip_config_find_get(struct net *net, __be32 clusterip, int entry)
 	rcu_read_lock_bh();
 	c = __clusterip_config_find(net, clusterip);
 	if (c) {
-		if (unlikely(!atomic_inc_not_zero(&c->refcount)))
+		if (!c->pde || unlikely(!atomic_inc_not_zero(&c->refcount)))
 			c = NULL;
 		else if (entry)
 			atomic_inc(&c->entries);
@@ -166,14 +166,15 @@ clusterip_config_init_nodelist(struct clusterip_config *c,
 
 static struct clusterip_config *
 clusterip_config_init(const struct ipt_clusterip_tgt_info *i, __be32 ip,
-			struct net_device *dev)
+		      struct net_device *dev)
 {
+	struct net *net = dev_net(dev);
 	struct clusterip_config *c;
-	struct clusterip_net *cn = net_generic(dev_net(dev), clusterip_net_id);
+	struct clusterip_net *cn = net_generic(net, clusterip_net_id);
 
 	c = kzalloc(sizeof(*c), GFP_ATOMIC);
 	if (!c)
-		return NULL;
+		return ERR_PTR(-ENOMEM);
 
 	c->dev = dev;
 	c->clusterip = ip;
@@ -185,6 +186,17 @@ clusterip_config_init(const struct ipt_clusterip_tgt_info *i, __be32 ip,
 	atomic_set(&c->refcount, 1);
 	atomic_set(&c->entries, 1);
 
+	spin_lock_bh(&cn->lock);
+	if (__clusterip_config_find(net, ip)) {
+		spin_unlock_bh(&cn->lock);
+		kfree(c);
+
+		return ERR_PTR(-EBUSY);
+	}
+
+	list_add_rcu(&c->list, &cn->configs);
+	spin_unlock_bh(&cn->lock);
+
 #ifdef CONFIG_PROC_FS
 	{
 		char buffer[16];
@@ -195,16 +207,16 @@ clusterip_config_init(const struct ipt_clusterip_tgt_info *i, __be32 ip,
 					  cn->procdir,
 					  &clusterip_proc_fops, c);
 		if (!c->pde) {
+			spin_lock_bh(&cn->lock);
+			list_del_rcu(&c->list);
+			spin_unlock_bh(&cn->lock);
 			kfree(c);
-			return NULL;
+
+			return ERR_PTR(-ENOMEM);
 		}
 	}
 #endif
 
-	spin_lock_bh(&cn->lock);
-	list_add_rcu(&c->list, &cn->configs);
-	spin_unlock_bh(&cn->lock);
-
 	return c;
 }
 
@@ -410,9 +422,9 @@ static int clusterip_tg_check(const struct xt_tgchk_param *par)
 
 			config = clusterip_config_init(cipinfo,
 							e->ip.dst.s_addr, dev);
-			if (!config) {
+			if (IS_ERR(config)) {
 				dev_put(dev);
-				return -ENOMEM;
+				return PTR_ERR(config);
 			}
 			dev_mc_add(config->dev, config->clustermac);
 		}
-- 
2.1.4

^ permalink raw reply related

* [PATCH 3/6] netfilter: nf_tables: fix oob access
From: Pablo Neira Ayuso @ 2017-01-05 11:19 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1483615193-2931-1-git-send-email-pablo@netfilter.org>

From: Florian Westphal <fw@strlen.de>

BUG: KASAN: slab-out-of-bounds in nf_tables_rule_destroy+0xf1/0x130 at addr ffff88006a4c35c8
Read of size 8 by task nft/1607

When we've destroyed last valid expr, nft_expr_next() returns an invalid expr.
We must not dereference it unless it passes != nft_expr_last() check.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_tables_api.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index a019a87e58ee..0db5f9782265 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -2115,7 +2115,7 @@ static void nf_tables_rule_destroy(const struct nft_ctx *ctx,
 	 * is called on error from nf_tables_newrule().
 	 */
 	expr = nft_expr_first(rule);
-	while (expr->ops && expr != nft_expr_last(rule)) {
+	while (expr != nft_expr_last(rule) && expr->ops) {
 		nf_tables_expr_destroy(ctx, expr);
 		expr = nft_expr_next(expr);
 	}
-- 
2.1.4

^ permalink raw reply related

* [PATCH 4/6] netfilter: nft_payload: mangle ckecksum if NFT_PAYLOAD_L4CSUM_PSEUDOHDR is set
From: Pablo Neira Ayuso @ 2017-01-05 11:19 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1483615193-2931-1-git-send-email-pablo@netfilter.org>

If the NFT_PAYLOAD_L4CSUM_PSEUDOHDR flag is set, then mangle layer 4
checksum. This should not depend on csum_type NFT_PAYLOAD_CSUM_INET
since IPv6 header has no checksum field, but still an update of any of
the pseudoheader fields may trigger a layer 4 checksum update.

Fixes: 1814096980bb ("netfilter: nft_payload: layer 4 checksum adjustment for pseudoheader fields")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_payload.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/net/netfilter/nft_payload.c b/net/netfilter/nft_payload.c
index 36d2b1096546..7d699bbd45b0 100644
--- a/net/netfilter/nft_payload.c
+++ b/net/netfilter/nft_payload.c
@@ -250,6 +250,22 @@ static int nft_payload_l4csum_update(const struct nft_pktinfo *pkt,
 	return 0;
 }
 
+static int nft_payload_csum_inet(struct sk_buff *skb, const u32 *src,
+				 __wsum fsum, __wsum tsum, int csum_offset)
+{
+	__sum16 sum;
+
+	if (skb_copy_bits(skb, csum_offset, &sum, sizeof(sum)) < 0)
+		return -1;
+
+	nft_csum_replace(&sum, fsum, tsum);
+	if (!skb_make_writable(skb, csum_offset + sizeof(sum)) ||
+	    skb_store_bits(skb, csum_offset, &sum, sizeof(sum)) < 0)
+		return -1;
+
+	return 0;
+}
+
 static void nft_payload_set_eval(const struct nft_expr *expr,
 				 struct nft_regs *regs,
 				 const struct nft_pktinfo *pkt)
@@ -259,7 +275,6 @@ static void nft_payload_set_eval(const struct nft_expr *expr,
 	const u32 *src = &regs->data[priv->sreg];
 	int offset, csum_offset;
 	__wsum fsum, tsum;
-	__sum16 sum;
 
 	switch (priv->base) {
 	case NFT_PAYLOAD_LL_HEADER:
@@ -282,18 +297,14 @@ static void nft_payload_set_eval(const struct nft_expr *expr,
 	csum_offset = offset + priv->csum_offset;
 	offset += priv->offset;
 
-	if (priv->csum_type == NFT_PAYLOAD_CSUM_INET &&
+	if ((priv->csum_type == NFT_PAYLOAD_CSUM_INET || priv->csum_flags) &&
 	    (priv->base != NFT_PAYLOAD_TRANSPORT_HEADER ||
 	     skb->ip_summed != CHECKSUM_PARTIAL)) {
-		if (skb_copy_bits(skb, csum_offset, &sum, sizeof(sum)) < 0)
-			goto err;
-
 		fsum = skb_checksum(skb, offset, priv->len, 0);
 		tsum = csum_partial(src, priv->len, 0);
-		nft_csum_replace(&sum, fsum, tsum);
 
-		if (!skb_make_writable(skb, csum_offset + sizeof(sum)) ||
-		    skb_store_bits(skb, csum_offset, &sum, sizeof(sum)) < 0)
+		if (priv->csum_type == NFT_PAYLOAD_CSUM_INET &&
+		    nft_payload_csum_inet(skb, src, fsum, tsum, csum_offset))
 			goto err;
 
 		if (priv->csum_flags &&
-- 
2.1.4

^ permalink raw reply related

* [PATCH 0/6] Netfilter fixes for net
From: Pablo Neira Ayuso @ 2017-01-05 11:19 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

Hi David,

The following patchset contains accumulated Netfilter fixes for your
net tree:

1) Ensure quota dump and reset happens iff we can deliver numbers to
   userspace.

2) Silence splat on incorrect use of smp_processor_id() from nft_queue.

3) Fix an out-of-bound access reported by KASAN in
   nf_tables_rule_destroy(), patch from Florian Westphal.

4) Fix layer 4 checksum mangling in the nf_tables payload expression
   with IPv6.

5) Fix a race in the CLUSTERIP target from control plane path when two
   threads run to add a new configuration object. Serialize invocations
   of clusterip_config_init() using spin_lock. From Xin Long.

6) Call br_nf_pre_routing_finish_bridge_finish() once we are done with
   the br_nf_pre_routing_finish() hook. From Artur Molchanov.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Wish you a nice new year btw, thanks!

----------------------------------------------------------------

The following changes since commit a220871be66f99d8957c693cf22ec67ecbd9c23a:

  virtio-net: correctly enable multiqueue (2016-12-13 10:37:38 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git HEAD

for you to fetch changes up to 14221cc45caad2fcab3a8543234bb7eda9b540d5:

  bridge: netfilter: Fix dropping packets that moving through bridge interface (2016-12-30 18:22:50 +0100)

----------------------------------------------------------------
Artur Molchanov (1):
      bridge: netfilter: Fix dropping packets that moving through bridge interface

Florian Westphal (1):
      netfilter: nf_tables: fix oob access

Pablo Neira Ayuso (3):
      netfilter: nft_quota: reset quota after dump
      netfilter: nft_queue: use raw_smp_processor_id()
      netfilter: nft_payload: mangle ckecksum if NFT_PAYLOAD_L4CSUM_PSEUDOHDR is set

Xin Long (1):
      netfilter: ipt_CLUSTERIP: check duplicate config when initializing

 net/bridge/br_netfilter_hooks.c    |  2 +-
 net/ipv4/netfilter/ipt_CLUSTERIP.c | 34 +++++++++++++++++++++++-----------
 net/netfilter/nf_tables_api.c      |  2 +-
 net/netfilter/nft_payload.c        | 27 +++++++++++++++++++--------
 net/netfilter/nft_queue.c          |  2 +-
 net/netfilter/nft_quota.c          | 26 ++++++++++++++------------
 6 files changed, 59 insertions(+), 34 deletions(-)

^ permalink raw reply

* [PATCH 1/6] netfilter: nft_quota: reset quota after dump
From: Pablo Neira Ayuso @ 2017-01-05 11:19 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1483615193-2931-1-git-send-email-pablo@netfilter.org>

Dumping of netlink attributes may fail due to insufficient room in the
skbuff, so let's reset consumed quota if we succeed to put netlink
attributes into the skbuff.

Fixes: 43da04a593d8 ("netfilter: nf_tables: atomic dump and reset for stateful objects")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_quota.c | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/net/netfilter/nft_quota.c b/net/netfilter/nft_quota.c
index bd6efc53f26d..2d6fe3559912 100644
--- a/net/netfilter/nft_quota.c
+++ b/net/netfilter/nft_quota.c
@@ -110,30 +110,32 @@ static int nft_quota_obj_init(const struct nlattr * const tb[],
 static int nft_quota_do_dump(struct sk_buff *skb, struct nft_quota *priv,
 			     bool reset)
 {
+	u64 consumed, consumed_cap;
 	u32 flags = priv->flags;
-	u64 consumed;
-
-	if (reset) {
-		consumed = atomic64_xchg(&priv->consumed, 0);
-		if (test_and_clear_bit(NFT_QUOTA_DEPLETED_BIT, &priv->flags))
-			flags |= NFT_QUOTA_F_DEPLETED;
-	} else {
-		consumed = atomic64_read(&priv->consumed);
-	}
 
 	/* Since we inconditionally increment consumed quota for each packet
 	 * that we see, don't go over the quota boundary in what we send to
 	 * userspace.
 	 */
-	if (consumed > priv->quota)
-		consumed = priv->quota;
+	consumed = atomic64_read(&priv->consumed);
+	if (consumed >= priv->quota) {
+		consumed_cap = priv->quota;
+		flags |= NFT_QUOTA_F_DEPLETED;
+	} else {
+		consumed_cap = consumed;
+	}
 
 	if (nla_put_be64(skb, NFTA_QUOTA_BYTES, cpu_to_be64(priv->quota),
 			 NFTA_QUOTA_PAD) ||
-	    nla_put_be64(skb, NFTA_QUOTA_CONSUMED, cpu_to_be64(consumed),
+	    nla_put_be64(skb, NFTA_QUOTA_CONSUMED, cpu_to_be64(consumed_cap),
 			 NFTA_QUOTA_PAD) ||
 	    nla_put_be32(skb, NFTA_QUOTA_FLAGS, htonl(flags)))
 		goto nla_put_failure;
+
+	if (reset) {
+		atomic64_sub(consumed, &priv->consumed);
+		clear_bit(NFT_QUOTA_DEPLETED_BIT, &priv->flags);
+	}
 	return 0;
 
 nla_put_failure:
-- 
2.1.4

^ permalink raw reply related

* [PATCH 6/6] bridge: netfilter: Fix dropping packets that moving through bridge interface
From: Pablo Neira Ayuso @ 2017-01-05 11:19 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1483615193-2931-1-git-send-email-pablo@netfilter.org>

From: Artur Molchanov <arturmolchanov@gmail.com>

Problem:
br_nf_pre_routing_finish() calls itself instead of
br_nf_pre_routing_finish_bridge(). Due to this bug reverse path filter drops
packets that go through bridge interface.

User impact:
Local docker containers with bridge network can not communicate with each
other.

Fixes: c5136b15ea36 ("netfilter: bridge: add and use br_nf_hook_thresh")
Signed-off-by: Artur Molchanov <artur.molchanov@synesis.ru>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/bridge/br_netfilter_hooks.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
index b12501a77f18..135cc8ab813c 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -399,7 +399,7 @@ static int br_nf_pre_routing_finish(struct net *net, struct sock *sk, struct sk_
 				br_nf_hook_thresh(NF_BR_PRE_ROUTING,
 						  net, sk, skb, skb->dev,
 						  NULL,
-						  br_nf_pre_routing_finish);
+						  br_nf_pre_routing_finish_bridge);
 				return 0;
 			}
 			ether_addr_copy(eth_hdr(skb)->h_dest, dev->dev_addr);
-- 
2.1.4


^ permalink raw reply related

* [PATCH 2/6] netfilter: nft_queue: use raw_smp_processor_id()
From: Pablo Neira Ayuso @ 2017-01-05 11:19 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1483615193-2931-1-git-send-email-pablo@netfilter.org>

Using smp_processor_id() causes splats with PREEMPT_RCU:

[19379.552780] BUG: using smp_processor_id() in preemptible [00000000] code: ping/32389
[19379.552793] caller is debug_smp_processor_id+0x17/0x19
[...]
[19379.552823] Call Trace:
[19379.552832]  [<ffffffff81274e9e>] dump_stack+0x67/0x90
[19379.552837]  [<ffffffff8129a4d4>] check_preemption_disabled+0xe5/0xf5
[19379.552842]  [<ffffffff8129a4fb>] debug_smp_processor_id+0x17/0x19
[19379.552849]  [<ffffffffa07c42dd>] nft_queue_eval+0x35/0x20c [nft_queue]

No need to disable preemption since we only fetch the numeric value, so
let's use raw_smp_processor_id() instead.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_queue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/nft_queue.c b/net/netfilter/nft_queue.c
index 3e19fa1230dc..dbb6aaff67ec 100644
--- a/net/netfilter/nft_queue.c
+++ b/net/netfilter/nft_queue.c
@@ -38,7 +38,7 @@ static void nft_queue_eval(const struct nft_expr *expr,
 
 	if (priv->queues_total > 1) {
 		if (priv->flags & NFT_QUEUE_FLAG_CPU_FANOUT) {
-			int cpu = smp_processor_id();
+			int cpu = raw_smp_processor_id();
 
 			queue = priv->queuenum + cpu % priv->queues_total;
 		} else {
-- 
2.1.4


^ permalink raw reply related

* SIOCSIWFREQ while in NL80211_IFTYPE_STATION
From: Jorge Ramirez @ 2017-01-05 11:02 UTC (permalink / raw)
  To: netdev, Daniel Lezcano

Hello all,

I am running a single wlan0 interface in managed mode (no aliases, no 
other wireless interfaces).
The association with the AP still hasn't happened.

I noticed that if trying to change the frequency to one of the valid 
values, the driver returns EBUSY.

The call stack is
cfg80211_wext_siwfreq
-->cfg80211_mgd_wext_siwfreq
--->cfg80211_set_monitor_channel (notice call to set 'monitor' channel 
in managed mode)
----> fails with EBUSY

Is therefore the expected behavior to fail under the above circumstances 
(managed mode && single wlan0 interface && no association)?
And if it is, please could you clarify when would it be valid to change 
the frequency in managed mode?

many thanks in advance for the help,
Jorge

^ permalink raw reply

* [PATCH] net: stmmac: fix maxmtu assignment to be within valid range
From: Kweh, Hock Leong @ 2017-01-05 10:47 UTC (permalink / raw)
  To: David S. Miller, Joao Pinto, Giuseppe CAVALLARO,
	seraphin.bonnaffe, Jarod Wilson
  Cc: Alexandre TORGUE, Joachim Eastwood, Niklas Cassel, Johan Hovold,
	pavel, Kweh, Hock Leong, lars.persson, netdev, LKML

From: "Kweh, Hock Leong" <hock.leong.kweh@intel.com>

There is no checking valid value of maxmtu when getting it from devicetree.
This resolution added the checking condition to ensure the assignment is
made within a valid range.

Signed-off-by: Kweh, Hock Leong <hock.leong.kweh@intel.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 39eb7a6..683d59f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -3319,7 +3319,8 @@ int stmmac_dvr_probe(struct device *device,
 		ndev->max_mtu = JUMBO_LEN;
 	else
 		ndev->max_mtu = SKB_MAX_HEAD(NET_SKB_PAD + NET_IP_ALIGN);
-	if (priv->plat->maxmtu < ndev->max_mtu)
+	if ((priv->plat->maxmtu < ndev->max_mtu) &&
+	    (priv->plat->maxmtu >= ndev->min_mtu))
 		ndev->max_mtu = priv->plat->maxmtu;
 
 	if (flow_ctrl)
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH/RFC v2 net-next] ravb: unmap descriptors when freeing rings
From: Simon Horman @ 2017-01-05 10:43 UTC (permalink / raw)
  To: David Miller, Sergei Shtylyov; +Cc: Magnus Damm, netdev, linux-renesas-soc

From: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>

"swiotlb buffer is full" errors occur after repeated initialisation of a
device - f.e. suspend/resume or ip link set up/down. This is because memory
mapped using dma_map_single() in ravb_ring_format() and ravb_start_xmit()
is not released.  Resolve this problem by unmapping descriptors when
freeing rings.

Note, ravb_tx_free() is moved but not otherwise modified by this patch.

Signed-off-by: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
[simon: reworked]
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
--
v1 [Kazuya Mizuguchi]

v2 [Simon Horman]
* As suggested by Sergei Shtylyov
  - Use dma_mapping_error() and rx_desc->ds_cc when unmapping RX descriptors;
    this is consistent with the way that they are mapped
  - Use ravb_tx_free() to clear TX descriptors
* Reduce scope of new local variable
---
 drivers/net/ethernet/renesas/ravb_main.c | 89 ++++++++++++++++++--------------
 1 file changed, 51 insertions(+), 38 deletions(-)

diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c
index 92d7692c840d..1797c48e3176 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -179,6 +179,44 @@ static struct mdiobb_ops bb_ops = {
 	.get_mdio_data = ravb_get_mdio_data,
 };
 
+/* Free TX skb function for AVB-IP */
+static int ravb_tx_free(struct net_device *ndev, int q)
+{
+	struct ravb_private *priv = netdev_priv(ndev);
+	struct net_device_stats *stats = &priv->stats[q];
+	struct ravb_tx_desc *desc;
+	int free_num = 0;
+	int entry;
+	u32 size;
+
+	for (; priv->cur_tx[q] - priv->dirty_tx[q] > 0; priv->dirty_tx[q]++) {
+		entry = priv->dirty_tx[q] % (priv->num_tx_ring[q] *
+					     NUM_TX_DESC);
+		desc = &priv->tx_ring[q][entry];
+		if (desc->die_dt != DT_FEMPTY)
+			break;
+		/* Descriptor type must be checked before all other reads */
+		dma_rmb();
+		size = le16_to_cpu(desc->ds_tagl) & TX_DS;
+		/* Free the original skb. */
+		if (priv->tx_skb[q][entry / NUM_TX_DESC]) {
+			dma_unmap_single(ndev->dev.parent, le32_to_cpu(desc->dptr),
+					 size, DMA_TO_DEVICE);
+			/* Last packet descriptor? */
+			if (entry % NUM_TX_DESC == NUM_TX_DESC - 1) {
+				entry /= NUM_TX_DESC;
+				dev_kfree_skb_any(priv->tx_skb[q][entry]);
+				priv->tx_skb[q][entry] = NULL;
+				stats->tx_packets++;
+			}
+			free_num++;
+		}
+		stats->tx_bytes += size;
+		desc->die_dt = DT_EEMPTY;
+	}
+	return free_num;
+}
+
 /* Free skb's and DMA buffers for Ethernet AVB */
 static void ravb_ring_free(struct net_device *ndev, int q)
 {
@@ -207,6 +245,18 @@ static void ravb_ring_free(struct net_device *ndev, int q)
 	priv->tx_align[q] = NULL;
 
 	if (priv->rx_ring[q]) {
+		for (i = 0; i < priv->num_rx_ring[q]; i++) {
+			struct ravb_ex_rx_desc *rx_desc = &priv->rx_ring[q][i];
+
+			if (!dma_mapping_error(ndev->dev.parent,
+					       rx_desc->dptr)) {
+				dma_unmap_single(ndev->dev.parent,
+						 le32_to_cpu(rx_desc->dptr),
+						 PKT_BUF_SZ,
+						 DMA_FROM_DEVICE);
+				rx_desc->ds_cc = cpu_to_le16(0);
+			}
+		}
 		ring_size = sizeof(struct ravb_ex_rx_desc) *
 			    (priv->num_rx_ring[q] + 1);
 		dma_free_coherent(ndev->dev.parent, ring_size, priv->rx_ring[q],
@@ -215,6 +265,7 @@ static void ravb_ring_free(struct net_device *ndev, int q)
 	}
 
 	if (priv->tx_ring[q]) {
+		ravb_tx_free(ndev, q);
 		ring_size = sizeof(struct ravb_tx_desc) *
 			    (priv->num_tx_ring[q] * NUM_TX_DESC + 1);
 		dma_free_coherent(ndev->dev.parent, ring_size, priv->tx_ring[q],
@@ -431,44 +482,6 @@ static int ravb_dmac_init(struct net_device *ndev)
 	return 0;
 }
 
-/* Free TX skb function for AVB-IP */
-static int ravb_tx_free(struct net_device *ndev, int q)
-{
-	struct ravb_private *priv = netdev_priv(ndev);
-	struct net_device_stats *stats = &priv->stats[q];
-	struct ravb_tx_desc *desc;
-	int free_num = 0;
-	int entry;
-	u32 size;
-
-	for (; priv->cur_tx[q] - priv->dirty_tx[q] > 0; priv->dirty_tx[q]++) {
-		entry = priv->dirty_tx[q] % (priv->num_tx_ring[q] *
-					     NUM_TX_DESC);
-		desc = &priv->tx_ring[q][entry];
-		if (desc->die_dt != DT_FEMPTY)
-			break;
-		/* Descriptor type must be checked before all other reads */
-		dma_rmb();
-		size = le16_to_cpu(desc->ds_tagl) & TX_DS;
-		/* Free the original skb. */
-		if (priv->tx_skb[q][entry / NUM_TX_DESC]) {
-			dma_unmap_single(ndev->dev.parent, le32_to_cpu(desc->dptr),
-					 size, DMA_TO_DEVICE);
-			/* Last packet descriptor? */
-			if (entry % NUM_TX_DESC == NUM_TX_DESC - 1) {
-				entry /= NUM_TX_DESC;
-				dev_kfree_skb_any(priv->tx_skb[q][entry]);
-				priv->tx_skb[q][entry] = NULL;
-				stats->tx_packets++;
-			}
-			free_num++;
-		}
-		stats->tx_bytes += size;
-		desc->die_dt = DT_EEMPTY;
-	}
-	return free_num;
-}
-
 static void ravb_get_tx_tstamp(struct net_device *ndev)
 {
 	struct ravb_private *priv = netdev_priv(ndev);
-- 
2.7.0.rc3.207.g0ac5344

^ permalink raw reply related

* [PATCH v4] net: ethernet: faraday: To support device tree usage.
From: Greentime Hu @ 2017-01-05 10:23 UTC (permalink / raw)
  To: f.fainelli, netdev, devicetree, andrew, linux-kernel, jiri,
	jonas.jensen, davem, arnd

Signed-off-by: Greentime Hu <green.hu@gmail.com>
---
Changes in v4:
  - Use the same binding document to describe the same faraday ethernet controller and add faraday to vendor-prefixes.txt.
Changes in v3:
  - Nothing changed in this patch but I have committed andestech to vendor-prefixes.txt.
Changes in v2:
  - Change atmac100_of_ids to ftmac100_of_ids
      
---
 .../net/{moxa,moxart-mac.txt => faraday,ftmac.txt} |    7 +++++--
 .../devicetree/bindings/vendor-prefixes.txt        |    1 +
 drivers/net/ethernet/faraday/ftmac100.c            |    7 +++++++
 3 files changed, 13 insertions(+), 2 deletions(-)
 rename Documentation/devicetree/bindings/net/{moxa,moxart-mac.txt => faraday,ftmac.txt} (68%)

diff --git a/Documentation/devicetree/bindings/net/moxa,moxart-mac.txt b/Documentation/devicetree/bindings/net/faraday,ftmac.txt
similarity index 68%
rename from Documentation/devicetree/bindings/net/moxa,moxart-mac.txt
rename to Documentation/devicetree/bindings/net/faraday,ftmac.txt
index 583418b..be4f55e 100644
--- a/Documentation/devicetree/bindings/net/moxa,moxart-mac.txt
+++ b/Documentation/devicetree/bindings/net/faraday,ftmac.txt
@@ -1,8 +1,11 @@
-MOXA ART Ethernet Controller
+Faraday Ethernet Controller
 
 Required properties:
 
-- compatible : Must be "moxa,moxart-mac"
+- compatible : Must contain "faraday,ftmac", as well as one of
+		the SoC specific identifiers:
+		"andestech,atmac100"
+		"moxa,moxart-mac"
 - reg : Should contain register location and length
 - interrupts : Should contain the mac interrupt number
 
diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt b/Documentation/devicetree/bindings/vendor-prefixes.txt
index 16d3b5e..489c336 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.txt
+++ b/Documentation/devicetree/bindings/vendor-prefixes.txt
@@ -102,6 +102,7 @@ everest	Everest Semiconductor Co. Ltd.
 everspin	Everspin Technologies, Inc.
 excito	Excito
 ezchip	EZchip Semiconductor
+faraday	Faraday Technology Corporation
 fcs	Fairchild Semiconductor
 firefly	Firefly
 focaltech	FocalTech Systems Co.,Ltd
diff --git a/drivers/net/ethernet/faraday/ftmac100.c b/drivers/net/ethernet/faraday/ftmac100.c
index dce5f7b..5d70ee9 100644
--- a/drivers/net/ethernet/faraday/ftmac100.c
+++ b/drivers/net/ethernet/faraday/ftmac100.c
@@ -1172,11 +1172,17 @@ static int __exit ftmac100_remove(struct platform_device *pdev)
 	return 0;
 }
 
+static const struct of_device_id ftmac100_of_ids[] = {
+	{ .compatible = "andestech,atmac100" },
+	{ }
+};
+
 static struct platform_driver ftmac100_driver = {
 	.probe		= ftmac100_probe,
 	.remove		= __exit_p(ftmac100_remove),
 	.driver		= {
 		.name	= DRV_NAME,
+		.of_match_table = ftmac100_of_ids
 	},
 };
 
@@ -1200,3 +1206,4 @@ static void __exit ftmac100_exit(void)
 MODULE_AUTHOR("Po-Yu Chuang <ratbert@faraday-tech.com>");
 MODULE_DESCRIPTION("FTMAC100 driver");
 MODULE_LICENSE("GPL");
+MODULE_DEVICE_TABLE(of, ftmac100_of_ids);
-- 
1.7.9.5

^ permalink raw reply related

* Re: [PATCH] stmmac: Enable Clause 45 PHYs in GMAC4 (eQOS)
From: Joao Pinto @ 2017-01-05 10:15 UTC (permalink / raw)
  To: Kweh, Hock Leong, Joao Pinto, davem@davemloft.net; +Cc: netdev@vger.kernel.org
In-Reply-To: <F54AEECA5E2B9541821D670476DAE19C5A91819D@PGSMSX102.gar.corp.intel.com>

Às 1:37 AM de 1/5/2017, Kweh, Hock Leong escreveu:
>> -----Original Message-----
>> From: Joao Pinto [mailto:Joao.Pinto@synopsys.com]
>> Sent: Wednesday, January 04, 2017 10:36 PM
>> To: davem@davemloft.net
>> Cc: Kweh, Hock Leong <hock.leong.kweh@intel.com>; netdev@vger.kernel.org;
>> Joao Pinto <Joao.Pinto@synopsys.com>
>> Subject: [PATCH] stmmac: Enable Clause 45 PHYs in GMAC4 (eQOS)
>>
>> The eQOS IP Core (best known in stmmac as gmac4) has a register that must be
>> set if using a Clause 45 PHY. If this register is not set, the PHY won't work.
>> This patch will have no impact in setups using Clause 22 PHYs.
>>
>> Signed-off-by: Joao Pinto <jpinto@synopsys.com>
> 
> Hi Joao,
> 
> This is not working on our environment. We are using the 4-ETH-4-MGB-101 plugin card.
> 
> Regards,
> Wilson

Hi Wilson and David,
I am using a different PHY and I only get it detecting the link with that bit
set. Thanks for your feedback, going to dig a bit more!

Joao


> 
>> ---
>>  drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
>> b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
>> index b0344c2..676ae3c 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c
>> @@ -41,6 +41,7 @@
>>  #define MII_GMAC4_GOC_SHIFT		2
>>  #define MII_GMAC4_WRITE			(1 << MII_GMAC4_GOC_SHIFT)
>>  #define MII_GMAC4_READ			(3 << MII_GMAC4_GOC_SHIFT)
>> +#define MII_CLAUSE45_PHY		(1 << 1)
>>
>>  static int stmmac_mdio_busy_wait(void __iomem *ioaddr, unsigned int
>> mii_addr)  { @@ -125,7 +126,7 @@ static int stmmac_mdio_write(struct
>> mii_bus *bus, int phyaddr, int phyreg,
>>  	value |= (priv->clk_csr << priv->hw->mii.clk_csr_shift)
>>  		& priv->hw->mii.clk_csr_mask;
>>  	if (priv->plat->has_gmac4)
>> -		value |= MII_GMAC4_WRITE;
>> +		value |= MII_GMAC4_WRITE | MII_CLAUSE45_PHY;
>>  	else
>>  		value |= MII_WRITE;
>>
>> --
>> 2.9.3
> 

^ permalink raw reply

* [PATCH v2 net-next] net:dsa: check for EPROBE_DEFER from dsa_dst_parse()
From: Volodymyr Bendiuga @ 2017-01-05 10:10 UTC (permalink / raw)
  To: andrew, vivien.didelot, f.fainelli, davem, netdev,
	volodymyr.bendiuga
  Cc: Volodymyr Bendiuga

Since there can be multiple dsa switches stacked together but
not all of devicetree nodes available at the time of calling
dsa_dst_parse(), EPROBE_DEFER can be returned by it. When this
happens, only the last dsa switch has to be deleted by
dsa_dst_del_ds(), but not the whole list, because next time linux
cames back to this function it will try to add only the last dsa
switch which returned EPROBE_DEFER.

Signed-off-by: Volodymyr Bendiuga <volodymyr.bendiuga@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
---
 net/dsa/dsa2.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 7924c92..a799718 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -673,8 +673,14 @@ static int _dsa_register_switch(struct dsa_switch *ds, struct device_node *np)
 	}

 	err = dsa_dst_parse(dst);
-	if (err)
+	if (err) {
+		if (err == -EPROBE_DEFER) {
+			dsa_dst_del_ds(dst, ds, ds->index);
+			return err;
+		}
+
 		goto out_del_dst;
+	}

 	err = dsa_dst_apply(dst);
 	if (err) {
-- 
2.7.4

^ permalink raw reply related

* [PATCH net-next V2 3/3] net/act_pedit: Introduce 'add' operation
From: Amir Vadai @ 2017-01-05  9:54 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Jiri Pirko, Or Gerlitz, Hadar Har-Zion, Amir Vadai
In-Reply-To: <20170105095454.32644-1-amir@vadai.me>

This command could be useful to inc/dec fields.
Command type is embedded inside the existing shift field in an unused
bits, therefore UAPI backward compatibility is being kept.

For example, to forward any TCP packet and decrease its TTL:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
    flower ip_proto tcp \
    action pedit munge ip ttl add 0xff pipe \
    action mirred egress redirect dev veth0

In the example above, adding 0xff to this u8 field is actually
decreasing it by one, since the operation is masked.

Signed-off-by: Amir Vadai <amir@vadai.me>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 include/uapi/linux/tc_act/tc_pedit.h | 10 ++++++++++
 net/sched/act_pedit.c                | 16 +++++++++++++++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/tc_act/tc_pedit.h b/include/uapi/linux/tc_act/tc_pedit.h
index 604e6729ad38..80028cd0bb1b 100644
--- a/include/uapi/linux/tc_act/tc_pedit.h
+++ b/include/uapi/linux/tc_act/tc_pedit.h
@@ -35,8 +35,13 @@ struct tc_pedit_sel {
 #define PEDIT_TYPE_SHIFT 24
 #define PEDIT_TYPE_MASK 0xff
 
+#define PEDIT_CMD_SHIFT 16
+#define PEDIT_CMD_MASK 0xff
+
 #define PEDIT_TYPE_GET(_val) \
 	(((_val) >> PEDIT_TYPE_SHIFT) & PEDIT_TYPE_MASK)
+#define PEDIT_CMD_GET(_val) \
+	(((_val) >> PEDIT_CMD_SHIFT) & PEDIT_CMD_MASK)
 #define PEDIT_SHIFT_GET(_val) ((_val) & 0xff)
 
 enum pedit_header_type {
@@ -49,4 +54,9 @@ enum pedit_header_type {
 	PEDIT_HDR_TYPE_UDP = 5,
 };
 
+enum pedit_cmd {
+	PEDIT_CMD_SET = 0,
+	PEDIT_CMD_ADD = 1,
+};
+
 #endif
diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
index 4b9c7184c752..aa137d51bf7f 100644
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -169,6 +169,7 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
 			u32 *ptr, _data;
 			int offset = tkey->off;
 			int hoffset;
+			u32 val;
 			int rc;
 			enum pedit_header_type htype =
 				PEDIT_TYPE_GET(tkey->shift);
@@ -214,7 +215,20 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
 			if (!ptr)
 				goto bad;
 			/* just do it, baby */
-			*ptr = ((*ptr & tkey->mask) ^ tkey->val);
+			switch (PEDIT_CMD_GET(tkey->shift)) {
+			case PEDIT_CMD_SET:
+				val = tkey->val;
+				break;
+			case PEDIT_CMD_ADD:
+				val = (*ptr + tkey->val) & ~tkey->mask;
+				break;
+			default:
+				pr_info("tc filter pedit bad command (%d)\n",
+					PEDIT_CMD_GET(tkey->shift));
+				goto bad;
+			}
+
+			*ptr = ((*ptr & tkey->mask) ^ val);
 			if (ptr == &_data)
 				skb_store_bits(skb, hoffset + offset, ptr, 4);
 		}
-- 
2.11.0

^ permalink raw reply related

* [PATCH net-next V2 2/3] net/act_pedit: Support using offset relative to the conventional network headers
From: Amir Vadai @ 2017-01-05  9:54 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Jiri Pirko, Or Gerlitz, Hadar Har-Zion, Amir Vadai
In-Reply-To: <20170105095454.32644-1-amir@vadai.me>

Extend pedit to enable the user setting offset relative to network
headers. This change would enable to work with more complex header
schemes (vs the simple IPv4 case) where setting a fixed offset relative
to the network header is not enough. It is also forward looking to
enable hardware offloading of pedit.

The header type is embedded in the 8 MSB of the u32 key->shift which
were never used till now. Therefore backward compatibility is being
kept.

Usage example:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
  flower \
    ip_proto tcp \
    dst_port 80 \
  action pedit munge tcp dport set 8080 pipe \
  action mirred egress redirect dev veth0

Will forward tcp port whose original dest port is 80, while modifying
the destination port to 8080.

Signed-off-by: Amir Vadai <amir@vadai.me>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 include/uapi/linux/tc_act/tc_pedit.h | 17 ++++++++++
 net/sched/act_pedit.c                | 65 +++++++++++++++++++++++++++++-------
 2 files changed, 70 insertions(+), 12 deletions(-)

diff --git a/include/uapi/linux/tc_act/tc_pedit.h b/include/uapi/linux/tc_act/tc_pedit.h
index 6389959a5157..604e6729ad38 100644
--- a/include/uapi/linux/tc_act/tc_pedit.h
+++ b/include/uapi/linux/tc_act/tc_pedit.h
@@ -32,4 +32,21 @@ struct tc_pedit_sel {
 };
 #define tc_pedit tc_pedit_sel
 
+#define PEDIT_TYPE_SHIFT 24
+#define PEDIT_TYPE_MASK 0xff
+
+#define PEDIT_TYPE_GET(_val) \
+	(((_val) >> PEDIT_TYPE_SHIFT) & PEDIT_TYPE_MASK)
+#define PEDIT_SHIFT_GET(_val) ((_val) & 0xff)
+
+enum pedit_header_type {
+	PEDIT_HDR_TYPE_RAW = 0,
+
+	PEDIT_HDR_TYPE_ETH = 1,
+	PEDIT_HDR_TYPE_IP4 = 2,
+	PEDIT_HDR_TYPE_IP6 = 3,
+	PEDIT_HDR_TYPE_TCP = 4,
+	PEDIT_HDR_TYPE_UDP = 5,
+};
+
 #endif
diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
index b27c4daec88f..4b9c7184c752 100644
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -119,18 +119,45 @@ static bool offset_valid(struct sk_buff *skb, int offset)
 	return true;
 }
 
+static int pedit_skb_hdr_offset(struct sk_buff *skb,
+				enum pedit_header_type htype, int *hoffset)
+{
+	int ret = -1;
+
+	switch (htype) {
+	case PEDIT_HDR_TYPE_ETH:
+		if (skb_mac_header_was_set(skb)) {
+			*hoffset = skb_mac_offset(skb);
+			ret = 0;
+		}
+		break;
+	case PEDIT_HDR_TYPE_RAW:
+	case PEDIT_HDR_TYPE_IP4:
+	case PEDIT_HDR_TYPE_IP6:
+		*hoffset = skb_network_offset(skb);
+		ret = 0;
+		break;
+	case PEDIT_HDR_TYPE_TCP:
+	case PEDIT_HDR_TYPE_UDP:
+		if (skb_transport_header_was_set(skb)) {
+			*hoffset = skb_transport_offset(skb);
+			ret = 0;
+		}
+		break;
+	};
+
+	return ret;
+}
+
 static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
 		     struct tcf_result *res)
 {
 	struct tcf_pedit *p = to_pedit(a);
 	int i;
-	unsigned int off;
 
 	if (skb_unclone(skb, GFP_ATOMIC))
 		return p->tcf_action;
 
-	off = skb_network_offset(skb);
-
 	spin_lock(&p->tcf_lock);
 
 	tcf_lastuse_update(&p->tcf_tm);
@@ -141,20 +168,32 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
 		for (i = p->tcfp_nkeys; i > 0; i--, tkey++) {
 			u32 *ptr, _data;
 			int offset = tkey->off;
+			int hoffset;
+			int rc;
+			enum pedit_header_type htype =
+				PEDIT_TYPE_GET(tkey->shift);
+
+			rc = pedit_skb_hdr_offset(skb, htype, &hoffset);
+			if (rc) {
+				pr_info("tc filter pedit bad header type specified (0x%x)\n",
+					htype);
+				goto bad;
+			}
 
 			if (tkey->offmask) {
 				char *d, _d;
 
-				if (!offset_valid(skb, off + tkey->at)) {
+				if (!offset_valid(skb, hoffset + tkey->at)) {
 					pr_info("tc filter pedit 'at' offset %d out of bounds\n",
-						off + tkey->at);
+						hoffset + tkey->at);
 					goto bad;
 				}
-				d = skb_header_pointer(skb, off + tkey->at, 1,
-						       &_d);
+				d = skb_header_pointer(skb,
+						       hoffset + tkey->at,
+						       1, &_d);
 				if (!d)
 					goto bad;
-				offset += (*d & tkey->offmask) >> tkey->shift;
+				offset += (*d & tkey->offmask) >> PEDIT_SHIFT_GET(tkey->shift);
 			}
 
 			if (offset % 4) {
@@ -163,19 +202,21 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
 				goto bad;
 			}
 
-			if (!offset_valid(skb, off + offset)) {
+			if (!offset_valid(skb, hoffset + offset)) {
 				pr_info("tc filter pedit offset %d out of bounds\n",
-					offset);
+					hoffset + offset);
 				goto bad;
 			}
 
-			ptr = skb_header_pointer(skb, off + offset, 4, &_data);
+			ptr = skb_header_pointer(skb,
+						 hoffset + offset,
+						 4, &_data);
 			if (!ptr)
 				goto bad;
 			/* just do it, baby */
 			*ptr = ((*ptr & tkey->mask) ^ tkey->val);
 			if (ptr == &_data)
-				skb_store_bits(skb, off + offset, ptr, 4);
+				skb_store_bits(skb, hoffset + offset, ptr, 4);
 		}
 
 		goto done;
-- 
2.11.0

^ permalink raw reply related

* [PATCH net-next V2 0/3] net/sched: act_pedit: Use offset relative to conventional network headers
From: Amir Vadai @ 2017-01-05  9:54 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Jiri Pirko, Or Gerlitz, Hadar Har-Zion, Amir Vadai

Hi Dave,

This is a respin of the patchset. V1 was sent and didn't make it for 4.10.

You asked me [1] why did I use specific header names instead of layers (L2, L3...),
and I explained that it is on purpose, this extra information is planned to be used
by hardware drivers to offload the action.

Some FW/HW parser APIs are such that they need to get the specific header type (e.g
IPV4 or IPV6, TCP or UDP) and not only the networking level (e.g network or transport).

Enhancing the UAPI to allow for specifying that would allow the same flows to be
set into both SW and HW.

This patchset also makes pedit more robust. Currently fields offset is specified
by offset relative to the ip header, while using negative offsets for 
MAC layer fields.

This series enables the user to set offset relative to the relevant header.

This patch is reusing existing fields in a way where backward UAPI 
compatibility is being kept.

Usage example:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
   flower \
     ip_proto tcp \
    dst_port 80 \
   action \
       pedit munge ip ttl add 0xff \
       pedit munge tcp dport set 8080 \
     pipe action mirred egress redirect dev veth0

Will forward traffic destined to tcp dport 80, while modifying the
destination port to 8080, and decreasing the ttl by one.

I've uploaded a draft for the userspace [2] to make it easier to review and
test the patchset.

[1] - http://patchwork.ozlabs.org/patch/700909/
[2] - git: https://bitbucket.org/av42/iproute2.git
      branch: pedit

Patchset was tested and applied on top of upstream commit 57ea884b0dcf
("packet: fix panic in __packet_set_timestamp on tpacket_v3 in tx mode")

Thanks,
Amir

Amir Vadai (3):
  net/skbuff: Introduce skb_mac_offset()
  net/act_pedit: Support using offset relative to the conventional
    network headers
  net/act_pedit: Introduce 'add' operation

 include/linux/skbuff.h               |  5 +++
 include/uapi/linux/tc_act/tc_pedit.h | 27 ++++++++++++
 net/sched/act_pedit.c                | 81 ++++++++++++++++++++++++++++++------
 3 files changed, 100 insertions(+), 13 deletions(-)

-- 
2.11.0

^ permalink raw reply

* [PATCH net-next V2 1/3] net/skbuff: Introduce skb_mac_offset()
From: Amir Vadai @ 2017-01-05  9:54 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Jiri Pirko, Or Gerlitz, Hadar Har-Zion, Amir Vadai
In-Reply-To: <20170105095454.32644-1-amir@vadai.me>

Introduce skb_mac_offset() that could be used to get mac header offset.

Signed-off-by: Amir Vadai <amir@vadai.me>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 include/linux/skbuff.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index b53c0cfd417e..3d8f81f39c2b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2178,6 +2178,11 @@ static inline unsigned char *skb_mac_header(const struct sk_buff *skb)
 	return skb->head + skb->mac_header;
 }

+static inline int skb_mac_offset(const struct sk_buff *skb)
+{
+	return skb_mac_header(skb) - skb->data;
+}
+
 static inline int skb_mac_header_was_set(const struct sk_buff *skb)
 {
 	return skb->mac_header != (typeof(skb->mac_header))~0U;
-- 
2.11.0

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox