Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 1/6] net: ethernet: ti: netcp: add support of cpts
From: Richard Cochran @ 2016-11-30  9:44 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: David S. Miller, netdev, Mugunthan V N, Sekhar Nori, linux-kernel,
	linux-omap, Rob Herring, devicetree, Murali Karicheri,
	Wingman Kwok
In-Reply-To: <20161128230428.6872-2-grygorii.strashko@ti.com>

On Mon, Nov 28, 2016 at 05:04:23PM -0600, Grygorii Strashko wrote:
> @@ -678,6 +744,9 @@ struct gbe_priv {
>  	int				num_et_stats;
>  	/*  Lock for updating the hwstats */
>  	spinlock_t			hw_stats_lock;
> +
> +	int                             cpts_registered;

The usage of this counter is racy.

> +	struct cpts                     *cpts;
>  };

This ++ and -- business ...

> +static void gbe_register_cpts(struct gbe_priv *gbe_dev)
> +{
> +	if (!gbe_dev->cpts)
> +		return;
> +
> +	if (gbe_dev->cpts_registered > 0)
> +		goto done;
> +
> +	if (cpts_register(gbe_dev->cpts)) {
> +		dev_err(gbe_dev->dev, "error registering cpts device\n");
> +		return;
> +	}
> +
> +done:
> +	++gbe_dev->cpts_registered;
> +}
> +
> +static void gbe_unregister_cpts(struct gbe_priv *gbe_dev)
> +{
> +	if (!gbe_dev->cpts || (gbe_dev->cpts_registered <= 0))
> +		return;
> +
> +	if (--gbe_dev->cpts_registered)
> +		return;
> +
> +	cpts_unregister(gbe_dev->cpts);
> +}

is invoked from your open() and close() methods, but those methods
are not serialized among multiple ports.

Thanks,
Richard

^ permalink raw reply

* Re: [PATCH net 04/16] net: ethernet: aurora: nb8800: fix fixed-link phydev leaks
From: Mason @ 2016-11-30  9:36 UTC (permalink / raw)
  To: Johan Hovold, Mans Rullgard, Sebastian Frias
  Cc: netdev, LKML, David S. Miller, Joe Perches, Brian Norris
In-Reply-To: <1480357509-28074-5-git-send-email-johan@kernel.org>

On 28/11/2016 19:24, Johan Hovold wrote:

> Make sure to deregister and free any fixed-link PHY registered using
> of_phy_register_fixed_link() on probe errors and on driver unbind.
> 
> Fixes: c7dfe3abf40e ("net: ethernet: nb8800: support fixed-link DT node")
> Signed-off-by: Johan Hovold <johan@kernel.org>
> ---
>  drivers/net/ethernet/aurora/nb8800.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)

Did you use scripts/get_maintainer.pl ?

Neither the author of the driver (Mans) nor the author of
the code in question (Sebastian) were CCed on this patch.

It looks like the CC list was truncated, the last entry being

  Vivien Didelot <

Regards.


> diff --git a/drivers/net/ethernet/aurora/nb8800.c b/drivers/net/ethernet/aurora/nb8800.c
> index 00c38bf151e6..e078d8da978c 100644
> --- a/drivers/net/ethernet/aurora/nb8800.c
> +++ b/drivers/net/ethernet/aurora/nb8800.c
> @@ -1466,12 +1466,12 @@ static int nb8800_probe(struct platform_device *pdev)
>  
>  	ret = nb8800_hw_init(dev);
>  	if (ret)
> -		goto err_free_bus;
> +		goto err_deregister_fixed_link;
>  
>  	if (ops && ops->init) {
>  		ret = ops->init(dev);
>  		if (ret)
> -			goto err_free_bus;
> +			goto err_deregister_fixed_link;
>  	}
>  
>  	dev->netdev_ops = &nb8800_netdev_ops;
> @@ -1504,6 +1504,9 @@ static int nb8800_probe(struct platform_device *pdev)
>  
>  err_free_dma:
>  	nb8800_dma_free(dev);
> +err_deregister_fixed_link:
> +	if (of_phy_is_fixed_link(pdev->dev.of_node))
> +		of_phy_deregister_fixed_link(pdev->dev.of_node);
>  err_free_bus:
>  	of_node_put(priv->phy_node);
>  	mdiobus_unregister(bus);
> @@ -1521,6 +1524,8 @@ static int nb8800_remove(struct platform_device *pdev)
>  	struct nb8800_priv *priv = netdev_priv(ndev);
>  
>  	unregister_netdev(ndev);
> +	if (of_phy_is_fixed_link(pdev->dev.of_node))
> +		of_phy_deregister_fixed_link(pdev->dev.of_node);
>  	of_node_put(priv->phy_node);
>  
>  	mdiobus_unregister(priv->mii_bus);

^ permalink raw reply

* Re: [PATCH] cpsw: ethtool: add support for nway reset
From: Yegor Yefremov @ 2016-11-30  9:31 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, linux-omap@vger.kernel.org, Grygorii Strashko,
	N, Mugunthan V
In-Reply-To: <20161129.194158.1482957574907674651.davem@davemloft.net>

Hi David,

On Wed, Nov 30, 2016 at 1:41 AM, David Miller <davem@davemloft.net> wrote:
> From: yegorslists@googlemail.com
> Date: Mon, 28 Nov 2016 10:47:52 +0100
>
>> From: Yegor Yefremov <yegorslists@googlemail.com>
>>
>> This patch adds support for ethtool's '-r' command. Restarting
>> N-WAY negotiation can be useful to activate newly changed EEE
>> settings etc.
>>
>> Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com>
>
> This doesn't apply cleanly to net-next.

My previous patch [1] doesn't show up in net-next. This could explain,
why nway patch doesn't apply.
Should I resend them both as series?

[1] http://marc.info/?l=linux-omap&m=148036822211869&w=2

Yegor

^ permalink raw reply

* Re: [Patch net-next] audit: remove useless synchronize_net()
From: Richard Guy Briggs @ 2016-11-30  9:16 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, linux-audit, pmoore
In-Reply-To: <1480439696-21818-1-git-send-email-xiyou.wangcong@gmail.com>

On 2016-11-29 09:14, Cong Wang wrote:
> netlink kernel socket is protected by refcount, not RCU.
> Its rcv path is neither protected by RCU. So the synchronize_net()
> is just pointless.

If I understand correctly, xfrm_user_net_exit() usage of
RCU_INIT_POINTER() and synchronize_net() is similarly pointless?  Also
net/phonet/socket.c?  I probably modelled things based on the former...

> Cc: Richard Guy Briggs <rgb@redhat.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
>  kernel/audit.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 92c463d..67b9fbd8 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -1172,9 +1172,8 @@ static void __net_exit audit_net_exit(struct net *net)
>  		audit_sock = NULL;
>  	}
>  
> -	RCU_INIT_POINTER(aunet->nlsk, NULL);
> -	synchronize_net();
>  	netlink_kernel_release(sock);
> +	aunet->nlsk = NULL;
>  }
>  
>  static struct pernet_operations audit_net_ops __net_initdata = {
> -- 
> 2.1.0
> 

- RGB

^ permalink raw reply

* Re: [PATCH  v2 13/13] net: ethernet: ti: cpts: fix overflow check period
From: Richard Cochran @ 2016-11-30  9:12 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: David S. Miller, netdev, Mugunthan V N, Sekhar Nori, linux-kernel,
	linux-omap, Rob Herring, devicetree, Murali Karicheri,
	Wingman Kwok, John Stultz, Thomas Gleixner
In-Reply-To: <20161128230337.6731-14-grygorii.strashko@ti.com>

On Mon, Nov 28, 2016 at 05:03:37PM -0600, Grygorii Strashko wrote:
> The CPTS drivers uses 8sec period for overflow checking with
> assumption that CPTS retclk will not exceed 500MHz. But that's not
> true on some TI platforms (Kesytone 2). As result, it is possible that
> CPTS counter will overflow more than once between two readings.
> 
> Hence, fix it by selecting overflow check period dynamically as
> max_sec_before_overflow/2, where
>  max_sec_before_overflow = max_counter_val / rftclk_freq.
> 
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>

Acked-by: Richard Cochran <richardcochran@gmail.com>

^ permalink raw reply

* [PATCH net-next 3/3] net/act_pedit: Introduce 'add' operation
From: Amir Vadai @ 2016-11-30  9:09 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Jamal Hadi Salim, Or Gerlitz, Hadar Har-Zion, Amir Vadai
In-Reply-To: <20161130090928.14816-1-amir@vadai.me>

This command could be useful to inc/dec fields.

For example, to forward any TCP packet and decrease its TTL:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
    flower ip_proto tcp \
    action pedit munge ip ttl add 0xff pipe \
    action mirred egress redirect dev veth0

In the example above, adding 0xff to this u8 field is actually
decreasing it by one, since the operation is masked.

Signed-off-by: Amir Vadai <amir@vadai.me>
---
 include/uapi/linux/tc_act/tc_pedit.h | 10 ++++++++++
 net/sched/act_pedit.c                | 16 +++++++++++++++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/tc_act/tc_pedit.h b/include/uapi/linux/tc_act/tc_pedit.h
index 604e6729ad38..80028cd0bb1b 100644
--- a/include/uapi/linux/tc_act/tc_pedit.h
+++ b/include/uapi/linux/tc_act/tc_pedit.h
@@ -35,8 +35,13 @@ struct tc_pedit_sel {
 #define PEDIT_TYPE_SHIFT 24
 #define PEDIT_TYPE_MASK 0xff
 
+#define PEDIT_CMD_SHIFT 16
+#define PEDIT_CMD_MASK 0xff
+
 #define PEDIT_TYPE_GET(_val) \
 	(((_val) >> PEDIT_TYPE_SHIFT) & PEDIT_TYPE_MASK)
+#define PEDIT_CMD_GET(_val) \
+	(((_val) >> PEDIT_CMD_SHIFT) & PEDIT_CMD_MASK)
 #define PEDIT_SHIFT_GET(_val) ((_val) & 0xff)
 
 enum pedit_header_type {
@@ -49,4 +54,9 @@ enum pedit_header_type {
 	PEDIT_HDR_TYPE_UDP = 5,
 };
 
+enum pedit_cmd {
+	PEDIT_CMD_SET = 0,
+	PEDIT_CMD_ADD = 1,
+};
+
 #endif
diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
index 4b9c7184c752..aa137d51bf7f 100644
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -169,6 +169,7 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
 			u32 *ptr, _data;
 			int offset = tkey->off;
 			int hoffset;
+			u32 val;
 			int rc;
 			enum pedit_header_type htype =
 				PEDIT_TYPE_GET(tkey->shift);
@@ -214,7 +215,20 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
 			if (!ptr)
 				goto bad;
 			/* just do it, baby */
-			*ptr = ((*ptr & tkey->mask) ^ tkey->val);
+			switch (PEDIT_CMD_GET(tkey->shift)) {
+			case PEDIT_CMD_SET:
+				val = tkey->val;
+				break;
+			case PEDIT_CMD_ADD:
+				val = (*ptr + tkey->val) & ~tkey->mask;
+				break;
+			default:
+				pr_info("tc filter pedit bad command (%d)\n",
+					PEDIT_CMD_GET(tkey->shift));
+				goto bad;
+			}
+
+			*ptr = ((*ptr & tkey->mask) ^ val);
 			if (ptr == &_data)
 				skb_store_bits(skb, hoffset + offset, ptr, 4);
 		}
-- 
2.10.2

^ permalink raw reply related

* [PATCH net-next 2/3] net/act_pedit: Support using offset relative to the conventional network headers
From: Amir Vadai @ 2016-11-30  9:09 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Jamal Hadi Salim, Or Gerlitz, Hadar Har-Zion, Amir Vadai
In-Reply-To: <20161130090928.14816-1-amir@vadai.me>

Extend pedit to enable the user using offset relative to network
headers.  This change would enable to work with more complex header
schemes (vs the simple IPv4 case) where setting a fixed offset relative
to the network header is not enough. It is also forward looking to
enable hardware offloading of pedit more easier.

The header type is embedded in the 8 MSB of the u32 key->shift which
were never used till now. Therefore backward compatibility is being
kept.

Usage example:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
  flower \
    ip_proto tcp \
    src_port 80 \
  action pedit munge tcp dport set 8080 pipe \
  action mirred egress redirect dev veth0

Will forward traffic to tcp port 80, and modify the destination port to
8080.

hange-Id: Ibd7bbbe0b8c2f6adae0591868bb6892c55e75732
Signed-off-by: Amir Vadai <amir@vadai.me>
---
 include/uapi/linux/tc_act/tc_pedit.h | 17 ++++++++++
 net/sched/act_pedit.c                | 65 +++++++++++++++++++++++++++++-------
 2 files changed, 70 insertions(+), 12 deletions(-)

diff --git a/include/uapi/linux/tc_act/tc_pedit.h b/include/uapi/linux/tc_act/tc_pedit.h
index 6389959a5157..604e6729ad38 100644
--- a/include/uapi/linux/tc_act/tc_pedit.h
+++ b/include/uapi/linux/tc_act/tc_pedit.h
@@ -32,4 +32,21 @@ struct tc_pedit_sel {
 };
 #define tc_pedit tc_pedit_sel
 
+#define PEDIT_TYPE_SHIFT 24
+#define PEDIT_TYPE_MASK 0xff
+
+#define PEDIT_TYPE_GET(_val) \
+	(((_val) >> PEDIT_TYPE_SHIFT) & PEDIT_TYPE_MASK)
+#define PEDIT_SHIFT_GET(_val) ((_val) & 0xff)
+
+enum pedit_header_type {
+	PEDIT_HDR_TYPE_RAW = 0,
+
+	PEDIT_HDR_TYPE_ETH = 1,
+	PEDIT_HDR_TYPE_IP4 = 2,
+	PEDIT_HDR_TYPE_IP6 = 3,
+	PEDIT_HDR_TYPE_TCP = 4,
+	PEDIT_HDR_TYPE_UDP = 5,
+};
+
 #endif
diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
index b27c4daec88f..4b9c7184c752 100644
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -119,18 +119,45 @@ static bool offset_valid(struct sk_buff *skb, int offset)
 	return true;
 }
 
+static int pedit_skb_hdr_offset(struct sk_buff *skb,
+				enum pedit_header_type htype, int *hoffset)
+{
+	int ret = -1;
+
+	switch (htype) {
+	case PEDIT_HDR_TYPE_ETH:
+		if (skb_mac_header_was_set(skb)) {
+			*hoffset = skb_mac_offset(skb);
+			ret = 0;
+		}
+		break;
+	case PEDIT_HDR_TYPE_RAW:
+	case PEDIT_HDR_TYPE_IP4:
+	case PEDIT_HDR_TYPE_IP6:
+		*hoffset = skb_network_offset(skb);
+		ret = 0;
+		break;
+	case PEDIT_HDR_TYPE_TCP:
+	case PEDIT_HDR_TYPE_UDP:
+		if (skb_transport_header_was_set(skb)) {
+			*hoffset = skb_transport_offset(skb);
+			ret = 0;
+		}
+		break;
+	};
+
+	return ret;
+}
+
 static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
 		     struct tcf_result *res)
 {
 	struct tcf_pedit *p = to_pedit(a);
 	int i;
-	unsigned int off;
 
 	if (skb_unclone(skb, GFP_ATOMIC))
 		return p->tcf_action;
 
-	off = skb_network_offset(skb);
-
 	spin_lock(&p->tcf_lock);
 
 	tcf_lastuse_update(&p->tcf_tm);
@@ -141,20 +168,32 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
 		for (i = p->tcfp_nkeys; i > 0; i--, tkey++) {
 			u32 *ptr, _data;
 			int offset = tkey->off;
+			int hoffset;
+			int rc;
+			enum pedit_header_type htype =
+				PEDIT_TYPE_GET(tkey->shift);
+
+			rc = pedit_skb_hdr_offset(skb, htype, &hoffset);
+			if (rc) {
+				pr_info("tc filter pedit bad header type specified (0x%x)\n",
+					htype);
+				goto bad;
+			}
 
 			if (tkey->offmask) {
 				char *d, _d;
 
-				if (!offset_valid(skb, off + tkey->at)) {
+				if (!offset_valid(skb, hoffset + tkey->at)) {
 					pr_info("tc filter pedit 'at' offset %d out of bounds\n",
-						off + tkey->at);
+						hoffset + tkey->at);
 					goto bad;
 				}
-				d = skb_header_pointer(skb, off + tkey->at, 1,
-						       &_d);
+				d = skb_header_pointer(skb,
+						       hoffset + tkey->at,
+						       1, &_d);
 				if (!d)
 					goto bad;
-				offset += (*d & tkey->offmask) >> tkey->shift;
+				offset += (*d & tkey->offmask) >> PEDIT_SHIFT_GET(tkey->shift);
 			}
 
 			if (offset % 4) {
@@ -163,19 +202,21 @@ static int tcf_pedit(struct sk_buff *skb, const struct tc_action *a,
 				goto bad;
 			}
 
-			if (!offset_valid(skb, off + offset)) {
+			if (!offset_valid(skb, hoffset + offset)) {
 				pr_info("tc filter pedit offset %d out of bounds\n",
-					offset);
+					hoffset + offset);
 				goto bad;
 			}
 
-			ptr = skb_header_pointer(skb, off + offset, 4, &_data);
+			ptr = skb_header_pointer(skb,
+						 hoffset + offset,
+						 4, &_data);
 			if (!ptr)
 				goto bad;
 			/* just do it, baby */
 			*ptr = ((*ptr & tkey->mask) ^ tkey->val);
 			if (ptr == &_data)
-				skb_store_bits(skb, off + offset, ptr, 4);
+				skb_store_bits(skb, hoffset + offset, ptr, 4);
 		}
 
 		goto done;
-- 
2.10.2

^ permalink raw reply related

* [PATCH net-next 1/3] net/skbuff: Introduce skb_mac_offset()
From: Amir Vadai @ 2016-11-30  9:09 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Jamal Hadi Salim, Or Gerlitz, Hadar Har-Zion, Amir Vadai
In-Reply-To: <20161130090928.14816-1-amir@vadai.me>

Introduce skb_mac_offset() that could be used to get mac header offset.

Signed-off-by: Amir Vadai <amir@vadai.me>
---
 include/linux/skbuff.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 9c535fbccf2c..395eb5111df0 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2169,6 +2169,11 @@ static inline unsigned char *skb_mac_header(const struct sk_buff *skb)
 	return skb->head + skb->mac_header;
 }
 
+static inline int skb_mac_offset(const struct sk_buff *skb)
+{
+	return skb_mac_header(skb) - skb->data;
+}
+
 static inline int skb_mac_header_was_set(const struct sk_buff *skb)
 {
 	return skb->mac_header != (typeof(skb->mac_header))~0U;
-- 
2.10.2

^ permalink raw reply related

* [PATCH net-next 0/3] net/sched: act_pedit: Support using offset relative to the conventional network headers
From: Amir Vadai @ 2016-11-30  9:09 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Jamal Hadi Salim, Or Gerlitz, Hadar Har-Zion, Amir Vadai

Hi,

Patch 1/3 ("net/skbuff: Introduce skb_mac_offset()") adds a utility function to
get mac header offset.

Patch 2/3 ("net/act_pedit: Support using offset relative to the conventional
network headers") extends pedit to enable the user to set offset relative to
MAC/IPv4/IPv6/TCP network headers.
This would enable to work with more complex header schemes (vs the simple IPv4
case) where setting a fixed offset relative to the network header is not
enough. It is also forward looking to enable hardware offloading of pedit more
easier.

The header type is embedded in the 8 MSB of the u32 key->shift which
were never used till now. Therefore backward compatibility is being
kept.

Patch 3/3 ("net/act_pedit: Introduce 'add' operation") add a new operation to
increase the value of a header field. The operation is passed on another free
8bit in the key->shift.

Usage example:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
  flower \
    ip_proto tcp \
    src_port 80 \
  action \
	  pedit munge ip ttl add 0xff \
	  pedit munge tcp dport set 8080 \
	pipe action mirred egress redirect dev veth0

Will forward traffic with tcp dport 80, and modify the destination port to
8080, and decrease the ttl by 1.

I've uploaded a draft for the userspace [2] to make it easier to review and
test the patchset.

The patchset will conflict if already accepted patch [1] from net is missing.
It was applied and tested with [1] on top of commit 93ba22225504 ("hv_netvsc:
remove excessive logging on MTU change").

[1] - 95c2027bfeda ("net/sched: pedit: make sure that offset is valid")
[2] - git: https://bitbucket.org/av42/iproute2.git
      branch: pedit

Thanks,
Amir

Amir Vadai (3):
  net/skbuff: Introduce skb_mac_offset()
  net/act_pedit: Support using offset relative to the conventional
    network headers
  net/act_pedit: Introduce 'add' operation

 include/linux/skbuff.h               |  5 +++
 include/uapi/linux/tc_act/tc_pedit.h | 27 ++++++++++++
 net/sched/act_pedit.c                | 81 ++++++++++++++++++++++++++++++------
 3 files changed, 100 insertions(+), 13 deletions(-)

-- 
2.10.2

^ permalink raw reply

* Re: [PATCH net-next v3 3/4] bpf: BPF for lightweight tunnel infrastructure
From: Thomas Graf @ 2016-11-30  8:57 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: davem, netdev, daniel, tom, roopa, hannes
In-Reply-To: <20161130070150.GA33397@ast-mbp.thefacebook.com>

On 11/29/16 at 11:01pm, Alexei Starovoitov wrote:
> On Wed, Nov 30, 2016 at 07:48:51AM +0100, Thomas Graf wrote:
> > Should we check in __bpf_redirect_common() whether mac_header <
> > nework_header then or add it to lwt-bpf conditional on
> > dev_is_mac_header_xmit()?
> 
> may be only extra 'if' in lwt-bpf is all we need?

Agreed, I will add a mac_header < network_header check to lwt-bpf if we
redirect to an l2 device.

> I'm still missing what will happen if we 'forget' to do
> bpf_skb_push() inside the lwt-bpf program, but still do redirect
> in lwt_xmit stage to l2 netdev...

The same as for a AF_PACKET socket not providing an actual L2 header.
I will add a test case to cover this scenario as well.

^ permalink raw reply

* Re: [PATCH v3] ethernet :mellanox :mlx4: Replace pci_pool_alloc by pci_pool_zalloc
From: Tariq Toukan @ 2016-11-30  8:44 UTC (permalink / raw)
  To: Souptick Joarder, sergei.shtylyov, yishaih
  Cc: netdev, linux-rdma, sahu.rameshwar73
In-Reply-To: <20161129194611.GA4088@jordon-HP-15-Notebook-PC>

Hi Souptic,

Thanks for your patch.

On 29/11/2016 9:46 PM, Souptick Joarder wrote:
> In mlx4_alloc_cmd_mailbox(), pci_pool_alloc() followed by memset will be
> replaced by pci_pool_zalloc()
>
> Signed-off-by: Souptick joarder <jrdr.linux@gmail.com>
> ---
> v3:
>    - Fixed alignment issues
As mentioned already, you mean 'Remove empty line'.
>
> v2:
>    - Address comment from sergei
>      Alignment was not proper
>
>   drivers/net/ethernet/mellanox/mlx4/cmd.c | 6 ++----
>   1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
> index e36bebc..a49072b4 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
> @@ -2679,15 +2679,13 @@ struct mlx4_cmd_mailbox *mlx4_alloc_cmd_mailbox(struct mlx4_dev *dev)
>   	if (!mailbox)
>   		return ERR_PTR(-ENOMEM);
>
> -	mailbox->buf = pci_pool_alloc(mlx4_priv(dev)->cmd.pool, GFP_KERNEL,
> -				      &mailbox->dma);
> +	mailbox->buf = pci_pool_zalloc(mlx4_priv(dev)->cmd.pool, GFP_KERNEL,
> +				       &mailbox->dma);
>   	if (!mailbox->buf) {
>   		kfree(mailbox);
>   		return ERR_PTR(-ENOMEM);
>   	}
>
> -	memset(mailbox->buf, 0, MLX4_MAILBOX_SIZE);
> -
>   	return mailbox;
>   }
>   EXPORT_SYMBOL_GPL(mlx4_alloc_cmd_mailbox);
> --
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>

Thanks,
Tariq

^ permalink raw reply

* [net-next] rtnetlink: return the correct error code
From: Zhang Shengju @ 2016-11-30  8:37 UTC (permalink / raw)
  To: netdev

Before this patch, function ndo_dflt_fdb_dump() will always return code
from uc fdb dump. The reture code of mc fdb dump is lost.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
---
 net/core/rtnetlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 4e60525..061415f 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -3165,7 +3165,7 @@ int ndo_dflt_fdb_dump(struct sk_buff *skb,
 	err = nlmsg_populate_fdb(skb, cb, dev, idx, &dev->uc);
 	if (err)
 		goto out;
-	nlmsg_populate_fdb(skb, cb, dev, idx, &dev->mc);
+	err = nlmsg_populate_fdb(skb, cb, dev, idx, &dev->mc);
 out:
 	netif_addr_unlock_bh(dev);
 	return err;
-- 
1.8.3.1

^ permalink raw reply related

* [iproute PATCH] man: ip-route.8: Add notes about dropped IPv4 route cache
From: Phil Sutter @ 2016-11-30  8:29 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 man/man8/ip-route.8.in | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/man/man8/ip-route.8.in b/man/man8/ip-route.8.in
index d4fae3cc783ba..c0acaa0020ef7 100644
--- a/man/man8/ip-route.8.in
+++ b/man/man8/ip-route.8.in
@@ -924,6 +924,12 @@ routes are left unchanged. Any routes specified in the data stream that
 already exist in the table will be ignored.
 .RE
 
+.SH NOTES
+Starting with Linux kernel version 3.6, there is no routing cache for IPv4
+anymore. Hence
+.B "ip route show cached"
+will never print any entries on systems with this or newer kernel versions.
+
 .SH EXAMPLES
 .PP
 ip ro
-- 
2.10.0

^ permalink raw reply related

* [PATCH] net: asix: Fix AX88772_suspend() USB vendor commands failure issues
From: ASIX_Allan [Office] @ 2016-11-30  8:29 UTC (permalink / raw)
  To: 'Jon Hunter', freddy-knRN6Y/kmf1NUHwG+Fw1Kw,
	Dean_Jenkins-nmGgyN9QBj3QT0dZR+AlfA,
	Mark_Craske-nmGgyN9QBj3QT0dZR+AlfA, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	robert.foss-ZGY8ohtN/8qB+jHODAdFcQ,
	ivecera-H+wXaHxf7aLQT0dZR+AlfA,
	john.stultz-QSEj5FYQhm4dnm+yROfE0A,
	vpalatin-F7+t8E8rja9g9hUCZPvPmw,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	grundler-F7+t8E8rja9g9hUCZPvPmw,
	changchias-Re5JQEeQqe8AvxtiuMwx3w, andrew-g2DYL2Zd6BY,
	tremyfr-Re5JQEeQqe8AvxtiuMwx3w, colin.king-Z7WLFzj8eWMS+FvcfC7Uqw,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	vpalatin-hpIqsD4AKlfQT0dZR+AlfA


The change fixes AX88772_suspend() USB vendor commands failure issues.

Signed-off-by: Allan Chou <allan-knRN6Y/kmf1NUHwG+Fw1Kw@public.gmane.org>
Tested-by: Allan Chou <allan-knRN6Y/kmf1NUHwG+Fw1Kw@public.gmane.org>
Tested-by: Jon Hunter <jonathanh-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>

---
--- a/drivers/net/usb/asix_devices.c	2016-11-28 05:08:04.000000000 +0800
+++ b/drivers/net/usb/asix_devices.c	2016-11-30 09:31:54.000000000 +0800
@@ -603,12 +603,12 @@ static void ax88772_suspend(struct usbne
 	u16 medium;
 
 	/* Stop MAC operation */
-	medium = asix_read_medium_status(dev, 0);
+	medium = asix_read_medium_status(dev, 1);
 	medium &= ~AX_MEDIUM_RE;
-	asix_write_medium_mode(dev, medium, 0);
+	asix_write_medium_mode(dev, medium, 1);
 
 	netdev_dbg(dev->net, "ax88772_suspend: medium=0x%04x\n",
-		   asix_read_medium_status(dev, 0));
+		   asix_read_medium_status(dev, 1));
 
 	/* Preserve BMCR for restoring */
 	priv->presvd_phy_bmcr =


--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [PATCH v4 3/5] net: asix: Fix AX88772x resume failures
From: ASIX_Allan [Office] @ 2016-11-30  8:28 UTC (permalink / raw)
  To: 'Jon Hunter', freddy-knRN6Y/kmf1NUHwG+Fw1Kw,
	Dean_Jenkins-nmGgyN9QBj3QT0dZR+AlfA,
	Mark_Craske-nmGgyN9QBj3QT0dZR+AlfA, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	robert.foss-ZGY8ohtN/8qB+jHODAdFcQ,
	ivecera-H+wXaHxf7aLQT0dZR+AlfA,
	john.stultz-QSEj5FYQhm4dnm+yROfE0A,
	vpalatin-F7+t8E8rja9g9hUCZPvPmw,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	grundler-F7+t8E8rja9g9hUCZPvPmw,
	changchias-Re5JQEeQqe8AvxtiuMwx3w, andrew-g2DYL2Zd6BY,
	tremyfr-Re5JQEeQqe8AvxtiuMwx3w, colin.king-Z7WLFzj8eWMS+FvcfC7Uqw,
	linux-usb-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	vpalatin-hpIqsD4AKlfQT0dZR+AlfA
In-Reply-To: <b24b2fa4-88d8-5772-b79c-aa2cadd7aa1e-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>

Dear Jon,

Thanks a lot for your reminding. I will submit a new driver patch soon.

---
Best regards,
Allan Chou


-----Original Message-----
From: Jon Hunter [mailto:jonathanh-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org] 
Sent: Wednesday, November 30, 2016 4:08 PM
To: allan-knRN6Y/kmf1NUHwG+Fw1Kw@public.gmane.org; freddy-knRN6Y/kmf1NUHwG+Fw1Kw@public.gmane.org; Dean_Jenkins-nmGgyN9QBj3QT0dZR+AlfA@public.gmane.org;
Mark_Craske-nmGgyN9QBj3QT0dZR+AlfA@public.gmane.org; davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org; robert.foss-ZGY8ohtN/8qB+jHODAdFcQ@public.gmane.org;
ivecera-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org; john.stultz-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org; vpalatin-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org;
stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ@public.gmane.org; grundler-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org; changchias-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org;
andrew-g2DYL2Zd6BY@public.gmane.org; tremyfr-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org; colin.king-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org;
linux-usb-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org;
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; vpalatin-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org
Subject: Re: [PATCH v4 3/5] net: asix: Fix AX88772x resume failures

Hi Allan,

On 30/11/16 03:03, ASIX_Allan [Office] wrote:
> The change fixes AX88772x resume failure by
> - Restore incorrect AX88772A PHY registers when resetting
> - Need to stop MAC operation when suspending
> - Need to restart MII when restoring PHY
> 
> Signed-off-by: Allan Chou <allan-knRN6Y/kmf1NUHwG+Fw1Kw@public.gmane.org>
> Signed-off-by: Robert Foss <robert.foss-ZGY8ohtN/8qB+jHODAdFcQ@public.gmane.org>
> Tested-by: Robert Foss <robert.foss-ZGY8ohtN/8qB+jHODAdFcQ@public.gmane.org>
> Tested-by: Jon Hunter <jonathanh-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> Tested-by: Allan Chou <allan-knRN6Y/kmf1NUHwG+Fw1Kw@public.gmane.org>

V3 of this patch is already in the current mainline branch. So you need to
send a patch on top of V3 (or v4.9-rc7) to get this fixed. Also you should
highlight the fact that this is a fix needed for v4.9.

Cheers
Jon

--
nvpublic

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH iproute2 V2 1/2] tc/cls_flower: Classify packet in ip tunnels
From: Jiri Pirko @ 2016-11-30  8:13 UTC (permalink / raw)
  To: Amir Vadai
  Cc: Stephen Hemminger, netdev, David S. Miller, Jiri Benc, Or Gerlitz,
	Hadar Har-Zion, Roi Dayan
In-Reply-To: <CAML6475YBZvreU4PEPWUWrEkW5EoGDCw1KS2+ndac7gMY2BQAQ@mail.gmail.com>

Wed, Nov 30, 2016 at 08:38:24AM CET, amirva@gmail.com wrote:
>On Wed, Nov 30, 2016 at 9:17 AM Amir Vadai <amir@vadai.me> wrote:
>
>> On Tue, Nov 29, 2016 at 07:26:58PM -0800, Stephen Hemminger wrote:
>> > The overall design is fine, just a couple nits with the code.
>> >
>> > > diff --git a/tc/f_flower.c b/tc/f_flower.c
>> > > index 2d31d1aa832d..1cf0750b5b83 100644
>> > > --- a/tc/f_flower.c
>> > > +++ b/tc/f_flower.c
>> >
>> > >
>> > > +static int flower_parse_key_id(char *str, int type, struct nlmsghdr
>> *n)
>> >
>> > str is not modified, therefore use: const char *str
>> ack
>>
>> >
>> > > +{
>> > > +   int ret;
>> > > +   __be32 key_id;
>> > > +
>> > > +   ret = get_be32(&key_id, str, 10);
>> > > +   if (ret)
>> > > +           return -1;
>> >
>> > Traditionally netlink attributes are in host order, why was flower
>> > chosen to be different?
>> I don't know, maybe Jiri (cc'ed) can explain, but it is all over the
>> flower code.
>>
>Now the right Jiri (Pirko) is CC'ed

There is a bunch of helpers inside the cls_flower.c to put and set
values, they work with generic char arrays of len.


>
>
>> >
>> > > +
>> > > +   addattr32(n, MAX_MSG, type, key_id);
>> > > +
>> > > +   return 0;
>> >
>> >
>> > Why lose the return value here?  Why not:
>> >
>> >       ret = get_be32(&key_id, str, 10);
>> >       if (!ret)
>> >               addattr32(n, MAX_MSG, type, key_id);
>> The get_*() function can return only -1 or 0. But you are right, and it
>> is better the way you suggested.  Changing accordingly in V3.
>>
>> >
>> > > +}
>> > > +
>> > >  static int flower_parse_opt(struct filter_util *qu, char *handle,
>> > >                         int argc, char **argv, struct nlmsghdr *n)
>> > >  {
>> > > @@ -339,6 +359,38 @@ static int flower_parse_opt(struct filter_util
>> *qu, char *handle,
>> > >                             fprintf(stderr, "Illegal \"src_port\"\n");
>> > >                             return -1;
>> > >                     }
>> > > +           } else if (matches(*argv, "enc_dst_ip") == 0) {
>> > > +                   NEXT_ARG();
>> > > +                   ret = flower_parse_ip_addr(*argv, 0,
>> > > +
>> TCA_FLOWER_KEY_ENC_IPV4_DST,
>> > > +
>> TCA_FLOWER_KEY_ENC_IPV4_DST_MASK,
>> > > +
>> TCA_FLOWER_KEY_ENC_IPV6_DST,
>> > > +
>> TCA_FLOWER_KEY_ENC_IPV6_DST_MASK,
>> > > +                                              n);
>> > > +                   if (ret < 0) {
>> > > +                           fprintf(stderr, "Illegal
>> \"enc_dst_ip\"\n");
>> > > +                           return -1;
>> > > +                   }
>> > > +           } else if (matches(*argv, "enc_src_ip") == 0) {
>> > > +                   NEXT_ARG();
>> > > +                   ret = flower_parse_ip_addr(*argv, 0,
>> > > +
>> TCA_FLOWER_KEY_ENC_IPV4_SRC,
>> > > +
>> TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK,
>> > > +
>> TCA_FLOWER_KEY_ENC_IPV6_SRC,
>> > > +
>> TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK,
>> > > +                                              n);
>> > > +                   if (ret < 0) {
>> > > +                           fprintf(stderr, "Illegal
>> \"enc_src_ip\"\n");
>> > > +                           return -1;
>> > > +                   }
>> > > +           } else if (matches(*argv, "enc_key_id") == 0) {
>> > > +                   NEXT_ARG();
>> > > +                   ret = flower_parse_key_id(*argv,
>> > > +
>>  TCA_FLOWER_KEY_ENC_KEY_ID, n);
>> > > +                   if (ret < 0) {
>> > > +                           fprintf(stderr, "Illegal
>> \"enc_key_id\"\n");
>> > > +                           return -1;
>> > > +                   }
>> > >             } else if (matches(*argv, "action") == 0) {
>> > >                     NEXT_ARG();
>> > >                     ret = parse_action(&argc, &argv, TCA_FLOWER_ACT,
>> n);
>> > > @@ -509,6 +561,14 @@ static void flower_print_port(FILE *f, char
>> *name, __u8 ip_proto,
>> > >     fprintf(f, "\n  %s %d", name, ntohs(rta_getattr_u16(attr)));
>> > >  }
>> > >
>> > > +static void flower_print_key_id(FILE *f, char *name,
>> > > +                           struct rtattr *attr)
>> >
>> > const char *name?
>> ack
>>
>> >
>> >
>> > > +{
>> > > +   if (!attr)
>> > > +           return;
>> > > +   fprintf(f, "\n  %s %d", name, ntohl(rta_getattr_u32(attr)));
>> > > +}
>> > > +
>> >
>> > Why short circuit, just change the order:
>> >
>> >       if (attr)
>> >               fprintf(f, "\n  %s %s", name, ntohl(rta_getattr_u32(attr));
>> >
>> > You might also want to introduce rta_getattr_be32()
>> ack
>>
>> >
>> > Please change, retest and resubmit both patches.
>> ack
>>
>> Thanks for reviewing,
>> Amir
>>

^ permalink raw reply

* Re: [PATCH v4 3/5] net: asix: Fix AX88772x resume failures
From: Jon Hunter @ 2016-11-30  8:07 UTC (permalink / raw)
  To: allan, freddy, Dean_Jenkins, Mark_Craske, davem, robert.foss,
	ivecera, john.stultz, vpalatin, stephen, grundler, changchias,
	andrew, tremyfr, colin.king, linux-usb, netdev, linux-kernel,
	vpalatin
In-Reply-To: <00a601d24ab6$4e9274d0$ebb75e70$@asix.com.tw>

Hi Allan,

On 30/11/16 03:03, ASIX_Allan [Office] wrote:
> The change fixes AX88772x resume failure by
> - Restore incorrect AX88772A PHY registers when resetting
> - Need to stop MAC operation when suspending
> - Need to restart MII when restoring PHY
> 
> Signed-off-by: Allan Chou <allan@asix.com.tw>
> Signed-off-by: Robert Foss <robert.foss@collabora.com>
> Tested-by: Robert Foss <robert.foss@collabora.com>
> Tested-by: Jon Hunter <jonathanh@nvidia.com>
> Tested-by: Allan Chou <allan@asix.com.tw>

V3 of this patch is already in the current mainline branch. So you need
to send a patch on top of V3 (or v4.9-rc7) to get this fixed. Also you
should highlight the fact that this is a fix needed for v4.9.

Cheers
Jon

-- 
nvpublic

^ permalink raw reply

* Re: [RFC PATCH net-next v2] ipv6: implement consistent hashing for equal-cost multipath routing
From: David Lebrun @ 2016-11-30  7:56 UTC (permalink / raw)
  To: Hannes Frederic Sowa, netdev
In-Reply-To: <1480477952.3702850.803295033.367FD66D@webmail.messagingengine.com>

[-- Attachment #1: Type: text/plain, Size: 1667 bytes --]

On 11/30/2016 04:52 AM, Hannes Frederic Sowa wrote:
> In the worst case this causes 2GB (order 19) allocations (x == 31) to
> happen in GFP_ATOMIC (due to write lock) context and could cause update
> failures to the routing table due to fragmentation. Are you sure the
> upper limit of 31 is reasonable? I would very much prefer an upper limit
> of below or equal 25 for x to stay within the bounds of the slab
> allocators (which is still a lot and probably causes errors!).
> Unfortunately because of the nature of the sysctl you can't really
> create its own cache for it. :/
> 

Agreed. I think that even something like 16 would be excessively
sufficient, that would enable 65K slices, which is way more than enough
to have sufficient balancing with a reasonable amount of nexthops (I
wonder whether there are actual deployments with more than 32 nexthops
for a route).

> Also by design, one day this should all be RCU and having that much data
> outstanding worries me a bit during routing table mutation.
> 
> I am a fan of consistent hashing but I am not so sure if it belongs into
> a generic ECMP implementation or into its own ipvs or netfilter module
> where you specifically know how much memory to burn for it.
> 

The complexity of the consistent hashing code might warrant something
like that, but I am ot sure of the implications.

> Also please convert the sysctl to a netlink attribute if you pursue this
> because if I change the sysctl while my quagga is hammering the routing
> table I would like to know which nodes allocate what amount of memory.

Yes, that was the idea.

Thanks for the feedback

David


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 163 bytes --]

^ permalink raw reply

* [PATCH iproute2 V3 3/3] tc/act_tunnel: Introduce ip tunnel action
From: Amir Vadai @ 2016-11-30  7:38 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: netdev, David S. Miller, Or Gerlitz, Hadar Har-Zion, Jiri Pirko,
	Jiri Benc, Amir Vadai
In-Reply-To: <20161130073840.5269-1-amir@vadai.me>

This action could be used before redirecting packets to a shared tunnel
device, or when redirecting packets arriving from a such a device.

The 'unset' action is optional. It is used to explicitly unset the
metadata created by the tunnel device during decap. If not used, the
metadata will be released automatically by the kernel.
The 'set' operation, will set the metadata with the specified values for
the encap.

For example, the following flower filter will forward all ICMP packets
destined to 11.11.11.2 through the shared vxlan device 'vxlan0'. Before
redirecting, a metadata for the vxlan tunnel is created using the
tunnel_key action and it's arguments:

$ tc filter add dev net0 protocol ip parent ffff: \
    flower \
      ip_proto 1 \
      dst_ip 11.11.11.2 \
    action tunnel_key set \
      src_ip 11.11.0.1 \
      dst_ip 11.11.0.2 \
      id 11 \
    action mirred egress redirect dev vxlan0

Signed-off-by: Amir Vadai <amir@vadai.me>
---
 include/linux/tc_act/tc_tunnel_key.h |  42 ++++++
 man/man8/tc-tunnel_key.8             | 113 +++++++++++++++
 tc/Makefile                          |   1 +
 tc/m_tunnel_key.c                    | 258 +++++++++++++++++++++++++++++++++++
 4 files changed, 414 insertions(+)
 create mode 100644 include/linux/tc_act/tc_tunnel_key.h
 create mode 100644 man/man8/tc-tunnel_key.8
 create mode 100644 tc/m_tunnel_key.c

diff --git a/include/linux/tc_act/tc_tunnel_key.h b/include/linux/tc_act/tc_tunnel_key.h
new file mode 100644
index 000000000000..f9ddf5369a45
--- /dev/null
+++ b/include/linux/tc_act/tc_tunnel_key.h
@@ -0,0 +1,42 @@
+/*
+ * Copyright (c) 2016, Amir Vadai <amir@vadai.me>
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef __LINUX_TC_TUNNEL_KEY_H
+#define __LINUX_TC_TUNNEL_KEY_H
+
+#include <linux/pkt_cls.h>
+
+#define TCA_ACT_TUNNEL_KEY 17
+
+#define TCA_TUNNEL_KEY_ACT_SET	    1
+#define TCA_TUNNEL_KEY_ACT_RELEASE  2
+
+struct tc_tunnel_key {
+	tc_gen;
+	int t_action;
+};
+
+enum {
+	TCA_TUNNEL_KEY_UNSPEC,
+	TCA_TUNNEL_KEY_TM,
+	TCA_TUNNEL_KEY_PARMS,
+	TCA_TUNNEL_KEY_ENC_IPV4_SRC,	/* be32 */
+	TCA_TUNNEL_KEY_ENC_IPV4_DST,	/* be32 */
+	TCA_TUNNEL_KEY_ENC_IPV6_SRC,	/* struct in6_addr */
+	TCA_TUNNEL_KEY_ENC_IPV6_DST,	/* struct in6_addr */
+	TCA_TUNNEL_KEY_ENC_KEY_ID,	/* be64 */
+	TCA_TUNNEL_KEY_PAD,
+	__TCA_TUNNEL_KEY_MAX,
+};
+
+#define TCA_TUNNEL_KEY_MAX (__TCA_TUNNEL_KEY_MAX - 1)
+
+#endif
+
diff --git a/man/man8/tc-tunnel_key.8 b/man/man8/tc-tunnel_key.8
new file mode 100644
index 000000000000..d0c333d27158
--- /dev/null
+++ b/man/man8/tc-tunnel_key.8
@@ -0,0 +1,113 @@
+.TH "Tunnel metadata manipulation action in tc" 8 "10 Nov 2016" "iproute2" "Linux"
+
+.SH NAME
+tunnel_key - Tunnel metadata manipulation
+.SH SYNOPSIS
+.in +8
+.ti -8
+.BR tc " ... " "action tunnel_key" " { " unset " | "
+.IR SET " }"
+
+.ti -8
+.IR SET " := "
+.BR set " " src_ip
+.IR ADDRESS
+.BR dst_ip
+.IR ADDRESS
+.BI id " KEY_ID"
+
+.SH DESCRIPTION
+The
+.B tunnel_key
+action combined with a shared IP tunnel device, allows to perform IP tunnel en-
+or decapsulation on a packet, reflected by
+the operation modes
+.IR UNSET " and " SET .
+The
+.I UNSET
+mode is optional - even without using it, the metadata information will be
+released automatically when packet processing will be finished.
+.IR UNSET
+function could be used in cases when traffic is forwarded between two tunnels,
+where the metadata from the first tunnel will be used for encapsulation done by
+the second tunnel.
+It must be used for offloaded filters, such that hardware drivers can
+realize they need to program the HW to do decapsulation.
+.IR SET
+mode requires the source and destination ip
+.I ADDRESS
+and the tunnel key id
+.I KEY_ID
+which will be used by the ip tunnel shared device to create the tunnel header. The
+.B tunnel_key
+action is useful only in combination with a
+.B mirred redirect
+action to a shared IP tunnel device which will use the metadata (for
+.I SET
+) and unset the metadata created by it (for
+.I UNSET
+).
+
+.SH OPTIONS
+.TP
+.B unset
+Decapsulation mode, no further arguments allowed. This function is not
+mandatory and might be used only in some specific use cases.
+.TP
+.B set
+Encapsulation mode. Requires
+.B id
+,
+.B src_ip
+and
+.B dst_ip
+options.
+.RS
+.TP
+.B id
+Tunnel ID (for example VNI in VXLAN tunnel)
+.TP
+.B src_ip
+Outer header source IP address (IPv4 or IPv6)
+.TP
+.B dst_ip
+Outer header destination IP address (IPv4 or IPv6)
+.RE
+.SH EXAMPLES
+The following example encapsulates incoming ICMP packets on eth0 into a vxlan
+tunnel by setting metadata to VNI 11, source IP 11.11.0.1 and destination IP
+11.11.0.2 by forwarding the skb with the metadata to device vxlan0, which will
+prepare the VXLAN headers:
+
+.RS
+.EX
+#tc qdisc add dev eth0 handle ffff: ingress
+#tc filter add dev eth0 protocol ip parent ffff: \\
+  flower \\
+    ip_proto icmp \\
+  action tunnel_key set \\
+    src_ip 11.11.0.1 \\
+    dst_ip 11.11.0.2 \\
+    id 11 \\
+  action mirred egress redirect dev vxlan0
+.EE
+.RE
+
+Here is an example of the
+.B unset
+function: Incoming VXLAN packets on vxlan0 with specific outer IP's and VNI 11
+in the metadata are decapsulated and redirected to eth0:
+
+.RS
+.EX
+#tc qdisc add dev eth0 handle ffff: ingress
+#tc filter add dev vxlan0 protocol ip parent ffff: \
+  flower \\
+	  enc_src_ip 11.11.0.2 enc_dst_ip 11.11.0.1 enc_key_id 11 \
+	action tunnel_key unset \
+	action mirred egress redirect dev eth0
+.EE
+.RE
+
+.SH SEE ALSO
+.BR tc (8)
diff --git a/tc/Makefile b/tc/Makefile
index dfa875b5edaf..f6f41ca2bb3d 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -50,6 +50,7 @@ TCMODULES += m_simple.o
 TCMODULES += m_vlan.o
 TCMODULES += m_connmark.o
 TCMODULES += m_bpf.o
+TCMODULES += m_tunnel_key.o
 TCMODULES += p_ip.o
 TCMODULES += p_icmp.o
 TCMODULES += p_tcp.o
diff --git a/tc/m_tunnel_key.c b/tc/m_tunnel_key.c
new file mode 100644
index 000000000000..f4a20e24e0bf
--- /dev/null
+++ b/tc/m_tunnel_key.c
@@ -0,0 +1,258 @@
+/*
+ * m_tunnel_key.c	ip tunnel manipulation module
+ *
+ *              This program is free software; you can redistribute it and/or
+ *              modify it under the terms of the GNU General Public License
+ *              as published by the Free Software Foundation; either version
+ *              2 of the License, or (at your option) any later version.
+ *
+ * Authors:     Amir Vadai <amir@vadai.me>
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <string.h>
+#include <linux/if_ether.h>
+#include "utils.h"
+#include "rt_names.h"
+#include "tc_util.h"
+#include <linux/tc_act/tc_tunnel_key.h>
+
+static void explain(void)
+{
+	fprintf(stderr, "Usage: tunnel_key unset\n");
+	fprintf(stderr, "       tunnel_key set id TUNNELID src_ip IP dst_ip IP\n");
+}
+
+static void usage(void)
+{
+	explain();
+	exit(-1);
+}
+
+static int tunnel_key_parse_ip_addr(const char *str, int addr4_type,
+				    int addr6_type, struct nlmsghdr *n)
+{
+	inet_prefix addr;
+	int ret;
+
+	ret = get_addr(&addr, str, AF_UNSPEC);
+	if (ret)
+		return ret;
+
+	addattr_l(n, MAX_MSG, addr.family == AF_INET ? addr4_type : addr6_type,
+		  addr.data, addr.bytelen);
+
+	return 0;
+}
+
+static int tunnel_key_parse_key_id(const char *str, int type,
+				   struct nlmsghdr *n)
+{
+	__be32 key_id;
+	int ret;
+
+	ret = get_be32(&key_id, str, 10);
+	if (!ret)
+		addattr32(n, MAX_MSG, type, key_id);
+
+	return ret;
+}
+
+static int parse_tunnel_key(struct action_util *a, int *argc_p, char ***argv_p,
+			    int tca_id, struct nlmsghdr *n)
+{
+	struct tc_tunnel_key parm = { .action = TC_ACT_PIPE };
+	char **argv = *argv_p;
+	int argc = *argc_p;
+	struct rtattr *tail;
+	int action = 0;
+	int ret;
+	int has_src_ip = 0;
+	int has_dst_ip = 0;
+	int has_key_id = 0;
+
+	if (matches(*argv, "tunnel_key") != 0)
+		return -1;
+
+	tail = NLMSG_TAIL(n);
+	addattr_l(n, MAX_MSG, tca_id, NULL, 0);
+
+	NEXT_ARG();
+
+	while (argc > 0) {
+		if (matches(*argv, "unset") == 0) {
+			if (action) {
+				fprintf(stderr, "unexpected \"%s\" - action already specified\n",
+					*argv);
+				explain();
+				return -1;
+			}
+			action = TCA_TUNNEL_KEY_ACT_RELEASE;
+		} else if (matches(*argv, "set") == 0) {
+			if (action) {
+				fprintf(stderr, "unexpected \"%s\" - action already specified\n",
+					*argv);
+				explain();
+				return -1;
+			}
+			action = TCA_TUNNEL_KEY_ACT_SET;
+		} else if (matches(*argv, "src_ip") == 0) {
+			NEXT_ARG();
+			ret = tunnel_key_parse_ip_addr(*argv,
+						       TCA_TUNNEL_KEY_ENC_IPV4_SRC,
+						       TCA_TUNNEL_KEY_ENC_IPV6_SRC,
+						       n);
+			if (ret < 0) {
+				fprintf(stderr, "Illegal \"src_ip\"\n");
+				return -1;
+			}
+			has_src_ip = 1;
+		} else if (matches(*argv, "dst_ip") == 0) {
+			NEXT_ARG();
+			ret = tunnel_key_parse_ip_addr(*argv,
+						       TCA_TUNNEL_KEY_ENC_IPV4_DST,
+						       TCA_TUNNEL_KEY_ENC_IPV6_DST,
+						       n);
+			if (ret < 0) {
+				fprintf(stderr, "Illegal \"dst_ip\"\n");
+				return -1;
+			}
+			has_dst_ip = 1;
+		} else if (matches(*argv, "id") == 0) {
+			NEXT_ARG();
+			ret = tunnel_key_parse_key_id(*argv, TCA_TUNNEL_KEY_ENC_KEY_ID, n);
+			if (ret < 0) {
+				fprintf(stderr, "Illegal \"id\"\n");
+				return -1;
+			}
+			has_key_id = 1;
+		} else if (matches(*argv, "help") == 0) {
+			usage();
+		} else {
+			break;
+		}
+		NEXT_ARG_FWD();
+	}
+
+	if (argc && !action_a2n(*argv, &parm.action, false))
+		NEXT_ARG_FWD();
+
+	if (argc) {
+		if (matches(*argv, "index") == 0) {
+			NEXT_ARG();
+			if (get_u32(&parm.index, *argv, 10)) {
+				fprintf(stderr, "tunnel_key: Illegal \"index\"\n");
+				return -1;
+			}
+
+			NEXT_ARG_FWD();
+		}
+	}
+
+	if (action == TCA_TUNNEL_KEY_ACT_SET &&
+	    (!has_src_ip || !has_dst_ip || !has_key_id)) {
+		fprintf(stderr, "set needs tunnel_key parameters\n");
+		explain();
+		return -1;
+	}
+
+	parm.t_action = action;
+	addattr_l(n, MAX_MSG, TCA_TUNNEL_KEY_PARMS, &parm, sizeof(parm));
+	tail->rta_len = (char *)NLMSG_TAIL(n) - (char *)tail;
+
+	*argc_p = argc;
+	*argv_p = argv;
+
+	return 0;
+}
+
+static void tunnel_key_print_ip_addr(FILE *f, const char *name,
+				     struct rtattr *attr)
+{
+	int family;
+	size_t len;
+
+	if (!attr)
+		return;
+
+	len = RTA_PAYLOAD(attr);
+
+	if (len == 4)
+		family = AF_INET;
+	else if (len == 16)
+		family = AF_INET6;
+	else
+		return;
+
+	fprintf(f, "\n\t%s %s", name, rt_addr_n2a_rta(family, attr));
+}
+
+static void tunnel_key_print_key_id(FILE *f, const char *name,
+				    struct rtattr *attr)
+{
+	if (!attr)
+		return;
+	fprintf(f, "\n\t%s %d", name, rta_getattr_be32(attr));
+}
+
+static int print_tunnel_key(struct action_util *au, FILE *f, struct rtattr *arg)
+{
+	struct rtattr *tb[TCA_TUNNEL_KEY_MAX + 1];
+	struct tc_tunnel_key *parm;
+
+	if (!arg)
+		return -1;
+
+	parse_rtattr_nested(tb, TCA_TUNNEL_KEY_MAX, arg);
+
+	if (!tb[TCA_TUNNEL_KEY_PARMS]) {
+		fprintf(f, "[NULL tunnel_key parameters]");
+		return -1;
+	}
+	parm = RTA_DATA(tb[TCA_TUNNEL_KEY_PARMS]);
+
+	fprintf(f, "tunnel_key");
+
+	switch (parm->t_action) {
+	case TCA_TUNNEL_KEY_ACT_RELEASE:
+		fprintf(f, " unset");
+		break;
+	case TCA_TUNNEL_KEY_ACT_SET:
+		fprintf(f, " set");
+		tunnel_key_print_ip_addr(f, "src_ip",
+					 tb[TCA_TUNNEL_KEY_ENC_IPV4_SRC]);
+		tunnel_key_print_ip_addr(f, "dst_ip",
+					 tb[TCA_TUNNEL_KEY_ENC_IPV4_DST]);
+		tunnel_key_print_ip_addr(f, "src_ip",
+					 tb[TCA_TUNNEL_KEY_ENC_IPV6_SRC]);
+		tunnel_key_print_ip_addr(f, "dst_ip",
+					 tb[TCA_TUNNEL_KEY_ENC_IPV6_DST]);
+		tunnel_key_print_key_id(f, "key_id",
+					tb[TCA_TUNNEL_KEY_ENC_KEY_ID]);
+		break;
+	}
+	fprintf(f, " %s", action_n2a(parm->action));
+
+	fprintf(f, "\n\tindex %d ref %d bind %d", parm->index, parm->refcnt,
+		parm->bindcnt);
+
+	if (show_stats) {
+		if (tb[TCA_TUNNEL_KEY_TM]) {
+			struct tcf_t *tm = RTA_DATA(tb[TCA_TUNNEL_KEY_TM]);
+
+			print_tm(f, tm);
+		}
+	}
+
+	fprintf(f, "\n ");
+
+	return 0;
+}
+
+struct action_util tunnel_key_action_util = {
+	.id = "tunnel_key",
+	.parse_aopt = parse_tunnel_key,
+	.print_aopt = print_tunnel_key,
+};
-- 
2.10.2

^ permalink raw reply related

* [PATCH iproute2 V3 2/3] tc/cls_flower: Classify packet in ip tunnels
From: Amir Vadai @ 2016-11-30  7:38 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: netdev, David S. Miller, Or Gerlitz, Hadar Har-Zion, Jiri Pirko,
	Jiri Benc, Amir Vadai
In-Reply-To: <20161130073840.5269-1-amir@vadai.me>

Introduce classifying by metadata extracted by the tunnel device.
Outer header fields - source/dest ip and tunnel id, are extracted from
the metadata when classifying.

For example, the following will add a filter on the ingress Qdisc of shared
vxlan device named 'vxlan0'. To forward packets with outer src ip
11.11.0.2, dst ip 11.11.0.1 and tunnel id 11. The packets will be
forwarded to tap device 'vnet0':

$ tc filter add dev vxlan0 protocol ip parent ffff: \
    flower \
      enc_src_ip 11.11.0.2 \
      enc_dst_ip 11.11.0.1 \
      enc_key_id 11 \
      dst_ip 11.11.11.1 \
    action mirred egress redirect dev vnet0

Signed-off-by: Amir Vadai <amir@vadai.me>
---
 man/man8/tc-flower.8 | 17 ++++++++++-
 tc/f_flower.c        | 84 +++++++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 96 insertions(+), 5 deletions(-)

diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8
index 74f76647753b..0e0b0cf4bb72 100644
--- a/man/man8/tc-flower.8
+++ b/man/man8/tc-flower.8
@@ -36,7 +36,11 @@ flower \- flow based traffic control filter
 .BR dst_ip " | " src_ip " } { "
 .IR ipv4_address " | " ipv6_address " } | { "
 .BR dst_port " | " src_port " } "
-.IR port_number " }"
+.IR port_number " } | "
+.B enc_key_id
+.IR KEY-ID " | {"
+.BR enc_dst_ip " | " enc_src_ip " } { "
+.IR ipv4_address " | " ipv6_address " } | "
 .SH DESCRIPTION
 The
 .B flower
@@ -121,6 +125,17 @@ which has to be specified in beforehand.
 Match on layer 4 protocol source or destination port number. Only available for
 .BR ip_proto " values " udp " and " tcp ,
 which has to be specified in beforehand.
+.TP
+.BI enc_key_id " NUMBER"
+.TQ
+.BI enc_dst_ip " ADDRESS"
+.TQ
+.BI enc_src_ip " ADDRESS"
+Match on IP tunnel metadata. Key id
+.I NUMBER
+is a 32 bit tunnel key id (e.g. VNI for VXLAN tunnel).
+.I ADDRESS
+must be a valid IPv4 or IPv6 address.
 .SH NOTES
 As stated above where applicable, matches of a certain layer implicitly depend
 on the matches of the next lower layer. Precisely, layer one and two matches (
diff --git a/tc/f_flower.c b/tc/f_flower.c
index 2d31d1aa832d..173cfc20f90b 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -41,7 +41,10 @@ static void explain(void)
 	fprintf(stderr, "                       dst_ip [ IPV4-ADDR | IPV6-ADDR ] |\n");
 	fprintf(stderr, "                       src_ip [ IPV4-ADDR | IPV6-ADDR ] |\n");
 	fprintf(stderr, "                       dst_port PORT-NUMBER |\n");
-	fprintf(stderr, "                       src_port PORT-NUMBER }\n");
+	fprintf(stderr, "                       src_port PORT-NUMBER |\n");
+	fprintf(stderr, "                       enc_dst_ip [ IPV4-ADDR | IPV6-ADDR ] |\n");
+	fprintf(stderr, "                       enc_src_ip [ IPV4-ADDR | IPV6-ADDR ] |\n");
+	fprintf(stderr, "                       enc_key_id [ KEY-ID ] }\n");
 	fprintf(stderr,	"       FILTERID := X:Y:Z\n");
 	fprintf(stderr,	"       ACTION-SPEC := ... look at individual actions\n");
 	fprintf(stderr,	"\n");
@@ -121,8 +124,9 @@ static int flower_parse_ip_addr(char *str, __be16 eth_type,
 		family = AF_INET;
 	} else if (eth_type == htons(ETH_P_IPV6)) {
 		family = AF_INET6;
+	} else if (!eth_type) {
+		family = AF_UNSPEC;
 	} else {
-		fprintf(stderr, "Illegal \"eth_type\" for ip address\n");
 		return -1;
 	}
 
@@ -130,8 +134,10 @@ static int flower_parse_ip_addr(char *str, __be16 eth_type,
 	if (ret)
 		return -1;
 
-	if (addr.family != family)
+	if (family && (addr.family != family)) {
+		fprintf(stderr, "Illegal \"eth_type\" for ip address\n");
 		return -1;
+	}
 
 	addattr_l(n, MAX_MSG, addr.family == AF_INET ? addr4_type : addr6_type,
 		  addr.data, addr.bytelen);
@@ -181,6 +187,18 @@ static int flower_parse_port(char *str, __u8 ip_port,
 	return 0;
 }
 
+static int flower_parse_key_id(const char *str, int type, struct nlmsghdr *n)
+{
+	int ret;
+	__be32 key_id;
+
+	ret = get_be32(&key_id, str, 10);
+	if (!ret)
+		addattr32(n, MAX_MSG, type, key_id);
+
+	return ret;
+}
+
 static int flower_parse_opt(struct filter_util *qu, char *handle,
 			    int argc, char **argv, struct nlmsghdr *n)
 {
@@ -339,6 +357,38 @@ static int flower_parse_opt(struct filter_util *qu, char *handle,
 				fprintf(stderr, "Illegal \"src_port\"\n");
 				return -1;
 			}
+		} else if (matches(*argv, "enc_dst_ip") == 0) {
+			NEXT_ARG();
+			ret = flower_parse_ip_addr(*argv, 0,
+						   TCA_FLOWER_KEY_ENC_IPV4_DST,
+						   TCA_FLOWER_KEY_ENC_IPV4_DST_MASK,
+						   TCA_FLOWER_KEY_ENC_IPV6_DST,
+						   TCA_FLOWER_KEY_ENC_IPV6_DST_MASK,
+						   n);
+			if (ret < 0) {
+				fprintf(stderr, "Illegal \"enc_dst_ip\"\n");
+				return -1;
+			}
+		} else if (matches(*argv, "enc_src_ip") == 0) {
+			NEXT_ARG();
+			ret = flower_parse_ip_addr(*argv, 0,
+						   TCA_FLOWER_KEY_ENC_IPV4_SRC,
+						   TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK,
+						   TCA_FLOWER_KEY_ENC_IPV6_SRC,
+						   TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK,
+						   n);
+			if (ret < 0) {
+				fprintf(stderr, "Illegal \"enc_src_ip\"\n");
+				return -1;
+			}
+		} else if (matches(*argv, "enc_key_id") == 0) {
+			NEXT_ARG();
+			ret = flower_parse_key_id(*argv,
+						  TCA_FLOWER_KEY_ENC_KEY_ID, n);
+			if (ret < 0) {
+				fprintf(stderr, "Illegal \"enc_key_id\"\n");
+				return -1;
+			}
 		} else if (matches(*argv, "action") == 0) {
 			NEXT_ARG();
 			ret = parse_action(&argc, &argv, TCA_FLOWER_ACT, n);
@@ -506,7 +556,14 @@ static void flower_print_port(FILE *f, char *name, __u8 ip_proto,
 		return;
 	if (!attr)
 		return;
-	fprintf(f, "\n  %s %d", name, ntohs(rta_getattr_u16(attr)));
+	fprintf(f, "\n  %s %d", name, rta_getattr_be16(attr));
+}
+
+static void flower_print_key_id(FILE *f, const char *name,
+				struct rtattr *attr)
+{
+	if (attr)
+		fprintf(f, "\n  %s %d", name, rta_getattr_be32(attr));
 }
 
 static int flower_print_opt(struct filter_util *qu, FILE *f,
@@ -577,6 +634,25 @@ static int flower_print_opt(struct filter_util *qu, FILE *f,
 			  tb[TCA_FLOWER_KEY_TCP_SRC],
 			  tb[TCA_FLOWER_KEY_UDP_SRC]);
 
+	flower_print_ip_addr(f, "enc_dst_ip",
+			     tb[TCA_FLOWER_KEY_ENC_IPV4_DST_MASK] ?
+			     htons(ETH_P_IP) : htons(ETH_P_IPV6),
+			     tb[TCA_FLOWER_KEY_ENC_IPV4_DST],
+			     tb[TCA_FLOWER_KEY_ENC_IPV4_DST_MASK],
+			     tb[TCA_FLOWER_KEY_ENC_IPV6_DST],
+			     tb[TCA_FLOWER_KEY_ENC_IPV6_DST_MASK]);
+
+	flower_print_ip_addr(f, "enc_src_ip",
+			     tb[TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK] ?
+			     htons(ETH_P_IP) : htons(ETH_P_IPV6),
+			     tb[TCA_FLOWER_KEY_ENC_IPV4_SRC],
+			     tb[TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK],
+			     tb[TCA_FLOWER_KEY_ENC_IPV6_SRC],
+			     tb[TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK]);
+
+	flower_print_key_id(f, "enc_key_id",
+			    tb[TCA_FLOWER_KEY_ENC_KEY_ID]);
+
 	if (tb[TCA_FLOWER_FLAGS])  {
 		__u32 flags = rta_getattr_u32(tb[TCA_FLOWER_FLAGS]);
 
-- 
2.10.2

^ permalink raw reply related

* [PATCH iproute2 V3 1/3] libnetlink: Introduce rta_getattr_be*()
From: Amir Vadai @ 2016-11-30  7:38 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: netdev, David S. Miller, Or Gerlitz, Hadar Har-Zion, Jiri Pirko,
	Jiri Benc, Amir Vadai
In-Reply-To: <20161130073840.5269-1-amir@vadai.me>

Add the utility functions rta_getattr_be16() and rta_getattr_be32(), and
change existing code to use it.

Signed-off-by: Amir Vadai <amir@vadai.me>
---
 bridge/fdb.c         | 4 ++--
 include/libnetlink.h | 9 +++++++++
 ip/iplink_geneve.c   | 2 +-
 ip/iplink_vxlan.c    | 2 +-
 4 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/bridge/fdb.c b/bridge/fdb.c
index 90f4b154c5dc..a91521776e99 100644
--- a/bridge/fdb.c
+++ b/bridge/fdb.c
@@ -168,10 +168,10 @@ int print_fdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 	if (tb[NDA_PORT]) {
 		if (jw_global)
 			jsonw_uint_field(jw_global, "port",
-					 ntohs(rta_getattr_u16(tb[NDA_PORT])));
+					 rta_getattr_be16(tb[NDA_PORT]));
 		else
 			fprintf(fp, "port %d ",
-				ntohs(rta_getattr_u16(tb[NDA_PORT])));
+				rta_getattr_be16(tb[NDA_PORT]));
 	}
 
 	if (tb[NDA_VNI]) {
diff --git a/include/libnetlink.h b/include/libnetlink.h
index 483509ca9635..751ebf186dd4 100644
--- a/include/libnetlink.h
+++ b/include/libnetlink.h
@@ -10,6 +10,7 @@
 #include <linux/if_addr.h>
 #include <linux/neighbour.h>
 #include <linux/netconf.h>
+#include <arpa/inet.h>
 
 struct rtnl_handle {
 	int			fd;
@@ -140,10 +141,18 @@ static inline __u16 rta_getattr_u16(const struct rtattr *rta)
 {
 	return *(__u16 *)RTA_DATA(rta);
 }
+static inline __be16 rta_getattr_be16(const struct rtattr *rta)
+{
+	return ntohs(rta_getattr_u16(rta));
+}
 static inline __u32 rta_getattr_u32(const struct rtattr *rta)
 {
 	return *(__u32 *)RTA_DATA(rta);
 }
+static inline __be32 rta_getattr_be32(const struct rtattr *rta)
+{
+	return ntohl(rta_getattr_u32(rta));
+}
 static inline __u64 rta_getattr_u64(const struct rtattr *rta)
 {
 	__u64 tmp;
diff --git a/ip/iplink_geneve.c b/ip/iplink_geneve.c
index 3bfba91c644c..1e6669d07d60 100644
--- a/ip/iplink_geneve.c
+++ b/ip/iplink_geneve.c
@@ -234,7 +234,7 @@ static void geneve_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
 
 	if (tb[IFLA_GENEVE_PORT])
 		fprintf(f, "dstport %u ",
-			ntohs(rta_getattr_u16(tb[IFLA_GENEVE_PORT])));
+			rta_getattr_be16(tb[IFLA_GENEVE_PORT]));
 
 	if (tb[IFLA_GENEVE_COLLECT_METADATA])
 		fputs("external ", f);
diff --git a/ip/iplink_vxlan.c b/ip/iplink_vxlan.c
index 93af979a1e97..6d02bb47b2f0 100644
--- a/ip/iplink_vxlan.c
+++ b/ip/iplink_vxlan.c
@@ -413,7 +413,7 @@ static void vxlan_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
 
 	if (tb[IFLA_VXLAN_PORT])
 		fprintf(f, "dstport %u ",
-			ntohs(rta_getattr_u16(tb[IFLA_VXLAN_PORT])));
+			rta_getattr_be16(tb[IFLA_VXLAN_PORT]));
 
 	if (tb[IFLA_VXLAN_LEARNING] &&
 	    !rta_getattr_u8(tb[IFLA_VXLAN_LEARNING]))
-- 
2.10.2

^ permalink raw reply related

* [PATCH iproute2 V3 0/3] tc: Support for ip tunnel metadata set/unset/classify
From: Amir Vadai @ 2016-11-30  7:38 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: netdev, David S. Miller, Or Gerlitz, Hadar Har-Zion, Jiri Pirko,
	Jiri Benc, Amir Vadai

Hi,

This short series adds support for matching and setting metadata for ip tunnel
shared device using the TC system, introduced in kernel 4.9 [1].

Applied and tested on top of commit f3f339e9590a ("cleanup debris from revert")

Example usage:

$ tc filter add dev vxlan0 protocol ip parent ffff: \
    flower \
      enc_src_ip 11.11.0.2 \
      enc_dst_ip 11.11.0.1 \
      enc_key_id 11 \
      dst_ip 11.11.11.1 \
    action mirred egress redirect dev vnet0

$ tc filter add dev net0 protocol ip parent ffff: \
    flower \
      ip_proto 1 \
      dst_ip 11.11.11.2 \
    action tunnel_key set \
      src_ip 11.11.0.1 \
      dst_ip 11.11.0.2 \
      id 11 \
    action mirred egress redirect dev vxlan0

[1] - d1ba24feb466 ("Merge branch 'act_tunnel_key'")

Thanks,
Amir

Changes from V2:
- Use const where needed
- Don't lose return value
- Introduce rta_getattr_be16() and rta_getattr_be32()

Changes from V1:
- Updated Patch 2/2 ("tc/act_tunnel: Introduce ip tunnel action") commit log
	and the man page tc-tunnel_key to reflect the fact that 'unset' operation is
	no mandatory.
	And describe when it might be needed.
- Rename the 'release' operation to 'unset'

Amir Vadai (3):
  libnetlink: Introduce rta_getattr_be*()
  tc/cls_flower: Classify packet in ip tunnels
  tc/act_tunnel: Introduce ip tunnel action

 bridge/fdb.c                         |   4 +-
 include/libnetlink.h                 |   9 ++
 include/linux/tc_act/tc_tunnel_key.h |  42 ++++++
 ip/iplink_geneve.c                   |   2 +-
 ip/iplink_vxlan.c                    |   2 +-
 man/man8/tc-flower.8                 |  17 ++-
 man/man8/tc-tunnel_key.8             | 113 +++++++++++++++
 tc/Makefile                          |   1 +
 tc/f_flower.c                        |  84 +++++++++++-
 tc/m_tunnel_key.c                    | 258 +++++++++++++++++++++++++++++++++++
 10 files changed, 523 insertions(+), 9 deletions(-)
 create mode 100644 include/linux/tc_act/tc_tunnel_key.h
 create mode 100644 man/man8/tc-tunnel_key.8
 create mode 100644 tc/m_tunnel_key.c

-- 
2.10.2

^ permalink raw reply

* Re: [PATCH iproute2 V2 1/2] tc/cls_flower: Classify packet in ip tunnels
From: Amir Vadai @ 2016-11-30  7:42 UTC (permalink / raw)
  To: Stephen Hemminger, Jiri Pirko
  Cc: Linux Netdev List, David S. Miller, Jiri Benc, Or Gerlitz,
	Hadar Har-Zion, Roi Dayan
In-Reply-To: <20161130071753.GA1366@office.localdomain>

On Wed, Nov 30, 2016 at 9:17 AM, Amir Vadai <amir@vadai.me> wrote:
> On Tue, Nov 29, 2016 at 07:26:58PM -0800, Stephen Hemminger wrote:
(Sending again since I just discovered that Google Inbox is adding an
HTML part...)

>> The overall design is fine, just a couple nits with the code.
>>
>> > diff --git a/tc/f_flower.c b/tc/f_flower.c
>> > index 2d31d1aa832d..1cf0750b5b83 100644
>> > --- a/tc/f_flower.c
>> > +++ b/tc/f_flower.c
>>
>> >
>> > +static int flower_parse_key_id(char *str, int type, struct nlmsghdr *n)
>>
>> str is not modified, therefore use: const char *str
> ack
>
>>
>> > +{
>> > +   int ret;
>> > +   __be32 key_id;
>> > +
>> > +   ret = get_be32(&key_id, str, 10);
>> > +   if (ret)
>> > +           return -1;
>>
>> Traditionally netlink attributes are in host order, why was flower
>> chosen to be different?
> I don't know, maybe Jiri (cc'ed) can explain, but it is all over the
> flower code.
Now the right Jiri (Pirko) is CC'ed

>
>>
>> > +
>> > +   addattr32(n, MAX_MSG, type, key_id);
>> > +
>> > +   return 0;
>>
>>
>> Why lose the return value here?  Why not:
>>
>>       ret = get_be32(&key_id, str, 10);
>>       if (!ret)
>>               addattr32(n, MAX_MSG, type, key_id);
> The get_*() function can return only -1 or 0. But you are right, and it
> is better the way you suggested.  Changing accordingly in V3.
>
>>
>> > +}
>> > +
>> >  static int flower_parse_opt(struct filter_util *qu, char *handle,
>> >                         int argc, char **argv, struct nlmsghdr *n)
>> >  {
>> > @@ -339,6 +359,38 @@ static int flower_parse_opt(struct filter_util *qu, char *handle,
>> >                             fprintf(stderr, "Illegal \"src_port\"\n");
>> >                             return -1;
>> >                     }
>> > +           } else if (matches(*argv, "enc_dst_ip") == 0) {
>> > +                   NEXT_ARG();
>> > +                   ret = flower_parse_ip_addr(*argv, 0,
>> > +                                              TCA_FLOWER_KEY_ENC_IPV4_DST,
>> > +                                              TCA_FLOWER_KEY_ENC_IPV4_DST_MASK,
>> > +                                              TCA_FLOWER_KEY_ENC_IPV6_DST,
>> > +                                              TCA_FLOWER_KEY_ENC_IPV6_DST_MASK,
>> > +                                              n);
>> > +                   if (ret < 0) {
>> > +                           fprintf(stderr, "Illegal \"enc_dst_ip\"\n");
>> > +                           return -1;
>> > +                   }
>> > +           } else if (matches(*argv, "enc_src_ip") == 0) {
>> > +                   NEXT_ARG();
>> > +                   ret = flower_parse_ip_addr(*argv, 0,
>> > +                                              TCA_FLOWER_KEY_ENC_IPV4_SRC,
>> > +                                              TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK,
>> > +                                              TCA_FLOWER_KEY_ENC_IPV6_SRC,
>> > +                                              TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK,
>> > +                                              n);
>> > +                   if (ret < 0) {
>> > +                           fprintf(stderr, "Illegal \"enc_src_ip\"\n");
>> > +                           return -1;
>> > +                   }
>> > +           } else if (matches(*argv, "enc_key_id") == 0) {
>> > +                   NEXT_ARG();
>> > +                   ret = flower_parse_key_id(*argv,
>> > +                                             TCA_FLOWER_KEY_ENC_KEY_ID, n);
>> > +                   if (ret < 0) {
>> > +                           fprintf(stderr, "Illegal \"enc_key_id\"\n");
>> > +                           return -1;
>> > +                   }
>> >             } else if (matches(*argv, "action") == 0) {
>> >                     NEXT_ARG();
>> >                     ret = parse_action(&argc, &argv, TCA_FLOWER_ACT, n);
>> > @@ -509,6 +561,14 @@ static void flower_print_port(FILE *f, char *name, __u8 ip_proto,
>> >     fprintf(f, "\n  %s %d", name, ntohs(rta_getattr_u16(attr)));
>> >  }
>> >
>> > +static void flower_print_key_id(FILE *f, char *name,
>> > +                           struct rtattr *attr)
>>
>> const char *name?
> ack
>
>>
>>
>> > +{
>> > +   if (!attr)
>> > +           return;
>> > +   fprintf(f, "\n  %s %d", name, ntohl(rta_getattr_u32(attr)));
>> > +}
>> > +
>>
>> Why short circuit, just change the order:
>>
>>       if (attr)
>>               fprintf(f, "\n  %s %s", name, ntohl(rta_getattr_u32(attr));
>>
>> You might also want to introduce rta_getattr_be32()
> ack
>
>>
>> Please change, retest and resubmit both patches.
> ack
>
> Thanks for reviewing,
> Amir

^ permalink raw reply

* RE: [patch net / RFC] net: fec: increase frame size limitation to actually available buffer
From: Andy Duan @ 2016-11-30  7:35 UTC (permalink / raw)
  To: Nikita Yushchenko, David S. Miller, Troy Kisky, Andrew Lunn,
	Eric Nelson, Philippe Reynes, Johannes Berg,
	netdev@vger.kernel.org
  Cc: Chris Healy, Fabio Estevam, linux-kernel@vger.kernel.org
In-Reply-To: <dab27a3d-821c-8d07-ad98-2bdd8c442bc5@cogentembedded.com>

From: Nikita Yushchenko <nikita.yoush@cogentembedded.com> Sent: Wednesday, November 30, 2016 2:36 PM
 >To: Andy Duan <fugang.duan@nxp.com>; David S. Miller
 ><davem@davemloft.net>; Troy Kisky <troy.kisky@boundarydevices.com>;
 >Andrew Lunn <andrew@lunn.ch>; Eric Nelson <eric@nelint.com>; Philippe
 >Reynes <tremyfr@gmail.com>; Johannes Berg <johannes@sipsolutions.net>;
 >netdev@vger.kernel.org
 >Cc: Chris Healy <cphealy@gmail.com>; Fabio Estevam
 ><fabio.estevam@nxp.com>; linux-kernel@vger.kernel.org
 >Subject: Re: [patch net / RFC] net: fec: increase frame size limitation to
 >actually available buffer
 >
 >> But I think it is not necessary since the driver don't support jumbo frame.
 >
 >Hardcoded 1522 raises two separate issues.
 >
 >(1) When DSA is in use, frames processed by FEC chip contain DSA tag and
 >thus can be larger than hardcoded limit of 1522. This issue is not FEC-specific,
 >any driver that hardcodes maximum frame size to 1522 (many
 >do) will have this issue if used with DSA.
 >
 >Clean solution for this must take into account that difference between MTU
 >and max frame size is no longer known at compile time. Actually this is the
 >case even without DSA, due to VLANs: max frame size is (MTU + 18) without
 >VLANs, but (MTU + 22) with VLANs. However currently drivers tend to ignore
 >this and hardcode 22.  With DSA, 22 is not enough, need to add switch-
 >specific tag size to that.
 >
 >Not yet sure how to handle this. DSA-specific API to find out tag size could be
 >added, but generic solution should handle all cases of dynamic difference
 >between MTU and max frame size, not only DSA.
 >
 >
 >(2) There is some demand to use larger frames for optimization purposes.
 >
 >FEC register fields that limit frame size are 14-bit, thus allowing frames up to
 >(4k-1). I'm about to prepare a larger patch:
 >- add ndo_change_mtu handler, allowing MTU up to (4k - overhead),
 >- set MAX_FL / TRUNC_FL based on configured MTU,
 >- if necessary, do buffer reallocation with larger buffers.
 >
 >Is this suitable for upstreaming?
 >Is there any policy related to handling larger frames?

Of course, welcome to upstream the jumbo frame patches, but hope to also add the transmit jumbo frame, not only receive path, which is helpful for testing with two boards connection.
And, some notes need you to care:
- the maximum jumbo frame should consider the fifo size. Different chip has different fifo size. Like i.MX53 tx and rx share one fifo, i.mx6q/dl/sl have separate 4k fifo for tx and rx, i.mx6sx/i.mx7x have separate 8k fifo for tx and rx.
- rx fifo watermark to generate pause frame in busy loading system to avoid fifo overrun. In general,  little pause frames bring better performance, mass of pause frames cause worse performance.


Regards,
Andy

^ permalink raw reply

* Re: [PATCH v3 net-next 3/3] openvswitch: Fix skb->protocol for vlan frames.
From: Pravin Shelar @ 2016-11-30  7:34 UTC (permalink / raw)
  To: Jarno Rajahalme; +Cc: Linux Kernel Network Developers, Jiri Benc, Eric Garver
In-Reply-To: <1480462253-114713-3-git-send-email-jarno@ovn.org>

On Tue, Nov 29, 2016 at 3:30 PM, Jarno Rajahalme <jarno@ovn.org> wrote:
> Do not always set skb->protocol to be the ethertype of the L3 header.
> For a packet with non-accelerated VLAN tags skb->protocol needs to be
> the ethertype of the outermost non-accelerated VLAN ethertype.
>
> Any VLAN offloading is undone on the OVS netlink interface, and any
> VLAN tags added by userspace are non-accelerated, as are double tagged
> VLAN packets.
>
> Fixes: 018c1dda5f ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes")
> Fixes: 5108bbaddc ("openvswitch: add processing of L3 packets")
> Signed-off-by: Jarno Rajahalme <jarno@ovn.org>

Looks much better now. Thanks for fixing it. I have one minor comment.

Acked-by: Pravin B Shelar <pshelar@ovn.org>

> ---
> v3: Set skb->protocol properly also for double tagged packets, as suggested
>     by Pravin.  This patch is no longer needed for the MTU test failure, as
>     the new patch 2/3 addresses that.
>
>  net/openvswitch/datapath.c |  1 -
>  net/openvswitch/flow.c     | 62 +++++++++++++++++++++++-----------------------
>  2 files changed, 31 insertions(+), 32 deletions(-)
>
> diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> index 08aa926..e2fe2e5 100644
> --- a/net/openvswitch/flow.c
> +++ b/net/openvswitch/flow.c
> @@ -354,6 +354,7 @@ static int parse_vlan(struct sk_buff *skb, struct sw_flow_key *key)
>                 res = parse_vlan_tag(skb, &key->eth.vlan);
>                 if (res <= 0)
>                         return res;
> +               skb->protocol = key->eth.vlan.tpid;
>         }
>
>         /* Parse inner vlan tag. */
> @@ -361,6 +362,11 @@ static int parse_vlan(struct sk_buff *skb, struct sw_flow_key *key)
>         if (res <= 0)
>                 return res;
>
> +       /* If the outer vlan tag was accelerated, skb->protocol should
> +        * refelect the inner vlan type. */
> +       if (!eth_type_vlan(skb->protocol))

Since you would be spinning another version, can you change this
condition to directly check for skb-vlan-tag rather than indirectly
checking for the vlan accelerated case?

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox