Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v3 00/39] put_user_pages(): miscellaneous call sites
From: Mike Marshall @ 2019-08-30  1:29 UTC (permalink / raw)
  To: John Hubbard
  Cc: john.hubbard, Andrew Morton, Christoph Hellwig, Dan Williams,
	Dave Chinner, Dave Hansen, Ira Weiny, Jan Kara, Jason Gunthorpe,
	Jérôme Glisse, LKML, amd-gfx, ceph-devel, devel, devel,
	dri-devel, intel-gfx, kvm, linux-arm-kernel, linux-block,
	linux-crypto, linux-fbdev, linux-fsdevel, linux-media, linux-mm,
	Linux NFS Mailing List, linux-rdma, linux-rpi-kernel, linux-xfs,
	netdev, rds-devel, sparclinux, x86, xen-devel
In-Reply-To: <912eb2bd-4102-05c1-5571-c261617ad30b@nvidia.com>

Hi John...

I added this patch series on top of Linux 5.3rc6 and ran
xfstests with no regressions...

Acked-by: Mike Marshall <hubcap@omnibond.com>

-Mike

On Tue, Aug 6, 2019 at 9:50 PM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 8/6/19 6:32 PM, john.hubbard@gmail.com wrote:
> > From: John Hubbard <jhubbard@nvidia.com>
> > ...
> >
> > John Hubbard (38):
> >   mm/gup: add make_dirty arg to put_user_pages_dirty_lock()
> ...
> >  54 files changed, 191 insertions(+), 323 deletions(-)
> >
> ahem, yes, apparently this is what happens if I add a few patches while editing
> the cover letter... :)
>
> The subject line should read "00/41", and the list of files affected here is
> therefore under-reported in this cover letter. However, the patch series itself is
> intact and ready for submission.
>
> thanks,
> --
> John Hubbard
> NVIDIA

^ permalink raw reply

* Re: [RFC PATCH v2 net-next 00/15] tc-taprio offload for SJA1105 DSA
From: Jakub Kicinski @ 2019-08-30  1:21 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: f.fainelli, vivien.didelot, andrew, davem, vinicius.gomes,
	vedang.patel, richardcochran, weifeng.voon, jiri, m-karicheri2,
	Jose.Abreu, ilias.apalodimas, jhs, xiyou.wangcong, netdev
In-Reply-To: <20190830004635.24863-1-olteanv@gmail.com>

On Fri, 30 Aug 2019 03:46:20 +0300, Vladimir Oltean wrote:
> - Configuring the switch over SPI cannot apparently be done from this
>   ndo_setup_tc callback because it runs in atomic context. I also have
>   some downstream patches to offload tc clsact matchall with mirred
>   action, but in that case it looks like the atomic context restriction
>   does not apply.

This sounds really surprising ndo_setup_tc should always be allowed to
sleep. Can the taprio limitation be lifted somehow?

^ permalink raw reply

* [PATCH][next] Bluetooth: mgmt: Use struct_size() helper
From: Gustavo A. R. Silva @ 2019-08-30  1:12 UTC (permalink / raw)
  To: Marcel Holtmann, Johan Hedberg, David S. Miller
  Cc: linux-bluetooth, netdev, linux-kernel, Gustavo A. R. Silva

One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct mgmt_rp_get_connections {
	...
        struct mgmt_addr_info addr[0];
} __packed;

Make use of the struct_size() helper instead of an open-coded version
in order to avoid any potential type mistakes.

So, replace the following form:

sizeof(*rp) + (i * sizeof(struct mgmt_addr_info));

with:

struct_size(rp, addr, i)

Also, notice that, in this case, variable rp_len is not necessary,
hence it is removed.

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
 net/bluetooth/mgmt.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/net/bluetooth/mgmt.c b/net/bluetooth/mgmt.c
index 150114e33b20..acb7c6d5643f 100644
--- a/net/bluetooth/mgmt.c
+++ b/net/bluetooth/mgmt.c
@@ -2588,7 +2588,6 @@ static int get_connections(struct sock *sk, struct hci_dev *hdev, void *data,
 {
 	struct mgmt_rp_get_connections *rp;
 	struct hci_conn *c;
-	size_t rp_len;
 	int err;
 	u16 i;
 
@@ -2608,8 +2607,7 @@ static int get_connections(struct sock *sk, struct hci_dev *hdev, void *data,
 			i++;
 	}
 
-	rp_len = sizeof(*rp) + (i * sizeof(struct mgmt_addr_info));
-	rp = kmalloc(rp_len, GFP_KERNEL);
+	rp = kmalloc(struct_size(rp, addr, i), GFP_KERNEL);
 	if (!rp) {
 		err = -ENOMEM;
 		goto unlock;
@@ -2629,10 +2627,8 @@ static int get_connections(struct sock *sk, struct hci_dev *hdev, void *data,
 	rp->conn_count = cpu_to_le16(i);
 
 	/* Recalculate length in case of filtered SCO connections, etc */
-	rp_len = sizeof(*rp) + (i * sizeof(struct mgmt_addr_info));
-
 	err = mgmt_cmd_complete(sk, hdev->id, MGMT_OP_GET_CONNECTIONS, 0, rp,
-				rp_len);
+				struct_size(rp, addr, i));
 
 	kfree(rp);
 
-- 
2.23.0


^ permalink raw reply related

* [PATCH v2 net 2/3] taprio: Set default link speed to 10 Mbps in taprio_set_picos_per_byte
From: Vladimir Oltean @ 2019-08-30  1:07 UTC (permalink / raw)
  To: xiyou.wangcong, jiri, davem, vinicius.gomes, vedang.patel,
	leandro.maciel.dorileo
  Cc: netdev, Vladimir Oltean
In-Reply-To: <20190830010723.32096-1-olteanv@gmail.com>

The taprio budget needs to be adapted at runtime according to interface
link speed. But that handling is problematic.

For one thing, installing a qdisc on an interface that doesn't have
carrier is not illegal. But taprio prints the following stack trace:

[   31.851373] ------------[ cut here ]------------
[   31.856024] WARNING: CPU: 1 PID: 207 at net/sched/sch_taprio.c:481 taprio_dequeue+0x1a8/0x2d4
[   31.864566] taprio: dequeue() called with unknown picos per byte.
[   31.864570] Modules linked in:
[   31.873701] CPU: 1 PID: 207 Comm: tc Not tainted 5.3.0-rc5-01199-g8838fe023cd6 #1689
[   31.881398] Hardware name: Freescale LS1021A
[   31.885661] [<c03133a4>] (unwind_backtrace) from [<c030d8cc>] (show_stack+0x10/0x14)
[   31.893368] [<c030d8cc>] (show_stack) from [<c10ac958>] (dump_stack+0xb4/0xc8)
[   31.900555] [<c10ac958>] (dump_stack) from [<c0349d04>] (__warn+0xe0/0xf8)
[   31.907395] [<c0349d04>] (__warn) from [<c0349d64>] (warn_slowpath_fmt+0x48/0x6c)
[   31.914841] [<c0349d64>] (warn_slowpath_fmt) from [<c0f38db4>] (taprio_dequeue+0x1a8/0x2d4)
[   31.923150] [<c0f38db4>] (taprio_dequeue) from [<c0f227b0>] (__qdisc_run+0x90/0x61c)
[   31.930856] [<c0f227b0>] (__qdisc_run) from [<c0ec82ac>] (net_tx_action+0x12c/0x2bc)
[   31.938560] [<c0ec82ac>] (net_tx_action) from [<c0302298>] (__do_softirq+0x130/0x3c8)
[   31.946350] [<c0302298>] (__do_softirq) from [<c03502a0>] (irq_exit+0xbc/0xd8)
[   31.953536] [<c03502a0>] (irq_exit) from [<c03a4808>] (__handle_domain_irq+0x60/0xb4)
[   31.961328] [<c03a4808>] (__handle_domain_irq) from [<c0754478>] (gic_handle_irq+0x58/0x9c)
[   31.969638] [<c0754478>] (gic_handle_irq) from [<c0301a8c>] (__irq_svc+0x6c/0x90)
[   31.977076] Exception stack(0xe8167b20 to 0xe8167b68)
[   31.982100] 7b20: e9d4bd80 00000cc0 000000cf 00000000 e9d4bd80 c1f38958 00000cc0 c1f38960
[   31.990234] 7b40: 00000001 000000cf 00000004 e9dc0800 00000000 e8167b70 c0f478ec c0f46d94
[   31.998363] 7b60: 60070013 ffffffff
[   32.001833] [<c0301a8c>] (__irq_svc) from [<c0f46d94>] (netlink_trim+0x18/0xd8)
[   32.009104] [<c0f46d94>] (netlink_trim) from [<c0f478ec>] (netlink_broadcast_filtered+0x34/0x414)
[   32.017930] [<c0f478ec>] (netlink_broadcast_filtered) from [<c0f47cec>] (netlink_broadcast+0x20/0x28)
[   32.027102] [<c0f47cec>] (netlink_broadcast) from [<c0eea378>] (rtnetlink_send+0x34/0x88)
[   32.035238] [<c0eea378>] (rtnetlink_send) from [<c0f25890>] (notify_and_destroy+0x2c/0x44)
[   32.043461] [<c0f25890>] (notify_and_destroy) from [<c0f25e08>] (qdisc_graft+0x398/0x470)
[   32.051595] [<c0f25e08>] (qdisc_graft) from [<c0f27a00>] (tc_modify_qdisc+0x3a4/0x724)
[   32.059470] [<c0f27a00>] (tc_modify_qdisc) from [<c0ee4c84>] (rtnetlink_rcv_msg+0x260/0x2ec)
[   32.067864] [<c0ee4c84>] (rtnetlink_rcv_msg) from [<c0f4a988>] (netlink_rcv_skb+0xb8/0x110)
[   32.076172] [<c0f4a988>] (netlink_rcv_skb) from [<c0f4a170>] (netlink_unicast+0x1b4/0x22c)
[   32.084392] [<c0f4a170>] (netlink_unicast) from [<c0f4a5e4>] (netlink_sendmsg+0x33c/0x380)
[   32.092614] [<c0f4a5e4>] (netlink_sendmsg) from [<c0ea9f40>] (sock_sendmsg+0x14/0x24)
[   32.100403] [<c0ea9f40>] (sock_sendmsg) from [<c0eaa780>] (___sys_sendmsg+0x214/0x228)
[   32.108279] [<c0eaa780>] (___sys_sendmsg) from [<c0eabad0>] (__sys_sendmsg+0x50/0x8c)
[   32.116068] [<c0eabad0>] (__sys_sendmsg) from [<c0301000>] (ret_fast_syscall+0x0/0x54)
[   32.123938] Exception stack(0xe8167fa8 to 0xe8167ff0)
[   32.128960] 7fa0:                   b6fa68c8 000000f8 00000003 bea142d0 00000000 00000000
[   32.137093] 7fc0: b6fa68c8 000000f8 0052154c 00000128 5d6468a2 00000000 00000028 00558c9c
[   32.145224] 7fe0: 00000070 bea14278 00530d64 b6e17e64
[   32.150659] ---[ end trace 2139c9827c3e5177 ]---

This happens because the qdisc ->dequeue callback gets called. Which
again is not illegal, the qdisc will dequeue even when the interface is
up but doesn't have carrier (and hence SPEED_UNKNOWN), and the frames
will be dropped further down the stack in dev_direct_xmit().

And, at the end of the day, for what? For calculating the initial budget
of an interface which is non-operational at the moment and where frames
will get dropped anyway.

So if we can't figure out the link speed, default to SPEED_10 and move
along. We can also remove the runtime check now.

Cc: Leandro Dorileo <leandro.maciel.dorileo@intel.com>
Fixes: 7b9eba7ba0c1 ("net/sched: taprio: fix picos_per_byte miscalculation")
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
Changes in v2:
None.

 net/sched/sch_taprio.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 72b5f8c1aef9..84b863e2bdbd 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -477,11 +477,6 @@ static struct sk_buff *taprio_dequeue(struct Qdisc *sch)
 	u32 gate_mask;
 	int i;
 
-	if (atomic64_read(&q->picos_per_byte) == -1) {
-		WARN_ONCE(1, "taprio: dequeue() called with unknown picos per byte.");
-		return NULL;
-	}
-
 	rcu_read_lock();
 	entry = rcu_dereference(q->current_entry);
 	/* if there's no entry, it means that the schedule didn't
@@ -954,12 +949,20 @@ static void taprio_set_picos_per_byte(struct net_device *dev,
 				      struct taprio_sched *q)
 {
 	struct ethtool_link_ksettings ecmd;
-	int picos_per_byte = -1;
+	int speed = SPEED_10;
+	int picos_per_byte;
+	int err;
+
+	err = __ethtool_get_link_ksettings(dev, &ecmd);
+	if (err < 0)
+		goto skip;
+
+	if (ecmd.base.speed != SPEED_UNKNOWN)
+		speed = ecmd.base.speed;
 
-	if (!__ethtool_get_link_ksettings(dev, &ecmd) &&
-	    ecmd.base.speed != SPEED_UNKNOWN)
-		picos_per_byte = div64_s64(NSEC_PER_SEC * 1000LL * 8,
-					   ecmd.base.speed * 1000 * 1000);
+skip:
+	picos_per_byte = div64_s64(NSEC_PER_SEC * 1000LL * 8,
+				   speed * 1000 * 1000);
 
 	atomic64_set(&q->picos_per_byte, picos_per_byte);
 	netdev_dbg(dev, "taprio: set %s's picos_per_byte to: %lld, linkspeed: %d\n",
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 net 3/3] net/sched: cbs: Set default link speed to 10 Mbps in cbs_set_port_rate
From: Vladimir Oltean @ 2019-08-30  1:07 UTC (permalink / raw)
  To: xiyou.wangcong, jiri, davem, vinicius.gomes, vedang.patel,
	leandro.maciel.dorileo
  Cc: netdev, Vladimir Oltean
In-Reply-To: <20190830010723.32096-1-olteanv@gmail.com>

The discussion to be made is absolutely the same as in the case of
previous patch ("taprio: Set default link speed to 10 Mbps in
taprio_set_picos_per_byte"). Nothing is lost when setting a default.

Cc: Leandro Dorileo <leandro.maciel.dorileo@intel.com>
Fixes: e0a7683d30e9 ("net/sched: cbs: fix port_rate miscalculation")
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
Changes in v2:
None.

 net/sched/sch_cbs.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c
index 732e109c3055..810645b5c086 100644
--- a/net/sched/sch_cbs.c
+++ b/net/sched/sch_cbs.c
@@ -181,11 +181,6 @@ static struct sk_buff *cbs_dequeue_soft(struct Qdisc *sch)
 	s64 credits;
 	int len;
 
-	if (atomic64_read(&q->port_rate) == -1) {
-		WARN_ONCE(1, "cbs: dequeue() called with unknown port rate.");
-		return NULL;
-	}
-
 	if (q->credits < 0) {
 		credits = timediff_to_credits(now - q->last, q->idleslope);
 
@@ -303,11 +298,19 @@ static int cbs_enable_offload(struct net_device *dev, struct cbs_sched_data *q,
 static void cbs_set_port_rate(struct net_device *dev, struct cbs_sched_data *q)
 {
 	struct ethtool_link_ksettings ecmd;
+	int speed = SPEED_10;
 	int port_rate = -1;
+	int err;
+
+	err = __ethtool_get_link_ksettings(dev, &ecmd);
+	if (err < 0)
+		goto skip;
+
+	if (ecmd.base.speed != SPEED_UNKNOWN)
+		speed = ecmd.base.speed;
 
-	if (!__ethtool_get_link_ksettings(dev, &ecmd) &&
-	    ecmd.base.speed != SPEED_UNKNOWN)
-		port_rate = ecmd.base.speed * 1000 * BYTES_PER_KBIT;
+skip:
+	port_rate = speed * 1000 * BYTES_PER_KBIT;
 
 	atomic64_set(&q->port_rate, port_rate);
 	netdev_dbg(dev, "cbs: set %s's port_rate to: %lld, linkspeed: %d\n",
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 net 1/3] taprio: Fix kernel panic in taprio_destroy
From: Vladimir Oltean @ 2019-08-30  1:07 UTC (permalink / raw)
  To: xiyou.wangcong, jiri, davem, vinicius.gomes, vedang.patel,
	leandro.maciel.dorileo
  Cc: netdev, Vladimir Oltean
In-Reply-To: <20190830010723.32096-1-olteanv@gmail.com>

taprio_init may fail earlier than this line:

	list_add(&q->taprio_list, &taprio_list);

i.e. due to the net device not being multi queue.

Attempting to remove q from the global taprio_list when it is not part
of it will result in a kernel panic.

Fix it by matching list_add and list_del better to one another in the
order of operations. This way we can keep the deletion unconditional
and with lower complexity - O(1).

Cc: Leandro Dorileo <leandro.maciel.dorileo@intel.com>
Fixes: 7b9eba7ba0c1 ("net/sched: taprio: fix picos_per_byte miscalculation")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
Changes in v2:
Dropped the list iteration approach and just matched the addition better
with the removal.

 net/sched/sch_taprio.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 540bde009ea5..72b5f8c1aef9 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -1245,6 +1245,10 @@ static int taprio_init(struct Qdisc *sch, struct nlattr *opt,
 	 */
 	q->clockid = -1;
 
+	spin_lock(&taprio_list_lock);
+	list_add(&q->taprio_list, &taprio_list);
+	spin_unlock(&taprio_list_lock);
+
 	if (sch->parent != TC_H_ROOT)
 		return -EOPNOTSUPP;
 
@@ -1262,10 +1266,6 @@ static int taprio_init(struct Qdisc *sch, struct nlattr *opt,
 	if (!opt)
 		return -EINVAL;
 
-	spin_lock(&taprio_list_lock);
-	list_add(&q->taprio_list, &taprio_list);
-	spin_unlock(&taprio_list_lock);
-
 	for (i = 0; i < dev->num_tx_queues; i++) {
 		struct netdev_queue *dev_queue;
 		struct Qdisc *qdisc;
-- 
2.17.1


^ permalink raw reply related

* [PATCH v2 net 0/3] Fix issues in tc-taprio and tc-cbs
From: Vladimir Oltean @ 2019-08-30  1:07 UTC (permalink / raw)
  To: xiyou.wangcong, jiri, davem, vinicius.gomes, vedang.patel,
	leandro.maciel.dorileo
  Cc: netdev, Vladimir Oltean

This series fixes one panic and one WARN_ON found in the tc-taprio
qdisc, while trying to apply it:

- On an interface which is not multi-queue
- On an interface which has no carrier

The tc-cbs was also visually found to suffer of the same issue as
tc-taprio, and the fix was only compile-tested in that case.

Vladimir Oltean (3):
  taprio: Fix kernel panic in taprio_destroy
  taprio: Set default link speed to 10 Mbps in taprio_set_picos_per_byte
  net/sched: cbs: Set default link speed to 10 Mbps in cbs_set_port_rate

 net/sched/sch_cbs.c    | 19 +++++++++++--------
 net/sched/sch_taprio.c | 31 +++++++++++++++++--------------
 2 files changed, 28 insertions(+), 22 deletions(-)

-- 
2.17.1


^ permalink raw reply

* Re: [kbuild-all] [PATCH] ipv6: Not to probe neighbourless routes
From: Philip Li @ 2019-08-30  1:00 UTC (permalink / raw)
  To: David Miller
  Cc: lkp, wang.yi59, wang.liang82, yoshfuji, netdev, linux-kernel,
	cheng.lin130, kbuild-all, xue.zhihong, kuznet
In-Reply-To: <20190829.163742.2109211377942652910.davem@davemloft.net>

On Thu, Aug 29, 2019 at 04:37:42PM -0700, David Miller wrote:
> 
> So yeah, this is one instance where the kbuild test robot's report is
> making more rather than less work for us.
sorry for the inconvenience caused. We monitor the lkml, and as you
point out, we will continuously improve to provide faster response.

> 
> We identified the build problem within hours of this patch being
> posted and the updated version was posted more than 24 hours ago.
> 
> The kbuild robot should really have a way to either:
> 
> 1) Report build problems faster, humans find the obvious cases like
>    this one within a day or less.
thanks, we will continue working on this to speed up

> 
> 2) Notice that a new version of the patch was posted or that a human
>    responded to the patch pointing out the build problem.
thanks, we will enhance the patch testing to consider these ideas.

> 
> Otherwise we get postings like this which is just more noise to
> delete.
> 
> Thanks.
> _______________________________________________
> kbuild-all mailing list
> kbuild-all@lists.01.org
> https://lists.01.org/mailman/listinfo/kbuild-all

^ permalink raw reply

* Re: [PATCH] net: spider_net: Use struct_size() helper
From: David Miller @ 2019-08-30  0:54 UTC (permalink / raw)
  To: gustavo; +Cc: kou.ishizaki, netdev, linux-kernel
In-Reply-To: <20190828202108.GA9494@embeddedor>

From: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Date: Wed, 28 Aug 2019 15:21:08 -0500

> One of the more common cases of allocation size calculations is finding
> the size of a structure that has a zero-sized array at the end, along
> with memory for some number of elements for that array. For example:
> 
> struct spider_net_card {
> 	...
>         struct spider_net_descr darray[0];
> };
> 
> Make use of the struct_size() helper instead of an open-coded version
> in order to avoid any potential type mistakes.
> 
> So, replace the following form:
> 
> sizeof(struct spider_net_card) + (tx_descriptors + rx_descriptors) * sizeof(struct spider_net_descr)
> 
> with:
> 
> struct_size(card, darray, tx_descriptors + rx_descriptors)
> 
> Notice that, in this case, variable alloc_size is not necessary, hence it
> is removed.
> 
> Building: allmodconfig powerpc.
> 
> This code was detected with the help of Coccinelle.
> 
> Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>

Applied to net-next, thanks.

^ permalink raw reply

* [PATCH net-next 3/4] net: flow_offload: mangle action at byte level
From: Pablo Neira Ayuso @ 2019-08-30  0:53 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, vishal, jakub.kicinski, saeedm, jiri
In-Reply-To: <20190830005336.23604-1-pablo@netfilter.org>

The flow mangle action is originally modeled after the tc pedit action,
this has a number of shortcomings:

1) The tc pedit offset must be set on the 32-bits boundaries. Many
   protocol header field offsets are not aligned to 32-bits, eg. port
   destination, port source and ethernet destination. This patch adjusts
   the offset accordingly and trim off length in these case, so drivers get
   an exact offset and length to the header fields.

2) The maximum mangle length is one word of 32-bits, hence you need to
   up to four actions to mangle an IPv6 address. This patch coalesces
   consecutive tc pedit actions into one single action so drivers can
   configure the IPv6 mangling in one go. Ethernet address fields now
   require one single action instead of two too.

The following drivers have been updated accordingly to use this new
mangle action layout:

1) The cxgb4 driver does not need to split protocol field matching
   larger than one 32-bit words into multiple definitions. Instead one
   single definition per protocol field is enough. Checking for
   transport protocol ports is also simplified.

2) The mlx5 driver logic to disallow IPv4 ttl and IPv6 hoplimit fields
   becomes more simple too.

3) The nfp driver uses the nfp_fl_set_helper() function to configure the
   payload mangling. The memchr_inv() function is used to check for
   proper initialization of the value and mask. The driver has been
   updated to refer to the exact protocol header offsets too.

As a result, this patch reduces code complexity on the driver side at
the cost of adding ~100 LOC at the core to perform offset and length
adjustment; and to coalesce consecutive actions.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 .../net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c   | 162 +++++------------
 .../net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.h   |  40 ++---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    |  90 ++++------
 drivers/net/ethernet/netronome/nfp/flower/action.c | 200 +++++++++------------
 include/net/flow_offload.h                         |   7 +-
 net/sched/cls_api.c                                | 144 ++++++++++++---
 6 files changed, 304 insertions(+), 339 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
index 5afc15a60199..ba1ced08e41c 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
@@ -44,20 +44,12 @@
 #define STATS_CHECK_PERIOD (HZ / 2)
 
 static struct ch_tc_pedit_fields pedits[] = {
-	PEDIT_FIELDS(ETH_, DMAC_31_0, 4, dmac, 0),
-	PEDIT_FIELDS(ETH_, DMAC_47_32, 2, dmac, 4),
-	PEDIT_FIELDS(ETH_, SMAC_15_0, 2, smac, 0),
-	PEDIT_FIELDS(ETH_, SMAC_47_16, 4, smac, 2),
+	PEDIT_FIELDS(ETH_, DMAC, 6, dmac, 0),
+	PEDIT_FIELDS(ETH_, SMAC, 6, smac, 0),
 	PEDIT_FIELDS(IP4_, SRC, 4, nat_fip, 0),
 	PEDIT_FIELDS(IP4_, DST, 4, nat_lip, 0),
-	PEDIT_FIELDS(IP6_, SRC_31_0, 4, nat_fip, 0),
-	PEDIT_FIELDS(IP6_, SRC_63_32, 4, nat_fip, 4),
-	PEDIT_FIELDS(IP6_, SRC_95_64, 4, nat_fip, 8),
-	PEDIT_FIELDS(IP6_, SRC_127_96, 4, nat_fip, 12),
-	PEDIT_FIELDS(IP6_, DST_31_0, 4, nat_lip, 0),
-	PEDIT_FIELDS(IP6_, DST_63_32, 4, nat_lip, 4),
-	PEDIT_FIELDS(IP6_, DST_95_64, 4, nat_lip, 8),
-	PEDIT_FIELDS(IP6_, DST_127_96, 4, nat_lip, 12),
+	PEDIT_FIELDS(IP6_, SRC, 16, nat_fip, 0),
+	PEDIT_FIELDS(IP6_, DST, 16, nat_lip, 0),
 	PEDIT_FIELDS(TCP_, SPORT, 2, nat_fport, 0),
 	PEDIT_FIELDS(TCP_, DPORT, 2, nat_lport, 0),
 	PEDIT_FIELDS(UDP_, SPORT, 2, nat_fport, 0),
@@ -272,8 +264,8 @@ static int cxgb4_validate_flow_match(struct net_device *dev,
 	return 0;
 }
 
-static void offload_pedit(struct ch_filter_specification *fs, u32 val, u32 mask,
-			  u8 field)
+static void offload_pedit(struct ch_filter_specification *fs,
+			  struct flow_action_entry *act, u8 field)
 {
 	u32 offset = 0;
 	u8 size = 1;
@@ -286,92 +278,68 @@ static void offload_pedit(struct ch_filter_specification *fs, u32 val, u32 mask,
 			break;
 		}
 	}
-	memcpy((u8 *)fs + offset, &val, size);
+	memcpy((u8 *)fs + offset, act->mangle.val, size);
 }
 
-static void process_pedit_field(struct ch_filter_specification *fs, u32 val,
-				u32 mask, u32 offset, u8 htype)
+static void process_pedit_field(struct ch_filter_specification *fs,
+				struct flow_action_entry *act)
 {
+	u32 offset = act->mangle.offset;
+	u8 htype = act->mangle.htype;
+
 	switch (htype) {
 	case FLOW_ACT_MANGLE_HDR_TYPE_ETH:
 		switch (offset) {
-		case PEDIT_ETH_DMAC_31_0:
+		case PEDIT_ETH_DMAC:
 			fs->newdmac = 1;
-			offload_pedit(fs, val, mask, ETH_DMAC_31_0);
-			break;
-		case PEDIT_ETH_DMAC_47_32_SMAC_15_0:
-			if (mask & PEDIT_ETH_DMAC_MASK)
-				offload_pedit(fs, val, mask, ETH_DMAC_47_32);
-			else
-				offload_pedit(fs, val >> 16, mask >> 16,
-					      ETH_SMAC_15_0);
+			offload_pedit(fs, act, ETH_DMAC);
 			break;
-		case PEDIT_ETH_SMAC_47_16:
+		case PEDIT_ETH_SMAC:
 			fs->newsmac = 1;
-			offload_pedit(fs, val, mask, ETH_SMAC_47_16);
+			offload_pedit(fs, act, ETH_SMAC);
+			break;
 		}
 		break;
 	case FLOW_ACT_MANGLE_HDR_TYPE_IP4:
 		switch (offset) {
 		case PEDIT_IP4_SRC:
-			offload_pedit(fs, val, mask, IP4_SRC);
+			offload_pedit(fs, act, IP4_SRC);
 			break;
 		case PEDIT_IP4_DST:
-			offload_pedit(fs, val, mask, IP4_DST);
+			offload_pedit(fs, act, IP4_DST);
 		}
 		fs->nat_mode = NAT_MODE_ALL;
 		break;
 	case FLOW_ACT_MANGLE_HDR_TYPE_IP6:
 		switch (offset) {
-		case PEDIT_IP6_SRC_31_0:
-			offload_pedit(fs, val, mask, IP6_SRC_31_0);
-			break;
-		case PEDIT_IP6_SRC_63_32:
-			offload_pedit(fs, val, mask, IP6_SRC_63_32);
-			break;
-		case PEDIT_IP6_SRC_95_64:
-			offload_pedit(fs, val, mask, IP6_SRC_95_64);
-			break;
-		case PEDIT_IP6_SRC_127_96:
-			offload_pedit(fs, val, mask, IP6_SRC_127_96);
-			break;
-		case PEDIT_IP6_DST_31_0:
-			offload_pedit(fs, val, mask, IP6_DST_31_0);
+		case PEDIT_IP6_SRC:
+			offload_pedit(fs, act, IP6_SRC);
 			break;
-		case PEDIT_IP6_DST_63_32:
-			offload_pedit(fs, val, mask, IP6_DST_63_32);
+		case PEDIT_IP6_DST:
+			offload_pedit(fs, act, IP6_DST);
 			break;
-		case PEDIT_IP6_DST_95_64:
-			offload_pedit(fs, val, mask, IP6_DST_95_64);
-			break;
-		case PEDIT_IP6_DST_127_96:
-			offload_pedit(fs, val, mask, IP6_DST_127_96);
 		}
 		fs->nat_mode = NAT_MODE_ALL;
 		break;
 	case FLOW_ACT_MANGLE_HDR_TYPE_TCP:
 		switch (offset) {
-		case PEDIT_TCP_SPORT_DPORT:
-			if (mask & PEDIT_TCP_UDP_SPORT_MASK)
-				offload_pedit(fs, cpu_to_be32(val) >> 16,
-					      cpu_to_be32(mask) >> 16,
-					      TCP_SPORT);
-			else
-				offload_pedit(fs, cpu_to_be32(val),
-					      cpu_to_be32(mask), TCP_DPORT);
+		case PEDIT_TCP_SPORT:
+			offload_pedit(fs, act, TCP_SPORT);
+			break;
+		case PEDIT_TCP_DPORT:
+			offload_pedit(fs, act, TCP_DPORT);
+			break;
 		}
 		fs->nat_mode = NAT_MODE_ALL;
 		break;
 	case FLOW_ACT_MANGLE_HDR_TYPE_UDP:
 		switch (offset) {
-		case PEDIT_UDP_SPORT_DPORT:
-			if (mask & PEDIT_TCP_UDP_SPORT_MASK)
-				offload_pedit(fs, cpu_to_be32(val) >> 16,
-					      cpu_to_be32(mask) >> 16,
-					      UDP_SPORT);
-			else
-				offload_pedit(fs, cpu_to_be32(val),
-					      cpu_to_be32(mask), UDP_DPORT);
+		case PEDIT_UDP_SPORT:
+			offload_pedit(fs, act, UDP_SPORT);
+			break;
+		case PEDIT_UDP_DPORT:
+			offload_pedit(fs, act, UDP_DPORT);
+			break;
 		}
 		fs->nat_mode = NAT_MODE_ALL;
 	}
@@ -424,17 +392,8 @@ static void cxgb4_process_flow_actions(struct net_device *in,
 			}
 			}
 			break;
-		case FLOW_ACTION_MANGLE: {
-			u32 mask, val, offset;
-			u8 htype;
-
-			htype = act->mangle.htype;
-			mask = act->mangle.mask;
-			val = act->mangle.val;
-			offset = act->mangle.offset;
-
-			process_pedit_field(fs, val, mask, offset, htype);
-			}
+		case FLOW_ACTION_MANGLE:
+			process_pedit_field(fs, act);
 			break;
 		default:
 			break;
@@ -442,35 +401,20 @@ static void cxgb4_process_flow_actions(struct net_device *in,
 	}
 }
 
-static bool valid_l4_mask(u32 mask)
-{
-	u16 hi, lo;
-
-	/* Either the upper 16-bits (SPORT) OR the lower
-	 * 16-bits (DPORT) can be set, but NOT BOTH.
-	 */
-	hi = (mask >> 16) & 0xFFFF;
-	lo = mask & 0xFFFF;
-
-	return hi && lo ? false : true;
-}
-
 static bool valid_pedit_action(struct net_device *dev,
 			       const struct flow_action_entry *act)
 {
-	u32 mask, offset;
+	u32 offset;
 	u8 htype;
 
 	htype = act->mangle.htype;
-	mask = act->mangle.mask;
 	offset = act->mangle.offset;
 
 	switch (htype) {
 	case FLOW_ACT_MANGLE_HDR_TYPE_ETH:
 		switch (offset) {
-		case PEDIT_ETH_DMAC_31_0:
-		case PEDIT_ETH_DMAC_47_32_SMAC_15_0:
-		case PEDIT_ETH_SMAC_47_16:
+		case PEDIT_ETH_DMAC:
+		case PEDIT_ETH_SMAC:
 			break;
 		default:
 			netdev_err(dev, "%s: Unsupported pedit field\n",
@@ -491,14 +435,8 @@ static bool valid_pedit_action(struct net_device *dev,
 		break;
 	case FLOW_ACT_MANGLE_HDR_TYPE_IP6:
 		switch (offset) {
-		case PEDIT_IP6_SRC_31_0:
-		case PEDIT_IP6_SRC_63_32:
-		case PEDIT_IP6_SRC_95_64:
-		case PEDIT_IP6_SRC_127_96:
-		case PEDIT_IP6_DST_31_0:
-		case PEDIT_IP6_DST_63_32:
-		case PEDIT_IP6_DST_95_64:
-		case PEDIT_IP6_DST_127_96:
+		case PEDIT_IP6_SRC:
+		case PEDIT_IP6_DST:
 			break;
 		default:
 			netdev_err(dev, "%s: Unsupported pedit field\n",
@@ -508,12 +446,8 @@ static bool valid_pedit_action(struct net_device *dev,
 		break;
 	case FLOW_ACT_MANGLE_HDR_TYPE_TCP:
 		switch (offset) {
-		case PEDIT_TCP_SPORT_DPORT:
-			if (!valid_l4_mask(mask)) {
-				netdev_err(dev, "%s: Unsupported mask for TCP L4 ports\n",
-					   __func__);
-				return false;
-			}
+		case PEDIT_TCP_SPORT:
+		case PEDIT_TCP_DPORT:
 			break;
 		default:
 			netdev_err(dev, "%s: Unsupported pedit field\n",
@@ -523,12 +457,8 @@ static bool valid_pedit_action(struct net_device *dev,
 		break;
 	case FLOW_ACT_MANGLE_HDR_TYPE_UDP:
 		switch (offset) {
-		case PEDIT_UDP_SPORT_DPORT:
-			if (!valid_l4_mask(mask)) {
-				netdev_err(dev, "%s: Unsupported mask for UDP L4 ports\n",
-					   __func__);
-				return false;
-			}
+		case PEDIT_UDP_SPORT:
+		case PEDIT_UDP_DPORT:
 			break;
 		default:
 			netdev_err(dev, "%s: Unsupported pedit field\n",
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.h
index eb4c95248baf..03892755a18f 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.h
@@ -55,23 +55,14 @@ struct ch_tc_flower_entry {
 };
 
 enum {
-	ETH_DMAC_31_0,	/* dmac bits 0.. 31 */
-	ETH_DMAC_47_32,	/* dmac bits 32..47 */
-	ETH_SMAC_15_0,	/* smac bits 0.. 15 */
-	ETH_SMAC_47_16,	/* smac bits 16..47 */
+	ETH_DMAC,	/* 48-bits dmac bits */
+	ETH_SMAC,	/* 48-bits smac bits */
 
 	IP4_SRC,	/* 32-bit IPv4 src  */
 	IP4_DST,	/* 32-bit IPv4 dst  */
 
-	IP6_SRC_31_0,	/* src bits 0..  31 */
-	IP6_SRC_63_32,	/* src bits 63.. 32 */
-	IP6_SRC_95_64,	/* src bits 95.. 64 */
-	IP6_SRC_127_96,	/* src bits 127..96 */
-
-	IP6_DST_31_0,	/* dst bits 0..  31 */
-	IP6_DST_63_32,	/* dst bits 63.. 32 */
-	IP6_DST_95_64,	/* dst bits 95.. 64 */
-	IP6_DST_127_96,	/* dst bits 127..96 */
+	IP6_SRC,	/* 128-bit IPv6 src */
+	IP6_DST,	/* 128-bit IPv6 dst */
 
 	TCP_SPORT,	/* 16-bit TCP sport */
 	TCP_DPORT,	/* 16-bit TCP dport */
@@ -90,23 +81,16 @@ struct ch_tc_pedit_fields {
 	{ type## field, size, \
 		offsetof(struct ch_filter_specification, fs_field) + (offset) }
 
-#define PEDIT_ETH_DMAC_MASK		0xffff
-#define PEDIT_TCP_UDP_SPORT_MASK	0xffff
-#define PEDIT_ETH_DMAC_31_0		0x0
-#define PEDIT_ETH_DMAC_47_32_SMAC_15_0	0x4
-#define PEDIT_ETH_SMAC_47_16		0x8
+#define PEDIT_ETH_DMAC			0x0
+#define PEDIT_ETH_SMAC			0x6
+#define PEDIT_IP6_SRC			0x8
+#define PEDIT_IP6_DST			0x18
 #define PEDIT_IP4_SRC			0xC
 #define PEDIT_IP4_DST			0x10
-#define PEDIT_IP6_SRC_31_0		0x8
-#define PEDIT_IP6_SRC_63_32		0xC
-#define PEDIT_IP6_SRC_95_64		0x10
-#define PEDIT_IP6_SRC_127_96		0x14
-#define PEDIT_IP6_DST_31_0		0x18
-#define PEDIT_IP6_DST_63_32		0x1C
-#define PEDIT_IP6_DST_95_64		0x20
-#define PEDIT_IP6_DST_127_96		0x24
-#define PEDIT_TCP_SPORT_DPORT		0x0
-#define PEDIT_UDP_SPORT_DPORT		0x0
+#define PEDIT_TCP_SPORT			0x0
+#define PEDIT_TCP_DPORT			0x2
+#define PEDIT_UDP_SPORT			0x0
+#define PEDIT_UDP_DPORT			0x2
 
 int cxgb4_tc_flower_replace(struct net_device *dev,
 			    struct flow_cls_offload *cls);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index f29895b3a947..b7b88bc22cf7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -2201,19 +2201,24 @@ static int pedit_header_offsets[] = {
 
 #define pedit_header(_ph, _htype) ((void *)(_ph) + pedit_header_offsets[_htype])
 
-static int set_pedit_val(u8 hdr_type, u32 mask, u32 val, u32 offset,
+static int set_pedit_val(u8 hdr_type, const struct flow_action_entry *act,
 			 struct pedit_headers_action *hdrs)
 {
-	u32 *curr_pmask, *curr_pval;
+	u32 offset = act->mangle.offset;
+	u8 *curr_pmask, *curr_pval;
+	int i;
 
-	curr_pmask = (u32 *)(pedit_header(&hdrs->masks, hdr_type) + offset);
-	curr_pval  = (u32 *)(pedit_header(&hdrs->vals, hdr_type) + offset);
+	curr_pmask = (u8 *)(pedit_header(&hdrs->masks, hdr_type) + offset);
+	curr_pval  = (u8 *)(pedit_header(&hdrs->vals, hdr_type) + offset);
 
-	if (*curr_pmask & mask)  /* disallow acting twice on the same location */
-		goto out_err;
+	for (i = 0; i < act->mangle.len; i++) {
+		/* disallow acting twice on the same location */
+		if (curr_pmask[i] & act->mangle.mask[i])
+			goto out_err;
 
-	*curr_pmask |= mask;
-	*curr_pval  |= val;
+		curr_pmask[i] |= act->mangle.mask[i];
+		curr_pval[i] |= act->mangle.val[i];
+	}
 
 	return 0;
 
@@ -2487,7 +2492,6 @@ static int parse_tc_pedit_action(struct mlx5e_priv *priv,
 {
 	u8 cmd = (act->id == FLOW_ACTION_MANGLE) ? 0 : 1;
 	int err = -EOPNOTSUPP;
-	u32 mask, val, offset;
 	u8 htype;
 
 	htype = act->mangle.htype;
@@ -2504,11 +2508,7 @@ static int parse_tc_pedit_action(struct mlx5e_priv *priv,
 		goto out_err;
 	}
 
-	mask = act->mangle.mask;
-	val = act->mangle.val;
-	offset = act->mangle.offset;
-
-	err = set_pedit_val(htype, mask, val, offset, &hdrs[cmd]);
+	err = set_pedit_val(htype, act, &hdrs[cmd]);
 	if (err)
 		goto out_err;
 
@@ -2589,50 +2589,18 @@ static bool csum_offload_supported(struct mlx5e_priv *priv,
 	return true;
 }
 
-struct ip_ttl_word {
-	__u8	ttl;
-	__u8	protocol;
-	__sum16	check;
-};
-
-struct ipv6_hoplimit_word {
-	__be16	payload_len;
-	__u8	nexthdr;
-	__u8	hop_limit;
-};
-
 static bool is_action_keys_supported(const struct flow_action_entry *act)
 {
-	u32 mask, offset;
-	u8 htype;
+	u32 offset = act->mangle.offset;
+	u8 htype = act->mangle.htype;
 
-	htype = act->mangle.htype;
-	offset = act->mangle.offset;
-	mask = act->mangle.mask;
-	/* For IPv4 & IPv6 header check 4 byte word,
-	 * to determine that modified fields
-	 * are NOT ttl & hop_limit only.
-	 */
-	if (htype == FLOW_ACT_MANGLE_HDR_TYPE_IP4) {
-		struct ip_ttl_word *ttl_word =
-			(struct ip_ttl_word *)&mask;
-
-		if (offset != offsetof(struct iphdr, ttl) ||
-		    ttl_word->protocol ||
-		    ttl_word->check) {
-			return true;
-		}
-	} else if (htype == FLOW_ACT_MANGLE_HDR_TYPE_IP6) {
-		struct ipv6_hoplimit_word *hoplimit_word =
-			(struct ipv6_hoplimit_word *)&mask;
-
-		if (offset != offsetof(struct ipv6hdr, payload_len) ||
-		    hoplimit_word->payload_len ||
-		    hoplimit_word->nexthdr) {
-			return true;
-		}
-	}
-	return false;
+	if ((htype == FLOW_ACT_MANGLE_HDR_TYPE_IP4 &&
+	     offset == offsetof(struct iphdr, ttl)) ||
+	    (htype == FLOW_ACT_MANGLE_HDR_TYPE_IP6 &&
+	     offset == offsetof(struct ipv6hdr, hop_limit)))
+		return false;
+
+	return true;
 }
 
 static bool modify_header_match_supported(struct mlx5_flow_spec *spec,
@@ -2726,19 +2694,21 @@ static int add_vlan_rewrite_action(struct mlx5e_priv *priv, int namespace,
 				   struct pedit_headers_action *hdrs,
 				   u32 *action, struct netlink_ext_ack *extack)
 {
-	u16 mask16 = VLAN_VID_MASK;
-	u16 val16 = act->vlan.vid & VLAN_VID_MASK;
-	const struct flow_action_entry pedit_act = {
+	__be16 mask16 = htons(VLAN_VID_MASK);
+	__be16 val16 = htons(act->vlan.vid & VLAN_VID_MASK);
+	struct flow_action_entry pedit_act = {
 		.id = FLOW_ACTION_MANGLE,
 		.mangle.htype = FLOW_ACT_MANGLE_HDR_TYPE_ETH,
 		.mangle.offset = offsetof(struct vlan_ethhdr, h_vlan_TCI),
-		.mangle.mask = (u32)be16_to_cpu(*(__be16 *)&mask16),
-		.mangle.val = (u32)be16_to_cpu(*(__be16 *)&val16),
+		.mangle.len = 2,
 	};
 	u8 match_prio_mask, match_prio_val;
 	void *headers_c, *headers_v;
 	int err;
 
+	memcpy(pedit_act.mangle.mask, &mask16, sizeof(__be16));
+	memcpy(pedit_act.mangle.val, &val16, sizeof(__be16));
+
 	headers_c = get_match_headers_criteria(*action, &parse_attr->spec);
 	headers_v = get_match_headers_value(*action, &parse_attr->spec);
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c b/drivers/net/ethernet/netronome/nfp/flower/action.c
index 592c36ba9e3f..49f0dadb0ae3 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -472,38 +472,39 @@ nfp_fl_set_ipv4_tun(struct nfp_app *app, struct nfp_fl_set_ipv4_tun *set_tun,
 	return 0;
 }
 
-static void nfp_fl_set_helper32(u32 value, u32 mask, u8 *p_exact, u8 *p_mask)
+static void nfp_fl_set_helper(const u8 *value, const u8 *mask,
+			      u8 *p_exact, u8 *p_mask, int len)
 {
-	u32 oldvalue = get_unaligned((u32 *)p_exact);
-	u32 oldmask = get_unaligned((u32 *)p_mask);
+	int i;
 
-	value |= oldvalue & ~mask;
-
-	put_unaligned(oldmask | mask, (u32 *)p_mask);
-	put_unaligned(value, (u32 *)p_exact);
+	for (i = 0; i < len; i++) {
+		p_exact[i] |= (p_exact[i] & ~mask[i]) | value[i];
+		p_mask[i] |= mask[i];
+	}
 }
 
 static int
 nfp_fl_set_eth(const struct flow_action_entry *act, u32 off,
 	       struct nfp_fl_set_eth *set_eth, struct netlink_ext_ack *extack)
 {
-	u32 exact, mask;
-
-	if (off + 4 > ETH_ALEN * 2) {
+	switch (off) {
+	case offsetof(struct ethhdr, h_dest):
+	case offsetof(struct ethhdr, h_source):
+		break;
+	default:
 		NL_SET_ERR_MSG_MOD(extack, "unsupported offload: invalid pedit ethernet action");
 		return -EOPNOTSUPP;
 	}
 
-	mask = act->mangle.mask;
-	exact = act->mangle.val;
-
-	if (exact & ~mask) {
+	if (memchr_inv(&act->mangle.val, 0, act->mangle.len) &&
+	    !memchr_inv(&act->mangle.mask, 0, act->mangle.len)) {
 		NL_SET_ERR_MSG_MOD(extack, "unsupported offload: invalid pedit ethernet action");
 		return -EOPNOTSUPP;
 	}
 
-	nfp_fl_set_helper32(exact, mask, &set_eth->eth_addr_val[off],
-			    &set_eth->eth_addr_mask[off]);
+	nfp_fl_set_helper(act->mangle.val, act->mangle.mask,
+			  &set_eth->eth_addr_val[off],
+			  &set_eth->eth_addr_mask[off], act->mangle.len);
 
 	set_eth->reserved = cpu_to_be16(0);
 	set_eth->head.jump_id = NFP_FL_ACTION_OPCODE_SET_ETHERNET;
@@ -524,68 +525,46 @@ nfp_fl_set_ip4(const struct flow_action_entry *act, u32 off,
 	       struct nfp_fl_set_ip4_ttl_tos *set_ip_ttl_tos,
 	       struct netlink_ext_ack *extack)
 {
-	struct ipv4_ttl_word *ttl_word_mask;
-	struct ipv4_ttl_word *ttl_word;
-	struct iphdr *tos_word_mask;
-	struct iphdr *tos_word;
-	__be32 exact, mask;
-
-	/* We are expecting tcf_pedit to return a big endian value */
-	mask = (__force __be32)act->mangle.mask;
-	exact = (__force __be32)act->mangle.val;
-
-	if (exact & ~mask) {
+	if (memchr_inv(act->mangle.val, 0, act->mangle.len) &&
+	    !memchr_inv(act->mangle.mask, 0, act->mangle.len)) {
 		NL_SET_ERR_MSG_MOD(extack, "unsupported offload: invalid pedit IPv4 action");
 		return -EOPNOTSUPP;
 	}
 
 	switch (off) {
 	case offsetof(struct iphdr, daddr):
-		set_ip_addr->ipv4_dst_mask |= mask;
-		set_ip_addr->ipv4_dst &= ~mask;
-		set_ip_addr->ipv4_dst |= exact;
+		nfp_fl_set_helper(act->mangle.val, act->mangle.mask,
+				  (u8 *)&set_ip_addr->ipv4_dst,
+				  (u8 *)&set_ip_addr->ipv4_dst_mask,
+				  act->mangle.len);
 		set_ip_addr->head.jump_id = NFP_FL_ACTION_OPCODE_SET_IPV4_ADDRS;
 		set_ip_addr->head.len_lw = sizeof(*set_ip_addr) >>
 					   NFP_FL_LW_SIZ;
 		break;
 	case offsetof(struct iphdr, saddr):
-		set_ip_addr->ipv4_src_mask |= mask;
-		set_ip_addr->ipv4_src &= ~mask;
-		set_ip_addr->ipv4_src |= exact;
+		nfp_fl_set_helper(act->mangle.val, act->mangle.mask,
+				  (u8 *)&set_ip_addr->ipv4_src,
+				  (u8 *)&set_ip_addr->ipv4_src_mask,
+				  act->mangle.len);
 		set_ip_addr->head.jump_id = NFP_FL_ACTION_OPCODE_SET_IPV4_ADDRS;
 		set_ip_addr->head.len_lw = sizeof(*set_ip_addr) >>
 					   NFP_FL_LW_SIZ;
 		break;
 	case offsetof(struct iphdr, ttl):
-		ttl_word_mask = (struct ipv4_ttl_word *)&mask;
-		ttl_word = (struct ipv4_ttl_word *)&exact;
-
-		if (ttl_word_mask->protocol || ttl_word_mask->check) {
-			NL_SET_ERR_MSG_MOD(extack, "unsupported offload: invalid pedit IPv4 ttl action");
-			return -EOPNOTSUPP;
-		}
-
-		set_ip_ttl_tos->ipv4_ttl_mask |= ttl_word_mask->ttl;
-		set_ip_ttl_tos->ipv4_ttl &= ~ttl_word_mask->ttl;
-		set_ip_ttl_tos->ipv4_ttl |= ttl_word->ttl & ttl_word_mask->ttl;
+		nfp_fl_set_helper(act->mangle.val, act->mangle.mask,
+				  (u8 *)&set_ip_ttl_tos->ipv4_ttl,
+				  (u8 *)&set_ip_ttl_tos->ipv4_ttl_mask,
+				  act->mangle.len);
 		set_ip_ttl_tos->head.jump_id =
 			NFP_FL_ACTION_OPCODE_SET_IPV4_TTL_TOS;
 		set_ip_ttl_tos->head.len_lw = sizeof(*set_ip_ttl_tos) >>
 					      NFP_FL_LW_SIZ;
 		break;
-	case round_down(offsetof(struct iphdr, tos), 4):
-		tos_word_mask = (struct iphdr *)&mask;
-		tos_word = (struct iphdr *)&exact;
-
-		if (tos_word_mask->version || tos_word_mask->ihl ||
-		    tos_word_mask->tot_len) {
-			NL_SET_ERR_MSG_MOD(extack, "unsupported offload: invalid pedit IPv4 tos action");
-			return -EOPNOTSUPP;
-		}
-
-		set_ip_ttl_tos->ipv4_tos_mask |= tos_word_mask->tos;
-		set_ip_ttl_tos->ipv4_tos &= ~tos_word_mask->tos;
-		set_ip_ttl_tos->ipv4_tos |= tos_word->tos & tos_word_mask->tos;
+	case offsetof(struct iphdr, tos):
+		nfp_fl_set_helper(act->mangle.val, act->mangle.mask,
+				  (u8 *)&set_ip_ttl_tos->ipv4_tos,
+				  (u8 *)&set_ip_ttl_tos->ipv4_tos_mask,
+				  act->mangle.len);
 		set_ip_ttl_tos->head.jump_id =
 			NFP_FL_ACTION_OPCODE_SET_IPV4_TTL_TOS;
 		set_ip_ttl_tos->head.len_lw = sizeof(*set_ip_ttl_tos) >>
@@ -600,12 +579,17 @@ nfp_fl_set_ip4(const struct flow_action_entry *act, u32 off,
 }
 
 static void
-nfp_fl_set_ip6_helper(int opcode_tag, u8 word, __be32 exact, __be32 mask,
-		      struct nfp_fl_set_ipv6_addr *ip6)
+nfp_fl_set_ip6_helper(int opcode_tag, const struct flow_action_entry *act,
+		      struct nfp_fl_set_ipv6_addr *ip6,
+		      struct netlink_ext_ack *extack)
 {
-	ip6->ipv6[word].mask |= mask;
-	ip6->ipv6[word].exact &= ~mask;
-	ip6->ipv6[word].exact |= exact;
+	int i, j;
+
+	for (i = 0, j = 0; i < sizeof(struct in6_addr); i++, j += sizeof(u32)) {
+		nfp_fl_set_helper(&act->mangle.val[j], &act->mangle.mask[j],
+				  (u8 *)&ip6->ipv6[i].exact,
+				  (u8 *)&ip6->ipv6[i].mask, sizeof(u32));
+	}
 
 	ip6->reserved = cpu_to_be16(0);
 	ip6->head.jump_id = opcode_tag;
@@ -619,39 +603,34 @@ struct ipv6_hop_limit_word {
 };
 
 static int
-nfp_fl_set_ip6_hop_limit_flow_label(u32 off, __be32 exact, __be32 mask,
+nfp_fl_set_ip6_hop_limit_flow_label(u32 off,
+				    const struct flow_action_entry *act,
 				    struct nfp_fl_set_ipv6_tc_hl_fl *ip_hl_fl,
 				    struct netlink_ext_ack *extack)
 {
-	struct ipv6_hop_limit_word *fl_hl_mask;
-	struct ipv6_hop_limit_word *fl_hl;
-
 	switch (off) {
-	case offsetof(struct ipv6hdr, payload_len):
-		fl_hl_mask = (struct ipv6_hop_limit_word *)&mask;
-		fl_hl = (struct ipv6_hop_limit_word *)&exact;
-
-		if (fl_hl_mask->nexthdr || fl_hl_mask->payload_len) {
-			NL_SET_ERR_MSG_MOD(extack, "unsupported offload: invalid pedit IPv6 hop limit action");
-			return -EOPNOTSUPP;
-		}
-
-		ip_hl_fl->ipv6_hop_limit_mask |= fl_hl_mask->hop_limit;
-		ip_hl_fl->ipv6_hop_limit &= ~fl_hl_mask->hop_limit;
-		ip_hl_fl->ipv6_hop_limit |= fl_hl->hop_limit &
-					    fl_hl_mask->hop_limit;
+	case offsetof(struct ipv6hdr, hop_limit):
+		nfp_fl_set_helper(act->mangle.val, act->mangle.mask,
+				  &ip_hl_fl->ipv6_hop_limit,
+				  &ip_hl_fl->ipv6_hop_limit_mask,
+				  act->mangle.len);
 		break;
-	case round_down(offsetof(struct ipv6hdr, flow_lbl), 4):
-		if (mask & ~IPV6_FLOW_LABEL_MASK ||
-		    exact & ~IPV6_FLOW_LABEL_MASK) {
+	case 1: /* offsetof(struct ipv6hdr, flow_lbl) */
+		/* The initial 4-bits are part of the traffic class field. */
+		if (act->mangle.val[0] & 0xf0 ||
+		    act->mangle.mask[0] & 0xf0) {
 			NL_SET_ERR_MSG_MOD(extack, "unsupported offload: invalid pedit IPv6 flow label action");
 			return -EOPNOTSUPP;
 		}
 
-		ip_hl_fl->ipv6_label_mask |= mask;
-		ip_hl_fl->ipv6_label &= ~mask;
-		ip_hl_fl->ipv6_label |= exact;
+		nfp_fl_set_helper(act->mangle.val, act->mangle.mask,
+				  (u8 *)&ip_hl_fl->ipv6_label,
+				  (u8 *)&ip_hl_fl->ipv6_label_mask,
+				  act->mangle.len);
 		break;
+	default:
+		NL_SET_ERR_MSG_MOD(extack, "unsupported offload: invalid pedit IPv6 action");
+		return -EOPNOTSUPP;
 	}
 
 	ip_hl_fl->head.jump_id = NFP_FL_ACTION_OPCODE_SET_IPV6_TC_HL_FL;
@@ -667,31 +646,23 @@ nfp_fl_set_ip6(const struct flow_action_entry *act, u32 off,
 	       struct nfp_fl_set_ipv6_tc_hl_fl *ip_hl_fl,
 	       struct netlink_ext_ack *extack)
 {
-	__be32 exact, mask;
 	int err = 0;
-	u8 word;
 
-	/* We are expecting tcf_pedit to return a big endian value */
-	mask = (__force __be32)act->mangle.mask;
-	exact = (__force __be32)act->mangle.val;
-
-	if (exact & ~mask) {
+	if (memchr_inv(act->mangle.val, 0, act->mangle.len) &&
+	    !memchr_inv(act->mangle.mask, 0, act->mangle.len)) {
 		NL_SET_ERR_MSG_MOD(extack, "unsupported offload: invalid pedit IPv6 action");
 		return -EOPNOTSUPP;
 	}
 
 	if (off < offsetof(struct ipv6hdr, saddr)) {
-		err = nfp_fl_set_ip6_hop_limit_flow_label(off, exact, mask,
-							  ip_hl_fl, extack);
-	} else if (off < offsetof(struct ipv6hdr, daddr)) {
-		word = (off - offsetof(struct ipv6hdr, saddr)) / sizeof(exact);
-		nfp_fl_set_ip6_helper(NFP_FL_ACTION_OPCODE_SET_IPV6_SRC, word,
-				      exact, mask, ip_src);
-	} else if (off < offsetof(struct ipv6hdr, daddr) +
-		       sizeof(struct in6_addr)) {
-		word = (off - offsetof(struct ipv6hdr, daddr)) / sizeof(exact);
-		nfp_fl_set_ip6_helper(NFP_FL_ACTION_OPCODE_SET_IPV6_DST, word,
-				      exact, mask, ip_dst);
+		err = nfp_fl_set_ip6_hop_limit_flow_label(off, act, ip_hl_fl,
+							  extack);
+	} else if (off == offsetof(struct ipv6hdr, saddr)) {
+		nfp_fl_set_ip6_helper(NFP_FL_ACTION_OPCODE_SET_IPV6_SRC,
+				      act, ip_src, extack);
+	} else if (off == offsetof(struct ipv6hdr, daddr)) {
+		nfp_fl_set_ip6_helper(NFP_FL_ACTION_OPCODE_SET_IPV6_DST,
+				      act, ip_dst, extack);
 	} else {
 		NL_SET_ERR_MSG_MOD(extack, "unsupported offload: pedit on unsupported section of IPv6 header");
 		return -EOPNOTSUPP;
@@ -705,23 +676,30 @@ nfp_fl_set_tport(const struct flow_action_entry *act, u32 off,
 		 struct nfp_fl_set_tport *set_tport, int opcode,
 		 struct netlink_ext_ack *extack)
 {
-	u32 exact, mask;
+	int i;
 
-	if (off) {
+	switch (off) {
+	case offsetof(struct tcphdr, source):
+		i = 0;
+		break;
+	case offsetof(struct tcphdr, dest):
+		i = 2;
+		break;
+	default:
 		NL_SET_ERR_MSG_MOD(extack, "unsupported offload: pedit on unsupported section of L4 header");
 		return -EOPNOTSUPP;
 	}
 
-	mask = act->mangle.mask;
-	exact = act->mangle.val;
-
-	if (exact & ~mask) {
+	if (memchr_inv(act->mangle.val, 0, act->mangle.len) &&
+	    !memchr_inv(act->mangle.mask, 0, act->mangle.len)) {
 		NL_SET_ERR_MSG_MOD(extack, "unsupported offload: invalid pedit L4 action");
 		return -EOPNOTSUPP;
 	}
 
-	nfp_fl_set_helper32(exact, mask, set_tport->tp_port_val,
-			    set_tport->tp_port_mask);
+	nfp_fl_set_helper(act->mangle.val, act->mangle.mask,
+			  &set_tport->tp_port_val[i],
+			  &set_tport->tp_port_mask[i],
+			  act->mangle.len);
 
 	set_tport->reserved = cpu_to_be16(0);
 	set_tport->head.jump_id = opcode;
diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index fc881875f856..a2853b37dded 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -154,6 +154,8 @@ enum flow_action_mangle_base {
 	FLOW_ACT_MANGLE_HDR_TYPE_UDP,
 };
 
+#define FLOW_ACTION_MANGLE_MAXLEN	16
+
 struct flow_action_entry {
 	enum flow_action_id		id;
 	union {
@@ -167,8 +169,9 @@ struct flow_action_entry {
 		struct {				/* FLOW_ACTION_PACKET_EDIT */
 			enum flow_action_mangle_base htype;
 			u32		offset;
-			u32		mask;
-			u32		val;
+			u8		mask[FLOW_ACTION_MANGLE_MAXLEN];
+			u8		val[FLOW_ACTION_MANGLE_MAXLEN];
+			u8		len;
 		} mangle;
 		const struct ip_tunnel_info *tunnel;	/* FLOW_ACTION_TUNNEL_ENCAP */
 		u32			csum_flags;	/* FLOW_ACTION_CSUM */
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index e30a151d8527..e8827fa8263a 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -3289,11 +3289,128 @@ void tc_cleanup_flow_action(struct flow_action *flow_action)
 }
 EXPORT_SYMBOL(tc_cleanup_flow_action);
 
+static unsigned int flow_action_mangle_type(enum pedit_cmd cmd)
+{
+	switch (cmd) {
+	case TCA_PEDIT_KEY_EX_CMD_SET:
+		return FLOW_ACTION_MANGLE;
+	case TCA_PEDIT_KEY_EX_CMD_ADD:
+		return FLOW_ACTION_ADD;
+	default:
+		WARN_ON_ONCE(1);
+	}
+	return 0;
+}
+
+struct flow_action_mangle_ctx {
+	u8	cmd;
+	u8	offset;
+	u8	htype;
+	u8	idx;
+	u8	val[FLOW_ACTION_MANGLE_MAXLEN];
+	u8	mask[FLOW_ACTION_MANGLE_MAXLEN];
+	u32	first_word;
+	u32	last_word;
+};
+
+static void flow_action_mangle_entry(struct flow_action_entry *entry,
+				     const struct flow_action_mangle_ctx *ctx)
+{
+	u32 delta;
+
+	entry->id = ctx->cmd;
+	entry->mangle.htype = ctx->htype;
+	entry->mangle.offset = ctx->offset;
+	entry->mangle.len = ctx->idx;
+
+	/* Adjust offset. */
+	delta = sizeof(u32) - (fls(ctx->first_word) / BITS_PER_BYTE);
+	entry->mangle.offset += delta;
+
+	/* Adjust length. */
+	entry->mangle.len -= ((ffs(ctx->last_word) / BITS_PER_BYTE) + delta);
+
+	/* Copy value and mask from offset using the adjusted length. */
+	memcpy(entry->mangle.val, &ctx->val[delta], entry->mangle.len);
+	memcpy(entry->mangle.mask, &ctx->mask[delta], entry->mangle.len);
+}
+
+static void flow_action_mangle_ctx_update(struct flow_action_mangle_ctx *ctx,
+					  const struct tc_action *act, int k)
+{
+	u32 val, mask;
+
+	val = tcf_pedit_val(act, k);
+	mask = ~tcf_pedit_mask(act, k);
+
+	memcpy(&ctx->val[ctx->idx], &val, sizeof(u32));
+	memcpy(&ctx->mask[ctx->idx], &mask, sizeof(u32));
+	ctx->idx += sizeof(u32);
+}
+
+static void flow_action_mangle_ctx_init(struct flow_action_mangle_ctx *ctx,
+                                        const struct tc_action *act, int k)
+{
+	ctx->cmd = flow_action_mangle_type(tcf_pedit_cmd(act, k));
+	ctx->offset = tcf_pedit_offset(act, k);
+	ctx->htype = tcf_pedit_htype(act, k);
+	ctx->idx = 0;
+
+	ctx->first_word = ntohl(~tcf_pedit_mask(act, k));
+	ctx->last_word = ctx->first_word;
+
+	flow_action_mangle_ctx_update(ctx, act, k);
+}
+
+static int flow_action_mangle(struct flow_action *flow_action,
+			      struct flow_action_entry *entry,
+			      const struct tc_action *act, int j)
+{
+	struct flow_action_mangle_ctx ctx;
+	int k;
+
+	if (tcf_pedit_cmd(act, 0) > TCA_PEDIT_KEY_EX_CMD_ADD)
+		return -1;
+
+	flow_action_mangle_ctx_init(&ctx, act, 0);
+
+	/* Special case: one single 32-bits word. */
+	if (tcf_pedit_nkeys(act) == 1) {
+		flow_action_mangle_entry(entry, &ctx);
+		return j;
+	}
+
+	for (k = 1; k < tcf_pedit_nkeys(act); k++) {
+		if (tcf_pedit_cmd(act, k) > TCA_PEDIT_KEY_EX_CMD_ADD)
+			return -1;
+
+		/* Offset is contiguous and type is the same, coalesce. */
+		if (ctx.idx < FLOW_ACTION_MANGLE_MAXLEN &&
+		    ctx.offset + ctx.idx == tcf_pedit_offset(act, k) &&
+		    ctx.cmd == flow_action_mangle_type(tcf_pedit_cmd(act, k))) {
+			flow_action_mangle_ctx_update(&ctx, act, k);
+			continue;
+		}
+		ctx.last_word = ntohl(~tcf_pedit_mask(act, k - 1));
+
+		/* Cannot coalesce, set up this entry. */
+		flow_action_mangle_entry(entry, &ctx);
+
+		flow_action_mangle_ctx_init(&ctx, act, k);
+		entry = &flow_action->entries[++j];
+	}
+
+	ctx.last_word = ntohl(~tcf_pedit_mask(act, k - 1));
+	flow_action_mangle_entry(entry, &ctx);
+
+	return j;
+}
+
 int tc_setup_flow_action(struct flow_action *flow_action,
 			 const struct tcf_exts *exts, bool rtnl_held)
 {
 	const struct tc_action *act;
-	int i, j, k, err = 0;
+	int i, j, err = 0;
 
 	if (!exts)
 		return 0;
@@ -3366,25 +3483,9 @@ int tc_setup_flow_action(struct flow_action *flow_action,
 		} else if (is_tcf_tunnel_release(act)) {
 			entry->id = FLOW_ACTION_TUNNEL_DECAP;
 		} else if (is_tcf_pedit(act)) {
-			for (k = 0; k < tcf_pedit_nkeys(act); k++) {
-				switch (tcf_pedit_cmd(act, k)) {
-				case TCA_PEDIT_KEY_EX_CMD_SET:
-					entry->id = FLOW_ACTION_MANGLE;
-					break;
-				case TCA_PEDIT_KEY_EX_CMD_ADD:
-					entry->id = FLOW_ACTION_ADD;
-					break;
-				default:
-					err = -EOPNOTSUPP;
-					goto err_out;
-				}
-				entry->mangle.htype = tcf_pedit_htype(act, k);
-				entry->mangle.mask = ~tcf_pedit_mask(act, k);
-				entry->mangle.val = tcf_pedit_val(act, k) &
-							entry->mangle.mask;
-				entry->mangle.offset = tcf_pedit_offset(act, k);
-				entry = &flow_action->entries[++j];
-			}
+			j = flow_action_mangle(flow_action, entry, act, j);
+			if (j < 0)
+				goto err_out;
 		} else if (is_tcf_csum(act)) {
 			entry->id = FLOW_ACTION_CSUM;
 			entry->csum_flags = tcf_csum_update_flags(act);
@@ -3439,8 +3540,7 @@ int tc_setup_flow_action(struct flow_action *flow_action,
 			goto err_out;
 		}
 
-		if (!is_tcf_pedit(act))
-			j++;
+		j++;
 	}
 
 err_out:
-- 
2.11.0


^ permalink raw reply related

* [PATCH net-next 4/4] netfilter: nft_payload: packet mangling offload support
From: Pablo Neira Ayuso @ 2019-08-30  0:53 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, vishal, jakub.kicinski, saeedm, jiri
In-Reply-To: <20190830005336.23604-1-pablo@netfilter.org>

This patch allows for mangling packet fields using hardware offload
infrastructure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_payload.c | 72 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 72 insertions(+)

diff --git a/net/netfilter/nft_payload.c b/net/netfilter/nft_payload.c
index 22a80eb60222..39882f81ca8d 100644
--- a/net/netfilter/nft_payload.c
+++ b/net/netfilter/nft_payload.c
@@ -562,12 +562,84 @@ static int nft_payload_set_dump(struct sk_buff *skb, const struct nft_expr *expr
 	return -1;
 }
 
+static int nft_payload_offload_set_nh(struct nft_offload_ctx *ctx,
+				      struct nft_flow_rule *flow,
+				      const struct nft_payload_set *priv)
+{
+	int type = FLOW_ACT_MANGLE_UNSPEC;
+
+	switch (ctx->dep.l3num) {
+	case htons(ETH_P_IP):
+		type = FLOW_ACT_MANGLE_HDR_TYPE_IP4;
+		break;
+	case htons(ETH_P_IPV6):
+		type = FLOW_ACT_MANGLE_HDR_TYPE_IP6;
+		break;
+	}
+
+	return type;
+}
+
+static int nft_payload_offload_set_th(struct nft_offload_ctx *ctx,
+				      struct nft_flow_rule *flow,
+				      const struct nft_payload_set *priv)
+{
+	int type = FLOW_ACT_MANGLE_UNSPEC;
+
+	switch (ctx->dep.protonum) {
+	case IPPROTO_TCP:
+		type = FLOW_ACT_MANGLE_HDR_TYPE_TCP;
+		break;
+	case IPPROTO_UDP:
+		type = FLOW_ACT_MANGLE_HDR_TYPE_UDP;
+		break;
+	}
+
+	return type;
+}
+
+static int nft_payload_set_offload(struct nft_offload_ctx *ctx,
+				   struct nft_flow_rule *flow,
+				   const struct nft_expr *expr)
+{
+	const struct nft_payload_set *priv = nft_expr_priv(expr);
+	struct nft_offload_reg *sreg = &ctx->regs[priv->sreg];
+	int type = FLOW_ACT_MANGLE_UNSPEC;
+	struct flow_action_entry *entry;
+
+	switch (priv->base) {
+	case NFT_PAYLOAD_LL_HEADER:
+		type = FLOW_ACT_MANGLE_HDR_TYPE_ETH;
+		break;
+	case NFT_PAYLOAD_NETWORK_HEADER:
+		type = nft_payload_offload_set_nh(ctx, flow, priv);
+		break;
+	case NFT_PAYLOAD_TRANSPORT_HEADER:
+		type = nft_payload_offload_set_th(ctx, flow, priv);
+		break;
+	default:
+		WARN_ON_ONCE(1);
+		break;
+	}
+
+	entry = &flow->rule->action.entries[ctx->num_actions++];
+	entry->mangle.htype	= type;
+	entry->mangle.offset	= priv->offset;
+	entry->mangle.len	= priv->len;
+
+	memcpy(entry->mangle.val, sreg->data.data, priv->len);
+	memset(entry->mangle.mask, 0xff, priv->len);
+
+	return type != FLOW_ACT_MANGLE_UNSPEC ? 0 : -EOPNOTSUPP;
+}
+
 static const struct nft_expr_ops nft_payload_set_ops = {
 	.type		= &nft_payload_type,
 	.size		= NFT_EXPR_SIZE(sizeof(struct nft_payload_set)),
 	.eval		= nft_payload_set_eval,
 	.init		= nft_payload_set_init,
 	.dump		= nft_payload_set_dump,
+	.offload	= nft_payload_set_offload,
 };
 
 static const struct nft_expr_ops *
-- 
2.11.0


^ permalink raw reply related

* [PATCH net-next 1/4] net: flow_offload: flip mangle action mask
From: Pablo Neira Ayuso @ 2019-08-30  0:53 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, vishal, jakub.kicinski, saeedm, jiri
In-Reply-To: <20190830005336.23604-1-pablo@netfilter.org>

Userspace tc pedit action performs a bitwise NOT operation on the mask.
All of the existing drivers in the tree undo this operation. Prepare the
mangle mask in the way the drivers expect from the
tc_setup_flow_action() function.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c | 12 ++++++------
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c      |  6 +++---
 drivers/net/ethernet/netronome/nfp/flower/action.c   |  8 ++++----
 net/sched/cls_api.c                                  |  2 +-
 4 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
index e447976bdd3e..2d26dbca701d 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
@@ -275,7 +275,7 @@ static int cxgb4_validate_flow_match(struct net_device *dev,
 static void offload_pedit(struct ch_filter_specification *fs, u32 val, u32 mask,
 			  u8 field)
 {
-	u32 set_val = val & ~mask;
+	u32 set_val = val & mask;
 	u32 offset = 0;
 	u8 size = 1;
 	int i;
@@ -301,7 +301,7 @@ static void process_pedit_field(struct ch_filter_specification *fs, u32 val,
 			offload_pedit(fs, val, mask, ETH_DMAC_31_0);
 			break;
 		case PEDIT_ETH_DMAC_47_32_SMAC_15_0:
-			if (~mask & PEDIT_ETH_DMAC_MASK)
+			if (mask & PEDIT_ETH_DMAC_MASK)
 				offload_pedit(fs, val, mask, ETH_DMAC_47_32);
 			else
 				offload_pedit(fs, val >> 16, mask >> 16,
@@ -353,7 +353,7 @@ static void process_pedit_field(struct ch_filter_specification *fs, u32 val,
 	case FLOW_ACT_MANGLE_HDR_TYPE_TCP:
 		switch (offset) {
 		case PEDIT_TCP_SPORT_DPORT:
-			if (~mask & PEDIT_TCP_UDP_SPORT_MASK)
+			if (mask & PEDIT_TCP_UDP_SPORT_MASK)
 				offload_pedit(fs, cpu_to_be32(val) >> 16,
 					      cpu_to_be32(mask) >> 16,
 					      TCP_SPORT);
@@ -366,7 +366,7 @@ static void process_pedit_field(struct ch_filter_specification *fs, u32 val,
 	case FLOW_ACT_MANGLE_HDR_TYPE_UDP:
 		switch (offset) {
 		case PEDIT_UDP_SPORT_DPORT:
-			if (~mask & PEDIT_TCP_UDP_SPORT_MASK)
+			if (mask & PEDIT_TCP_UDP_SPORT_MASK)
 				offload_pedit(fs, cpu_to_be32(val) >> 16,
 					      cpu_to_be32(mask) >> 16,
 					      UDP_SPORT);
@@ -510,7 +510,7 @@ static bool valid_pedit_action(struct net_device *dev,
 	case FLOW_ACT_MANGLE_HDR_TYPE_TCP:
 		switch (offset) {
 		case PEDIT_TCP_SPORT_DPORT:
-			if (!valid_l4_mask(~mask)) {
+			if (!valid_l4_mask(mask)) {
 				netdev_err(dev, "%s: Unsupported mask for TCP L4 ports\n",
 					   __func__);
 				return false;
@@ -525,7 +525,7 @@ static bool valid_pedit_action(struct net_device *dev,
 	case FLOW_ACT_MANGLE_HDR_TYPE_UDP:
 		switch (offset) {
 		case PEDIT_UDP_SPORT_DPORT:
-			if (!valid_l4_mask(~mask)) {
+			if (!valid_l4_mask(mask)) {
 				netdev_err(dev, "%s: Unsupported mask for UDP L4 ports\n",
 					   __func__);
 				return false;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 5581a8045ede..67e82480b516 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -2508,7 +2508,7 @@ static int parse_tc_pedit_action(struct mlx5e_priv *priv,
 	val = act->mangle.val;
 	offset = act->mangle.offset;
 
-	err = set_pedit_val(htype, ~mask, val, offset, &hdrs[cmd]);
+	err = set_pedit_val(htype, mask, val, offset, &hdrs[cmd]);
 	if (err)
 		goto out_err;
 
@@ -2608,7 +2608,7 @@ static bool is_action_keys_supported(const struct flow_action_entry *act)
 
 	htype = act->mangle.htype;
 	offset = act->mangle.offset;
-	mask = ~act->mangle.mask;
+	mask = act->mangle.mask;
 	/* For IPv4 & IPv6 header check 4 byte word,
 	 * to determine that modified fields
 	 * are NOT ttl & hop_limit only.
@@ -2732,7 +2732,7 @@ static int add_vlan_rewrite_action(struct mlx5e_priv *priv, int namespace,
 		.id = FLOW_ACTION_MANGLE,
 		.mangle.htype = FLOW_ACT_MANGLE_HDR_TYPE_ETH,
 		.mangle.offset = offsetof(struct vlan_ethhdr, h_vlan_TCI),
-		.mangle.mask = ~(u32)be16_to_cpu(*(__be16 *)&mask16),
+		.mangle.mask = (u32)be16_to_cpu(*(__be16 *)&mask16),
 		.mangle.val = (u32)be16_to_cpu(*(__be16 *)&val16),
 	};
 	u8 match_prio_mask, match_prio_val;
diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c b/drivers/net/ethernet/netronome/nfp/flower/action.c
index 1b019fdfcd97..ee0066a7ba87 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -495,7 +495,7 @@ nfp_fl_set_eth(const struct flow_action_entry *act, u32 off,
 		return -EOPNOTSUPP;
 	}
 
-	mask = ~act->mangle.mask;
+	mask = act->mangle.mask;
 	exact = act->mangle.val;
 
 	if (exact & ~mask) {
@@ -532,7 +532,7 @@ nfp_fl_set_ip4(const struct flow_action_entry *act, u32 off,
 	__be32 exact, mask;
 
 	/* We are expecting tcf_pedit to return a big endian value */
-	mask = (__force __be32)~act->mangle.mask;
+	mask = (__force __be32)act->mangle.mask;
 	exact = (__force __be32)act->mangle.val;
 
 	if (exact & ~mask) {
@@ -673,7 +673,7 @@ nfp_fl_set_ip6(const struct flow_action_entry *act, u32 off,
 	u8 word;
 
 	/* We are expecting tcf_pedit to return a big endian value */
-	mask = (__force __be32)~act->mangle.mask;
+	mask = (__force __be32)act->mangle.mask;
 	exact = (__force __be32)act->mangle.val;
 
 	if (exact & ~mask) {
@@ -713,7 +713,7 @@ nfp_fl_set_tport(const struct flow_action_entry *act, u32 off,
 		return -EOPNOTSUPP;
 	}
 
-	mask = ~act->mangle.mask;
+	mask = act->mangle.mask;
 	exact = act->mangle.val;
 
 	if (exact & ~mask) {
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 671ca905dbb5..fbab004d0075 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -3379,7 +3379,7 @@ int tc_setup_flow_action(struct flow_action *flow_action,
 					goto err_out;
 				}
 				entry->mangle.htype = tcf_pedit_htype(act, k);
-				entry->mangle.mask = tcf_pedit_mask(act, k);
+				entry->mangle.mask = ~tcf_pedit_mask(act, k);
 				entry->mangle.val = tcf_pedit_val(act, k);
 				entry->mangle.offset = tcf_pedit_offset(act, k);
 				entry = &flow_action->entries[++j];
-- 
2.11.0


^ permalink raw reply related

* [PATCH net-next 2/4] net: flow_offload: bitwise AND on mangle action value field
From: Pablo Neira Ayuso @ 2019-08-30  0:53 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, vishal, jakub.kicinski, saeedm, jiri
In-Reply-To: <20190830005336.23604-1-pablo@netfilter.org>

Drivers perform a bitwise AND on the value and the mask. Update
tc_setup_flow_action() to perform this operation so drivers do not need
to do this.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c | 3 +--
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c      | 2 +-
 drivers/net/ethernet/netronome/nfp/flower/action.c   | 9 ++++-----
 net/sched/cls_api.c                                  | 3 ++-
 4 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
index 2d26dbca701d..5afc15a60199 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
@@ -275,7 +275,6 @@ static int cxgb4_validate_flow_match(struct net_device *dev,
 static void offload_pedit(struct ch_filter_specification *fs, u32 val, u32 mask,
 			  u8 field)
 {
-	u32 set_val = val & mask;
 	u32 offset = 0;
 	u8 size = 1;
 	int i;
@@ -287,7 +286,7 @@ static void offload_pedit(struct ch_filter_specification *fs, u32 val, u32 mask,
 			break;
 		}
 	}
-	memcpy((u8 *)fs + offset, &set_val, size);
+	memcpy((u8 *)fs + offset, &val, size);
 }
 
 static void process_pedit_field(struct ch_filter_specification *fs, u32 val,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 67e82480b516..f29895b3a947 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -2213,7 +2213,7 @@ static int set_pedit_val(u8 hdr_type, u32 mask, u32 val, u32 offset,
 		goto out_err;
 
 	*curr_pmask |= mask;
-	*curr_pval  |= (val & mask);
+	*curr_pval  |= val;
 
 	return 0;
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/action.c b/drivers/net/ethernet/netronome/nfp/flower/action.c
index ee0066a7ba87..592c36ba9e3f 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/action.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/action.c
@@ -477,7 +477,6 @@ static void nfp_fl_set_helper32(u32 value, u32 mask, u8 *p_exact, u8 *p_mask)
 	u32 oldvalue = get_unaligned((u32 *)p_exact);
 	u32 oldmask = get_unaligned((u32 *)p_mask);
 
-	value &= mask;
 	value |= oldvalue & ~mask;
 
 	put_unaligned(oldmask | mask, (u32 *)p_mask);
@@ -544,7 +543,7 @@ nfp_fl_set_ip4(const struct flow_action_entry *act, u32 off,
 	case offsetof(struct iphdr, daddr):
 		set_ip_addr->ipv4_dst_mask |= mask;
 		set_ip_addr->ipv4_dst &= ~mask;
-		set_ip_addr->ipv4_dst |= exact & mask;
+		set_ip_addr->ipv4_dst |= exact;
 		set_ip_addr->head.jump_id = NFP_FL_ACTION_OPCODE_SET_IPV4_ADDRS;
 		set_ip_addr->head.len_lw = sizeof(*set_ip_addr) >>
 					   NFP_FL_LW_SIZ;
@@ -552,7 +551,7 @@ nfp_fl_set_ip4(const struct flow_action_entry *act, u32 off,
 	case offsetof(struct iphdr, saddr):
 		set_ip_addr->ipv4_src_mask |= mask;
 		set_ip_addr->ipv4_src &= ~mask;
-		set_ip_addr->ipv4_src |= exact & mask;
+		set_ip_addr->ipv4_src |= exact;
 		set_ip_addr->head.jump_id = NFP_FL_ACTION_OPCODE_SET_IPV4_ADDRS;
 		set_ip_addr->head.len_lw = sizeof(*set_ip_addr) >>
 					   NFP_FL_LW_SIZ;
@@ -606,7 +605,7 @@ nfp_fl_set_ip6_helper(int opcode_tag, u8 word, __be32 exact, __be32 mask,
 {
 	ip6->ipv6[word].mask |= mask;
 	ip6->ipv6[word].exact &= ~mask;
-	ip6->ipv6[word].exact |= exact & mask;
+	ip6->ipv6[word].exact |= exact;
 
 	ip6->reserved = cpu_to_be16(0);
 	ip6->head.jump_id = opcode_tag;
@@ -651,7 +650,7 @@ nfp_fl_set_ip6_hop_limit_flow_label(u32 off, __be32 exact, __be32 mask,
 
 		ip_hl_fl->ipv6_label_mask |= mask;
 		ip_hl_fl->ipv6_label &= ~mask;
-		ip_hl_fl->ipv6_label |= exact & mask;
+		ip_hl_fl->ipv6_label |= exact;
 		break;
 	}
 
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index fbab004d0075..e30a151d8527 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -3380,7 +3380,8 @@ int tc_setup_flow_action(struct flow_action *flow_action,
 				}
 				entry->mangle.htype = tcf_pedit_htype(act, k);
 				entry->mangle.mask = ~tcf_pedit_mask(act, k);
-				entry->mangle.val = tcf_pedit_val(act, k);
+				entry->mangle.val = tcf_pedit_val(act, k) &
+							entry->mangle.mask;
 				entry->mangle.offset = tcf_pedit_offset(act, k);
 				entry = &flow_action->entries[++j];
 			}
-- 
2.11.0


^ permalink raw reply related

* [PATCH 0/4 net-next] flow_offload: update mangle action representation
From: Pablo Neira Ayuso @ 2019-08-30  0:53 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, vishal, jakub.kicinski, saeedm, jiri

Hi,

This patch updates the mangle action representation:

Patch 1) Undo bitwise NOT operation on the mangle mask (coming from tc
	 pedit userspace).

Patch 2) mangle value &= mask from the front-end side.

Patch 3) adjust offset, length and coalesce consecutive actions.

Patch 4) add payload mangling for netfilter.

After this patchset:

* Offsets do not need to be on the 32-bits boundaries anymore. This
  patchset adds front-end code to adjust the offset and length coming
  from the tc pedit representation, so drivers get an exact header field
  offset and length.

* The front-end coalesces consecutive pedit actions into one single
  word, so drivers can mangle IPv6 and ethernet address fields in one
  single go.

On the driver side, diffstat -t shows that drivers code to deal with
payload mangling gets simplified:

	INSERTED,DELETED,MODIFIED,FILENAME
	46,116,0,drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c (-70 LOC)
	12,28,0,drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.h (-16 LOC)
	30,60,0,drivers/net/ethernet/mellanox/mlx5/core/en_tc.c (-30 LOC)
	89,111,0,drivers/net/ethernet/netronome/nfp/flower/action.c (-22 LOC)

While, on the front-end side the balance is the following:

	122,21,0,net/sched/cls_api.c (+101 LOC)

Please, apply.

P.S: This patchset comes after the "netfilter: payload mangling offload
     support" series, although it has been heavily reworked.

Pablo Neira Ayuso (4):
  net: flow_offload: flip mangle action mask
  net: flow_offload: bitwise AND on mangle action value field
  net: flow_offload: mangle action at byte level
  netfilter: nft_payload: packet mangling offload support

 .../net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c   | 163 +++++------------
 .../net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.h   |  40 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    |  90 +++------
 drivers/net/ethernet/netronome/nfp/flower/action.c | 201 +++++++++------------
 include/net/flow_offload.h                         |   7 +-
 net/netfilter/nft_payload.c                        |  72 ++++++++
 net/sched/cls_api.c                                | 142 +++++++++++++--
 7 files changed, 376 insertions(+), 339 deletions(-)

-- 
2.11.0


^ permalink raw reply

* [PATCH v3 net-next 1/2] net: bridge: Populate the pvid flag in br_vlan_get_info
From: Vladimir Oltean @ 2019-08-30  0:53 UTC (permalink / raw)
  To: f.fainelli, vivien.didelot, andrew, idosch, roopa, nikolay, davem
  Cc: netdev, Vladimir Oltean
In-Reply-To: <20190830005325.26526-1-olteanv@gmail.com>

Currently this simplified code snippet fails:

	br_vlan_get_pvid(netdev, &pvid);
	br_vlan_get_info(netdev, pvid, &vinfo);
	ASSERT(!(vinfo.flags & BRIDGE_VLAN_INFO_PVID));

It is intuitive that the pvid of a netdevice should have the
BRIDGE_VLAN_INFO_PVID flag set.

However I can't seem to pinpoint a commit where this behavior was
introduced. It seems like it's been like that since forever.

At a first glance it would make more sense to just handle the
BRIDGE_VLAN_INFO_PVID flag in __vlan_add_flags. However, as Nikolay
explains:

  There are a few reasons why we don't do it, most importantly because
  we need to have only one visible pvid at any single time, even if it's
  stale - it must be just one. Right now that rule will not be violated
  by this change, but people will try using this flag and could see two
  pvids simultaneously. You can see that the pvid code is even using
  memory barriers to propagate the new value faster and everywhere the
  pvid is read only once.  That is the reason the flag is set
  dynamically when dumping entries, too.  A second (weaker) argument
  against would be given the above we don't want another way to do the
  same thing, specifically if it can provide us with two pvids (e.g. if
  walking the vlan list) or if it can provide us with a pvid different
  from the one set in the vg. [Obviously, I'm talking about RCU
  pvid/vlan use cases similar to the dumps.  The locked cases are fine.
  I would like to avoid explaining why this shouldn't be relied upon
  without locking]

So instead of introducing the above change and making sure of the pvid
uniqueness under RCU, simply dynamically populate the pvid flag in
br_vlan_get_info().

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
Changes since v2:
None.

 net/bridge/br_vlan.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index f5b2aeebbfe9..bb98984cd27d 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -1281,6 +1281,8 @@ int br_vlan_get_info(const struct net_device *dev, u16 vid,
 
 	p_vinfo->vid = vid;
 	p_vinfo->flags = v->flags;
+	if (vid == br_get_pvid(vg))
+		p_vinfo->flags |= BRIDGE_VLAN_INFO_PVID;
 	return 0;
 }
 EXPORT_SYMBOL_GPL(br_vlan_get_info);
-- 
2.17.1


^ permalink raw reply related

* [PATCH v3 net-next 2/2] net: dsa: tag_8021q: Restore bridge VLANs when enabling vlan_filtering
From: Vladimir Oltean @ 2019-08-30  0:53 UTC (permalink / raw)
  To: f.fainelli, vivien.didelot, andrew, idosch, roopa, nikolay, davem
  Cc: netdev, Vladimir Oltean
In-Reply-To: <20190830005325.26526-1-olteanv@gmail.com>

The bridge core assumes that enabling/disabling vlan_filtering will
translate into the simple toggling of a flag for switchdev drivers.

That is clearly not the case for sja1105, which alters the VLAN table
and the pvids in order to obtain port separation in standalone mode.

There are 2 parts to the issue.

First, tag_8021q changes the pvid to a unique per-port rx_vid for frame
identification. But we need to disable tag_8021q when vlan_filtering
kicks in, and at that point, the VLAN configured as pvid will have to be
removed from the filtering table of the ports. With an invalid pvid, the
ports will drop all traffic.  Since the bridge will not call any vlan
operation through switchdev after enabling vlan_filtering, we need to
ensure we're in a functional state ourselves. Hence read the pvid that
the bridge is aware of, and program that into our ports.

Secondly, tag_8021q uses the 1024-3071 range privately in
vlan_filtering=0 mode. Had the user installed one of these VLANs during
a previous vlan_filtering=1 session, then upon the next tag_8021q
cleanup for vlan_filtering to kick in again, VLANs in that range will
get deleted unconditionally, hence breaking user expectation. So when
deleting the VLANs, check if the bridge had knowledge about them, and if
it did, re-apply the settings. Wrap this logic inside a
dsa_8021q_vid_apply helper function to reduce code duplication.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
Changes since v2:
- Don't error out in br_vlan_get_pvid
- Restore CPU port VID as well

 net/dsa/tag_8021q.c | 102 +++++++++++++++++++++++++++++++++++---------
 1 file changed, 82 insertions(+), 20 deletions(-)

diff --git a/net/dsa/tag_8021q.c b/net/dsa/tag_8021q.c
index 67a1bc635a7b..9c1cc2482b68 100644
--- a/net/dsa/tag_8021q.c
+++ b/net/dsa/tag_8021q.c
@@ -93,6 +93,79 @@ int dsa_8021q_rx_source_port(u16 vid)
 }
 EXPORT_SYMBOL_GPL(dsa_8021q_rx_source_port);
 
+static int dsa_8021q_restore_pvid(struct dsa_switch *ds, int port)
+{
+	struct bridge_vlan_info vinfo;
+	struct net_device *slave;
+	u16 pvid;
+	int err;
+
+	if (!dsa_is_user_port(ds, port))
+		return 0;
+
+	slave = ds->ports[port].slave;
+
+	err = br_vlan_get_pvid(slave, &pvid);
+	if (err < 0)
+		/* There is no pvid on the bridge for this port, which is
+		 * perfectly valid. Nothing to restore, bye-bye!
+		 */
+		return 0;
+
+	err = br_vlan_get_info(slave, pvid, &vinfo);
+	if (err < 0) {
+		dev_err(ds->dev, "Couldn't determine PVID attributes\n");
+		return err;
+	}
+
+	return dsa_port_vid_add(&ds->ports[port], pvid, vinfo.flags);
+}
+
+/* If @enabled is true, installs @vid with @flags into the switch port's HW
+ * filter.
+ * If @enabled is false, deletes @vid (ignores @flags) from the port. Had the
+ * user explicitly configured this @vid through the bridge core, then the @vid
+ * is installed again, but this time with the flags from the bridge layer.
+ */
+static int dsa_8021q_vid_apply(struct dsa_switch *ds, int port, u16 vid,
+			       u16 flags, bool enabled)
+{
+	struct dsa_port *dp = &ds->ports[port];
+	struct bridge_vlan_info vinfo;
+	int err;
+
+	if (enabled)
+		return dsa_port_vid_add(dp, vid, flags);
+
+	err = dsa_port_vid_del(dp, vid);
+	if (err < 0)
+		return err;
+
+	/* Nothing to restore from the bridge for a non-user port.
+	 * The CPU port VLANs are restored implicitly with the user ports,
+	 * similar to how the bridge does in dsa_slave_vlan_add and
+	 * dsa_slave_vlan_del.
+	 */
+	if (!dsa_is_user_port(ds, port))
+		return 0;
+
+	err = br_vlan_get_info(dp->slave, vid, &vinfo);
+	/* Couldn't determine bridge attributes for this vid,
+	 * it means the bridge had not configured it.
+	 */
+	if (err < 0)
+		return 0;
+
+	/* Restore the VID from the bridge */
+	err = dsa_port_vid_add(dp, vid, vinfo.flags);
+	if (err < 0)
+		return err;
+
+	vinfo.flags &= ~BRIDGE_VLAN_INFO_PVID;
+
+	return dsa_port_vid_add(dp->cpu_dp, vid, vinfo.flags);
+}
+
 /* RX VLAN tagging (left) and TX VLAN tagging (right) setup shown for a single
  * front-panel switch port (here swp0).
  *
@@ -148,8 +221,6 @@ EXPORT_SYMBOL_GPL(dsa_8021q_rx_source_port);
 int dsa_port_setup_8021q_tagging(struct dsa_switch *ds, int port, bool enabled)
 {
 	int upstream = dsa_upstream_port(ds, port);
-	struct dsa_port *dp = &ds->ports[port];
-	struct dsa_port *upstream_dp = &ds->ports[upstream];
 	u16 rx_vid = dsa_8021q_rx_vid(ds, port);
 	u16 tx_vid = dsa_8021q_tx_vid(ds, port);
 	int i, err;
@@ -166,7 +237,6 @@ int dsa_port_setup_8021q_tagging(struct dsa_switch *ds, int port, bool enabled)
 	 * restrictions, so there are no concerns about leaking traffic.
 	 */
 	for (i = 0; i < ds->num_ports; i++) {
-		struct dsa_port *other_dp = &ds->ports[i];
 		u16 flags;
 
 		if (i == upstream)
@@ -179,10 +249,7 @@ int dsa_port_setup_8021q_tagging(struct dsa_switch *ds, int port, bool enabled)
 			/* The RX VID is a regular VLAN on all others */
 			flags = BRIDGE_VLAN_INFO_UNTAGGED;
 
-		if (enabled)
-			err = dsa_port_vid_add(other_dp, rx_vid, flags);
-		else
-			err = dsa_port_vid_del(other_dp, rx_vid);
+		err = dsa_8021q_vid_apply(ds, i, rx_vid, flags, enabled);
 		if (err) {
 			dev_err(ds->dev, "Failed to apply RX VID %d to port %d: %d\n",
 				rx_vid, port, err);
@@ -193,10 +260,7 @@ int dsa_port_setup_8021q_tagging(struct dsa_switch *ds, int port, bool enabled)
 	/* CPU port needs to see this port's RX VID
 	 * as tagged egress.
 	 */
-	if (enabled)
-		err = dsa_port_vid_add(upstream_dp, rx_vid, 0);
-	else
-		err = dsa_port_vid_del(upstream_dp, rx_vid);
+	err = dsa_8021q_vid_apply(ds, upstream, rx_vid, 0, enabled);
 	if (err) {
 		dev_err(ds->dev, "Failed to apply RX VID %d to port %d: %d\n",
 			rx_vid, port, err);
@@ -204,26 +268,24 @@ int dsa_port_setup_8021q_tagging(struct dsa_switch *ds, int port, bool enabled)
 	}
 
 	/* Finally apply the TX VID on this port and on the CPU port */
-	if (enabled)
-		err = dsa_port_vid_add(dp, tx_vid, BRIDGE_VLAN_INFO_UNTAGGED);
-	else
-		err = dsa_port_vid_del(dp, tx_vid);
+	err = dsa_8021q_vid_apply(ds, port, tx_vid, BRIDGE_VLAN_INFO_UNTAGGED,
+				  enabled);
 	if (err) {
 		dev_err(ds->dev, "Failed to apply TX VID %d on port %d: %d\n",
 			tx_vid, port, err);
 		return err;
 	}
-	if (enabled)
-		err = dsa_port_vid_add(upstream_dp, tx_vid, 0);
-	else
-		err = dsa_port_vid_del(upstream_dp, tx_vid);
+	err = dsa_8021q_vid_apply(ds, upstream, tx_vid, 0, enabled);
 	if (err) {
 		dev_err(ds->dev, "Failed to apply TX VID %d on port %d: %d\n",
 			tx_vid, upstream, err);
 		return err;
 	}
 
-	return 0;
+	if (!enabled)
+		err = dsa_8021q_restore_pvid(ds, port);
+
+	return err;
 }
 EXPORT_SYMBOL_GPL(dsa_port_setup_8021q_tagging);
 
-- 
2.17.1


^ permalink raw reply related

* [PATCH v3 net-next 0/2] Dynamic toggling of vlan_filtering for SJA1105 DSA
From: Vladimir Oltean @ 2019-08-30  0:53 UTC (permalink / raw)
  To: f.fainelli, vivien.didelot, andrew, idosch, roopa, nikolay, davem
  Cc: netdev, Vladimir Oltean

This patchset addresses a limitation in dsa_8021q where this sequence of
commands was causing the switch to stop forwarding traffic:

  ip link add name br0 type bridge vlan_filtering 0
  ip link set dev swp2 master br0
  echo 1 > /sys/class/net/br0/bridge/vlan_filtering
  echo 0 > /sys/class/net/br0/bridge/vlan_filtering

The issue has to do with the VLAN table manipulations that dsa_8021q
does without notifying the bridge layer. The solution is to always
restore the VLANs that the bridge knows about, when disabling tagging.

Vladimir Oltean (2):
  net: bridge: Populate the pvid flag in br_vlan_get_info
  net: dsa: tag_8021q: Restore bridge VLANs when enabling vlan_filtering

 net/bridge/br_vlan.c |   2 +
 net/dsa/tag_8021q.c  | 102 ++++++++++++++++++++++++++++++++++---------
 2 files changed, 84 insertions(+), 20 deletions(-)

-- 
2.17.1


^ permalink raw reply

* [PATCH RFC bpf-next 00/10] improve/fix cross-compilation for bpf samples
From: Ivan Khoronzhuk @ 2019-08-30  0:50 UTC (permalink / raw)
  To: linux, ast, daniel, yhs, davem, jakub.kicinski, hawk,
	john.fastabend
  Cc: linux-arm-kernel, linux-kernel, netdev, bpf, clang-built-linux,
	Ivan Khoronzhuk

This series contains mainly fixes/improvements for cross-compilation
(also verified on native platform build), tested on arm, but intended
for any arch.

The several patches are related to llvm clang and should be out of this
series or even fixed in another way, and here just to get comments:
  arm: include: asm: swab: mask rev16 instruction for clang
  arm: include: asm: unified: mask .syntax unified for clang

Also, only for armv7, there is one more problem related to long and
void type sizes for 32 bits, while the BPF LLVM back end still
operates in 64 bit, but that's another story.

Smth related not only for cross-compilation and can have impact on other
archs and build environments, so might be good idea to verify it in order
to add appropriate changes, some warn options can be tuned, so comment.

Ivan Khoronzhuk (10):
  samples: bpf: Makefile: use --target from cross-compile
  samples: bpf: Makefile: remove target for native build
  libbpf: Makefile: add C/CXX/LDFLAGS to libbpf.so and test_libpf
    targets
  samples: bpf: use own EXTRA_CFLAGS for clang commands
  samples: bpf: Makefile: use vars from KBUILD_CFLAGS to handle linux
    headers
  samples: bpf: makefile: fix HDR_PROBE
  samples: bpf: add makefile.prog for separate CC build
  samples: bpf: Makefile: base progs build on Makefile.progs
  arm: include: asm: swab: mask rev16 instruction for clang
  arm: include: asm: unified: mask .syntax unified for clang

 arch/arm/include/asm/swab.h    |   3 +
 arch/arm/include/asm/unified.h |   6 +-
 samples/bpf/Makefile           | 177 +++++++++++++++++++--------------
 samples/bpf/Makefile.prog      |  77 ++++++++++++++
 samples/bpf/README.rst         |   7 ++
 tools/lib/bpf/Makefile         |  11 +-
 6 files changed, 205 insertions(+), 76 deletions(-)
 create mode 100644 samples/bpf/Makefile.prog

-- 
2.17.1


^ permalink raw reply

* [PATCH RFC bpf-next 03/10] libbpf: Makefile: add C/CXX/LDFLAGS to libbpf.so and test_libpf targets
From: Ivan Khoronzhuk @ 2019-08-30  0:50 UTC (permalink / raw)
  To: linux, ast, daniel, yhs, davem, jakub.kicinski, hawk,
	john.fastabend
  Cc: linux-arm-kernel, linux-kernel, netdev, bpf, clang-built-linux,
	Ivan Khoronzhuk
In-Reply-To: <20190830005037.24004-1-ivan.khoronzhuk@linaro.org>

In case of LDFLAGS and EXTRA_CC flags there is no way to pass them
correctly to build command, for instance when --sysroot is used or
external libraries are used, like -lelf. In follow patches this is
used for samples/bpf cross-compiling.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
 tools/lib/bpf/Makefile | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index 844f6cd79c03..d606d249e334 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -99,6 +99,10 @@ else
   CFLAGS := -g -Wall
 endif
 
+ifdef EXTRA_CXXFLAGS
+  CXXFLAGS := $(EXTRA_CXXFLAGS)
+endif
+
 ifeq ($(feature-libelf-mmap), 1)
   override CFLAGS += -DHAVE_LIBELF_MMAP_SUPPORT
 endif
@@ -179,8 +183,9 @@ $(BPF_IN): force elfdep bpfdep
 $(OUTPUT)libbpf.so: $(OUTPUT)libbpf.so.$(LIBBPF_VERSION)
 
 $(OUTPUT)libbpf.so.$(LIBBPF_VERSION): $(BPF_IN)
-	$(QUIET_LINK)$(CC) --shared -Wl,-soname,libbpf.so.$(VERSION) \
-				    -Wl,--version-script=$(VERSION_SCRIPT) $^ -lelf -o $@
+	$(QUIET_LINK)$(CC) $(LDFLAGS) \
+		--shared -Wl,-soname,libbpf.so.$(VERSION) \
+		-Wl,--version-script=$(VERSION_SCRIPT) $^ -lelf -o $@
 	@ln -sf $(@F) $(OUTPUT)libbpf.so
 	@ln -sf $(@F) $(OUTPUT)libbpf.so.$(VERSION)
 
@@ -188,7 +193,7 @@ $(OUTPUT)libbpf.a: $(BPF_IN)
 	$(QUIET_LINK)$(RM) $@; $(AR) rcs $@ $^
 
 $(OUTPUT)test_libbpf: test_libbpf.cpp $(OUTPUT)libbpf.a
-	$(QUIET_LINK)$(CXX) $(INCLUDES) $^ -lelf -o $@
+	$(QUIET_LINK)$(CXX) $(CXXFLAGS) $(LDFLAGS) $(INCLUDES) $^ -lelf -o $@
 
 $(OUTPUT)libbpf.pc:
 	$(QUIET_GEN)sed -e "s|@PREFIX@|$(prefix)|" \
-- 
2.17.1


^ permalink raw reply related

* [PATCH RFC bpf-next 02/10] samples: bpf: Makefile: remove target for native build
From: Ivan Khoronzhuk @ 2019-08-30  0:50 UTC (permalink / raw)
  To: linux, ast, daniel, yhs, davem, jakub.kicinski, hawk,
	john.fastabend
  Cc: linux-arm-kernel, linux-kernel, netdev, bpf, clang-built-linux,
	Ivan Khoronzhuk
In-Reply-To: <20190830005037.24004-1-ivan.khoronzhuk@linaro.org>

No need to set --target for native build, at least for arm, the
default target will be used anyway. In case of arm, for at least
clang 5 - 10 it causes error like:

clang: warning: unknown platform, assuming -mfloat-abi=soft
LLVM ERROR: Unsupported calling convention
make[2]: *** [/home/root/snapshot/samples/bpf/Makefile:299:
/home/root/snapshot/samples/bpf/sockex1_kern.o] Error 1

To make the platform to be known, only set to real triple helps:
--target=arm-linux-gnueabihf
or just drop the target key to use default one. Decision to just drop
it and thus default target will be used, looks better.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
 samples/bpf/Makefile | 2 --
 1 file changed, 2 deletions(-)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 61b7394b811e..a2953357927e 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -197,8 +197,6 @@ BTF_PAHOLE ?= pahole
 ifdef CROSS_COMPILE
 HOSTCC = $(CROSS_COMPILE)gcc
 CLANG_ARCH_ARGS = --target=$(notdir $(CROSS_COMPILE:%-=%))
-else
-CLANG_ARCH_ARGS = -target $(ARCH)
 endif
 
 # Don't evaluate probes and warnings if we need to run make recursively
-- 
2.17.1


^ permalink raw reply related

* [PATCH RFC bpf-next 05/10] samples: bpf: Makefile: use vars from KBUILD_CFLAGS to handle linux headers
From: Ivan Khoronzhuk @ 2019-08-30  0:50 UTC (permalink / raw)
  To: linux, ast, daniel, yhs, davem, jakub.kicinski, hawk,
	john.fastabend
  Cc: linux-arm-kernel, linux-kernel, netdev, bpf, clang-built-linux,
	Ivan Khoronzhuk
In-Reply-To: <20190830005037.24004-1-ivan.khoronzhuk@linaro.org>

The kernel headers are reused from samples bpf, and autoconf.h is not
enough to reflect complete configuration for clang. One of such
configurations is __LINUX_ARM_ARCH__ min version used as instruction
set selector. In another case an error like "SMP is not
supported" for arm and others errors are issued and final object is
not correct.
---
 samples/bpf/Makefile | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index cdd742c05200..9232efa2b1b3 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -186,6 +186,13 @@ HOSTLDLIBS_map_perf_test	+= -lrt
 HOSTLDLIBS_test_overhead	+= -lrt
 HOSTLDLIBS_xdpsock		+= -pthread
 
+# Strip all expet -D options needed to handle linux headers
+# for arm it's __LINUX_ARM_ARCH__ and potentially others fork vars
+D_OPTIONS = $(shell echo "$(KBUILD_CFLAGS) " | sed 's/[[:blank:]]/\n/g' | \
+	sed '/^-D/!d' | tr '\n' ' ')
+
+CLANG_EXTRA_CFLAGS += $(D_OPTIONS)
+
 # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline:
 #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
 LLC ?= llc
-- 
2.17.1


^ permalink raw reply related

* [PATCH RFC bpf-next 04/10] samples: bpf: use own EXTRA_CFLAGS for clang commands
From: Ivan Khoronzhuk @ 2019-08-30  0:50 UTC (permalink / raw)
  To: linux, ast, daniel, yhs, davem, jakub.kicinski, hawk,
	john.fastabend
  Cc: linux-arm-kernel, linux-kernel, netdev, bpf, clang-built-linux,
	Ivan Khoronzhuk
In-Reply-To: <20190830005037.24004-1-ivan.khoronzhuk@linaro.org>

It can overlap with CFLAGS used for libraries built with gcc if
not now then in following patches. Correct it here for simplicity.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
 samples/bpf/Makefile | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index a2953357927e..cdd742c05200 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -219,10 +219,10 @@ BTF_LLVM_PROBE := $(shell echo "int main() { return 0; }" | \
 			  /bin/rm -f ./llvm_btf_verify.o)
 
 ifneq ($(BTF_LLVM_PROBE),)
-	EXTRA_CFLAGS += -g
+	CLANG_EXTRA_CFLAGS += -g
 else
 ifneq ($(and $(BTF_LLC_PROBE),$(BTF_PAHOLE_PROBE),$(BTF_OBJCOPY_PROBE)),)
-	EXTRA_CFLAGS += -g
+	CLANG_EXTRA_CFLAGS += -g
 	LLC_FLAGS += -mattr=dwarfris
 	DWARF2BTF = y
 endif
@@ -281,8 +281,8 @@ $(obj)/hbm_edt_kern.o: $(src)/hbm.h $(src)/hbm_kern.h
 # useless for BPF samples.
 $(obj)/%.o: $(src)/%.c
 	@echo "  CLANG-bpf " $@
-	$(Q)$(CLANG) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) -I$(obj) \
-		-I$(srctree)/tools/testing/selftests/bpf/ \
+	$(Q)$(CLANG) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(CLANG_EXTRA_CFLAGS) \
+		-I$(obj) -I$(srctree)/tools/testing/selftests/bpf/ \
 		-D__KERNEL__ -D__BPF_TRACING__ -Wno-unused-value -Wno-pointer-sign \
 		-D__TARGET_ARCH_$(SRCARCH) -Wno-compare-distinct-pointer-types \
 		-Wno-gnu-variable-sized-type-not-at-end \
-- 
2.17.1


^ permalink raw reply related

* [PATCH RFC bpf-next 06/10] samples: bpf: makefile: fix HDR_PROBE
From: Ivan Khoronzhuk @ 2019-08-30  0:50 UTC (permalink / raw)
  To: linux, ast, daniel, yhs, davem, jakub.kicinski, hawk,
	john.fastabend
  Cc: linux-arm-kernel, linux-kernel, netdev, bpf, clang-built-linux,
	Ivan Khoronzhuk
In-Reply-To: <20190830005037.24004-1-ivan.khoronzhuk@linaro.org>

echo should be replace on echo -e to handle \n correctly, but instead,
replace it on printf as some systems can't handle echo -e.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
 samples/bpf/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 9232efa2b1b3..043f9cc14cdd 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -208,7 +208,7 @@ endif
 
 # Don't evaluate probes and warnings if we need to run make recursively
 ifneq ($(src),)
-HDR_PROBE := $(shell echo "\#include <linux/types.h>\n struct list_head { int a; }; int main() { return 0; }" | \
+HDR_PROBE := $(shell printf "\#include <linux/types.h>\n struct list_head { int a; }; int main() { return 0; }" | \
 	$(HOSTCC) $(KBUILD_HOSTCFLAGS) -x c - -o /dev/null 2>/dev/null && \
 	echo okay)
 
-- 
2.17.1


^ permalink raw reply related

* [PATCH RFC bpf-next 07/10] samples: bpf: add makefile.prog for separate CC build
From: Ivan Khoronzhuk @ 2019-08-30  0:50 UTC (permalink / raw)
  To: linux, ast, daniel, yhs, davem, jakub.kicinski, hawk,
	john.fastabend
  Cc: linux-arm-kernel, linux-kernel, netdev, bpf, clang-built-linux,
	Ivan Khoronzhuk
In-Reply-To: <20190830005037.24004-1-ivan.khoronzhuk@linaro.org>

The HOSTCC is supposed to build binaries and tools running on the host
afterwards, in order to simplify build or so, like "fixdep" or else.
In case of cross compiling "fixdep" is executed on host when the rest
samples should run on target arch. In order to build binaries for
target arch with CC and tools running on host with HOSTCC, lets add
Makefile.prog for simplicity, having definition and routines similar
to ones, used in script/Makefile.host. That allows later add
cross-compilation to samples/bpf with minimum changes.

Makefile.prog contains only stuff needed for samples/bpf, potentially
can be reused and sophisticated for other prog sets later and now
needed only for unblocking tricky samples/bpf cross compilation.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
 samples/bpf/Makefile.prog | 77 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)
 create mode 100644 samples/bpf/Makefile.prog

diff --git a/samples/bpf/Makefile.prog b/samples/bpf/Makefile.prog
new file mode 100644
index 000000000000..d5d02fbb5e6e
--- /dev/null
+++ b/samples/bpf/Makefile.prog
@@ -0,0 +1,77 @@
+# SPDX-License-Identifier: GPL-2.0
+# ==========================================================================
+# Building binaries on the host system
+# Binaries are not used during the compilation of the kernel, and intendent to
+# be build for target board, target board can be host ofc. Added to build
+# binaries to run not on host system.
+#
+# Both C and C++ are supported, but preferred language is C for such utilities.
+#
+# Sample syntax (see Documentation/kbuild/makefiles.rst for reference)
+# progs-y := xdpsock_example
+# Will compile xdpsock_example.c and create an executable named xdpsock_example
+#
+# progs-y    := xdpsock
+# xdpsock-objs := xdpsock_user.o xdpsock_user2.o
+# Will compile xdpsock.c and xdpsock.c, and then link the executable
+# xdpsock, based on xdpsock_user.o and xdpsock_user2.o
+#
+# Inherited from scripts/Makefile.host
+#
+__progs := $(sort $(progs-y))
+
+# C code
+# Executables compiled from a single .c file
+prog-csingle	:= $(foreach m,$(__progs), \
+			$(if $($(m)-objs)$($(m)-cxxobjs),,$(m)))
+
+# C executables linked based on several .o files
+prog-cmulti	:= $(foreach m,$(__progs),\
+		   $(if $($(m)-cxxobjs),,$(if $($(m)-objs),$(m))))
+
+# Object (.o) files compiled from .c files
+prog-cobjs	:= $(sort $(foreach m,$(__progs),$($(m)-objs)))
+
+prog-csingle	:= $(addprefix $(obj)/,$(prog-csingle))
+prog-cmulti	:= $(addprefix $(obj)/,$(prog-cmulti))
+prog-cobjs	:= $(addprefix $(obj)/,$(prog-cobjs))
+
+#####
+# Handle options to gcc. Support building with separate output directory
+
+_progc_flags   = $(PROGS_CFLAGS) \
+                 $(PROGCFLAGS_$(basetarget).o)
+
+# $(objtree)/$(obj) for including generated headers from checkin source files
+ifeq ($(KBUILD_EXTMOD),)
+ifdef building_out_of_srctree
+_progc_flags   += -I $(objtree)/$(obj)
+endif
+endif
+
+progc_flags    = -Wp,-MD,$(depfile) $(_progc_flags)
+
+# Create executable from a single .c file
+# prog-csingle -> Executable
+quiet_cmd_prog-csingle 	= CC  $@
+      cmd_prog-csingle	= $(CC) $(progc_flags) $(PROGS_LDFLAGS) -o $@ $< \
+		$(PROGS_LDLIBS) $(PROGLDLIBS_$(@F))
+$(prog-csingle): $(obj)/%: $(src)/%.c FORCE
+	$(call if_changed_dep,prog-csingle)
+
+# Link an executable based on list of .o files, all plain c
+# prog-cmulti -> executable
+quiet_cmd_prog-cmulti	= LD  $@
+      cmd_prog-cmulti	= $(CC) $(progc_flags) $(PROGS_LDFLAGS) -o $@ \
+			  $(addprefix $(obj)/,$($(@F)-objs)) \
+			  $(PROGS_LDLIBS) $(PROGLDLIBS_$(@F))
+$(prog-cmulti): $(prog-cobjs) FORCE
+	$(call if_changed,prog-cmulti)
+$(call multi_depend, $(prog-cmulti), , -objs)
+
+# Create .o file from a single .c file
+# prog-cobjs -> .o
+quiet_cmd_prog-cobjs	= CC  $@
+      cmd_prog-cobjs	= $(CC) $(progc_flags) -c -o $@ $<
+$(prog-cobjs): $(obj)/%.o: $(src)/%.c FORCE
+	$(call if_changed_dep,prog-cobjs)
-- 
2.17.1


^ permalink raw reply related

* [PATCH RFC bpf-next 08/10] samples: bpf: Makefile: base progs build on Makefile.progs
From: Ivan Khoronzhuk @ 2019-08-30  0:50 UTC (permalink / raw)
  To: linux, ast, daniel, yhs, davem, jakub.kicinski, hawk,
	john.fastabend
  Cc: linux-arm-kernel, linux-kernel, netdev, bpf, clang-built-linux,
	Ivan Khoronzhuk
In-Reply-To: <20190830005037.24004-1-ivan.khoronzhuk@linaro.org>

The main reason for that - HOSTCC and CC have different aims.
It was tested for arm cross compilation, based on linaro toolchain,
but should work for others.

In order to split cross compilation with host build, base bpf samples
on Makefile.progs. I've verified it on arm with adding SYSROOT.
It's also convenient when debug is with NFC.
To cross-compile I've used:

export ARCH=arm
export CROSS_COMPILE=arm-linux-gnueabihf-
make -j4 samples/bpf/ SYSROOT="path/to/sysroot"

Sysroot contains correct headers installed ofc.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
---
 samples/bpf/Makefile   | 164 ++++++++++++++++++++++++-----------------
 samples/bpf/README.rst |   7 ++
 2 files changed, 102 insertions(+), 69 deletions(-)

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 043f9cc14cdd..ed7131851172 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -4,55 +4,53 @@ BPF_SAMPLES_PATH ?= $(abspath $(srctree)/$(src))
 TOOLS_PATH := $(BPF_SAMPLES_PATH)/../../tools
 
 # List of programs to build
-hostprogs-y := test_lru_dist
-hostprogs-y += sock_example
-hostprogs-y += fds_example
-hostprogs-y += sockex1
-hostprogs-y += sockex2
-hostprogs-y += sockex3
-hostprogs-y += tracex1
-hostprogs-y += tracex2
-hostprogs-y += tracex3
-hostprogs-y += tracex4
-hostprogs-y += tracex5
-hostprogs-y += tracex6
-hostprogs-y += tracex7
-hostprogs-y += test_probe_write_user
-hostprogs-y += trace_output
-hostprogs-y += lathist
-hostprogs-y += offwaketime
-hostprogs-y += spintest
-hostprogs-y += map_perf_test
-hostprogs-y += test_overhead
-hostprogs-y += test_cgrp2_array_pin
-hostprogs-y += test_cgrp2_attach
-hostprogs-y += test_cgrp2_sock
-hostprogs-y += test_cgrp2_sock2
-hostprogs-y += xdp1
-hostprogs-y += xdp2
-hostprogs-y += xdp_router_ipv4
-hostprogs-y += test_current_task_under_cgroup
-hostprogs-y += trace_event
-hostprogs-y += sampleip
-hostprogs-y += tc_l2_redirect
-hostprogs-y += lwt_len_hist
-hostprogs-y += xdp_tx_iptunnel
-hostprogs-y += test_map_in_map
-hostprogs-y += per_socket_stats_example
-hostprogs-y += xdp_redirect
-hostprogs-y += xdp_redirect_map
-hostprogs-y += xdp_redirect_cpu
-hostprogs-y += xdp_monitor
-hostprogs-y += xdp_rxq_info
-hostprogs-y += syscall_tp
-hostprogs-y += cpustat
-hostprogs-y += xdp_adjust_tail
-hostprogs-y += xdpsock
-hostprogs-y += xdp_fwd
-hostprogs-y += task_fd_query
-hostprogs-y += xdp_sample_pkts
-hostprogs-y += ibumad
-hostprogs-y += hbm
+progs-y := test_lru_dist
+progs-y += sock_example
+progs-y += fds_example
+progs-y += sockex1
+progs-y += sockex2
+progs-y += sockex3
+progs-y += tracex1
+progs-y += tracex2
+progs-y += tracex3
+progs-y += tracex4
+progs-y += tracex5
+progs-y += tracex6
+progs-y += tracex7
+progs-y += test_probe_write_user
+progs-y += trace_output
+progs-y += lathist
+progs-y += offwaketime
+progs-y += spintest
+progs-y += map_perf_test
+progs-y += test_overhead
+progs-y += test_cgrp2_array_pin
+progs-y += test_cgrp2_attach
+progs-y += test_cgrp2_sock
+progs-y += test_cgrp2_sock2
+progs-y += xdp1
+progs-y += xdp2
+progs-y += xdp_router_ipv4
+progs-y += test_current_task_under_cgroup
+progs-y += trace_event
+progs-y += sampleip
+progs-y += tc_l2_redirect
+progs-y += lwt_len_hist
+progs-y += xdp_tx_iptunnel
+progs-y += test_map_in_map
+progs-y += xdp_redirect_map
+progs-y += xdp_redirect_cpu
+progs-y += xdp_monitor
+progs-y += xdp_rxq_info
+progs-y += syscall_tp
+progs-y += cpustat
+progs-y += xdp_adjust_tail
+progs-y += xdpsock
+progs-y += xdp_fwd
+progs-y += task_fd_query
+progs-y += xdp_sample_pkts
+progs-y += ibumad
+progs-y += hbm
 
 # Libbpf dependencies
 LIBBPF = $(TOOLS_PATH)/lib/bpf/libbpf.a
@@ -111,7 +109,7 @@ ibumad-objs := bpf_load.o ibumad_user.o $(TRACE_HELPERS)
 hbm-objs := bpf_load.o hbm.o $(CGROUP_HELPERS)
 
 # Tell kbuild to always build the programs
-always := $(hostprogs-y)
+always := $(progs-y)
 always += sockex1_kern.o
 always += sockex2_kern.o
 always += sockex3_kern.o
@@ -171,26 +169,51 @@ always += ibumad_kern.o
 always += hbm_out_kern.o
 always += hbm_edt_kern.o
 
-KBUILD_HOSTCFLAGS += -I$(objtree)/usr/include
-KBUILD_HOSTCFLAGS += -I$(srctree)/tools/lib/bpf/
-KBUILD_HOSTCFLAGS += -I$(srctree)/tools/testing/selftests/bpf/
-KBUILD_HOSTCFLAGS += -I$(srctree)/tools/lib/ -I$(srctree)/tools/include
-KBUILD_HOSTCFLAGS += -I$(srctree)/tools/perf
-
-HOSTCFLAGS_bpf_load.o += -I$(objtree)/usr/include -Wno-unused-variable
-
-KBUILD_HOSTLDLIBS		+= $(LIBBPF) -lelf
-HOSTLDLIBS_tracex4		+= -lrt
-HOSTLDLIBS_trace_output	+= -lrt
-HOSTLDLIBS_map_perf_test	+= -lrt
-HOSTLDLIBS_test_overhead	+= -lrt
-HOSTLDLIBS_xdpsock		+= -pthread
-
 # Strip all expet -D options needed to handle linux headers
 # for arm it's __LINUX_ARM_ARCH__ and potentially others fork vars
 D_OPTIONS = $(shell echo "$(KBUILD_CFLAGS) " | sed 's/[[:blank:]]/\n/g' | \
 	sed '/^-D/!d' | tr '\n' ' ')
 
+ifdef SYSROOT
+ccflags-y += --sysroot=${SYSROOT}
+ccflags-y += -I${SYSROOT}/usr/include
+CLANG_EXTRA_CFLAGS := $(ccflags-y)
+PROGS_LDFLAGS := -L${SYSROOT}/usr/lib
+endif
+
+ccflags-y += -I$(srctree)/tools/lib/bpf/
+ccflags-y += -I$(srctree)/tools/testing/selftests/bpf/
+ccflags-y += -I$(srctree)/tools/lib/
+ccflags-y += -I$(srctree)/tools/perf
+
+ccflags-y += $(D_OPTIONS)
+ccflags-y += -Wall
+ccflags-y += -Wmissing-prototypes
+ccflags-y += -Wstrict-prototypes
+ccflags-y += -fomit-frame-pointer
+
+PROGS_CFLAGS := $(ccflags-y)
+
+ccflags-y += -I$(objtree)/usr/include
+ccflags-y += -I$(srctree)/tools/include
+
+PROGCFLAGS_bpf_load.o += -I$(objtree)/usr/include -I$(srctree)/tools/include \
+			 -Wno-unused-variable
+PROGCFLAGS_sampleip_user.o += -I$(srctree)/tools/include
+PROGCFLAGS_task_fd_query_user.o += -I$(srctree)/tools/include
+PROGCFLAGS_trace_event_user.o += -I$(srctree)/tools/include
+PROGCFLAGS_trace_output_user.o += -I$(srctree)/tools/include
+PROGCFLAGS_tracex6_user.o += -I$(srctree)/tools/include
+PROGCFLAGS_xdp_sample_pkts_user.o += -I$(srctree)/tools/include
+PROGCFLAGS_xdpsock_user.o += -I$(srctree)/tools/include
+
+PROGS_LDLIBS			:= $(LIBBPF) -lelf
+PROGLDLIBS_tracex4		+= -lrt
+PROGLDLIBS_trace_output		+= -lrt
+PROGLDLIBS_map_perf_test	+= -lrt
+PROGLDLIBS_test_overhead	+= -lrt
+PROGLDLIBS_xdpsock		+= -pthread
+
 CLANG_EXTRA_CFLAGS += $(D_OPTIONS)
 
 # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline:
@@ -202,15 +225,14 @@ BTF_PAHOLE ?= pahole
 
 # Detect that we're cross compiling and use the cross compiler
 ifdef CROSS_COMPILE
-HOSTCC = $(CROSS_COMPILE)gcc
 CLANG_ARCH_ARGS = --target=$(notdir $(CROSS_COMPILE:%-=%))
 endif
 
 # Don't evaluate probes and warnings if we need to run make recursively
 ifneq ($(src),)
 HDR_PROBE := $(shell printf "\#include <linux/types.h>\n struct list_head { int a; }; int main() { return 0; }" | \
-	$(HOSTCC) $(KBUILD_HOSTCFLAGS) -x c - -o /dev/null 2>/dev/null && \
-	echo okay)
+	$(CC) $(PROGS_CFLAGS) $(PROGS_LDFLAGS) -x c - -o /dev/null \
+	2>/dev/null && echo okay)
 
 ifeq ($(HDR_PROBE),)
 $(warning WARNING: Detected possible issues with include path.)
@@ -246,7 +268,9 @@ clean:
 
 $(LIBBPF): FORCE
 # Fix up variables inherited from Kbuild that tools/ build system won't like
-	$(MAKE) -C $(dir $@) RM='rm -rf' LDFLAGS= srctree=$(BPF_SAMPLES_PATH)/../../ O=
+	$(MAKE) -C $(dir $@) RM='rm -rf' EXTRA_CFLAGS="$(ccflags-y)" \
+		EXTRA_CXXFLAGS="$(ccflags-y)" LDFLAGS=$(PROGS_LDFLAGS) \
+		srctree=$(BPF_SAMPLES_PATH)/../../ O=
 
 $(obj)/syscall_nrs.h:	$(obj)/syscall_nrs.s FORCE
 	$(call filechk,offsets,__SYSCALL_NRS_H__)
@@ -283,6 +307,8 @@ $(obj)/hbm_out_kern.o: $(src)/hbm.h $(src)/hbm_kern.h
 $(obj)/hbm.o: $(src)/hbm.h
 $(obj)/hbm_edt_kern.o: $(src)/hbm.h $(src)/hbm_kern.h
 
+-include $(BPF_SAMPLES_PATH)/Makefile.prog
+
 # asm/sysreg.h - inline assembly used by it is incompatible with llvm.
 # But, there is no easy way to fix it, so just exclude it since it is
 # useless for BPF samples.
diff --git a/samples/bpf/README.rst b/samples/bpf/README.rst
index 5f27e4faca50..6b5e4eace977 100644
--- a/samples/bpf/README.rst
+++ b/samples/bpf/README.rst
@@ -74,3 +74,10 @@ samples for the cross target.
 export ARCH=arm64
 export CROSS_COMPILE="aarch64-linux-gnu-"
 make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
+
+If need to use environment of target board, the SYSROOT also can be set,
+pointing on FS of target board:
+
+make samples/bpf/ LLC=~/git/llvm/build/bin/llc \
+     CLANG=~/git/llvm/build/bin/clang \
+     SYSROOT=~/some_sdk/linux-devkit/sysroots/aarch64-linux-gnu
-- 
2.17.1


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox