Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] net/sched: Make NET_ACT_CT depends on NF_NAT
From: David Miller @ 2019-07-17 19:02 UTC (permalink / raw)
  To: yuehaibing; +Cc: jhs, xiyou.wangcong, jiri, linux-kernel, netdev
In-Reply-To: <20190716071602.27276-1-yuehaibing@huawei.com>

From: YueHaibing <yuehaibing@huawei.com>
Date: Tue, 16 Jul 2019 15:16:02 +0800

> If NF_NAT is m and NET_ACT_CT is y, build fails:
> 
> net/sched/act_ct.o: In function `tcf_ct_act':
> act_ct.c:(.text+0x21ac): undefined reference to `nf_ct_nat_ext_add'
> act_ct.c:(.text+0x229a): undefined reference to `nf_nat_icmp_reply_translation'
> act_ct.c:(.text+0x233a): undefined reference to `nf_nat_setup_info'
> act_ct.c:(.text+0x234a): undefined reference to `nf_nat_alloc_null_binding'
> act_ct.c:(.text+0x237c): undefined reference to `nf_nat_packet'
> 
> Reported-by: Hulk Robot <hulkci@huawei.com>
> Fixes: b57dc7c13ea9 ("net/sched: Introduce action ct")
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>

Applied.

^ permalink raw reply

* Re: [PATCH 2/2] net: apply proc_net_mkdir() harder
From: David Miller @ 2019-07-17 19:04 UTC (permalink / raw)
  To: pablo
  Cc: adobriyan, netdev, netfilter-devel, linux-nfs, j.vosburgh,
	vfalico, andy, kadlec, fw, bfields, chuck.lever
In-Reply-To: <20190716185220.hnlyiievuucdtn7x@salvia>

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Tue, 16 Jul 2019 20:52:20 +0200

> On Sat, Jul 06, 2019 at 07:55:21PM +0300, Alexey Dobriyan wrote:
>> From: "Hallsmark, Per" <Per.Hallsmark@windriver.com>
>> 
>> proc_net_mkdir() should be used to create stuff under /proc/net,
>> so that dentry revalidation kicks in.
>> 
>> See
>> 
>> 	commit 1fde6f21d90f8ba5da3cb9c54ca991ed72696c43
>> 	proc: fix /proc/net/* after setns(2)
>> 
>> 	[added more chunks --adobriyan]
> 
> I don't find this in the tree,

Because there were changes requested.


^ permalink raw reply

* Re: [Patch net v2] net_sched: unset TCQ_F_CAN_BYPASS when adding filters
From: David Miller @ 2019-07-17 19:04 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev, edumazet
In-Reply-To: <20190716205730.19675-1-xiyou.wangcong@gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Tue, 16 Jul 2019 13:57:30 -0700

> For qdisc's that support TC filters and set TCQ_F_CAN_BYPASS,
> notably fq_codel, it makes no sense to let packets bypass the TC
> filters we setup in any scenario, otherwise our packets steering
> policy could not be enforced.
 ...

Eric I think your feedback was addressed, please review to confirm.

Thank you.

^ permalink raw reply

* Re: [PATCH net v3 0/7] net/rds: RDMA fixes
From: David Miller @ 2019-07-17 19:07 UTC (permalink / raw)
  To: gerd.rausch; +Cc: santosh.shilimkar, netdev, linux-rdma, rds-devel
In-Reply-To: <3fd6ddd1-97e2-2c64-1772-b689eb3ee7ba@oracle.com>

From: Gerd Rausch <gerd.rausch@oracle.com>
Date: Tue, 16 Jul 2019 15:28:43 -0700

> A number of net/rds fixes necessary to make "rds_rdma.ko"
> pass some basic Oracle internal tests.

Series applied.

^ permalink raw reply

* Re: [Patch net v2] net_sched: unset TCQ_F_CAN_BYPASS when adding filters
From: Eric Dumazet @ 2019-07-17 19:19 UTC (permalink / raw)
  To: David Miller; +Cc: Cong Wang, netdev
In-Reply-To: <20190717.120446.101023323109287941.davem@davemloft.net>

On Wed, Jul 17, 2019 at 9:04 PM David Miller <davem@davemloft.net> wrote:
>
> From: Cong Wang <xiyou.wangcong@gmail.com>
> Date: Tue, 16 Jul 2019 13:57:30 -0700
>
> > For qdisc's that support TC filters and set TCQ_F_CAN_BYPASS,
> > notably fq_codel, it makes no sense to let packets bypass the TC
> > filters we setup in any scenario, otherwise our packets steering
> > policy could not be enforced.
>  ...
>
> Eric I think your feedback was addressed, please review to confirm.

Yes, this seems good to me, thanks.

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* [PATCH net,v3 1/4] net: openvswitch: rename flow_stats to sw_flow_stats
From: Pablo Neira Ayuso @ 2019-07-17 19:42 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, jiri, jakub.kicinski

There is a flow_stats structure defined in include/net/flow_offload.h
which is placed in the networking core. I think that definition takes
precedence on OVS, so rename flow_stats in OVS to sw_flow_stats.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
OVS compilation breaks here after this patchset since flow_stats
structure is already defined in include/net/flow_offload.h. This patch
is new in this batch.

 net/openvswitch/flow.c       | 8 ++++----
 net/openvswitch/flow.h       | 4 ++--
 net/openvswitch/flow_table.c | 2 +-
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index dca3b1e2acf0..bc89e16e0505 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -59,7 +59,7 @@ u64 ovs_flow_used_time(unsigned long flow_jiffies)
 void ovs_flow_stats_update(struct sw_flow *flow, __be16 tcp_flags,
 			   const struct sk_buff *skb)
 {
-	struct flow_stats *stats;
+	struct sw_flow_stats *stats;
 	unsigned int cpu = smp_processor_id();
 	int len = skb->len + (skb_vlan_tag_present(skb) ? VLAN_HLEN : 0);
 
@@ -87,7 +87,7 @@ void ovs_flow_stats_update(struct sw_flow *flow, __be16 tcp_flags,
 			if (likely(flow->stats_last_writer != -1) &&
 			    likely(!rcu_access_pointer(flow->stats[cpu]))) {
 				/* Try to allocate CPU-specific stats. */
-				struct flow_stats *new_stats;
+				struct sw_flow_stats *new_stats;
 
 				new_stats =
 					kmem_cache_alloc_node(flow_stats_cache,
@@ -134,7 +134,7 @@ void ovs_flow_stats_get(const struct sw_flow *flow,
 
 	/* We open code this to make sure cpu 0 is always considered */
 	for (cpu = 0; cpu < nr_cpu_ids; cpu = cpumask_next(cpu, &flow->cpu_used_mask)) {
-		struct flow_stats *stats = rcu_dereference_ovsl(flow->stats[cpu]);
+		struct sw_flow_stats *stats = rcu_dereference_ovsl(flow->stats[cpu]);
 
 		if (stats) {
 			/* Local CPU may write on non-local stats, so we must
@@ -158,7 +158,7 @@ void ovs_flow_stats_clear(struct sw_flow *flow)
 
 	/* We open code this to make sure cpu 0 is always considered */
 	for (cpu = 0; cpu < nr_cpu_ids; cpu = cpumask_next(cpu, &flow->cpu_used_mask)) {
-		struct flow_stats *stats = ovsl_dereference(flow->stats[cpu]);
+		struct sw_flow_stats *stats = ovsl_dereference(flow->stats[cpu]);
 
 		if (stats) {
 			spin_lock_bh(&stats->lock);
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index 3e2cc2202d66..a5506e2d4b7a 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -194,7 +194,7 @@ struct sw_flow_actions {
 	struct nlattr actions[];
 };
 
-struct flow_stats {
+struct sw_flow_stats {
 	u64 packet_count;		/* Number of packets matched. */
 	u64 byte_count;			/* Number of bytes matched. */
 	unsigned long used;		/* Last used time (in jiffies). */
@@ -216,7 +216,7 @@ struct sw_flow {
 	struct cpumask cpu_used_mask;
 	struct sw_flow_mask *mask;
 	struct sw_flow_actions __rcu *sf_acts;
-	struct flow_stats __rcu *stats[]; /* One for each CPU.  First one
+	struct sw_flow_stats __rcu *stats[]; /* One for each CPU.  First one
 					   * is allocated at flow creation time,
 					   * the rest are allocated on demand
 					   * while holding the 'stats[0].lock'.
diff --git a/net/openvswitch/flow_table.c b/net/openvswitch/flow_table.c
index 988fd8a94e43..651093e33351 100644
--- a/net/openvswitch/flow_table.c
+++ b/net/openvswitch/flow_table.c
@@ -66,7 +66,7 @@ void ovs_flow_mask_key(struct sw_flow_key *dst, const struct sw_flow_key *src,
 struct sw_flow *ovs_flow_alloc(void)
 {
 	struct sw_flow *flow;
-	struct flow_stats *stats;
+	struct sw_flow_stats *stats;
 
 	flow = kmem_cache_zalloc(flow_cache, GFP_KERNEL);
 	if (!flow)
-- 
2.11.0


^ permalink raw reply related

* [PATCH net,v3 2/4] net: flow_offload: remove netns parameter from flow_block_cb_alloc()
From: Pablo Neira Ayuso @ 2019-07-17 19:42 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, jiri, jakub.kicinski
In-Reply-To: <20190717194248.2522-1-pablo@netfilter.org>

No need to annotate the netns on the flow block callback object,
flow_block_cb_is_busy() already checks for used blocks.

Fixes: d63db30c8537 ("net: flow_offload: add flow_block_cb_alloc() and flow_block_cb_free()")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
v3: no changes.

 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c    | 3 +--
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c      | 5 ++---
 drivers/net/ethernet/mscc/ocelot_flower.c           | 3 +--
 drivers/net/ethernet/mscc/ocelot_tc.c               | 2 +-
 drivers/net/ethernet/netronome/nfp/flower/offload.c | 6 ++----
 include/net/flow_offload.h                          | 3 +--
 net/core/flow_offload.c                             | 9 +++------
 net/dsa/slave.c                                     | 2 +-
 8 files changed, 12 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 7245d287633d..2162412073c5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -735,8 +735,7 @@ mlx5e_rep_indr_setup_tc_block(struct net_device *netdev,
 		list_add(&indr_priv->list,
 			 &rpriv->uplink_priv.tc_indr_block_priv_list);
 
-		block_cb = flow_block_cb_alloc(f->net,
-					       mlx5e_rep_indr_setup_block_cb,
+		block_cb = flow_block_cb_alloc(mlx5e_rep_indr_setup_block_cb,
 					       indr_priv, indr_priv,
 					       mlx5e_rep_indr_tc_block_unbind);
 		if (IS_ERR(block_cb)) {
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 4d34d42b3b0e..a469035400cf 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -1610,8 +1610,7 @@ mlxsw_sp_setup_tc_block_flower_bind(struct mlxsw_sp_port *mlxsw_sp_port,
 		acl_block = mlxsw_sp_acl_block_create(mlxsw_sp, f->net);
 		if (!acl_block)
 			return -ENOMEM;
-		block_cb = flow_block_cb_alloc(f->net,
-					       mlxsw_sp_setup_tc_block_cb_flower,
+		block_cb = flow_block_cb_alloc(mlxsw_sp_setup_tc_block_cb_flower,
 					       mlxsw_sp, acl_block,
 					       mlxsw_sp_tc_block_flower_release);
 		if (IS_ERR(block_cb)) {
@@ -1702,7 +1701,7 @@ static int mlxsw_sp_setup_tc_block(struct mlxsw_sp_port *mlxsw_sp_port,
 					  &mlxsw_sp_block_cb_list))
 			return -EBUSY;
 
-		block_cb = flow_block_cb_alloc(f->net, cb, mlxsw_sp_port,
+		block_cb = flow_block_cb_alloc(cb, mlxsw_sp_port,
 					       mlxsw_sp_port, NULL);
 		if (IS_ERR(block_cb))
 			return PTR_ERR(block_cb);
diff --git a/drivers/net/ethernet/mscc/ocelot_flower.c b/drivers/net/ethernet/mscc/ocelot_flower.c
index 7aaddc09c185..6a11aea8b186 100644
--- a/drivers/net/ethernet/mscc/ocelot_flower.c
+++ b/drivers/net/ethernet/mscc/ocelot_flower.c
@@ -323,8 +323,7 @@ int ocelot_setup_tc_block_flower_bind(struct ocelot_port *port,
 		if (!port_block)
 			return -ENOMEM;
 
-		block_cb = flow_block_cb_alloc(f->net,
-					       ocelot_setup_tc_block_cb_flower,
+		block_cb = flow_block_cb_alloc(ocelot_setup_tc_block_cb_flower,
 					       port, port_block,
 					       ocelot_tc_block_unbind);
 		if (IS_ERR(block_cb)) {
diff --git a/drivers/net/ethernet/mscc/ocelot_tc.c b/drivers/net/ethernet/mscc/ocelot_tc.c
index 9e6464ffae5d..abbcb66bf5ac 100644
--- a/drivers/net/ethernet/mscc/ocelot_tc.c
+++ b/drivers/net/ethernet/mscc/ocelot_tc.c
@@ -156,7 +156,7 @@ static int ocelot_setup_tc_block(struct ocelot_port *port,
 		if (flow_block_cb_is_busy(cb, port, &ocelot_block_cb_list))
 			return -EBUSY;
 
-		block_cb = flow_block_cb_alloc(f->net, cb, port, port, NULL);
+		block_cb = flow_block_cb_alloc(cb, port, port, NULL);
 		if (IS_ERR(block_cb))
 			return PTR_ERR(block_cb);
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index faa8ba012a37..93ab0db6c504 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -1318,8 +1318,7 @@ static int nfp_flower_setup_tc_block(struct net_device *netdev,
 					  &nfp_block_cb_list))
 			return -EBUSY;
 
-		block_cb = flow_block_cb_alloc(f->net,
-					       nfp_flower_setup_tc_block_cb,
+		block_cb = flow_block_cb_alloc(nfp_flower_setup_tc_block_cb,
 					       repr, repr, NULL);
 		if (IS_ERR(block_cb))
 			return PTR_ERR(block_cb);
@@ -1424,8 +1423,7 @@ nfp_flower_setup_indr_tc_block(struct net_device *netdev, struct nfp_app *app,
 		cb_priv->app = app;
 		list_add(&cb_priv->list, &priv->indr_block_cb_priv);
 
-		block_cb = flow_block_cb_alloc(f->net,
-					       nfp_flower_setup_indr_block_cb,
+		block_cb = flow_block_cb_alloc(nfp_flower_setup_indr_block_cb,
 					       cb_priv, cb_priv,
 					       nfp_flower_setup_indr_tc_release);
 		if (IS_ERR(block_cb)) {
diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index db337299e81e..aa9b5287b231 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -264,7 +264,6 @@ struct flow_block_offload {
 struct flow_block_cb {
 	struct list_head	driver_list;
 	struct list_head	list;
-	struct net		*net;
 	tc_setup_cb_t		*cb;
 	void			*cb_ident;
 	void			*cb_priv;
@@ -272,7 +271,7 @@ struct flow_block_cb {
 	unsigned int		refcnt;
 };
 
-struct flow_block_cb *flow_block_cb_alloc(struct net *net, tc_setup_cb_t *cb,
+struct flow_block_cb *flow_block_cb_alloc(tc_setup_cb_t *cb,
 					  void *cb_ident, void *cb_priv,
 					  void (*release)(void *cb_priv));
 void flow_block_cb_free(struct flow_block_cb *block_cb);
diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c
index 76f8db3841d7..507de4b48815 100644
--- a/net/core/flow_offload.c
+++ b/net/core/flow_offload.c
@@ -165,7 +165,7 @@ void flow_rule_match_enc_opts(const struct flow_rule *rule,
 }
 EXPORT_SYMBOL(flow_rule_match_enc_opts);
 
-struct flow_block_cb *flow_block_cb_alloc(struct net *net, tc_setup_cb_t *cb,
+struct flow_block_cb *flow_block_cb_alloc(tc_setup_cb_t *cb,
 					  void *cb_ident, void *cb_priv,
 					  void (*release)(void *cb_priv))
 {
@@ -175,7 +175,6 @@ struct flow_block_cb *flow_block_cb_alloc(struct net *net, tc_setup_cb_t *cb,
 	if (!block_cb)
 		return ERR_PTR(-ENOMEM);
 
-	block_cb->net = net;
 	block_cb->cb = cb;
 	block_cb->cb_ident = cb_ident;
 	block_cb->cb_priv = cb_priv;
@@ -200,8 +199,7 @@ struct flow_block_cb *flow_block_cb_lookup(struct flow_block_offload *f,
 	struct flow_block_cb *block_cb;
 
 	list_for_each_entry(block_cb, f->driver_block_list, driver_list) {
-		if (block_cb->net == f->net &&
-		    block_cb->cb == cb &&
+		if (block_cb->cb == cb &&
 		    block_cb->cb_ident == cb_ident)
 			return block_cb;
 	}
@@ -261,8 +259,7 @@ int flow_block_cb_setup_simple(struct flow_block_offload *f,
 		if (flow_block_cb_is_busy(cb, cb_ident, driver_block_list))
 			return -EBUSY;
 
-		block_cb = flow_block_cb_alloc(f->net, cb, cb_ident,
-					       cb_priv, NULL);
+		block_cb = flow_block_cb_alloc(cb, cb_ident, cb_priv, NULL);
 		if (IS_ERR(block_cb))
 			return PTR_ERR(block_cb);
 
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 614c38ece104..6ca9ec58f881 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -967,7 +967,7 @@ static int dsa_slave_setup_tc_block(struct net_device *dev,
 		if (flow_block_cb_is_busy(cb, dev, &dsa_slave_block_cb_list))
 			return -EBUSY;
 
-		block_cb = flow_block_cb_alloc(f->net, cb, dev, dev, NULL);
+		block_cb = flow_block_cb_alloc(cb, dev, dev, NULL);
 		if (IS_ERR(block_cb))
 			return PTR_ERR(block_cb);
 
-- 
2.11.0


^ permalink raw reply related

* [PATCH net,v3 3/4] net: flow_offload: rename tc_setup_cb_t to flow_setup_cb_t
From: Pablo Neira Ayuso @ 2019-07-17 19:42 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, jiri, jakub.kicinski
In-Reply-To: <20190717194248.2522-1-pablo@netfilter.org>

Rename this type definition and adapt users.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
v3: no changes. tc_setup_cb_t is now misleading, so better fix this semantic
    inconsistency in the net tree as Jiri suggests.

 drivers/net/ethernet/mellanox/mlxsw/spectrum.c |  2 +-
 drivers/net/ethernet/mscc/ocelot_tc.c          |  2 +-
 include/net/flow_offload.h                     | 16 ++++++++++------
 include/net/pkt_cls.h                          |  4 ++--
 include/net/sch_generic.h                      |  6 ++----
 net/core/flow_offload.c                        |  9 +++++----
 net/dsa/slave.c                                |  2 +-
 net/sched/cls_api.c                            |  2 +-
 net/sched/cls_bpf.c                            |  2 +-
 net/sched/cls_flower.c                         |  2 +-
 net/sched/cls_matchall.c                       |  2 +-
 net/sched/cls_u32.c                            |  6 +++---
 12 files changed, 29 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index a469035400cf..51cd0b6f1f3e 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -1679,7 +1679,7 @@ static int mlxsw_sp_setup_tc_block(struct mlxsw_sp_port *mlxsw_sp_port,
 				   struct flow_block_offload *f)
 {
 	struct flow_block_cb *block_cb;
-	tc_setup_cb_t *cb;
+	flow_setup_cb_t *cb;
 	bool ingress;
 	int err;
 
diff --git a/drivers/net/ethernet/mscc/ocelot_tc.c b/drivers/net/ethernet/mscc/ocelot_tc.c
index abbcb66bf5ac..fba9512e9ca6 100644
--- a/drivers/net/ethernet/mscc/ocelot_tc.c
+++ b/drivers/net/ethernet/mscc/ocelot_tc.c
@@ -134,7 +134,7 @@ static int ocelot_setup_tc_block(struct ocelot_port *port,
 				 struct flow_block_offload *f)
 {
 	struct flow_block_cb *block_cb;
-	tc_setup_cb_t *cb;
+	flow_setup_cb_t *cb;
 	int err;
 
 	netdev_dbg(port->dev, "tc_block command %d, binder_type %d\n",
diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index aa9b5287b231..98bf3af5c84d 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -3,7 +3,6 @@
 
 #include <linux/kernel.h>
 #include <net/flow_dissector.h>
-#include <net/sch_generic.h>
 
 struct flow_match {
 	struct flow_dissector	*dissector;
@@ -261,23 +260,27 @@ struct flow_block_offload {
 	struct netlink_ext_ack *extack;
 };
 
+enum tc_setup_type;
+typedef int flow_setup_cb_t(enum tc_setup_type type, void *type_data,
+			    void *cb_priv);
+
 struct flow_block_cb {
 	struct list_head	driver_list;
 	struct list_head	list;
-	tc_setup_cb_t		*cb;
+	flow_setup_cb_t		*cb;
 	void			*cb_ident;
 	void			*cb_priv;
 	void			(*release)(void *cb_priv);
 	unsigned int		refcnt;
 };
 
-struct flow_block_cb *flow_block_cb_alloc(tc_setup_cb_t *cb,
+struct flow_block_cb *flow_block_cb_alloc(flow_setup_cb_t *cb,
 					  void *cb_ident, void *cb_priv,
 					  void (*release)(void *cb_priv));
 void flow_block_cb_free(struct flow_block_cb *block_cb);
 
 struct flow_block_cb *flow_block_cb_lookup(struct flow_block_offload *offload,
-					   tc_setup_cb_t *cb, void *cb_ident);
+					   flow_setup_cb_t *cb, void *cb_ident);
 
 void *flow_block_cb_priv(struct flow_block_cb *block_cb);
 void flow_block_cb_incref(struct flow_block_cb *block_cb);
@@ -295,11 +298,12 @@ static inline void flow_block_cb_remove(struct flow_block_cb *block_cb,
 	list_move(&block_cb->list, &offload->cb_list);
 }
 
-bool flow_block_cb_is_busy(tc_setup_cb_t *cb, void *cb_ident,
+bool flow_block_cb_is_busy(flow_setup_cb_t *cb, void *cb_ident,
 			   struct list_head *driver_block_list);
 
 int flow_block_cb_setup_simple(struct flow_block_offload *f,
-			       struct list_head *driver_list, tc_setup_cb_t *cb,
+			       struct list_head *driver_list,
+			       flow_setup_cb_t *cb,
 			       void *cb_ident, void *cb_priv, bool ingress_only);
 
 enum flow_cls_command {
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 841faadceb6e..cee651b76a1f 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -126,14 +126,14 @@ static inline struct Qdisc *tcf_block_q(struct tcf_block *block)
 }
 
 static inline
-int tc_setup_cb_block_register(struct tcf_block *block, tc_setup_cb_t *cb,
+int tc_setup_cb_block_register(struct tcf_block *block, flow_setup_cb_t *cb,
 			       void *cb_priv)
 {
 	return 0;
 }
 
 static inline
-void tc_setup_cb_block_unregister(struct tcf_block *block, tc_setup_cb_t *cb,
+void tc_setup_cb_block_unregister(struct tcf_block *block, flow_setup_cb_t *cb,
 				  void *cb_priv)
 {
 }
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 855167bbc372..9482e060483b 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -15,6 +15,7 @@
 #include <linux/mutex.h>
 #include <net/gen_stats.h>
 #include <net/rtnetlink.h>
+#include <net/flow_offload.h>
 
 struct Qdisc_ops;
 struct qdisc_walker;
@@ -22,9 +23,6 @@ struct tcf_walker;
 struct module;
 struct bpf_flow_keys;
 
-typedef int tc_setup_cb_t(enum tc_setup_type type,
-			  void *type_data, void *cb_priv);
-
 typedef int tc_indr_block_bind_cb_t(struct net_device *dev, void *cb_priv,
 				    enum tc_setup_type type, void *type_data);
 
@@ -313,7 +311,7 @@ struct tcf_proto_ops {
 	void			(*walk)(struct tcf_proto *tp,
 					struct tcf_walker *arg, bool rtnl_held);
 	int			(*reoffload)(struct tcf_proto *tp, bool add,
-					     tc_setup_cb_t *cb, void *cb_priv,
+					     flow_setup_cb_t *cb, void *cb_priv,
 					     struct netlink_ext_ack *extack);
 	void			(*bind_class)(void *, u32, unsigned long);
 	void *			(*tmplt_create)(struct net *net,
diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c
index 507de4b48815..a800fa78d96c 100644
--- a/net/core/flow_offload.c
+++ b/net/core/flow_offload.c
@@ -165,7 +165,7 @@ void flow_rule_match_enc_opts(const struct flow_rule *rule,
 }
 EXPORT_SYMBOL(flow_rule_match_enc_opts);
 
-struct flow_block_cb *flow_block_cb_alloc(tc_setup_cb_t *cb,
+struct flow_block_cb *flow_block_cb_alloc(flow_setup_cb_t *cb,
 					  void *cb_ident, void *cb_priv,
 					  void (*release)(void *cb_priv))
 {
@@ -194,7 +194,7 @@ void flow_block_cb_free(struct flow_block_cb *block_cb)
 EXPORT_SYMBOL(flow_block_cb_free);
 
 struct flow_block_cb *flow_block_cb_lookup(struct flow_block_offload *f,
-					   tc_setup_cb_t *cb, void *cb_ident)
+					   flow_setup_cb_t *cb, void *cb_ident)
 {
 	struct flow_block_cb *block_cb;
 
@@ -226,7 +226,7 @@ unsigned int flow_block_cb_decref(struct flow_block_cb *block_cb)
 }
 EXPORT_SYMBOL(flow_block_cb_decref);
 
-bool flow_block_cb_is_busy(tc_setup_cb_t *cb, void *cb_ident,
+bool flow_block_cb_is_busy(flow_setup_cb_t *cb, void *cb_ident,
 			   struct list_head *driver_block_list)
 {
 	struct flow_block_cb *block_cb;
@@ -243,7 +243,8 @@ EXPORT_SYMBOL(flow_block_cb_is_busy);
 
 int flow_block_cb_setup_simple(struct flow_block_offload *f,
 			       struct list_head *driver_block_list,
-			       tc_setup_cb_t *cb, void *cb_ident, void *cb_priv,
+			       flow_setup_cb_t *cb,
+			       void *cb_ident, void *cb_priv,
 			       bool ingress_only)
 {
 	struct flow_block_cb *block_cb;
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 6ca9ec58f881..d697a64fb564 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -951,7 +951,7 @@ static int dsa_slave_setup_tc_block(struct net_device *dev,
 				    struct flow_block_offload *f)
 {
 	struct flow_block_cb *block_cb;
-	tc_setup_cb_t *cb;
+	flow_setup_cb_t *cb;
 
 	if (f->binder_type == FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS)
 		cb = dsa_slave_setup_tc_block_cb_ig;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 278014e26aec..51fbe6e95a92 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -1514,7 +1514,7 @@ void tcf_block_put(struct tcf_block *block)
 EXPORT_SYMBOL(tcf_block_put);
 
 static int
-tcf_block_playback_offloads(struct tcf_block *block, tc_setup_cb_t *cb,
+tcf_block_playback_offloads(struct tcf_block *block, flow_setup_cb_t *cb,
 			    void *cb_priv, bool add, bool offload_in_use,
 			    struct netlink_ext_ack *extack)
 {
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 691f71830134..3f7a9c02b70c 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -651,7 +651,7 @@ static void cls_bpf_walk(struct tcf_proto *tp, struct tcf_walker *arg,
 	}
 }
 
-static int cls_bpf_reoffload(struct tcf_proto *tp, bool add, tc_setup_cb_t *cb,
+static int cls_bpf_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb,
 			     void *cb_priv, struct netlink_ext_ack *extack)
 {
 	struct cls_bpf_head *head = rtnl_dereference(tp->root);
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 38d6e85693fc..054123742e32 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -1800,7 +1800,7 @@ fl_get_next_hw_filter(struct tcf_proto *tp, struct cls_fl_filter *f, bool add)
 	return NULL;
 }
 
-static int fl_reoffload(struct tcf_proto *tp, bool add, tc_setup_cb_t *cb,
+static int fl_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb,
 			void *cb_priv, struct netlink_ext_ack *extack)
 {
 	struct tcf_block *block = tp->chain->block;
diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
index a30d2f8feb32..455ea2793f9b 100644
--- a/net/sched/cls_matchall.c
+++ b/net/sched/cls_matchall.c
@@ -282,7 +282,7 @@ static void mall_walk(struct tcf_proto *tp, struct tcf_walker *arg,
 	arg->count++;
 }
 
-static int mall_reoffload(struct tcf_proto *tp, bool add, tc_setup_cb_t *cb,
+static int mall_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb,
 			  void *cb_priv, struct netlink_ext_ack *extack)
 {
 	struct cls_mall_head *head = rtnl_dereference(tp->root);
diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index be9e46c77e8b..8614088edd1b 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -1152,7 +1152,7 @@ static void u32_walk(struct tcf_proto *tp, struct tcf_walker *arg,
 }
 
 static int u32_reoffload_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht,
-			       bool add, tc_setup_cb_t *cb, void *cb_priv,
+			       bool add, flow_setup_cb_t *cb, void *cb_priv,
 			       struct netlink_ext_ack *extack)
 {
 	struct tc_cls_u32_offload cls_u32 = {};
@@ -1172,7 +1172,7 @@ static int u32_reoffload_hnode(struct tcf_proto *tp, struct tc_u_hnode *ht,
 }
 
 static int u32_reoffload_knode(struct tcf_proto *tp, struct tc_u_knode *n,
-			       bool add, tc_setup_cb_t *cb, void *cb_priv,
+			       bool add, flow_setup_cb_t *cb, void *cb_priv,
 			       struct netlink_ext_ack *extack)
 {
 	struct tc_u_hnode *ht = rtnl_dereference(n->ht_down);
@@ -1213,7 +1213,7 @@ static int u32_reoffload_knode(struct tcf_proto *tp, struct tc_u_knode *n,
 	return 0;
 }
 
-static int u32_reoffload(struct tcf_proto *tp, bool add, tc_setup_cb_t *cb,
+static int u32_reoffload(struct tcf_proto *tp, bool add, flow_setup_cb_t *cb,
 			 void *cb_priv, struct netlink_ext_ack *extack)
 {
 	struct tc_u_common *tp_c = tp->data;
-- 
2.11.0


^ permalink raw reply related

* [PATCH net,v3 4/4] net: flow_offload: add flow_block structure and use it
From: Pablo Neira Ayuso @ 2019-07-17 19:42 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev, jiri, jakub.kicinski
In-Reply-To: <20190717194248.2522-1-pablo@netfilter.org>

This object stores the flow block callbacks that are attached to this
block. Update flow_block_cb_lookup() to take this new object.

This patch restores the block sharing feature.

Fixes: da3eeb904ff4 ("net: flow_offload: add list handling functions")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
v3: add flow_block_init() - Jiri Pirko.
    rename flow/flow_block/ - Jiri Pirko.
    fix documentation in nft_stats - Jiri Pirko.
    update flow_block_cb_lookup() argument to take flow_block
    and adapt callers - Jiri Pirko.

 drivers/net/ethernet/mellanox/mlx5/core/en_rep.c    |  2 +-
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c      |  8 +++++---
 drivers/net/ethernet/mscc/ocelot_flower.c           |  8 ++++----
 drivers/net/ethernet/mscc/ocelot_tc.c               |  2 +-
 drivers/net/ethernet/netronome/nfp/flower/offload.c |  5 +++--
 include/net/flow_offload.h                          | 12 +++++++++++-
 include/net/netfilter/nf_tables.h                   |  5 +++--
 include/net/sch_generic.h                           |  2 +-
 net/core/flow_offload.c                             |  6 +++---
 net/dsa/slave.c                                     |  2 +-
 net/netfilter/nf_tables_api.c                       |  2 +-
 net/netfilter/nf_tables_offload.c                   |  5 +++--
 net/sched/cls_api.c                                 | 10 +++++++---
 13 files changed, 44 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
index 2162412073c5..7f747cb1a4f4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c
@@ -752,7 +752,7 @@ mlx5e_rep_indr_setup_tc_block(struct net_device *netdev,
 		if (!indr_priv)
 			return -ENOENT;
 
-		block_cb = flow_block_cb_lookup(f,
+		block_cb = flow_block_cb_lookup(f->block,
 						mlx5e_rep_indr_setup_block_cb,
 						indr_priv);
 		if (!block_cb)
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 51cd0b6f1f3e..650638152bbc 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -1604,7 +1604,8 @@ mlxsw_sp_setup_tc_block_flower_bind(struct mlxsw_sp_port *mlxsw_sp_port,
 	bool register_block = false;
 	int err;
 
-	block_cb = flow_block_cb_lookup(f, mlxsw_sp_setup_tc_block_cb_flower,
+	block_cb = flow_block_cb_lookup(f->block,
+					mlxsw_sp_setup_tc_block_cb_flower,
 					mlxsw_sp);
 	if (!block_cb) {
 		acl_block = mlxsw_sp_acl_block_create(mlxsw_sp, f->net);
@@ -1656,7 +1657,8 @@ mlxsw_sp_setup_tc_block_flower_unbind(struct mlxsw_sp_port *mlxsw_sp_port,
 	struct flow_block_cb *block_cb;
 	int err;
 
-	block_cb = flow_block_cb_lookup(f, mlxsw_sp_setup_tc_block_cb_flower,
+	block_cb = flow_block_cb_lookup(f->block,
+					mlxsw_sp_setup_tc_block_cb_flower,
 					mlxsw_sp);
 	if (!block_cb)
 		return;
@@ -1717,7 +1719,7 @@ static int mlxsw_sp_setup_tc_block(struct mlxsw_sp_port *mlxsw_sp_port,
 	case FLOW_BLOCK_UNBIND:
 		mlxsw_sp_setup_tc_block_flower_unbind(mlxsw_sp_port,
 						      f, ingress);
-		block_cb = flow_block_cb_lookup(f, cb, mlxsw_sp_port);
+		block_cb = flow_block_cb_lookup(f->block, cb, mlxsw_sp_port);
 		if (!block_cb)
 			return -ENOENT;
 
diff --git a/drivers/net/ethernet/mscc/ocelot_flower.c b/drivers/net/ethernet/mscc/ocelot_flower.c
index 6a11aea8b186..59487d446a09 100644
--- a/drivers/net/ethernet/mscc/ocelot_flower.c
+++ b/drivers/net/ethernet/mscc/ocelot_flower.c
@@ -316,8 +316,8 @@ int ocelot_setup_tc_block_flower_bind(struct ocelot_port *port,
 	if (f->binder_type == FLOW_BLOCK_BINDER_TYPE_CLSACT_EGRESS)
 		return -EOPNOTSUPP;
 
-	block_cb = flow_block_cb_lookup(f, ocelot_setup_tc_block_cb_flower,
-					port);
+	block_cb = flow_block_cb_lookup(f->block,
+					ocelot_setup_tc_block_cb_flower, port);
 	if (!block_cb) {
 		port_block = ocelot_port_block_create(port);
 		if (!port_block)
@@ -350,8 +350,8 @@ void ocelot_setup_tc_block_flower_unbind(struct ocelot_port *port,
 {
 	struct flow_block_cb *block_cb;
 
-	block_cb = flow_block_cb_lookup(f, ocelot_setup_tc_block_cb_flower,
-					port);
+	block_cb = flow_block_cb_lookup(f->block,
+					ocelot_setup_tc_block_cb_flower, port);
 	if (!block_cb)
 		return;
 
diff --git a/drivers/net/ethernet/mscc/ocelot_tc.c b/drivers/net/ethernet/mscc/ocelot_tc.c
index fba9512e9ca6..16a6db71ca5e 100644
--- a/drivers/net/ethernet/mscc/ocelot_tc.c
+++ b/drivers/net/ethernet/mscc/ocelot_tc.c
@@ -169,7 +169,7 @@ static int ocelot_setup_tc_block(struct ocelot_port *port,
 		list_add_tail(&block_cb->driver_list, f->driver_block_list);
 		return 0;
 	case FLOW_BLOCK_UNBIND:
-		block_cb = flow_block_cb_lookup(f, cb, port);
+		block_cb = flow_block_cb_lookup(f->block, cb, port);
 		if (!block_cb)
 			return -ENOENT;
 
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 93ab0db6c504..e209f150c5f2 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -1327,7 +1327,8 @@ static int nfp_flower_setup_tc_block(struct net_device *netdev,
 		list_add_tail(&block_cb->driver_list, &nfp_block_cb_list);
 		return 0;
 	case FLOW_BLOCK_UNBIND:
-		block_cb = flow_block_cb_lookup(f, nfp_flower_setup_tc_block_cb,
+		block_cb = flow_block_cb_lookup(f->block,
+						nfp_flower_setup_tc_block_cb,
 						repr);
 		if (!block_cb)
 			return -ENOENT;
@@ -1440,7 +1441,7 @@ nfp_flower_setup_indr_tc_block(struct net_device *netdev, struct nfp_app *app,
 		if (!cb_priv)
 			return -ENOENT;
 
-		block_cb = flow_block_cb_lookup(f,
+		block_cb = flow_block_cb_lookup(f->block,
 						nfp_flower_setup_indr_block_cb,
 						cb_priv);
 		if (!block_cb)
diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 98bf3af5c84d..2b555ba78a02 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -248,6 +248,10 @@ enum flow_block_binder_type {
 	FLOW_BLOCK_BINDER_TYPE_CLSACT_EGRESS,
 };
 
+struct flow_block {
+	struct list_head cb_list;
+};
+
 struct netlink_ext_ack;
 
 struct flow_block_offload {
@@ -255,6 +259,7 @@ struct flow_block_offload {
 	enum flow_block_binder_type binder_type;
 	bool block_shared;
 	struct net *net;
+	struct flow_block *block;
 	struct list_head cb_list;
 	struct list_head *driver_block_list;
 	struct netlink_ext_ack *extack;
@@ -279,7 +284,7 @@ struct flow_block_cb *flow_block_cb_alloc(flow_setup_cb_t *cb,
 					  void (*release)(void *cb_priv));
 void flow_block_cb_free(struct flow_block_cb *block_cb);
 
-struct flow_block_cb *flow_block_cb_lookup(struct flow_block_offload *offload,
+struct flow_block_cb *flow_block_cb_lookup(struct flow_block *block,
 					   flow_setup_cb_t *cb, void *cb_ident);
 
 void *flow_block_cb_priv(struct flow_block_cb *block_cb);
@@ -336,4 +341,9 @@ flow_cls_offload_flow_rule(struct flow_cls_offload *flow_cmd)
 	return flow_cmd->rule;
 }
 
+static inline void flow_block_init(struct flow_block *flow_block)
+{
+	INIT_LIST_HEAD(&flow_block->cb_list);
+}
+
 #endif /* _NET_FLOW_OFFLOAD_H */
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 35dfdd9f69b3..9b624566b82d 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -11,6 +11,7 @@
 #include <linux/rhashtable.h>
 #include <net/netfilter/nf_flow_table.h>
 #include <net/netlink.h>
+#include <net/flow_offload.h>
 
 struct module;
 
@@ -951,7 +952,7 @@ struct nft_stats {
  *	@stats: per-cpu chain stats
  *	@chain: the chain
  *	@dev_name: device name that this base chain is attached to (if any)
- *	@cb_list: list of flow block callbacks (for hardware offload)
+ *	@flow_block: flow block (for hardware offload)
  */
 struct nft_base_chain {
 	struct nf_hook_ops		ops;
@@ -961,7 +962,7 @@ struct nft_base_chain {
 	struct nft_stats __percpu	*stats;
 	struct nft_chain		chain;
 	char 				dev_name[IFNAMSIZ];
-	struct list_head		cb_list;
+	struct flow_block		flow_block;
 };
 
 static inline struct nft_base_chain *nft_base_chain(const struct nft_chain *chain)
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 9482e060483b..6b6b01234dd9 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -399,7 +399,7 @@ struct tcf_block {
 	refcount_t refcnt;
 	struct net *net;
 	struct Qdisc *q;
-	struct list_head cb_list;
+	struct flow_block flow_block;
 	struct list_head owner_list;
 	bool keep_dst;
 	unsigned int offloadcnt; /* Number of oddloaded filters */
diff --git a/net/core/flow_offload.c b/net/core/flow_offload.c
index a800fa78d96c..d63b970784dc 100644
--- a/net/core/flow_offload.c
+++ b/net/core/flow_offload.c
@@ -193,12 +193,12 @@ void flow_block_cb_free(struct flow_block_cb *block_cb)
 }
 EXPORT_SYMBOL(flow_block_cb_free);
 
-struct flow_block_cb *flow_block_cb_lookup(struct flow_block_offload *f,
+struct flow_block_cb *flow_block_cb_lookup(struct flow_block *block,
 					   flow_setup_cb_t *cb, void *cb_ident)
 {
 	struct flow_block_cb *block_cb;
 
-	list_for_each_entry(block_cb, f->driver_block_list, driver_list) {
+	list_for_each_entry(block_cb, &block->cb_list, list) {
 		if (block_cb->cb == cb &&
 		    block_cb->cb_ident == cb_ident)
 			return block_cb;
@@ -268,7 +268,7 @@ int flow_block_cb_setup_simple(struct flow_block_offload *f,
 		list_add_tail(&block_cb->driver_list, driver_block_list);
 		return 0;
 	case FLOW_BLOCK_UNBIND:
-		block_cb = flow_block_cb_lookup(f, cb, cb_ident);
+		block_cb = flow_block_cb_lookup(f->block, cb, cb_ident);
 		if (!block_cb)
 			return -ENOENT;
 
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index d697a64fb564..33f41178afcc 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -975,7 +975,7 @@ static int dsa_slave_setup_tc_block(struct net_device *dev,
 		list_add_tail(&block_cb->driver_list, &dsa_slave_block_cb_list);
 		return 0;
 	case FLOW_BLOCK_UNBIND:
-		block_cb = flow_block_cb_lookup(f, cb, dev);
+		block_cb = flow_block_cb_lookup(f->block, cb, dev);
 		if (!block_cb)
 			return -ENOENT;
 
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 014e06b0b5cf..605a7cfe7ca7 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1662,7 +1662,7 @@ static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask,
 
 		chain->flags |= NFT_BASE_CHAIN | flags;
 		basechain->policy = NF_ACCEPT;
-		INIT_LIST_HEAD(&basechain->cb_list);
+		flow_block_init(&basechain->flow_block);
 	} else {
 		chain = kzalloc(sizeof(*chain), GFP_KERNEL);
 		if (chain == NULL)
diff --git a/net/netfilter/nf_tables_offload.c b/net/netfilter/nf_tables_offload.c
index 2c3302845f67..64f5fd5f240e 100644
--- a/net/netfilter/nf_tables_offload.c
+++ b/net/netfilter/nf_tables_offload.c
@@ -116,7 +116,7 @@ static int nft_setup_cb_call(struct nft_base_chain *basechain,
 	struct flow_block_cb *block_cb;
 	int err;
 
-	list_for_each_entry(block_cb, &basechain->cb_list, list) {
+	list_for_each_entry(block_cb, &basechain->flow_block.cb_list, list) {
 		err = block_cb->cb(type, type_data, block_cb->cb_priv);
 		if (err < 0)
 			return err;
@@ -154,7 +154,7 @@ static int nft_flow_offload_rule(struct nft_trans *trans,
 static int nft_flow_offload_bind(struct flow_block_offload *bo,
 				 struct nft_base_chain *basechain)
 {
-	list_splice(&bo->cb_list, &basechain->cb_list);
+	list_splice(&bo->cb_list, &basechain->flow_block.cb_list);
 	return 0;
 }
 
@@ -198,6 +198,7 @@ static int nft_flow_offload_chain(struct nft_trans *trans,
 		return -EOPNOTSUPP;
 
 	bo.command = cmd;
+	bo.block = &basechain->flow_block;
 	bo.binder_type = FLOW_BLOCK_BINDER_TYPE_CLSACT_INGRESS;
 	bo.extack = &extack;
 	INIT_LIST_HEAD(&bo.cb_list);
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 51fbe6e95a92..3b9720fe1e60 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -691,6 +691,8 @@ static void tc_indr_block_ing_cmd(struct tc_indr_block_dev *indr_dev,
 	if (!indr_dev->block)
 		return;
 
+	bo.block = &indr_dev->block->flow_block;
+
 	indr_block_cb->cb(indr_dev->dev, indr_block_cb->cb_priv, TC_SETUP_BLOCK,
 			  &bo);
 	tcf_block_setup(indr_dev->block, &bo);
@@ -775,6 +777,7 @@ static void tc_indr_block_call(struct tcf_block *block, struct net_device *dev,
 		.command	= command,
 		.binder_type	= ei->binder_type,
 		.net		= dev_net(dev),
+		.block		= &block->flow_block,
 		.block_shared	= tcf_block_shared(block),
 		.extack		= extack,
 	};
@@ -810,6 +813,7 @@ static int tcf_block_offload_cmd(struct tcf_block *block,
 	bo.net = dev_net(dev);
 	bo.command = command;
 	bo.binder_type = ei->binder_type;
+	bo.block = &block->flow_block;
 	bo.block_shared = tcf_block_shared(block);
 	bo.extack = extack;
 	INIT_LIST_HEAD(&bo.cb_list);
@@ -987,8 +991,8 @@ static struct tcf_block *tcf_block_create(struct net *net, struct Qdisc *q,
 		return ERR_PTR(-ENOMEM);
 	}
 	mutex_init(&block->lock);
+	flow_block_init(&block->flow_block);
 	INIT_LIST_HEAD(&block->chain_list);
-	INIT_LIST_HEAD(&block->cb_list);
 	INIT_LIST_HEAD(&block->owner_list);
 	INIT_LIST_HEAD(&block->chain0.filter_chain_list);
 
@@ -1570,7 +1574,7 @@ static int tcf_block_bind(struct tcf_block *block,
 
 		i++;
 	}
-	list_splice(&bo->cb_list, &block->cb_list);
+	list_splice(&bo->cb_list, &block->flow_block.cb_list);
 
 	return 0;
 
@@ -3155,7 +3159,7 @@ int tc_setup_cb_call(struct tcf_block *block, enum tc_setup_type type,
 	if (block->nooffloaddevcnt && err_stop)
 		return -EOPNOTSUPP;
 
-	list_for_each_entry(block_cb, &block->cb_list, list) {
+	list_for_each_entry(block_cb, &block->flow_block.cb_list, list) {
 		err = block_cb->cb(type, type_data, block_cb->cb_priv);
 		if (err) {
 			if (err_stop)
-- 
2.11.0


^ permalink raw reply related

* [PATCHv2] net: ag71xx: Add missing header
From: Rosen Penev @ 2019-07-17 19:46 UTC (permalink / raw)
  To: netdev

ag71xx uses devm_ioremap_nocache. This fixes usage of an implicit function

Fixes: d51b6ce441d356369387d20bc1de5f2edb0ab71e

Signed-off-by: Rosen Penev <rosenp@gmail.com>
---
 v2: Target net instead of net-next
 drivers/net/ethernet/atheros/ag71xx.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/atheros/ag71xx.c b/drivers/net/ethernet/atheros/ag71xx.c
index 72a57c6cd254..8f450a03a885 100644
--- a/drivers/net/ethernet/atheros/ag71xx.c
+++ b/drivers/net/ethernet/atheros/ag71xx.c
@@ -35,6 +35,7 @@
 #include <linux/regmap.h>
 #include <linux/reset.h>
 #include <linux/clk.h>
+#include <linux/io.h>
 
 /* For our NAPI weight bigger does *NOT* mean better - it means more
  * D-cache misses and lots more wasted cycles than we'll ever
-- 
2.17.1


^ permalink raw reply related

* regression with napi/softirq ?
From: Sudip Mukherjee @ 2019-07-17 20:19 UTC (permalink / raw)
  To: peterz, davem; +Cc: netdev, linux-kernel

Hi All,

I am using v4.14.55 on an Intel Atom based board and I am seeing network
packet drops frequently on wireshark logs. After lots of debugging it
seems that when this happens softirq is taking huge time to start after
it has been raised. This is a small snippet from ftrace:

           <...>-2110  [001] dNH1   466.634916: irq_handler_entry: irq=126 name=eth0-TxRx-0
           <...>-2110  [001] dNH1   466.634917: softirq_raise: vec=3 [action=NET_RX]
           <...>-2110  [001] dNH1   466.634918: irq_handler_exit: irq=126 ret=handled
     ksoftirqd/1-15    [001] ..s.   466.635826: softirq_entry: vec=3 [action=NET_RX]
     ksoftirqd/1-15    [001] ..s.   466.635852: softirq_exit: vec=3 [action=NET_RX]
     ksoftirqd/1-15    [001] d.H.   466.635856: irq_handler_entry: irq=126 name=eth0-TxRx-0
     ksoftirqd/1-15    [001] d.H.   466.635857: softirq_raise: vec=3 [action=NET_RX]
     ksoftirqd/1-15    [001] d.H.   466.635858: irq_handler_exit: irq=126 ret=handled
     ksoftirqd/1-15    [001] ..s.   466.635860: softirq_entry: vec=3 [action=NET_RX]
     ksoftirqd/1-15    [001] ..s.   466.635863: softirq_exit: vec=3 [action=NET_RX]

So, softirq was raised at 466.634917 but it started at 466.635826 almost
909 usec after it was raised.

If I move back to v4.4 kernel I still see similar behaviour but the maximum
delay I get is in the range of 500usec. But if I move back to v3.8 kernel I
can see there is no packet loss and the maximum delay between softirq_raise
and irq_handler_entry is 103usec.

Is this a known issue?
Will really appreciate your help in this problem.

--
Regards
Sudip

^ permalink raw reply

* [PATCH net 0/2] mlxsw: Two fixes
From: Ido Schimmel @ 2019-07-17 20:29 UTC (permalink / raw)
  To: netdev; +Cc: davem, jiri, petrm, mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

This patchset contains two fixes for mlxsw.

Patch #1 from Petr fixes an issue in which DSCP rewrite can occur even
if the egress port was switched to Trust L2 mode where priority mapping
is based on PCP.

Patch #2 fixes a problem where packets can be learned on a non-existing
FID if a tc filter with a redirect action is configured on a bridged
port. The problem and fix are explained in detail in the commit message.

Please consider both patches for 5.2.y

Ido Schimmel (1):
  mlxsw: spectrum: Do not process learned records with a dummy FID

Petr Machata (1):
  mlxsw: spectrum_dcb: Configure DSCP map as the last rule is removed

 drivers/net/ethernet/mellanox/mlxsw/spectrum.h   |  1 +
 .../net/ethernet/mellanox/mlxsw/spectrum_dcb.c   | 16 ++++++++--------
 .../net/ethernet/mellanox/mlxsw/spectrum_fid.c   | 10 ++++++++++
 .../ethernet/mellanox/mlxsw/spectrum_switchdev.c |  6 ++++++
 4 files changed, 25 insertions(+), 8 deletions(-)

-- 
2.21.0

^ permalink raw reply

* [PATCH net 2/2] mlxsw: spectrum: Do not process learned records with a dummy FID
From: Ido Schimmel @ 2019-07-17 20:29 UTC (permalink / raw)
  To: netdev; +Cc: davem, jiri, petrm, mlxsw, Ido Schimmel
In-Reply-To: <20190717202908.1547-1-idosch@idosch.org>

From: Ido Schimmel <idosch@mellanox.com>

The switch periodically sends notifications about learned FDB entries.
Among other things, the notification includes the FID (Filtering
Identifier) and the port on which the MAC was learned.

In case the driver does not have the FID defined on the relevant port,
the following error will be periodically generated:

mlxsw_spectrum2 0000:06:00.0 swp32: Failed to find a matching {Port, VID} following FDB notification

This is not supposed to happen under normal conditions, but can happen
if an ingress tc filter with a redirect action is installed on a bridged
port. The redirect action will cause the packet's FID to be changed to
the dummy FID and a learning notification will be emitted with this FID
- which is not defined on the bridged port.

Fix this by having the driver ignore learning notifications generated
with the dummy FID and delete them from the device.

Another option is to chain an ignore action after the redirect action
which will cause the device to disable learning, but this means that we
need to consume another action whenever a redirect action is used. In
addition, the scenario described above is merely a corner case.

Fixes: cedbb8b25948 ("mlxsw: spectrum_flower: Set dummy FID before forward action")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alex Kushnarov <alexanderk@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Alex Kushnarov <alexanderk@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h         |  1 +
 drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c     | 10 ++++++++++
 .../net/ethernet/mellanox/mlxsw/spectrum_switchdev.c   |  6 ++++++
 3 files changed, 17 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index a252b080dda9..131f62ce9297 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -830,6 +830,7 @@ int mlxsw_sp_setup_tc_prio(struct mlxsw_sp_port *mlxsw_sp_port,
 			   struct tc_prio_qopt_offload *p);
 
 /* spectrum_fid.c */
+bool mlxsw_sp_fid_is_dummy(struct mlxsw_sp *mlxsw_sp, u16 fid_index);
 bool mlxsw_sp_fid_lag_vid_valid(const struct mlxsw_sp_fid *fid);
 struct mlxsw_sp_fid *mlxsw_sp_fid_lookup_by_index(struct mlxsw_sp *mlxsw_sp,
 						  u16 fid_index);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
index 46baf3b44309..8df3cb21baa6 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c
@@ -126,6 +126,16 @@ static const int *mlxsw_sp_packet_type_sfgc_types[] = {
 	[MLXSW_SP_FLOOD_TYPE_MC]	= mlxsw_sp_sfgc_mc_packet_types,
 };
 
+bool mlxsw_sp_fid_is_dummy(struct mlxsw_sp *mlxsw_sp, u16 fid_index)
+{
+	enum mlxsw_sp_fid_type fid_type = MLXSW_SP_FID_TYPE_DUMMY;
+	struct mlxsw_sp_fid_family *fid_family;
+
+	fid_family = mlxsw_sp->fid_core->fid_family_arr[fid_type];
+
+	return fid_family->start_index == fid_index;
+}
+
 bool mlxsw_sp_fid_lag_vid_valid(const struct mlxsw_sp_fid *fid)
 {
 	return fid->fid_family->lag_vid_valid;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 50111f228d77..5ecb45118400 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -2468,6 +2468,9 @@ static void mlxsw_sp_fdb_notify_mac_process(struct mlxsw_sp *mlxsw_sp,
 		goto just_remove;
 	}
 
+	if (mlxsw_sp_fid_is_dummy(mlxsw_sp, fid))
+		goto just_remove;
+
 	mlxsw_sp_port_vlan = mlxsw_sp_port_vlan_find_by_fid(mlxsw_sp_port, fid);
 	if (!mlxsw_sp_port_vlan) {
 		netdev_err(mlxsw_sp_port->dev, "Failed to find a matching {Port, VID} following FDB notification\n");
@@ -2527,6 +2530,9 @@ static void mlxsw_sp_fdb_notify_mac_lag_process(struct mlxsw_sp *mlxsw_sp,
 		goto just_remove;
 	}
 
+	if (mlxsw_sp_fid_is_dummy(mlxsw_sp, fid))
+		goto just_remove;
+
 	mlxsw_sp_port_vlan = mlxsw_sp_port_vlan_find_by_fid(mlxsw_sp_port, fid);
 	if (!mlxsw_sp_port_vlan) {
 		netdev_err(mlxsw_sp_port->dev, "Failed to find a matching {Port, VID} following FDB notification\n");
-- 
2.21.0


^ permalink raw reply related

* [PATCH net 1/2] mlxsw: spectrum_dcb: Configure DSCP map as the last rule is removed
From: Ido Schimmel @ 2019-07-17 20:29 UTC (permalink / raw)
  To: netdev; +Cc: davem, jiri, petrm, mlxsw, Ido Schimmel
In-Reply-To: <20190717202908.1547-1-idosch@idosch.org>

From: Petr Machata <petrm@mellanox.com>

Spectrum systems use DSCP rewrite map to update DSCP field in egressing
packets to correspond to priority that the packet has. Whether rewriting
will take place is determined at the point when the packet ingresses the
switch: if the port is in Trust L3 mode, packet priority is determined from
the DSCP map at the port, and DSCP rewrite will happen. If the port is in
Trust L2 mode, 802.1p is used for packet prioritization, and no DSCP
rewrite will happen.

The driver determines the port trust mode based on whether any DSCP
prioritization rules are in effect at given port. If there are any, trust
level is L3, otherwise it's L2. When the last DSCP rule is removed, the
port is switched to trust L2. Under that scenario, if DSCP of a packet
should be rewritten, it should be rewritten to 0.

However, when switching to Trust L2, the driver neglects to also update the
DSCP rewrite map. The last DSCP rule thus remains in effect, and packets
egressing through this port, if they have the right priority, will have
their DSCP set according to this rule.

Fix by first configuring the rewrite map, and only then switching to trust
L2 and bailing out.

Fixes: b2b1dab6884e ("mlxsw: spectrum: Support ieee_setapp, ieee_delapp")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reported-by: Alex Veber <alexve@mellanox.com>
Tested-by: Alex Veber <alexve@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 .../net/ethernet/mellanox/mlxsw/spectrum_dcb.c   | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_dcb.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_dcb.c
index b25048c6c761..21296fa7f7fb 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_dcb.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_dcb.c
@@ -408,14 +408,6 @@ static int mlxsw_sp_port_dcb_app_update(struct mlxsw_sp_port *mlxsw_sp_port)
 	have_dscp = mlxsw_sp_port_dcb_app_prio_dscp_map(mlxsw_sp_port,
 							&prio_map);

-	if (!have_dscp) {
-		err = mlxsw_sp_port_dcb_toggle_trust(mlxsw_sp_port,
-					MLXSW_REG_QPTS_TRUST_STATE_PCP);
-		if (err)
-			netdev_err(mlxsw_sp_port->dev, "Couldn't switch to trust L2\n");
-		return err;
-	}
-
 	mlxsw_sp_port_dcb_app_dscp_prio_map(mlxsw_sp_port, default_prio,
 					    &dscp_map);
 	err = mlxsw_sp_port_dcb_app_update_qpdpm(mlxsw_sp_port,
@@ -432,6 +424,14 @@ static int mlxsw_sp_port_dcb_app_update(struct mlxsw_sp_port *mlxsw_sp_port)
 		return err;
 	}

+	if (!have_dscp) {
+		err = mlxsw_sp_port_dcb_toggle_trust(mlxsw_sp_port,
+					MLXSW_REG_QPTS_TRUST_STATE_PCP);
+		if (err)
+			netdev_err(mlxsw_sp_port->dev, "Couldn't switch to trust L2\n");
+		return err;
+	}
+
 	err = mlxsw_sp_port_dcb_toggle_trust(mlxsw_sp_port,
 					     MLXSW_REG_QPTS_TRUST_STATE_DSCP);
 	if (err) {
-- 
2.21.0

^ permalink raw reply related

* Re: [Patch net v2] net_sched: unset TCQ_F_CAN_BYPASS when adding filters
From: David Miller @ 2019-07-17 20:34 UTC (permalink / raw)
  To: edumazet; +Cc: xiyou.wangcong, netdev
In-Reply-To: <CANn89iKKidoeN8fwZLcem8BRK1FTJptwfYkO3Jn61ya7PKQLJA@mail.gmail.com>

From: Eric Dumazet <edumazet@google.com>
Date: Wed, 17 Jul 2019 21:19:51 +0200

> On Wed, Jul 17, 2019 at 9:04 PM David Miller <davem@davemloft.net> wrote:
>>
>> From: Cong Wang <xiyou.wangcong@gmail.com>
>> Date: Tue, 16 Jul 2019 13:57:30 -0700
>>
>> > For qdisc's that support TC filters and set TCQ_F_CAN_BYPASS,
>> > notably fq_codel, it makes no sense to let packets bypass the TC
>> > filters we setup in any scenario, otherwise our packets steering
>> > policy could not be enforced.
>>  ...
>>
>> Eric I think your feedback was addressed, please review to confirm.
> 
> Yes, this seems good to me, thanks.
> 
> Reviewed-by: Eric Dumazet <edumazet@google.com>

Great, applied and queued up for -stable.

^ permalink raw reply

* [PATCH net] ipv6: Unlink sibling route in case of failure
From: Ido Schimmel @ 2019-07-17 20:39 UTC (permalink / raw)
  To: netdev; +Cc: davem, dsahern, alexpe, mlxsw, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

When a route needs to be appended to an existing multipath route,
fib6_add_rt2node() first appends it to the siblings list and increments
the number of sibling routes on each sibling.

Later, the function notifies the route via call_fib6_entry_notifiers().
In case the notification is vetoed, the route is not unlinked from the
siblings list, which can result in a use-after-free.

Fix this by unlinking the route from the siblings list before returning
an error.

Audited the rest of the call sites from which the FIB notification chain
is called and could not find more problems.

Fixes: 2233000cba40 ("net/ipv6: Move call_fib6_entry_notifiers up for route adds")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
---
Dave, this will not apply cleanly to stable trees due to recent changes
in net-next. I can prepare another patch for stable if needed.
---
 net/ipv6/ip6_fib.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 49884f96232b..87f47bc55c5e 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1151,8 +1151,24 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct fib6_info *rt,
 			err = call_fib6_entry_notifiers(info->nl_net,
 							FIB_EVENT_ENTRY_ADD,
 							rt, extack);
-			if (err)
+			if (err) {
+				struct fib6_info *sibling, *next_sibling;
+
+				/* If the route has siblings, then it first
+				 * needs to be unlinked from them.
+				 */
+				if (!rt->fib6_nsiblings)
+					return err;
+
+				list_for_each_entry_safe(sibling, next_sibling,
+							 &rt->fib6_siblings,
+							 fib6_siblings)
+					sibling->fib6_nsiblings--;
+				rt->fib6_nsiblings = 0;
+				list_del_init(&rt->fib6_siblings);
+				rt6_multipath_rebalance(next_sibling);
 				return err;
+			}
 		}
 
 		rcu_assign_pointer(rt->fib6_next, iter);
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH bpf] bpf: fix narrower loads on s390
From: Ilya Leoshkevich @ 2019-07-17 20:51 UTC (permalink / raw)
  To: Y Song; +Cc: bpf, netdev, gor, heiko.carstens
In-Reply-To: <CAH3MdRV-qsJnyZVV1GnxRZ4=3KXTvKSgETp90fyevxycmAiHmA@mail.gmail.com>

> Am 17.07.2019 um 18:25 schrieb Y Song <ys114321@gmail.com>:
> 
> On Wed, Jul 17, 2019 at 3:36 AM Ilya Leoshkevich <iii@linux.ibm.com> wrote:
>> 
>> 
>> Here is a better one: len=0x11223344 and we would like to do
>> ((u8 *)&len)[3].
>> 
>> len is represented as `11 22 33 44` in memory, so the desired result is
>> 0x44. It can be obtained by doing (*(u32 *)&len) & 0xff, but today the
>> verifier does ((*(u32 *)&len) >> 24) & 0xff instead.
> 
> What you described above for the memory layout all makes sense.
> The root cause is for big endian, we should do *((u8 *)&len + 3).
> This is exactly what macros in test_pkt_md_access.c tries to do.
> 
> if  __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
> #define TEST_FIELD(TYPE, FIELD, MASK)                                   \
>        {                                                               \
>                TYPE tmp = *(volatile TYPE *)&skb->FIELD;               \
>                if (tmp != ((*(volatile __u32 *)&skb->FIELD) & MASK))   \
>                        return TC_ACT_SHOT;                             \
>        }
> #else
> #define TEST_FIELD_OFFSET(a, b) ((sizeof(a) - sizeof(b)) / sizeof(b))
> #define TEST_FIELD(TYPE, FIELD, MASK)                                   \
>        {                                                               \
>                TYPE tmp = *((volatile TYPE *)&skb->FIELD +             \
>                              TEST_FIELD_OFFSET(skb->FIELD, TYPE));     \
>                if (tmp != ((*(volatile __u32 *)&skb->FIELD) & MASK))   \
>                        return TC_ACT_SHOT;                             \
>        }
> #endif
> 
> Could you check whether your __BYTE_ORDER__ is set
> correctly or not for this case? You may need to tweak Makefile
> if you are doing cross compilation, I am not sure how as I
> did not have environment.

I’m building natively on s390.

Here is the (formatted) preprocessed C code for the first condition:

{
	__u8 tmp = *((volatile __u8 *)&skb->len +
		((sizeof(skb->len) - sizeof(__u8)) / sizeof(__u8)));
	if (tmp != ((*(volatile __u32 *)&skb->len) & 0xFF)) return 2;
};

So I believe the endianness is chosen correctly.

Here is the clang-generated BPF bytecode for the first condition:

# llvm-objdump -d test_pkt_md_access.o
0000000000000000 process:
       0:	71 21 00 03 00 00 00 00	r2 = *(u8 *)(r1 + 3)
       1:	61 31 00 00 00 00 00 00	r3 = *(u32 *)(r1 + 0)
       2:	57 30 00 00 00 00 00 ff	r3 &= 255
       3:	5d 23 00 1d 00 00 00 00	if r2 != r3 goto +29 <LBB0_10>

This also looks good to me.

Finally, here is the verifier-generated BPF bytecode:

# bpftool prog dump xlated id 14
; TEST_FIELD(__u8,  len, 0xFF);
   0: (61) r2 = *(u32 *)(r1 +104)
   1: (bc) w2 = w2
   2: (74) w2 >>= 24
   3: (bc) w2 = w2
   4: (54) w2 &= 255
   5: (bc) w2 = w2

Here we can see the shift that I'm referring to. I believe we should
translate *(u8 *)(r1 + 3) in this case without this shift on big-endian
machines.

Best regards,
Ilya

^ permalink raw reply

* Re: regression with napi/softirq ?
From: Thomas Gleixner @ 2019-07-17 20:53 UTC (permalink / raw)
  To: Sudip Mukherjee; +Cc: peterz, davem, netdev, linux-kernel
In-Reply-To: <20190717201925.fur57qfs2x3ha6aq@debian>

On Wed, 17 Jul 2019, Sudip Mukherjee wrote:
> I am using v4.14.55 on an Intel Atom based board and I am seeing network
> packet drops frequently on wireshark logs. After lots of debugging it
> seems that when this happens softirq is taking huge time to start after
> it has been raised. This is a small snippet from ftrace:
> 
>            <...>-2110  [001] dNH1   466.634916: irq_handler_entry: irq=126 name=eth0-TxRx-0
>            <...>-2110  [001] dNH1   466.634917: softirq_raise: vec=3 [action=NET_RX]
>            <...>-2110  [001] dNH1   466.634918: irq_handler_exit: irq=126 ret=handled
>      ksoftirqd/1-15    [001] ..s.   466.635826: softirq_entry: vec=3 [action=NET_RX]
>      ksoftirqd/1-15    [001] ..s.   466.635852: softirq_exit: vec=3 [action=NET_RX]
>      ksoftirqd/1-15    [001] d.H.   466.635856: irq_handler_entry: irq=126 name=eth0-TxRx-0
>      ksoftirqd/1-15    [001] d.H.   466.635857: softirq_raise: vec=3 [action=NET_RX]
>      ksoftirqd/1-15    [001] d.H.   466.635858: irq_handler_exit: irq=126 ret=handled
>      ksoftirqd/1-15    [001] ..s.   466.635860: softirq_entry: vec=3 [action=NET_RX]
>      ksoftirqd/1-15    [001] ..s.   466.635863: softirq_exit: vec=3 [action=NET_RX]
> 
> So, softirq was raised at 466.634917 but it started at 466.635826 almost
> 909 usec after it was raised.

This is a situation where the network softirq decided to delegate softirq
processing to ksoftirqd. That happens when too much work is available while
processing softirqs on return from interrupt.

That means that softirq processing happens under scheduler control. So if
there are other runnable tasks on the same CPU ksoftirqd can be delayed
until their time slice expired. As a consequence ksoftirqd might not be
able to catch up with the incoming packet flood and the NIC starts to drop.

You can hack ksoftirq_running() to return always false to avoid this, but
that might cause application starvation and a huge packet buffer backlog
when the amount of incoming packets makes the CPU do nothing else than
softirq processing.

Thanks,

	tglx

^ permalink raw reply

* phylink: flow control on fixed-link not working.
From: René van Dorst @ 2019-07-17 21:31 UTC (permalink / raw)
  To: netdev; +Cc: Russell King

Hi,

I am trying to enable flow control/pause on PHYLINK and fixed-link.

My setup SOC mac (mt7621) <-> RGMII <-> SWITCH mac (mt7530).

It seems that in fixed-link mode all the flow control/pause bits are  
cleared in
phylink_parse_fixedlink(). If I read phylink_parse_fixedlink() [0] correctly,
I see that pl->link_config.advertising is AND with pl->supprted which has only
the PHY_SETTING() modes bits set. pl->link_config.advertising is losing Pause
bits. pl->link_config.advertising is used in phylink_resolve_flow() to set the
MLO_PAUSE_RX/TX BITS.

I think this is an error.
Because in phylink_start() see this part [1].

  /* Apply the link configuration to the MAC when starting. This allows
   * a fixed-link to start with the correct parameters, and also
   * ensures that we set the appropriate advertisement for Serdes links.
   */
  phylink_resolve_flow(pl, &pl->link_config);
  phylink_mac_config(pl, &pl->link_config);

If I add a this hacky patch below, flow control is enabled on the fixed-link.
         if (s) {
                 __set_bit(s->bit, pl->supported);
+               if (phylink_test(pl->link_config.advertising, Pause))
+                       phylink_set(pl->supported, Pause);
         } else {

So is phylink_parse_fixedlink() broken or should it handled in a other way?

Greats,

René

[0]:  
https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/phylink.c#L196
[1]:  
https://elixir.bootlin.com/linux/latest/source/drivers/net/phy/phylink.c#L897

^ permalink raw reply

* Re: regression with napi/softirq ?
From: Sudip Mukherjee @ 2019-07-17 21:40 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Peter Zijlstra (Intel), David S. Miller, netdev, linux-kernel
In-Reply-To: <alpine.DEB.2.21.1907172238490.1778@nanos.tec.linutronix.de>

Hi,

On Wed, Jul 17, 2019 at 9:53 PM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> On Wed, 17 Jul 2019, Sudip Mukherjee wrote:
> > I am using v4.14.55 on an Intel Atom based board and I am seeing network
> > packet drops frequently on wireshark logs. After lots of debugging it
> > seems that when this happens softirq is taking huge time to start after
> > it has been raised. This is a small snippet from ftrace:
> >
> >            <...>-2110  [001] dNH1   466.634916: irq_handler_entry: irq=126 name=eth0-TxRx-0
> >            <...>-2110  [001] dNH1   466.634917: softirq_raise: vec=3 [action=NET_RX]
> >            <...>-2110  [001] dNH1   466.634918: irq_handler_exit: irq=126 ret=handled
> >      ksoftirqd/1-15    [001] ..s.   466.635826: softirq_entry: vec=3 [action=NET_RX]
> >      ksoftirqd/1-15    [001] ..s.   466.635852: softirq_exit: vec=3 [action=NET_RX]
> >      ksoftirqd/1-15    [001] d.H.   466.635856: irq_handler_entry: irq=126 name=eth0-TxRx-0
> >      ksoftirqd/1-15    [001] d.H.   466.635857: softirq_raise: vec=3 [action=NET_RX]
> >      ksoftirqd/1-15    [001] d.H.   466.635858: irq_handler_exit: irq=126 ret=handled
> >      ksoftirqd/1-15    [001] ..s.   466.635860: softirq_entry: vec=3 [action=NET_RX]
> >      ksoftirqd/1-15    [001] ..s.   466.635863: softirq_exit: vec=3 [action=NET_RX]
> >
> > So, softirq was raised at 466.634917 but it started at 466.635826 almost
> > 909 usec after it was raised.
>
> This is a situation where the network softirq decided to delegate softirq
> processing to ksoftirqd. That happens when too much work is available while
> processing softirqs on return from interrupt.
>
> That means that softirq processing happens under scheduler control. So if
> there are other runnable tasks on the same CPU ksoftirqd can be delayed
> until their time slice expired. As a consequence ksoftirqd might not be
> able to catch up with the incoming packet flood and the NIC starts to drop.

Yes, and I see in the ftrace that there are many other userspace processes
getting scheduled in that time.

>
> You can hack ksoftirq_running() to return always false to avoid this, but
> that might cause application starvation and a huge packet buffer backlog
> when the amount of incoming packets makes the CPU do nothing else than
> softirq processing.

I tried that now, it is better but still not as good as v3.8
Now I am getting 375.9usec as the maximum time between raising the softirq
and it starting to execute and packet drops still there.

And just a thought, do you think there should be a CONFIG_ option for
this feature
of ksoftirqd_running() so that it can be disabled if needed by users like us?

Can you please think of anything else that might have changed which I still need
to change to make the time comparable to v3.8..


-- 
Regards
Sudip

^ permalink raw reply

* Re: [PATCH RFC 0/4] Add support to directly attach BPF program to ftrace
From: Alexei Starovoitov @ 2019-07-17 21:40 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: LKML, bpf, Daniel Borkmann, Android Kernel Team,
	Network Development, Steven Rostedt
In-Reply-To: <20190717130119.GA138030@google.com>

On Wed, Jul 17, 2019 at 6:01 AM Joel Fernandes <joel@joelfernandes.org> wrote:

I trimmed cc. some emails were bouncing.

> > I think allowing one tracepoint and disallowing another is pointless
> > from security point of view. Tracing bpf program can do bpf_probe_read
> > of anything.
>
> I think the assumption here is the user controls the program instructions at
> runtime, but that's not the case. The BPF program we are loading is not
> dynamically generated, it is built at build time and it is loaded from a
> secure verified partition, so even though it can do bpf_probe_read, it is
> still not something that the user can change.

so you're saying that by having a set of signed bpf programs which
instructions are known to be non-malicious and allowed set of tracepoints
to attach via selinux whitelist, such setup will be safe?
Have you considered how mix and match will behave?

> And, we are planning to make it
> even more secure by making it kernel verify the program at load time as well
> (you were on some discussions about that a few months ago).

It sounds like api decisions for this sticky raw_tp feature are
driven by security choices which are not actually secure.
I'm suggesting to avoid bringing up point of security as a reason for
this api design, since it's making the opposite effect.

^ permalink raw reply

* [Patch net v3 0/2] ipv4: relax source validation check for loopback packets
From: Cong Wang @ 2019-07-17 21:41 UTC (permalink / raw)
  To: netdev; +Cc: dsahern, Cong Wang

This patchset fixes a corner case when loopback packets get dropped
by rp_filter when we route them from veth to lo. Patch 1 is the fix
and patch 2 provides a simplified test case for this scenario.

Cong Wang (2):
  fib: relax source validation check for loopback packets
  selftests: add a test case for rp_filter

---
v3: use dummy1 instead of dummy0 in the test case
v2: remove a redundant if check and add a test case

 net/ipv4/fib_frontend.c                  |  5 ++++
 tools/testing/selftests/net/fib_tests.sh | 35 +++++++++++++++++++++++-
 2 files changed, 39 insertions(+), 1 deletion(-)

-- 
2.21.0


^ permalink raw reply

* [Patch net v3 2/2] selftests: add a test case for rp_filter
From: Cong Wang @ 2019-07-17 21:41 UTC (permalink / raw)
  To: netdev; +Cc: dsahern, Cong Wang
In-Reply-To: <20190717214159.25959-1-xiyou.wangcong@gmail.com>

Add a test case to simulate the loopback packet case fixed
in the previous patch.

This test gets passed after the fix:

IPv4 rp_filter tests
    TEST: rp_filter passes local packets                                [ OK ]
    TEST: rp_filter passes loopback packets                             [ OK ]

Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 tools/testing/selftests/net/fib_tests.sh | 35 +++++++++++++++++++++++-
 1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh
index 9457aaeae092..4465fc2dae14 100755
--- a/tools/testing/selftests/net/fib_tests.sh
+++ b/tools/testing/selftests/net/fib_tests.sh
@@ -9,12 +9,13 @@ ret=0
 ksft_skip=4
 
 # all tests in this script. Can be overridden with -t option
-TESTS="unregister down carrier nexthop ipv6_rt ipv4_rt ipv6_addr_metric ipv4_addr_metric ipv6_route_metrics ipv4_route_metrics ipv4_route_v6_gw"
+TESTS="unregister down carrier nexthop ipv6_rt ipv4_rt ipv6_addr_metric ipv4_addr_metric ipv6_route_metrics ipv4_route_metrics ipv4_route_v6_gw rp_filter"
 
 VERBOSE=0
 PAUSE_ON_FAIL=no
 PAUSE=no
 IP="ip -netns ns1"
+NS_EXEC="ip netns exec ns1"
 
 log_test()
 {
@@ -433,6 +434,37 @@ fib_carrier_test()
 	fib_carrier_unicast_test
 }
 
+fib_rp_filter_test()
+{
+	echo
+	echo "IPv4 rp_filter tests"
+
+	setup
+
+	set -e
+	$IP link set dev lo address 52:54:00:6a:c7:5e
+	$IP link set dummy0 address 52:54:00:6a:c7:5e
+	$IP link add dummy1 type dummy
+	$IP link set dummy1 address 52:54:00:6a:c7:5e
+	$IP link set dev dummy1 up
+	$NS_EXEC sysctl -qw net.ipv4.conf.all.rp_filter=1
+	$NS_EXEC sysctl -qw net.ipv4.conf.all.accept_local=1
+	$NS_EXEC sysctl -qw net.ipv4.conf.all.route_localnet=1
+
+	$NS_EXEC tc qd add dev dummy1 parent root handle 1: fq_codel
+	$NS_EXEC tc filter add dev dummy1 parent 1: protocol arp basic action mirred egress redirect dev lo
+	$NS_EXEC tc filter add dev dummy1 parent 1: protocol ip basic action mirred egress redirect dev lo
+	set +e
+
+	run_cmd "ip netns exec ns1 ping -I dummy1 -w1 -c1 198.51.100.1"
+	log_test $? 0 "rp_filter passes local packets"
+
+	run_cmd "ip netns exec ns1 ping -I dummy1 -w1 -c1 127.0.0.1"
+	log_test $? 0 "rp_filter passes loopback packets"
+
+	cleanup
+}
+
 ################################################################################
 # Tests on nexthop spec
 
@@ -1557,6 +1589,7 @@ do
 	fib_unreg_test|unregister)	fib_unreg_test;;
 	fib_down_test|down)		fib_down_test;;
 	fib_carrier_test|carrier)	fib_carrier_test;;
+	fib_rp_filter_test|rp_filter)	fib_rp_filter_test;;
 	fib_nexthop_test|nexthop)	fib_nexthop_test;;
 	ipv6_route_test|ipv6_rt)	ipv6_route_test;;
 	ipv4_route_test|ipv4_rt)	ipv4_route_test;;
-- 
2.21.0


^ permalink raw reply related

* [Patch net v3 1/2] fib: relax source validation check for loopback packets
From: Cong Wang @ 2019-07-17 21:41 UTC (permalink / raw)
  To: netdev; +Cc: dsahern, Cong Wang, Julian Anastasov
In-Reply-To: <20190717214159.25959-1-xiyou.wangcong@gmail.com>

In a rare case where we redirect local packets from veth to lo,
these packets fail to pass the source validation when rp_filter
is turned on, as the tracing shows:

  <...>-311708 [040] ..s1 7951180.957825: fib_table_lookup: table 254 oif 0 iif 1 src 10.53.180.130 dst 10.53.180.130 tos 0 scope 0 flags 0
  <...>-311708 [040] ..s1 7951180.957826: fib_table_lookup_nh: nexthop dev eth0 oif 4 src 10.53.180.130

So, the fib table lookup returns eth0 as the nexthop even though
the packets are local and should be routed to loopback nonetheless,
but they can't pass the dev match check in fib_info_nh_uses_dev()
without this patch.

It should be safe to relax this check for this special case, as
normally packets coming out of loopback device still have skb_dst
so they won't even hit this slow path.

Cc: Julian Anastasov <ja@ssi.bg>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/ipv4/fib_frontend.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 317339cd7f03..e8bc939b56dd 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -388,6 +388,11 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
 	fib_combine_itag(itag, &res);
 
 	dev_match = fib_info_nh_uses_dev(res.fi, dev);
+	/* This is not common, loopback packets retain skb_dst so normally they
+	 * would not even hit this slow path.
+	 */
+	dev_match = dev_match || (res.type == RTN_LOCAL &&
+				  dev == net->loopback_dev);
 	if (dev_match) {
 		ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_HOST;
 		return ret;
-- 
2.21.0


^ permalink raw reply related

* [net  1/1] tipc: initialize 'validated' field of received packets
From: Jon Maloy @ 2019-07-17 21:43 UTC (permalink / raw)
  To: davem, netdev
  Cc: gordan.mihaljevic, tung.q.nguyen, hoang.h.le, jon.maloy,
	canh.d.luu, ying.xue, tipc-discussion

The tipc_msg_validate() function leaves a boolean flag 'validated' in
the validated buffer's control block, to avoid performing this action
more than once. However, at reception of new packets, the position of
this field may already have been set by lower layer protocols, so
that the packet is erroneously perceived as already validated by TIPC.

We fix this by initializing the said field to 'false' before performing
the initial validation.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/node.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/tipc/node.c b/net/tipc/node.c
index 324a1f9..3a5be1d 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -1807,6 +1807,7 @@ void tipc_rcv(struct net *net, struct sk_buff *skb, struct tipc_bearer *b)
 	__skb_queue_head_init(&xmitq);

 	/* Ensure message is well-formed before touching the header */
+	TIPC_SKB_CB(skb)->validated = false;
 	if (unlikely(!tipc_msg_validate(&skb)))
 		goto discard;
 	hdr = buf_msg(skb);
-- 
2.1.4

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox