netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/4] act_mirred: Ingress actions support
@ 2016-09-22 13:21 Shmulik Ladkani
  2016-09-22 13:21 ` [PATCH net-next 1/4] net/sched: act_mirred: Rename tcfm_ok_push to tcfm_mac_header_xmit Shmulik Ladkani
                   ` (3 more replies)
  0 siblings, 4 replies; 44+ messages in thread
From: Shmulik Ladkani @ 2016-09-22 13:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Jamal Hadi Salim, WANG Cong, Eric Dumazet, netdev,
	Shmulik Ladkani

This patch series implements action mirred 'ingress' actions
TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR.

This allows attaching filters whose target is to hand matching skbs into
the rx processing of a specified device.

Shmulik Ladkani (4):
  net/sched: act_mirred: Rename tcfm_ok_push to tcfm_mac_header_xmit
  net/sched: act_mirred: Refactor detection whether dev needs xmit at
    mac header
  net/sched: tc_mirred: Rename public predicates
    'is_tcf_mirred_redirect' and 'is_tcf_mirred_mirror'
  net/sched: act_mirred: Implement ingress actions

 drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_u32.c  |  2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c      |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c    |  2 +-
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c     |  4 +-
 .../net/ethernet/netronome/nfp/nfp_net_offload.c   |  2 +-
 include/net/tc_act/tc_mirred.h                     |  6 +-
 net/sched/act_mirred.c                             | 80 ++++++++++++++++------
 7 files changed, 69 insertions(+), 29 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH net-next 1/4] net/sched: act_mirred: Rename tcfm_ok_push to tcfm_mac_header_xmit
  2016-09-22 13:21 [PATCH net-next 0/4] act_mirred: Ingress actions support Shmulik Ladkani
@ 2016-09-22 13:21 ` Shmulik Ladkani
  2016-09-27 10:30   ` Daniel Borkmann
  2016-09-22 13:21 ` [PATCH net-next 2/4] net/sched: act_mirred: Refactor detection whether dev needs xmit at mac header Shmulik Ladkani
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 44+ messages in thread
From: Shmulik Ladkani @ 2016-09-22 13:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Jamal Hadi Salim, WANG Cong, Eric Dumazet, netdev,
	Shmulik Ladkani

From: Shmulik Ladkani <shmulik.ladkani@gmail.com>

'tcfm_ok_push' specifies whether a mac_len sized push is needed upon
egress to the target device (if action is performed at ingress).

Rename it to 'tcfm_mac_header_xmit' as this is actually an attribute of
the target device.
This allows to decouple the attribute from the action to be taken.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
---
 include/net/tc_act/tc_mirred.h |  2 +-
 net/sched/act_mirred.c         | 10 +++++-----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/net/tc_act/tc_mirred.h b/include/net/tc_act/tc_mirred.h
index 62770ad..5275158 100644
--- a/include/net/tc_act/tc_mirred.h
+++ b/include/net/tc_act/tc_mirred.h
@@ -8,7 +8,7 @@ struct tcf_mirred {
 	struct tc_action	common;
 	int			tcfm_eaction;
 	int			tcfm_ifindex;
-	int			tcfm_ok_push;
+	int			tcfm_mac_header_xmit;
 	struct net_device __rcu	*tcfm_dev;
 	struct list_head	tcfm_list;
 };
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 667dc38..7b03b13 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -63,7 +63,7 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
 	struct tc_mirred *parm;
 	struct tcf_mirred *m;
 	struct net_device *dev;
-	int ret, ok_push = 0;
+	int ret, mac_header_xmit = 0;
 	bool exists = false;
 
 	if (nla == NULL)
@@ -102,10 +102,10 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
 		case ARPHRD_IPGRE:
 		case ARPHRD_VOID:
 		case ARPHRD_NONE:
-			ok_push = 0;
+			mac_header_xmit = 0;
 			break;
 		default:
-			ok_push = 1;
+			mac_header_xmit = 1;
 			break;
 		}
 	} else {
@@ -136,7 +136,7 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
 			dev_put(rcu_dereference_protected(m->tcfm_dev, 1));
 		dev_hold(dev);
 		rcu_assign_pointer(m->tcfm_dev, dev);
-		m->tcfm_ok_push = ok_push;
+		m->tcfm_mac_header_xmit = mac_header_xmit;
 	}
 
 	if (ret == ACT_P_CREATED) {
@@ -181,7 +181,7 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
 		goto out;
 
 	if (!(at & AT_EGRESS)) {
-		if (m->tcfm_ok_push)
+		if (m->tcfm_mac_header_xmit)
 			skb_push_rcsum(skb2, skb->mac_len);
 	}
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH net-next 2/4] net/sched: act_mirred: Refactor detection whether dev needs xmit at mac header
  2016-09-22 13:21 [PATCH net-next 0/4] act_mirred: Ingress actions support Shmulik Ladkani
  2016-09-22 13:21 ` [PATCH net-next 1/4] net/sched: act_mirred: Rename tcfm_ok_push to tcfm_mac_header_xmit Shmulik Ladkani
@ 2016-09-22 13:21 ` Shmulik Ladkani
  2016-09-22 13:21 ` [PATCH net-next 3/4] net/sched: tc_mirred: Rename public predicates 'is_tcf_mirred_redirect' and 'is_tcf_mirred_mirror' Shmulik Ladkani
  2016-09-22 13:21 ` [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions Shmulik Ladkani
  3 siblings, 0 replies; 44+ messages in thread
From: Shmulik Ladkani @ 2016-09-22 13:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Jamal Hadi Salim, WANG Cong, Eric Dumazet, netdev,
	Shmulik Ladkani

From: Shmulik Ladkani <shmulik.ladkani@gmail.com>

Move detection logic that tests whether device expects skb data to point
to mac_header upon xmit into a function.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
---
 net/sched/act_mirred.c | 28 +++++++++++++++-------------
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 7b03b13..28629d3 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -54,6 +54,20 @@ static const struct nla_policy mirred_policy[TCA_MIRRED_MAX + 1] = {
 static int mirred_net_id;
 static struct tc_action_ops act_mirred_ops;
 
+static int dev_is_mac_header_xmit(const struct net_device *dev)
+{
+	switch (dev->type) {
+	case ARPHRD_TUNNEL:
+	case ARPHRD_TUNNEL6:
+	case ARPHRD_SIT:
+	case ARPHRD_IPGRE:
+	case ARPHRD_VOID:
+	case ARPHRD_NONE:
+		return 0;
+	}
+	return 1;
+}
+
 static int tcf_mirred_init(struct net *net, struct nlattr *nla,
 			   struct nlattr *est, struct tc_action **a, int ovr,
 			   int bind)
@@ -95,19 +109,7 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
 				tcf_hash_release(*a, bind);
 			return -ENODEV;
 		}
-		switch (dev->type) {
-		case ARPHRD_TUNNEL:
-		case ARPHRD_TUNNEL6:
-		case ARPHRD_SIT:
-		case ARPHRD_IPGRE:
-		case ARPHRD_VOID:
-		case ARPHRD_NONE:
-			mac_header_xmit = 0;
-			break;
-		default:
-			mac_header_xmit = 1;
-			break;
-		}
+		mac_header_xmit = dev_is_mac_header_xmit(dev);
 	} else {
 		dev = NULL;
 	}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH net-next 3/4] net/sched: tc_mirred: Rename public predicates 'is_tcf_mirred_redirect' and 'is_tcf_mirred_mirror'
  2016-09-22 13:21 [PATCH net-next 0/4] act_mirred: Ingress actions support Shmulik Ladkani
  2016-09-22 13:21 ` [PATCH net-next 1/4] net/sched: act_mirred: Rename tcfm_ok_push to tcfm_mac_header_xmit Shmulik Ladkani
  2016-09-22 13:21 ` [PATCH net-next 2/4] net/sched: act_mirred: Refactor detection whether dev needs xmit at mac header Shmulik Ladkani
@ 2016-09-22 13:21 ` Shmulik Ladkani
  2016-09-22 13:21 ` [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions Shmulik Ladkani
  3 siblings, 0 replies; 44+ messages in thread
From: Shmulik Ladkani @ 2016-09-22 13:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Jamal Hadi Salim, WANG Cong, Eric Dumazet, netdev,
	Shmulik Ladkani, Hariprasad S, Jeff Kirsher, Saeed Mahameed,
	Jiri Pirko, Ido Schimmel, Jakub Kicinski

From: Shmulik Ladkani <shmulik.ladkani@gmail.com>

These accessors are used in various drivers that support tc offloading,
to detect properties of a given 'tc_action'.

'is_tcf_mirred_redirect' tests that the action is TCA_EGRESS_REDIR.
'is_tcf_mirred_mirror' tests that the action is TCA_EGRESS_MIRROR.

As a prep towards supporting INGRESS redir/mirror, rename these
predicates to reflect their true meaning:
  s/is_tcf_mirred_redirect/is_tcf_mirred_egress_redirect/
  s/is_tcf_mirred_mirror/is_tcf_mirred_egress_mirror/

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Cc: Hariprasad S <hariprasad@chelsio.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Ido Schimmel <idosch@mellanox.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_u32.c    | 2 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c        | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c      | 2 +-
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c       | 4 +++-
 drivers/net/ethernet/netronome/nfp/nfp_net_offload.c | 2 +-
 include/net/tc_act/tc_mirred.h                       | 4 ++--
 6 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_u32.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_u32.c
index 49d2deb..52af62e 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_u32.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_u32.c
@@ -113,7 +113,7 @@ static int fill_action_fields(struct adapter *adap,
 		}
 
 		/* Re-direct to specified port in hardware. */
-		if (is_tcf_mirred_redirect(a)) {
+		if (is_tcf_mirred_egress_redirect(a)) {
 			struct net_device *n_dev;
 			unsigned int i, index;
 			bool found = false;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index d76bc1a..ba03da2 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8425,7 +8425,7 @@ static int parse_tc_actions(struct ixgbe_adapter *adapter,
 		}
 
 		/* Redirect to a VF or a offloaded macvlan */
-		if (is_tcf_mirred_redirect(a)) {
+		if (is_tcf_mirred_egress_redirect(a)) {
 			int ifindex = tcf_mirred_ifindex(a);
 
 			err = handle_redirect_action(adapter, ifindex, queue,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 22cfc4a..7301dff 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -383,7 +383,7 @@ static int parse_tc_fdb_actions(struct mlx5e_priv *priv, struct tcf_exts *exts,
 			continue;
 		}
 
-		if (is_tcf_mirred_redirect(a)) {
+		if (is_tcf_mirred_egress_redirect(a)) {
 			int ifindex = tcf_mirred_ifindex(a);
 			struct net_device *out_dev;
 			struct mlx5e_priv *out_priv;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 80f27b5..c5bac29 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -1237,8 +1237,10 @@ static int mlxsw_sp_port_add_cls_matchall(struct mlxsw_sp_port *mlxsw_sp_port,
 
 	tcf_exts_to_list(cls->exts, &actions);
 	list_for_each_entry(a, &actions, list) {
-		if (!is_tcf_mirred_mirror(a) || protocol != htons(ETH_P_ALL))
+		if (!is_tcf_mirred_egress_mirror(a) ||
+		    protocol != htons(ETH_P_ALL)) {
 			return -ENOTSUPP;
+		}
 
 		err = mlxsw_sp_port_add_cls_matchall_mirror(mlxsw_sp_port, cls,
 							    a, ingress);
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_offload.c b/drivers/net/ethernet/netronome/nfp/nfp_net_offload.c
index 43f42f8..2b1fa527 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_offload.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_offload.c
@@ -128,7 +128,7 @@ nfp_net_bpf_get_act(struct nfp_net *nn, struct tc_cls_bpf_offload *cls_bpf)
 		if (is_tcf_gact_shot(a))
 			return NN_ACT_TC_DROP;
 
-		if (is_tcf_mirred_redirect(a) &&
+		if (is_tcf_mirred_egress_redirect(a) &&
 		    tcf_mirred_ifindex(a) == nn->netdev->ifindex)
 			return NN_ACT_TC_REDIR;
 	}
diff --git a/include/net/tc_act/tc_mirred.h b/include/net/tc_act/tc_mirred.h
index 5275158..7b72c6b 100644
--- a/include/net/tc_act/tc_mirred.h
+++ b/include/net/tc_act/tc_mirred.h
@@ -14,7 +14,7 @@ struct tcf_mirred {
 };
 #define to_mirred(a) ((struct tcf_mirred *)a)
 
-static inline bool is_tcf_mirred_redirect(const struct tc_action *a)
+static inline bool is_tcf_mirred_egress_redirect(const struct tc_action *a)
 {
 #ifdef CONFIG_NET_CLS_ACT
 	if (a->ops && a->ops->type == TCA_ACT_MIRRED)
@@ -23,7 +23,7 @@ static inline bool is_tcf_mirred_redirect(const struct tc_action *a)
 	return false;
 }
 
-static inline bool is_tcf_mirred_mirror(const struct tc_action *a)
+static inline bool is_tcf_mirred_egress_mirror(const struct tc_action *a)
 {
 #ifdef CONFIG_NET_CLS_ACT
 	if (a->ops && a->ops->type == TCA_ACT_MIRRED)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-22 13:21 [PATCH net-next 0/4] act_mirred: Ingress actions support Shmulik Ladkani
                   ` (2 preceding siblings ...)
  2016-09-22 13:21 ` [PATCH net-next 3/4] net/sched: tc_mirred: Rename public predicates 'is_tcf_mirred_redirect' and 'is_tcf_mirred_mirror' Shmulik Ladkani
@ 2016-09-22 13:21 ` Shmulik Ladkani
  2016-09-22 14:54   ` Eric Dumazet
                     ` (3 more replies)
  3 siblings, 4 replies; 44+ messages in thread
From: Shmulik Ladkani @ 2016-09-22 13:21 UTC (permalink / raw)
  To: David S. Miller
  Cc: Jamal Hadi Salim, WANG Cong, Eric Dumazet, netdev,
	Shmulik Ladkani

From: Shmulik Ladkani <shmulik.ladkani@gmail.com>

Up until now, 'action mirred' supported only egress actions (either
TCA_EGRESS_REDIR or TCA_EGRESS_MIRROR).

This patch implements the corresponding ingress actions
TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR.

This allows attaching filters whose target is to hand matching skbs into
the rx processing of a specified device.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
---
 Was wondering, whether netif_receive_skb or dev_forward_skb should be
 used for the rx bouncing. Used netif_receive_skb as in ifb device.

 net/sched/act_mirred.c | 48 ++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 42 insertions(+), 6 deletions(-)

diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 28629d3..942120e 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -33,6 +33,25 @@
 static LIST_HEAD(mirred_list);
 static DEFINE_SPINLOCK(mirred_list_lock);
 
+static bool tcf_mirred_is_act_redirect(int action)
+{
+	return action == TCA_EGRESS_REDIR || action == TCA_INGRESS_REDIR;
+}
+
+static u32 tcf_mirred_act_direction(int action)
+{
+	switch (action) {
+	case TCA_EGRESS_REDIR:
+	case TCA_EGRESS_MIRROR:
+		return AT_EGRESS;
+	case TCA_INGRESS_REDIR:
+	case TCA_INGRESS_MIRROR:
+		return AT_INGRESS;
+	default:
+		BUG();
+	}
+}
+
 static void tcf_mirred_release(struct tc_action *a, int bind)
 {
 	struct tcf_mirred *m = to_mirred(a);
@@ -96,6 +115,8 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
 	switch (parm->eaction) {
 	case TCA_EGRESS_MIRROR:
 	case TCA_EGRESS_REDIR:
+	case TCA_INGRESS_REDIR:
+	case TCA_INGRESS_MIRROR:
 		break;
 	default:
 		if (exists)
@@ -157,7 +178,8 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
 	struct tcf_mirred *m = to_mirred(a);
 	struct net_device *dev;
 	struct sk_buff *skb2;
-	int retval, err;
+	int retval, err = 0;
+	int mac_len;
 	u32 at;
 
 	tcf_lastuse_update(&m->tcf_tm);
@@ -182,23 +204,37 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
 	if (!skb2)
 		goto out;
 
-	if (!(at & AT_EGRESS)) {
-		if (m->tcfm_mac_header_xmit)
+	/* If action's target direction differs than filter's direction,
+	 * and devices expect a mac header on xmit, then mac push/pull is
+	 * needed.
+	 */
+	if (at != tcf_mirred_act_direction(m->tcfm_eaction) &&
+	    m->tcfm_mac_header_xmit) {
+		if (at & AT_EGRESS) {
+			/* caught at egress, act ingress: pull mac */
+			mac_len = skb_network_header(skb) - skb_mac_header(skb);
+			skb_pull_rcsum(skb2, mac_len);
+		} else {
+			/* caught at ingress, act egress: push mac */
 			skb_push_rcsum(skb2, skb->mac_len);
+		}
 	}
 
 	/* mirror is always swallowed */
-	if (m->tcfm_eaction != TCA_EGRESS_MIRROR)
+	if (tcf_mirred_is_act_redirect(m->tcfm_eaction))
 		skb2->tc_verd = SET_TC_FROM(skb2->tc_verd, at);
 
 	skb2->skb_iif = skb->dev->ifindex;
 	skb2->dev = dev;
-	err = dev_queue_xmit(skb2);
+	if (tcf_mirred_act_direction(m->tcfm_eaction) & AT_EGRESS)
+		err = dev_queue_xmit(skb2);
+	else
+		netif_receive_skb(skb2);
 
 	if (err) {
 out:
 		qstats_overlimit_inc(this_cpu_ptr(m->common.cpu_qstats));
-		if (m->tcfm_eaction != TCA_EGRESS_MIRROR)
+		if (tcf_mirred_is_act_redirect(m->tcfm_eaction))
 			retval = TC_ACT_SHOT;
 	}
 	rcu_read_unlock();
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-22 13:21 ` [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions Shmulik Ladkani
@ 2016-09-22 14:54   ` Eric Dumazet
  2016-09-22 18:27     ` Shmulik Ladkani
  2016-09-22 23:40   ` Jamal Hadi Salim
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 44+ messages in thread
From: Eric Dumazet @ 2016-09-22 14:54 UTC (permalink / raw)
  To: Shmulik Ladkani
  Cc: David S. Miller, Jamal Hadi Salim, WANG Cong, Eric Dumazet,
	netdev, Shmulik Ladkani

On Thu, 2016-09-22 at 16:21 +0300, Shmulik Ladkani wrote:
> From: Shmulik Ladkani <shmulik.ladkani@gmail.com>
> 
> Up until now, 'action mirred' supported only egress actions (either
> TCA_EGRESS_REDIR or TCA_EGRESS_MIRROR).
> 
> This patch implements the corresponding ingress actions
> TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR.
> 
> This allows attaching filters whose target is to hand matching skbs into
> the rx processing of a specified device.
> 
> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> ---
>  Was wondering, whether netif_receive_skb or dev_forward_skb should be
>  used for the rx bouncing. Used netif_receive_skb as in ifb device.

Hmm... we probably need to apply the full rcu protection before this
patch.

https://patchwork.ozlabs.org/patch/667680/

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-22 14:54   ` Eric Dumazet
@ 2016-09-22 18:27     ` Shmulik Ladkani
  2016-09-22 18:42       ` Eric Dumazet
  0 siblings, 1 reply; 44+ messages in thread
From: Shmulik Ladkani @ 2016-09-22 18:27 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S. Miller, Jamal Hadi Salim, WANG Cong, Eric Dumazet,
	netdev, Shmulik Ladkani

On Thu, 22 Sep 2016 07:54:13 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Hmm... we probably need to apply the full rcu protection before this
> patch.
> 
> https://patchwork.ozlabs.org/patch/667680/

Are you referring to order of application into net-next?

This patch seems to present no new tcf_mirred_params members nor
need-to-be-protected code regions (please, correct me if wrong).
So it does not _depend_ on the 'full rcu fixes', does it?

Thanks,
Shmulik

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-22 18:27     ` Shmulik Ladkani
@ 2016-09-22 18:42       ` Eric Dumazet
  0 siblings, 0 replies; 44+ messages in thread
From: Eric Dumazet @ 2016-09-22 18:42 UTC (permalink / raw)
  To: Shmulik Ladkani
  Cc: David S. Miller, Jamal Hadi Salim, WANG Cong, Eric Dumazet,
	netdev

On Thu, 2016-09-22 at 21:27 +0300, Shmulik Ladkani wrote:
> On Thu, 22 Sep 2016 07:54:13 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Hmm... we probably need to apply the full rcu protection before this
> > patch.
> > 
> > https://patchwork.ozlabs.org/patch/667680/
> 
> Are you referring to order of application into net-next?
> 
> This patch seems to present no new tcf_mirred_params members nor
> need-to-be-protected code regions (please, correct me if wrong).
> So it does not _depend_ on the 'full rcu fixes', does it?

No, simply a reminder that we run lockless there, so you might need to
read some control variables once, and in a consistent way.

(Or a concurrent writer could change params in the middle of the
function)

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-22 13:21 ` [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions Shmulik Ladkani
  2016-09-22 14:54   ` Eric Dumazet
@ 2016-09-22 23:40   ` Jamal Hadi Salim
  2016-09-23  5:11     ` Shmulik Ladkani
  2016-09-24 23:50   ` Cong Wang
  2016-09-27  5:56   ` David Miller
  3 siblings, 1 reply; 44+ messages in thread
From: Jamal Hadi Salim @ 2016-09-22 23:40 UTC (permalink / raw)
  To: Shmulik Ladkani, David S. Miller
  Cc: WANG Cong, Eric Dumazet, netdev, Shmulik Ladkani

On 16-09-22 09:21 AM, Shmulik Ladkani wrote:
> From: Shmulik Ladkani <shmulik.ladkani@gmail.com>
>
> Up until now, 'action mirred' supported only egress actions (either
> TCA_EGRESS_REDIR or TCA_EGRESS_MIRROR).
>
> This patch implements the corresponding ingress actions
> TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR.
>
> This allows attaching filters whose target is to hand matching skbs into
> the rx processing of a specified device.
>

Thank you for doing this. There was something that made me remove
initial support for this feature - I am blanking out right now but
will find my notes and give more details. It may be around preventing
loops maybe. If that was the thought then:
I am just wondering is there a use case for a packet that is redirected
from egress ethx to ingress of ethy that then requires ingress of ethy
classify? Otherwise you could just set the "dont classify" flag.
i.e SET_TC_NCLS()

cheers,
jamal

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-22 23:40   ` Jamal Hadi Salim
@ 2016-09-23  5:11     ` Shmulik Ladkani
  2016-09-23 12:48       ` Jamal Hadi Salim
  2016-09-25  0:07       ` Cong Wang
  0 siblings, 2 replies; 44+ messages in thread
From: Shmulik Ladkani @ 2016-09-23  5:11 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: David S. Miller, WANG Cong, Eric Dumazet, netdev, Shmulik Ladkani

Hi,

On Thu, 22 Sep 2016 19:40:15 -0400 Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> On 16-09-22 09:21 AM, Shmulik Ladkani wrote:
> > From: Shmulik Ladkani <shmulik.ladkani@gmail.com>
> >
> > Up until now, 'action mirred' supported only egress actions (either
> > TCA_EGRESS_REDIR or TCA_EGRESS_MIRROR).
> >
> > This patch implements the corresponding ingress actions
> > TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR.
> >
> > This allows attaching filters whose target is to hand matching skbs into
> > the rx processing of a specified device.
> 
> Thank you for doing this. There was something that made me remove
> initial support for this feature - I am blanking out right now but
> will find my notes and give more details.

Thanks Jamal, appreciate any details.

Was wondering why it's missing, googled a bit with no meaningful
results, so speculated the following:

Some time long ago, initial 'mirred' purpose was to facilitate ifb.
Therefore 'egress redirect' was implemented. Jamal probably left the
'ingress' support for a later time :)

One interesting usecase for 'ingress redirect' is creating "rx bouncing"
construct (like macvlan/macvtap/ipvlan) but applied according to custom
logic.

> It may be around preventing loops maybe.

Could be, but personally, I treat these constructs as (powerful)
building blocks, and "with great power comes great responsibility".

Even today, one may create loops using existing 'egress redirect',
e.g. this rediculously errorneous construct:

 # ip l add v0 type veth peer name v0p 
 # tc filter add dev v0p parent ffff: basic \
    action mirred egress redirect dev v0

Regards,
Shmulik

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-23  5:11     ` Shmulik Ladkani
@ 2016-09-23 12:48       ` Jamal Hadi Salim
  2016-09-23 15:40         ` Shmulik Ladkani
  2016-09-25  0:07       ` Cong Wang
  1 sibling, 1 reply; 44+ messages in thread
From: Jamal Hadi Salim @ 2016-09-23 12:48 UTC (permalink / raw)
  To: Shmulik Ladkani; +Cc: David S. Miller, WANG Cong, Eric Dumazet, netdev

On 16-09-23 01:11 AM, Shmulik Ladkani wrote:
> Hi,
>
> On Thu, 22 Sep 2016 19:40:15 -0400 Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>> On 16-09-22 09:21 AM, Shmulik Ladkani wrote:
>>> From: Shmulik Ladkani <shmulik.ladkani@gmail.com>
>>>
>>> Up until now, 'action mirred' supported only egress actions (either
>>> TCA_EGRESS_REDIR or TCA_EGRESS_MIRROR).
>>>
>>> This patch implements the corresponding ingress actions
>>> TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR.
>>>
>>> This allows attaching filters whose target is to hand matching skbs into
>>> the rx processing of a specified device.
>>
>> Thank you for doing this. There was something that made me remove
>> initial support for this feature - I am blanking out right now but
>> will find my notes and give more details.
>
> Thanks Jamal, appreciate any details.
>
> Was wondering why it's missing, googled a bit with no meaningful
> results, so speculated the following:
>
> Some time long ago, initial 'mirred' purpose was to facilitate ifb.
> Therefore 'egress redirect' was implemented. Jamal probably left the
> 'ingress' support for a later time :)
>

History is mirror/redirect were first introduced to do just those
plain vanilla-free features. IFB came later. Up until recently there
were still some bits to support the ingress features that were removed
by Florian W. to save some skb bits.


> One interesting usecase for 'ingress redirect' is creating "rx bouncing"
> construct (like macvlan/macvtap/ipvlan) but applied according to custom
> logic.
>

I thought that was the motivation as well.

>> It may be around preventing loops maybe.
>
> Could be, but personally, I treat these constructs as (powerful)
> building blocks, and "with great power comes great responsibility".
>

Amen.
I am a believer in let-the-user-shoot-their-big-toe-if-they-want.


> Even today, one may create loops using existing 'egress redirect',
> e.g. this rediculously errorneous construct:
>
>  # ip l add v0 type veth peer name v0p
>  # tc filter add dev v0p parent ffff: basic \
>     action mirred egress redirect dev v0
>

I think we actually recover from this one by eventually
dropping (theres a ttl field). We should at least not lock
the kernel forever.
The other question is what to set skb->dev and skb->iif?
Some information will be lost if you move around netdevs a
bit.

BTW: You have motivated me to start looking again at redirect
to socket that was also left out. I am getting tired of redirecting
to tuntap with all its bells and whistles.

cheers,
jamal

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-23 12:48       ` Jamal Hadi Salim
@ 2016-09-23 15:40         ` Shmulik Ladkani
  2016-09-25  0:20           ` Cong Wang
  2016-09-25 13:05           ` Jamal Hadi Salim
  0 siblings, 2 replies; 44+ messages in thread
From: Shmulik Ladkani @ 2016-09-23 15:40 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: David S. Miller, WANG Cong, Eric Dumazet, netdev

On Fri, 23 Sep 2016 08:48:33 -0400 Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> > Even today, one may create loops using existing 'egress redirect',
> > e.g. this rediculously errorneous construct:
> >
> >  # ip l add v0 type veth peer name v0p
> >  # tc filter add dev v0p parent ffff: basic \
> >     action mirred egress redirect dev v0
> 
> I think we actually recover from this one by eventually
> dropping (theres a ttl field).

[off topic]

Don't know about that :) cpu fan got very noisy, 3 of 4 cores at 100%,
and after one second I got:

# ip -s l show type veth
16: v0p@v0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether a2:64:ff:10:dd:85 brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast   
    71660305923 469890864 0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    3509       24       0       0       0       0       
17: v0@v0p: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 52:a2:34:f6:7c:ec brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast   
    3509       24       0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    71660713017 469893555 0       0       0       0

> The other question is what to set skb->dev and skb->iif?
> Some information will be lost if you move around netdevs a
> bit.

[back to topic]

Good point.

Similarly to all constructs injecting skbs to device rx (bond/team,
vlan, macvlan, tunnels, ifb, __dev_forward_skb callers, etc..), we are
obligated to assign 'skb2->dev' as the new rx device.

Regarding 'skb2->skb_iif', original act_mirred code already has:

 	skb2->skb_iif = skb->dev->ifindex;   <--- THIS IS ORIG DEV IIF
 	skb2->dev = dev;                     <--- THIS IS TARGET DEV
	err = dev_queue_xmit(skb2);

I'm preserving this; OTOH the suggested modification in the patch is

-	err = dev_queue_xmit(skb2);
+	if (tcf_mirred_act_direction(m->tcfm_eaction) & AT_EGRESS)
+		err = dev_queue_xmit(skb2);
+	else
+		netif_receive_skb(skb2);

now, the call to 'netif_receive_skb' will eventually override skb_iif to
the target RX dev's index, upon entry to __netif_receive_skb_core.

I think this IS the expected behavior - as done by other "rx injection"
constructs.

My doubts were around whether we should call 'dev_forward_skb' instead
of 'netif_receive_skb'.
The former does some things I assumed we're not interested of, like
testing 'is_skb_forwardable' and re-running 'eth_type_trans'.
OTOH, it DOES scrub the skb.
Maybe we should scrub it as well prior the netif_receive_skb call?

Thanks,
Shmulik

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-22 13:21 ` [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions Shmulik Ladkani
  2016-09-22 14:54   ` Eric Dumazet
  2016-09-22 23:40   ` Jamal Hadi Salim
@ 2016-09-24 23:50   ` Cong Wang
  2016-09-27  5:56   ` David Miller
  3 siblings, 0 replies; 44+ messages in thread
From: Cong Wang @ 2016-09-24 23:50 UTC (permalink / raw)
  To: Shmulik Ladkani
  Cc: David S. Miller, Jamal Hadi Salim, Eric Dumazet,
	Linux Kernel Network Developers, Shmulik Ladkani

On Thu, Sep 22, 2016 at 6:21 AM, Shmulik Ladkani
<shmulik.ladkani@ravellosystems.com> wrote:
> From: Shmulik Ladkani <shmulik.ladkani@gmail.com>
>
> Up until now, 'action mirred' supported only egress actions (either
> TCA_EGRESS_REDIR or TCA_EGRESS_MIRROR).
>
> This patch implements the corresponding ingress actions
> TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR.
>
> This allows attaching filters whose target is to hand matching skbs into
> the rx processing of a specified device.

I like this idea, this idea actually came to my mind when we
tried to redirect packets from eth0 to veth device in containers.
I remember I already brought this up to Jamal before (either personally
or publicly), but I forgot why I stopped.

This would reduce some latency for our Mesos containers, we
would just skip one veth TX in our scenario.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-23  5:11     ` Shmulik Ladkani
  2016-09-23 12:48       ` Jamal Hadi Salim
@ 2016-09-25  0:07       ` Cong Wang
  2016-09-25 13:39         ` Jamal Hadi Salim
  2016-09-25 17:59         ` Shmulik Ladkani
  1 sibling, 2 replies; 44+ messages in thread
From: Cong Wang @ 2016-09-25  0:07 UTC (permalink / raw)
  To: Shmulik Ladkani
  Cc: Jamal Hadi Salim, David S. Miller, Eric Dumazet,
	Linux Kernel Network Developers

On Thu, Sep 22, 2016 at 10:11 PM, Shmulik Ladkani
<shmulik.ladkani@gmail.com> wrote:
> Was wondering why it's missing, googled a bit with no meaningful
> results, so speculated the following:
>
> Some time long ago, initial 'mirred' purpose was to facilitate ifb.
> Therefore 'egress redirect' was implemented. Jamal probably left the
> 'ingress' support for a later time :)
>
> One interesting usecase for 'ingress redirect' is creating "rx bouncing"
> construct (like macvlan/macvtap/ipvlan) but applied according to custom
> logic.

We have done this for our containers for a long time. We simply
redirect packets to veth TX then flow to veth RX of course.

One problem to use your code for us is that, the RX side of veth
is inside containers, not visible to outside, perhaps we need some
more parameter to tell the netns before the device name/index?
Thoughts?

>
>> It may be around preventing loops maybe.
>
> Could be, but personally, I treat these constructs as (powerful)
> building blocks, and "with great power comes great responsibility".
>
> Even today, one may create loops using existing 'egress redirect',
> e.g. this rediculously errorneous construct:
>
>  # ip l add v0 type veth peer name v0p
>  # tc filter add dev v0p parent ffff: basic \
>     action mirred egress redirect dev v0

Detecting such loops should not be hard technically, like we do
for reclassification. We might need some bits in skb to detect
this specific case. Anyway, I don't think it is a blocker, just need
more tests to catch some corner cases.

Thanks.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-23 15:40         ` Shmulik Ladkani
@ 2016-09-25  0:20           ` Cong Wang
  2016-09-25 13:05           ` Jamal Hadi Salim
  1 sibling, 0 replies; 44+ messages in thread
From: Cong Wang @ 2016-09-25  0:20 UTC (permalink / raw)
  To: Shmulik Ladkani
  Cc: Jamal Hadi Salim, David S. Miller, Eric Dumazet,
	Linux Kernel Network Developers

On Fri, Sep 23, 2016 at 8:40 AM, Shmulik Ladkani
<shmulik.ladkani@gmail.com> wrote:
> On Fri, 23 Sep 2016 08:48:33 -0400 Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>> > Even today, one may create loops using existing 'egress redirect',
>> > e.g. this rediculously errorneous construct:
>> >
>> >  # ip l add v0 type veth peer name v0p
>> >  # tc filter add dev v0p parent ffff: basic \
>> >     action mirred egress redirect dev v0
>>
>> I think we actually recover from this one by eventually
>> dropping (theres a ttl field).
>
> [off topic]
>
> Don't know about that :) cpu fan got very noisy, 3 of 4 cores at 100%,
> and after one second I got:
>
> # ip -s l show type veth
> 16: v0p@v0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
>     link/ether a2:64:ff:10:dd:85 brd ff:ff:ff:ff:ff:ff
>     RX: bytes  packets  errors  dropped overrun mcast
>     71660305923 469890864 0       0       0       0
>     TX: bytes  packets  errors  dropped carrier collsns
>     3509       24       0       0       0       0
> 17: v0@v0p: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
>     link/ether 52:a2:34:f6:7c:ec brd ff:ff:ff:ff:ff:ff
>     RX: bytes  packets  errors  dropped overrun mcast
>     3509       24       0       0       0       0
>     TX: bytes  packets  errors  dropped carrier collsns
>     71660713017 469893555 0       0       0       0

These ghost packets never enter IP stack, I don't think TTL
helps.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-23 15:40         ` Shmulik Ladkani
  2016-09-25  0:20           ` Cong Wang
@ 2016-09-25 13:05           ` Jamal Hadi Salim
  2016-09-25 16:26             ` Daniel Borkmann
  2016-09-25 17:33             ` Shmulik Ladkani
  1 sibling, 2 replies; 44+ messages in thread
From: Jamal Hadi Salim @ 2016-09-25 13:05 UTC (permalink / raw)
  To: Shmulik Ladkani
  Cc: David S. Miller, WANG Cong, Eric Dumazet, netdev,
	Florian Westphal

On 16-09-23 11:40 AM, Shmulik Ladkani wrote:
> On Fri, 23 Sep 2016 08:48:33 -0400 Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>>> Even today, one may create loops using existing 'egress redirect',
>>> e.g. this rediculously errorneous construct:
>>>
>>>  # ip l add v0 type veth peer name v0p
>>>  # tc filter add dev v0p parent ffff: basic \
>>>     action mirred egress redirect dev v0
>>
>> I think we actually recover from this one by eventually
>> dropping (theres a ttl field).
>
> [off topic]
>
> Don't know about that :) cpu fan got very noisy, 3 of 4 cores at 100%,
> and after one second I got:
>
> # ip -s l show type veth
> 16: v0p@v0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
>     link/ether a2:64:ff:10:dd:85 brd ff:ff:ff:ff:ff:ff
>     RX: bytes  packets  errors  dropped overrun mcast
>     71660305923 469890864 0       0       0       0
>     TX: bytes  packets  errors  dropped carrier collsns
>     3509       24       0       0       0       0
> 17: v0@v0p: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
>     link/ether 52:a2:34:f6:7c:ec brd ff:ff:ff:ff:ff:ff
>     RX: bytes  packets  errors  dropped overrun mcast
>     3509       24       0       0       0       0
>     TX: bytes  packets  errors  dropped carrier collsns
>     71660713017 469893555 0       0       0       0
>

I think this is still on topic!

Now I realize that code we took out around 4.2.x is still useful
for such a use case (I wasnt thinking about veth when Florian was
slimming the skb);
+Cc Florian W.

This snippet from 4.2:
-------------
3525 static int ing_filter(struct sk_buff *skb, struct netdev_queue *rxq)
3526 {
3527         struct net_device *dev = skb->dev;
3528         u32 ttl = G_TC_RTTL(skb->tc_verd);
3529         int result = TC_ACT_OK;
3530         struct Qdisc *q;
3531
3532         if (unlikely(MAX_RED_LOOP < ttl++)) {
3533                 net_warn_ratelimited("Redir loop detected Dropping 
packet (%d->%d)\n",
3534                                      skb->skb_iif, dev->ifindex);
3535                 return TC_ACT_SHOT;
3536         }
3537
3538         skb->tc_verd = SET_TC_RTTL(skb->tc_verd, ttl);
3539         skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS);
3540
3541         q = rcu_dereference(rxq->qdisc);
3542         if (q != &noop_qdisc) {
3543                 spin_lock(qdisc_lock(q));
3544                 if (likely(!test_bit(__QDISC_STATE_DEACTIVATED, 
&q->state)))
3545                         result = qdisc_enqueue_root(skb, q);
3546                 spin_unlock(qdisc_lock(q));
3547         }
3548
3549         return result;
3550 }
--------------------

MAX_RED_LOOP (stands for "Maximum Redirect loop") still exists in
current code. The idea above was that we would increment the rttl
counter  once and if we saw it again upto MAX_RED_LOOP we would assume
a loop and drop the packet (at the time i didnt think it was wise to
let the actions be in charge of setting the RTTL; it had to be central
core code - but it may not be neccessary)

Florian, when we discussed I said it was fine to reclaim those 3 bits
on tc verdict for RTTL at the time because i had taken out the
feature and never added it back. Your comment at the time was we can
add it back when someone shows up with the feature.
Shmulik is looking to add it.

> Similarly to all constructs injecting skbs to device rx (bond/team,
> vlan, macvlan, tunnels, ifb, __dev_forward_skb callers, etc..), we are
> obligated to assign 'skb2->dev' as the new rx device.
>
> Regarding 'skb2->skb_iif', original act_mirred code already has:
>
>  	skb2->skb_iif = skb->dev->ifindex;   <--- THIS IS ORIG DEV IIF
>  	skb2->dev = dev;                     <--- THIS IS TARGET DEV
> 	err = dev_queue_xmit(skb2);
>
> I'm preserving this; OTOH the suggested modification in the patch is
>
> -	err = dev_queue_xmit(skb2);
> +	if (tcf_mirred_act_direction(m->tcfm_eaction) & AT_EGRESS)
> +		err = dev_queue_xmit(skb2);
> +	else
> +		netif_receive_skb(skb2);
>
> now, the call to 'netif_receive_skb' will eventually override skb_iif to
> the target RX dev's index, upon entry to __netif_receive_skb_core.
>
> I think this IS the expected behavior - as done by other "rx injection"
> constructs.
>

Sounds fine.
I am wondering if we can have a tracing feature to show the lifetime of
the packet as it is being cycled around the kernel? It would help
debugging if some policy misbehaves.

> My doubts were around whether we should call 'dev_forward_skb' instead
> of 'netif_receive_skb'.
> The former does some things I assumed we're not interested of, like
> testing 'is_skb_forwardable' and re-running 'eth_type_trans'.
> OTOH, it DOES scrub the skb.
> Maybe we should scrub it as well prior the netif_receive_skb call?
>

Scrubbing the skb could be a bad idea if it gets rid of global state
like the RTTL if you add it back.

cheers,
jamal

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-25  0:07       ` Cong Wang
@ 2016-09-25 13:39         ` Jamal Hadi Salim
  2016-09-26  4:55           ` Cong Wang
  2016-09-25 17:59         ` Shmulik Ladkani
  1 sibling, 1 reply; 44+ messages in thread
From: Jamal Hadi Salim @ 2016-09-25 13:39 UTC (permalink / raw)
  To: Cong Wang, Shmulik Ladkani
  Cc: David S. Miller, Eric Dumazet, Linux Kernel Network Developers,
	Florian Westphal

On 16-09-24 08:07 PM, Cong Wang wrote:
> On Thu, Sep 22, 2016 at 10:11 PM, Shmulik Ladkani

>
> One problem to use your code for us is that, the RX side of veth
> is inside containers, not visible to outside, perhaps we need some
> more parameter to tell the netns before the device name/index?
> Thoughts?
>

Intriguing - but this would apply for only veth?

>>
>>> It may be around preventing loops maybe.
>>
>> Could be, but personally, I treat these constructs as (powerful)
>> building blocks, and "with great power comes great responsibility".
>>
>> Even today, one may create loops using existing 'egress redirect',
>> e.g. this rediculously errorneous construct:
>>
>>  # ip l add v0 type veth peer name v0p
>>  # tc filter add dev v0p parent ffff: basic \
>>     action mirred egress redirect dev v0
>
> Detecting such loops should not be hard technically, like we do
> for reclassification. We might need some bits in skb to detect
> this specific case.

Note my other email. We had the feature but we took it out to save bits
on the skb.


cheers,
jamal

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-25 13:05           ` Jamal Hadi Salim
@ 2016-09-25 16:26             ` Daniel Borkmann
  2016-09-25 18:33               ` Florian Westphal
  2016-09-25 23:31               ` Jamal Hadi Salim
  2016-09-25 17:33             ` Shmulik Ladkani
  1 sibling, 2 replies; 44+ messages in thread
From: Daniel Borkmann @ 2016-09-25 16:26 UTC (permalink / raw)
  To: Jamal Hadi Salim, Shmulik Ladkani
  Cc: David S. Miller, WANG Cong, Eric Dumazet, netdev,
	Florian Westphal

On 09/25/2016 03:05 PM, Jamal Hadi Salim wrote:
> On 16-09-23 11:40 AM, Shmulik Ladkani wrote:
>> On Fri, 23 Sep 2016 08:48:33 -0400 Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>>>> Even today, one may create loops using existing 'egress redirect',
>>>> e.g. this rediculously errorneous construct:
>>>>
>>>>  # ip l add v0 type veth peer name v0p
>>>>  # tc filter add dev v0p parent ffff: basic \
>>>>     action mirred egress redirect dev v0
>>>
>>> I think we actually recover from this one by eventually
>>> dropping (theres a ttl field).
>>
>> [off topic]
>>
>> Don't know about that :) cpu fan got very noisy, 3 of 4 cores at 100%,
>> and after one second I got:
>>
>> # ip -s l show type veth
>> 16: v0p@v0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
>>     link/ether a2:64:ff:10:dd:85 brd ff:ff:ff:ff:ff:ff
>>     RX: bytes  packets  errors  dropped overrun mcast
>>     71660305923 469890864 0       0       0       0
>>     TX: bytes  packets  errors  dropped carrier collsns
>>     3509       24       0       0       0       0
>> 17: v0@v0p: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
>>     link/ether 52:a2:34:f6:7c:ec brd ff:ff:ff:ff:ff:ff
>>     RX: bytes  packets  errors  dropped overrun mcast
>>     3509       24       0       0       0       0
>>     TX: bytes  packets  errors  dropped carrier collsns
>>     71660713017 469893555 0       0       0       0
>>
>
> I think this is still on topic!
>
> Now I realize that code we took out around 4.2.x is still useful
> for such a use case (I wasnt thinking about veth when Florian was
> slimming the skb);
> +Cc Florian W.
>
> This snippet from 4.2:
> -------------
> 3525 static int ing_filter(struct sk_buff *skb, struct netdev_queue *rxq)
> 3526 {
> 3527         struct net_device *dev = skb->dev;
> 3528         u32 ttl = G_TC_RTTL(skb->tc_verd);
> 3529         int result = TC_ACT_OK;
> 3530         struct Qdisc *q;
> 3531
> 3532         if (unlikely(MAX_RED_LOOP < ttl++)) {
> 3533                 net_warn_ratelimited("Redir loop detected Dropping packet (%d->%d)\n",
> 3534                                      skb->skb_iif, dev->ifindex);
> 3535                 return TC_ACT_SHOT;
> 3536         }
> 3537
> 3538         skb->tc_verd = SET_TC_RTTL(skb->tc_verd, ttl);
> 3539         skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS);
> 3540
> 3541         q = rcu_dereference(rxq->qdisc);
> 3542         if (q != &noop_qdisc) {
> 3543                 spin_lock(qdisc_lock(q));
> 3544                 if (likely(!test_bit(__QDISC_STATE_DEACTIVATED, &q->state)))
> 3545                         result = qdisc_enqueue_root(skb, q);
> 3546                 spin_unlock(qdisc_lock(q));
> 3547         }
> 3548
> 3549         return result;
> 3550 }
> --------------------
>
> MAX_RED_LOOP (stands for "Maximum Redirect loop") still exists in
> current code. The idea above was that we would increment the rttl
> counter  once and if we saw it again upto MAX_RED_LOOP we would assume
> a loop and drop the packet (at the time i didnt think it was wise to
> let the actions be in charge of setting the RTTL; it had to be central
> core code - but it may not be neccessary)
>
> Florian, when we discussed I said it was fine to reclaim those 3 bits
> on tc verdict for RTTL at the time because i had taken out the
> feature and never added it back. Your comment at the time was we can
> add it back when someone shows up with the feature.
> Shmulik is looking to add it.

Why not just reuse xmit_recursion, which is what we did in tc cls_bpf
programs f.e. see __bpf_tx_skb()? Would be a pity to waste 3 bits on
this in the skb.

>> Similarly to all constructs injecting skbs to device rx (bond/team,
>> vlan, macvlan, tunnels, ifb, __dev_forward_skb callers, etc..), we are
>> obligated to assign 'skb2->dev' as the new rx device.
>>
>> Regarding 'skb2->skb_iif', original act_mirred code already has:
>>
>>      skb2->skb_iif = skb->dev->ifindex;   <--- THIS IS ORIG DEV IIF
>>      skb2->dev = dev;                     <--- THIS IS TARGET DEV
>>     err = dev_queue_xmit(skb2);
>>
>> I'm preserving this; OTOH the suggested modification in the patch is
>>
>> -    err = dev_queue_xmit(skb2);
>> +    if (tcf_mirred_act_direction(m->tcfm_eaction) & AT_EGRESS)
>> +        err = dev_queue_xmit(skb2);
>> +    else
>> +        netif_receive_skb(skb2);
>>
>> now, the call to 'netif_receive_skb' will eventually override skb_iif to
>> the target RX dev's index, upon entry to __netif_receive_skb_core.
>>
>> I think this IS the expected behavior - as done by other "rx injection"
>> constructs.
>
> Sounds fine.
> I am wondering if we can have a tracing feature to show the lifetime of
> the packet as it is being cycled around the kernel? It would help
> debugging if some policy misbehaves.
>
>> My doubts were around whether we should call 'dev_forward_skb' instead
>> of 'netif_receive_skb'.
>> The former does some things I assumed we're not interested of, like
>> testing 'is_skb_forwardable' and re-running 'eth_type_trans'.
>> OTOH, it DOES scrub the skb.
>> Maybe we should scrub it as well prior the netif_receive_skb call?
>
> Scrubbing the skb could be a bad idea if it gets rid of global state
> like the RTTL if you add it back.
>
> cheers,
> jamal
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-25 13:05           ` Jamal Hadi Salim
  2016-09-25 16:26             ` Daniel Borkmann
@ 2016-09-25 17:33             ` Shmulik Ladkani
  2016-09-25 18:31               ` Florian Westphal
  2016-09-25 23:45               ` Jamal Hadi Salim
  1 sibling, 2 replies; 44+ messages in thread
From: Shmulik Ladkani @ 2016-09-25 17:33 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: David S. Miller, WANG Cong, Eric Dumazet, netdev,
	Florian Westphal, Daniel Borkmann

On Sun, 25 Sep 2016 09:05:08 -0400 Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> On 16-09-23 11:40 AM, Shmulik Ladkani wrote:
> >
> > [off topic]
> 
> I think this is still on topic!

Sorry, wasn't too clear on that.

What I meant is that _existing_ "egress redirect" already gets us into
crazy loops - the veth misconfig being just one example of, but
many more exist (many device stacking constructs, with lower dev issuing
an egress-redirect back to the topmost dev).

Point is, IMO loop detection (whether/how addressed), is orthogonal to
this series implementing "ingress redirect", and doesn't seem as a
strict prerequisite to adding the "ingress redirect" functionality to
act_mirred.

We can later address any loop-detection improvements in mirred.
WDYT?

Thanks,
Shmulik

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-25  0:07       ` Cong Wang
  2016-09-25 13:39         ` Jamal Hadi Salim
@ 2016-09-25 17:59         ` Shmulik Ladkani
  2016-09-26  4:56           ` Cong Wang
  1 sibling, 1 reply; 44+ messages in thread
From: Shmulik Ladkani @ 2016-09-25 17:59 UTC (permalink / raw)
  To: Cong Wang
  Cc: Jamal Hadi Salim, David S. Miller, Eric Dumazet,
	Linux Kernel Network Developers

Hi,

On Sat, 24 Sep 2016 17:07:12 -0700 Cong Wang <xiyou.wangcong@gmail.com> wrote:
> One problem to use your code for us is that, the RX side of veth
> is inside containers, not visible to outside, perhaps we need some
> more parameter to tell the netns before the device name/index?
> Thoughts?

Well, this is way trickier...

tc_mirred doesn't cope with netns movement of the target device.
See 'mirred_device_event': upon NETDEV_UNREGISTER the 'tcfm_dev' gets
nullified.

(dev_change_net_namespace sequence includes NETDEV_UNREGISTER,
 dev_net_set, NETDEV_REGISTER).

As upposed to veth, which keeps the peer netdev pointer (since veth peers
lifetime is coupled), here in act_mirred we can't easily distinguish a
"real" NETDEV_UNREGISTER vs a namespace change...

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-25 17:33             ` Shmulik Ladkani
@ 2016-09-25 18:31               ` Florian Westphal
  2016-09-26  1:15                 ` Jamal Hadi Salim
  2016-09-25 23:45               ` Jamal Hadi Salim
  1 sibling, 1 reply; 44+ messages in thread
From: Florian Westphal @ 2016-09-25 18:31 UTC (permalink / raw)
  To: Shmulik Ladkani
  Cc: Jamal Hadi Salim, David S. Miller, WANG Cong, Eric Dumazet,
	netdev, Florian Westphal, Daniel Borkmann

Shmulik Ladkani <shmulik.ladkani@gmail.com> wrote:
> We can later address any loop-detection improvements in mirred.
> WDYT?

You can address this after fixing infamous spinlock recursion hard
lockup (which has existed forever):

tc qdisc add dev eth0 root handle 1: prio
tc filter add dev eth0 parent 1: protocol ip u32 match u32 0 0 flowid
1:2 action mirred egress redirect dev eth0

(only do this on toy vm)

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-25 16:26             ` Daniel Borkmann
@ 2016-09-25 18:33               ` Florian Westphal
  2016-09-25 23:47                 ` Jamal Hadi Salim
  2016-09-25 23:31               ` Jamal Hadi Salim
  1 sibling, 1 reply; 44+ messages in thread
From: Florian Westphal @ 2016-09-25 18:33 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Jamal Hadi Salim, Shmulik Ladkani, David S. Miller, WANG Cong,
	Eric Dumazet, netdev, Florian Westphal

Daniel Borkmann <daniel@iogearbox.net> wrote:
> On 09/25/2016 03:05 PM, Jamal Hadi Salim wrote:
> >MAX_RED_LOOP (stands for "Maximum Redirect loop") still exists in
> >current code. The idea above was that we would increment the rttl
> >counter  once and if we saw it again upto MAX_RED_LOOP we would assume
> >a loop and drop the packet (at the time i didnt think it was wise to
> >let the actions be in charge of setting the RTTL; it had to be central
> >core code - but it may not be neccessary)
> >
> >Florian, when we discussed I said it was fine to reclaim those 3 bits
> >on tc verdict for RTTL at the time because i had taken out the
> >feature and never added it back. Your comment at the time was we can
> >add it back when someone shows up with the feature.
> >Shmulik is looking to add it.
> 
> Why not just reuse xmit_recursion, which is what we did in tc cls_bpf
> programs f.e. see __bpf_tx_skb()? Would be a pity to waste 3 bits on
> this in the skb.

+1, don't (yet) understand why a per-skb counter is required for this.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-25 16:26             ` Daniel Borkmann
  2016-09-25 18:33               ` Florian Westphal
@ 2016-09-25 23:31               ` Jamal Hadi Salim
  1 sibling, 0 replies; 44+ messages in thread
From: Jamal Hadi Salim @ 2016-09-25 23:31 UTC (permalink / raw)
  To: Daniel Borkmann, Shmulik Ladkani
  Cc: David S. Miller, WANG Cong, Eric Dumazet, netdev,
	Florian Westphal

On 16-09-25 12:26 PM, Daniel Borkmann wrote:
> On 09/25/2016 03:05 PM, Jamal Hadi Salim wrote:

[..]

>> MAX_RED_LOOP (stands for "Maximum Redirect loop") still exists in
>> current code. The idea above was that we would increment the rttl
>> counter  once and if we saw it again upto MAX_RED_LOOP we would assume
>> a loop and drop the packet (at the time i didnt think it was wise to
>> let the actions be in charge of setting the RTTL; it had to be central
>> core code - but it may not be neccessary)
>>
>> Florian, when we discussed I said it was fine to reclaim those 3 bits
>> on tc verdict for RTTL at the time because i had taken out the
>> feature and never added it back. Your comment at the time was we can
>> add it back when someone shows up with the feature.
>> Shmulik is looking to add it.
>
> Why not just reuse xmit_recursion, which is what we did in tc cls_bpf
> programs f.e. see __bpf_tx_skb()? Would be a pity to waste 3 bits on
> this in the skb.

If it is going to work, I'd be happy to save those bits.
xmit_recursion is going to prevent recursing into
dev_xmit(), no?

In our case we want to preventing looping of a singular
skb.

cheers,
jamal

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-25 17:33             ` Shmulik Ladkani
  2016-09-25 18:31               ` Florian Westphal
@ 2016-09-25 23:45               ` Jamal Hadi Salim
  1 sibling, 0 replies; 44+ messages in thread
From: Jamal Hadi Salim @ 2016-09-25 23:45 UTC (permalink / raw)
  To: Shmulik Ladkani
  Cc: David S. Miller, WANG Cong, Eric Dumazet, netdev,
	Florian Westphal, Daniel Borkmann

On 16-09-25 01:33 PM, Shmulik Ladkani wrote:
> On Sun, 25 Sep 2016 09:05:08 -0400 Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>> On 16-09-23 11:40 AM, Shmulik Ladkani wrote:
>>>
>>> [off topic]
>>
>> I think this is still on topic!
>
> Sorry, wasn't too clear on that.
>
> What I meant is that _existing_ "egress redirect" already gets us into
> crazy loops - the veth misconfig being just one example of, but
> many more exist (many device stacking constructs, with lower dev issuing
> an egress-redirect back to the topmost dev).
>

So this is stopped by the xmit_recursion Daniel mentioned, correct?

> Point is, IMO loop detection (whether/how addressed), is orthogonal to
> this series implementing "ingress redirect", and doesn't seem as a
> strict prerequisite to adding the "ingress redirect" functionality to
> act_mirred.
>
> We can later address any loop-detection improvements in mirred.
> WDYT?
>

If indeed the xmit_recursion solves the egress->egress problem then I
would suggest we need to address the egress->ingress issue.
BTW: plans to also address ingress->ingress?

cheers,
jamal

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-25 18:33               ` Florian Westphal
@ 2016-09-25 23:47                 ` Jamal Hadi Salim
  0 siblings, 0 replies; 44+ messages in thread
From: Jamal Hadi Salim @ 2016-09-25 23:47 UTC (permalink / raw)
  To: Florian Westphal, Daniel Borkmann
  Cc: Shmulik Ladkani, David S. Miller, WANG Cong, Eric Dumazet, netdev

On 16-09-25 02:33 PM, Florian Westphal wrote:
> Daniel Borkmann <daniel@iogearbox.net> wrote:
[..]
>>
>> Why not just reuse xmit_recursion, which is what we did in tc cls_bpf
>> programs f.e. see __bpf_tx_skb()? Would be a pity to waste 3 bits on
>> this in the skb.
>
> +1, don't (yet) understand why a per-skb counter is required for this.
>

If you could solve redirecting from ingress->egress with xmit_recursion
then we are set ;->

cheers,
jamal

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-25 18:31               ` Florian Westphal
@ 2016-09-26  1:15                 ` Jamal Hadi Salim
  2016-09-26  1:35                   ` Florian Westphal
  0 siblings, 1 reply; 44+ messages in thread
From: Jamal Hadi Salim @ 2016-09-26  1:15 UTC (permalink / raw)
  To: Florian Westphal, Shmulik Ladkani
  Cc: David S. Miller, WANG Cong, Eric Dumazet, netdev, Daniel Borkmann

On 16-09-25 02:31 PM, Florian Westphal wrote:
> Shmulik Ladkani <shmulik.ladkani@gmail.com> wrote:
>> We can later address any loop-detection improvements in mirred.
>> WDYT?
>
> You can address this after fixing infamous spinlock recursion hard
> lockup (which has existed forever):
>
> tc qdisc add dev eth0 root handle 1: prio
> tc filter add dev eth0 parent 1: protocol ip u32 match u32 0 0 flowid
> 1:2 action mirred egress redirect dev eth0
>
> (only do this on toy vm)
>

Realize didnt respond to this. Seems very simple to fix:
if skb->dev->ifindex and m->tcfm_dev->ifindex are the
same, then you can drop the packet.
But it may be possible to reject earlier the policy
entirely earlier.
And true given this is egress->egress redirection
one could also check with xmit_recursion check.


cheers,
jamal

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-26  1:15                 ` Jamal Hadi Salim
@ 2016-09-26  1:35                   ` Florian Westphal
  2016-09-26  1:40                     ` Jamal Hadi Salim
  2016-09-26 14:43                     ` Hannes Frederic Sowa
  0 siblings, 2 replies; 44+ messages in thread
From: Florian Westphal @ 2016-09-26  1:35 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Florian Westphal, Shmulik Ladkani, David S. Miller, WANG Cong,
	Eric Dumazet, netdev, Daniel Borkmann

Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> On 16-09-25 02:31 PM, Florian Westphal wrote:
> >Shmulik Ladkani <shmulik.ladkani@gmail.com> wrote:
> >>We can later address any loop-detection improvements in mirred.
> >>WDYT?
> >
> >You can address this after fixing infamous spinlock recursion hard
> >lockup (which has existed forever):
> >
> >tc qdisc add dev eth0 root handle 1: prio
> >tc filter add dev eth0 parent 1: protocol ip u32 match u32 0 0 flowid
> >1:2 action mirred egress redirect dev eth0
> >
> >(only do this on toy vm)
> >
> 
> Realize didnt respond to this. Seems very simple to fix:
> if skb->dev->ifindex and m->tcfm_dev->ifindex are the
> same, then you can drop the packet.

Yes, but I think we get same issue when we deal with stacked
interfaces, and redirect is to e.g. vlan on top of physical device.

And we have such loops even without tc, for instance when placing
both veth ends in same bridge :-(

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-26  1:35                   ` Florian Westphal
@ 2016-09-26  1:40                     ` Jamal Hadi Salim
  2016-09-26 14:43                     ` Hannes Frederic Sowa
  1 sibling, 0 replies; 44+ messages in thread
From: Jamal Hadi Salim @ 2016-09-26  1:40 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Shmulik Ladkani, David S. Miller, WANG Cong, Eric Dumazet, netdev,
	Daniel Borkmann

On 16-09-25 09:35 PM, Florian Westphal wrote:
> Jamal Hadi Salim <jhs@mojatatu.com> wrote:

>>
>> Realize didnt respond to this. Seems very simple to fix:
>> if skb->dev->ifindex and m->tcfm_dev->ifindex are the
>> same, then you can drop the packet.
>
> Yes, but I think we get same issue when we deal with stacked
> interfaces, and redirect is to e.g. vlan on top of physical device.
>
> And we have such loops even without tc, for instance when placing
> both veth ends in same bridge :-(


For egress->egress the xmit recursion should help (maybe an
audit needs to be done).
For the case Shmulik is trying to achieve I am not sure that
would help.
In general, catching such a loop (or broadcast), if cheap should
be attempted.

cheers,
jamal

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-25 13:39         ` Jamal Hadi Salim
@ 2016-09-26  4:55           ` Cong Wang
  0 siblings, 0 replies; 44+ messages in thread
From: Cong Wang @ 2016-09-26  4:55 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Shmulik Ladkani, David S. Miller, Eric Dumazet,
	Linux Kernel Network Developers, Florian Westphal

On Sun, Sep 25, 2016 at 6:39 AM, Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> On 16-09-24 08:07 PM, Cong Wang wrote:
>>
>> On Thu, Sep 22, 2016 at 10:11 PM, Shmulik Ladkani
>
>
>>
>> One problem to use your code for us is that, the RX side of veth
>> is inside containers, not visible to outside, perhaps we need some
>> more parameter to tell the netns before the device name/index?
>> Thoughts?
>>
>
> Intriguing - but this would apply for only veth?

Yes, my case is only about veth.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-25 17:59         ` Shmulik Ladkani
@ 2016-09-26  4:56           ` Cong Wang
  0 siblings, 0 replies; 44+ messages in thread
From: Cong Wang @ 2016-09-26  4:56 UTC (permalink / raw)
  To: Shmulik Ladkani
  Cc: Jamal Hadi Salim, David S. Miller, Eric Dumazet,
	Linux Kernel Network Developers

On Sun, Sep 25, 2016 at 10:59 AM, Shmulik Ladkani
<shmulik.ladkani@gmail.com> wrote:
> Hi,
>
> On Sat, 24 Sep 2016 17:07:12 -0700 Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> One problem to use your code for us is that, the RX side of veth
>> is inside containers, not visible to outside, perhaps we need some
>> more parameter to tell the netns before the device name/index?
>> Thoughts?
>
> Well, this is way trickier...
>
> tc_mirred doesn't cope with netns movement of the target device.
> See 'mirred_device_event': upon NETDEV_UNREGISTER the 'tcfm_dev' gets
> nullified.
>
> (dev_change_net_namespace sequence includes NETDEV_UNREGISTER,
>  dev_net_set, NETDEV_REGISTER).
>
> As upposed to veth, which keeps the peer netdev pointer (since veth peers
> lifetime is coupled), here in act_mirred we can't easily distinguish a
> "real" NETDEV_UNREGISTER vs a namespace change...

Yeah, isolation is a barrier for this, so we probably can't use
this feature. But I still think it could be useful.

Thanks.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-26  1:35                   ` Florian Westphal
  2016-09-26  1:40                     ` Jamal Hadi Salim
@ 2016-09-26 14:43                     ` Hannes Frederic Sowa
  2016-09-26 14:53                       ` Daniel Borkmann
  2016-09-26 15:26                       ` Shmulik Ladkani
  1 sibling, 2 replies; 44+ messages in thread
From: Hannes Frederic Sowa @ 2016-09-26 14:43 UTC (permalink / raw)
  To: Florian Westphal, Jamal Hadi Salim
  Cc: Shmulik Ladkani, David S. Miller, WANG Cong, Eric Dumazet, netdev,
	Daniel Borkmann

On 26.09.2016 03:35, Florian Westphal wrote:
> Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>> On 16-09-25 02:31 PM, Florian Westphal wrote:
>>> Shmulik Ladkani <shmulik.ladkani@gmail.com> wrote:
>>>> We can later address any loop-detection improvements in mirred.
>>>> WDYT?
>>>
>>> You can address this after fixing infamous spinlock recursion hard
>>> lockup (which has existed forever):
>>>
>>> tc qdisc add dev eth0 root handle 1: prio
>>> tc filter add dev eth0 parent 1: protocol ip u32 match u32 0 0 flowid
>>> 1:2 action mirred egress redirect dev eth0
>>>
>>> (only do this on toy vm)
>>>
>>
>> Realize didnt respond to this. Seems very simple to fix:
>> if skb->dev->ifindex and m->tcfm_dev->ifindex are the
>> same, then you can drop the packet.
> 
> Yes, but I think we get same issue when we deal with stacked
> interfaces, and redirect is to e.g. vlan on top of physical device.

We do have the adjacent upper lists in all netdevices, calculating if a
mirred actions would insert the skb on a stacked device above us should
be as easy as querying netdev_has_upper_dev and should be possible to
check that during config time.

> And we have such loops even without tc, for instance when placing
> both veth ends in same bridge :-(

We can't fix that without a ttl in the sk_buff struct.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-26 14:43                     ` Hannes Frederic Sowa
@ 2016-09-26 14:53                       ` Daniel Borkmann
  2016-09-26 15:12                         ` Hannes Frederic Sowa
  2016-09-26 15:26                       ` Shmulik Ladkani
  1 sibling, 1 reply; 44+ messages in thread
From: Daniel Borkmann @ 2016-09-26 14:53 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Florian Westphal, Jamal Hadi Salim
  Cc: Shmulik Ladkani, David S. Miller, WANG Cong, Eric Dumazet, netdev

On 09/26/2016 04:43 PM, Hannes Frederic Sowa wrote:
> On 26.09.2016 03:35, Florian Westphal wrote:
>> Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>>> On 16-09-25 02:31 PM, Florian Westphal wrote:
>>>> Shmulik Ladkani <shmulik.ladkani@gmail.com> wrote:
>>>>> We can later address any loop-detection improvements in mirred.
>>>>> WDYT?
>>>>
>>>> You can address this after fixing infamous spinlock recursion hard
>>>> lockup (which has existed forever):
>>>>
>>>> tc qdisc add dev eth0 root handle 1: prio
>>>> tc filter add dev eth0 parent 1: protocol ip u32 match u32 0 0 flowid
>>>> 1:2 action mirred egress redirect dev eth0
>>>>
>>>> (only do this on toy vm)
>>>
>>> Realize didnt respond to this. Seems very simple to fix:
>>> if skb->dev->ifindex and m->tcfm_dev->ifindex are the
>>> same, then you can drop the packet.
>>
>> Yes, but I think we get same issue when we deal with stacked
>> interfaces, and redirect is to e.g. vlan on top of physical device.
>
> We do have the adjacent upper lists in all netdevices, calculating if a
> mirred actions would insert the skb on a stacked device above us should
> be as easy as querying netdev_has_upper_dev and should be possible to
> check that during config time.

But that would still not be enough, no? In the sense that with above
scenario, you could redirect to some arbitrary device that redirects
this back to the original device if on purpose configured as such,
thus they don't necessarily need to have a stacked relationship.

>> And we have such loops even without tc, for instance when placing
>> both veth ends in same bridge :-(
>
> We can't fix that without a ttl in the sk_buff struct.
>
> Bye,
> Hannes

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-26 14:53                       ` Daniel Borkmann
@ 2016-09-26 15:12                         ` Hannes Frederic Sowa
  2016-09-26 15:53                           ` Daniel Borkmann
  0 siblings, 1 reply; 44+ messages in thread
From: Hannes Frederic Sowa @ 2016-09-26 15:12 UTC (permalink / raw)
  To: Daniel Borkmann, Florian Westphal, Jamal Hadi Salim
  Cc: Shmulik Ladkani, David S. Miller, WANG Cong, Eric Dumazet, netdev

On 26.09.2016 16:53, Daniel Borkmann wrote:
> On 09/26/2016 04:43 PM, Hannes Frederic Sowa wrote:
>> On 26.09.2016 03:35, Florian Westphal wrote:
>>> Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>>>> On 16-09-25 02:31 PM, Florian Westphal wrote:
>>>>> Shmulik Ladkani <shmulik.ladkani@gmail.com> wrote:
>>>>>> We can later address any loop-detection improvements in mirred.
>>>>>> WDYT?
>>>>>
>>>>> You can address this after fixing infamous spinlock recursion hard
>>>>> lockup (which has existed forever):
>>>>>
>>>>> tc qdisc add dev eth0 root handle 1: prio
>>>>> tc filter add dev eth0 parent 1: protocol ip u32 match u32 0 0 flowid
>>>>> 1:2 action mirred egress redirect dev eth0
>>>>>
>>>>> (only do this on toy vm)
>>>>
>>>> Realize didnt respond to this. Seems very simple to fix:
>>>> if skb->dev->ifindex and m->tcfm_dev->ifindex are the
>>>> same, then you can drop the packet.
>>>
>>> Yes, but I think we get same issue when we deal with stacked
>>> interfaces, and redirect is to e.g. vlan on top of physical device.
>>
>> We do have the adjacent upper lists in all netdevices, calculating if a
>> mirred actions would insert the skb on a stacked device above us should
>> be as easy as querying netdev_has_upper_dev and should be possible to
>> check that during config time.
> 
> But that would still not be enough, no? In the sense that with above
> scenario, you could redirect to some arbitrary device that redirects
> this back to the original device if on purpose configured as such,
> thus they don't necessarily need to have a stacked relationship.

Yes, it would only help with the scenario Florian described above.

Personally, I would only try to fix and warn against the easy to detect
cases. It is easy enough to just create a loop with your local attached
L2 which brings your box into a endless loop processing the same packet
again and again. Because it is out of control of the kernel you cannot
do anything at all.

I would just care that we sometimes reschedule and don't do everything
in one stack so we don't corrupt the machine and an admin has still a
chance to solve the problem.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-26 14:43                     ` Hannes Frederic Sowa
  2016-09-26 14:53                       ` Daniel Borkmann
@ 2016-09-26 15:26                       ` Shmulik Ladkani
  1 sibling, 0 replies; 44+ messages in thread
From: Shmulik Ladkani @ 2016-09-26 15:26 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Florian Westphal, Jamal Hadi Salim, David S. Miller, WANG Cong,
	Eric Dumazet, netdev, Daniel Borkmann

Hi,

On Mon, 26 Sep 2016 16:43:16 +0200 Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:
> On 26.09.2016 03:35, Florian Westphal wrote:
> > 
> > Yes, but I think we get same issue when we deal with stacked
> > interfaces, and redirect is to e.g. vlan on top of physical device.  
> 
> We do have the adjacent upper lists in all netdevices, calculating if a
> mirred actions would insert the skb on a stacked device above us should
> be as easy as querying netdev_has_upper_dev and should be possible to
> check that during config time.

This does not seem the right direction:

- We should NOT ban egress redirect to an "upper device", this limits
  flexibility available today:
  One may validly redirect to an "upper device" according to a very
  specific criteria such that the skb will NOT be caught again by the
  filter when re-iterated down the stack.

- And even if we did, at "config time" as you say, one can always stack
  the target device ontop the filtered device at a LATER time.
  And obviously, testing "is upper" while executing action is a bad idea.

- Moreover, as Daniel noted, this is just a band-aid which limitely
  addresses only few of the already existing troubles with stacked
  virtual netdevices (with or without act_mirred).

Frankly, I think the loop-detection suggestions overload act_mirred
with AI that might imply limitations not well assessed (even on existing
usecases).

If needed, I'd go with the "recursion detection" suggested by Daniel,
which is simple and aids with the issues of "egress" redirect that
already exist today.

Moreover, as the patch series is all about "ingress redirect" support,
the "recursion detection" deserves a series of its own, we shouldn't
bundle these two distinct features.

Regards,
Shmulik

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-26 15:12                         ` Hannes Frederic Sowa
@ 2016-09-26 15:53                           ` Daniel Borkmann
  0 siblings, 0 replies; 44+ messages in thread
From: Daniel Borkmann @ 2016-09-26 15:53 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Florian Westphal, Jamal Hadi Salim
  Cc: Shmulik Ladkani, David S. Miller, WANG Cong, Eric Dumazet, netdev

On 09/26/2016 05:12 PM, Hannes Frederic Sowa wrote:
[...]
> I would just care that we sometimes reschedule and don't do everything
> in one stack so we don't corrupt the machine and an admin has still a
> chance to solve the problem.

Sounds reasonable to me, and which is what dev_forward_skb() is doing
implicitly as well.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-22 13:21 ` [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions Shmulik Ladkani
                     ` (2 preceding siblings ...)
  2016-09-24 23:50   ` Cong Wang
@ 2016-09-27  5:56   ` David Miller
  2016-09-27  8:07     ` Shmulik Ladkani
  3 siblings, 1 reply; 44+ messages in thread
From: David Miller @ 2016-09-27  5:56 UTC (permalink / raw)
  To: shmulik.ladkani; +Cc: jhs, xiyou.wangcong, edumazet, netdev, shmulik.ladkani

From: Shmulik Ladkani <shmulik.ladkani@ravellosystems.com>
Date: Thu, 22 Sep 2016 16:21:52 +0300

> From: Shmulik Ladkani <shmulik.ladkani@gmail.com>
> 
> Up until now, 'action mirred' supported only egress actions (either
> TCA_EGRESS_REDIR or TCA_EGRESS_MIRROR).
> 
> This patch implements the corresponding ingress actions
> TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR.
> 
> This allows attaching filters whose target is to hand matching skbs into
> the rx processing of a specified device.
> 
> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> ---
>  Was wondering, whether netif_receive_skb or dev_forward_skb should be
>  used for the rx bouncing. Used netif_receive_skb as in ifb device.

The discussion on this patch has ventured off into what to do about
recursion.

But it unclear to me where this specific patch, and this series,
stands right now.  Someone please clear this up for me.

Thanks.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-27  5:56   ` David Miller
@ 2016-09-27  8:07     ` Shmulik Ladkani
  2016-09-27 10:39       ` Daniel Borkmann
  2016-09-27 14:06       ` Jamal Hadi Salim
  0 siblings, 2 replies; 44+ messages in thread
From: Shmulik Ladkani @ 2016-09-27  8:07 UTC (permalink / raw)
  To: David Miller; +Cc: jhs, xiyou.wangcong, edumazet, netdev, shmulik.ladkani

Hi David,

On Tue, 27 Sep 2016 01:56:06 -0400 (EDT), davem@davemloft.net wrote:
> The discussion on this patch has ventured off into what to do about
> recursion.
> 
> But it unclear to me where this specific patch, and this series,
> stands right now.  Someone please clear this up for me.

Status:
 - Series adds "ingress redirect/mirror" support
 - Positive feedback for the feature
 - So far no comments regarding code itself
 - Questions raised regarding "recursion handling"
   Expressed that existing mirred code (i.e egress redirect) is *already*
   loop-unsafe (and also, some non-tc netdev constructs, as exampled by
   others).
   Discussion then wandered to "recursion handling".

Regards,
Shmulik	

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 1/4] net/sched: act_mirred: Rename tcfm_ok_push to tcfm_mac_header_xmit
  2016-09-22 13:21 ` [PATCH net-next 1/4] net/sched: act_mirred: Rename tcfm_ok_push to tcfm_mac_header_xmit Shmulik Ladkani
@ 2016-09-27 10:30   ` Daniel Borkmann
  2016-09-27 18:24     ` Shmulik Ladkani
  0 siblings, 1 reply; 44+ messages in thread
From: Daniel Borkmann @ 2016-09-27 10:30 UTC (permalink / raw)
  To: Shmulik Ladkani, David S. Miller
  Cc: Jamal Hadi Salim, WANG Cong, Eric Dumazet, netdev,
	Shmulik Ladkani

On 09/22/2016 03:21 PM, Shmulik Ladkani wrote:
> From: Shmulik Ladkani <shmulik.ladkani@gmail.com>
>
> 'tcfm_ok_push' specifies whether a mac_len sized push is needed upon
> egress to the target device (if action is performed at ingress).
>
> Rename it to 'tcfm_mac_header_xmit' as this is actually an attribute of
> the target device.
> This allows to decouple the attribute from the action to be taken.
>
> Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
> ---
>   include/net/tc_act/tc_mirred.h |  2 +-
>   net/sched/act_mirred.c         | 10 +++++-----
>   2 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/include/net/tc_act/tc_mirred.h b/include/net/tc_act/tc_mirred.h
> index 62770ad..5275158 100644
> --- a/include/net/tc_act/tc_mirred.h
> +++ b/include/net/tc_act/tc_mirred.h
> @@ -8,7 +8,7 @@ struct tcf_mirred {
>   	struct tc_action	common;
>   	int			tcfm_eaction;
>   	int			tcfm_ifindex;
> -	int			tcfm_ok_push;
> +	int			tcfm_mac_header_xmit;

Since you already touch this here and in patch 2/4 anyway, maybe
make that a bool along the way?

Perhaps instead of tcfm_mac_header_xmit, tcfm_mac_header_push
might be a better name?

>   	struct net_device __rcu	*tcfm_dev;
>   	struct list_head	tcfm_list;
>   };
> diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
> index 667dc38..7b03b13 100644
> --- a/net/sched/act_mirred.c
> +++ b/net/sched/act_mirred.c
> @@ -63,7 +63,7 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
>   	struct tc_mirred *parm;
>   	struct tcf_mirred *m;
>   	struct net_device *dev;
> -	int ret, ok_push = 0;
> +	int ret, mac_header_xmit = 0;
>   	bool exists = false;
>
>   	if (nla == NULL)
> @@ -102,10 +102,10 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
>   		case ARPHRD_IPGRE:
>   		case ARPHRD_VOID:
>   		case ARPHRD_NONE:
> -			ok_push = 0;
> +			mac_header_xmit = 0;
>   			break;
>   		default:
> -			ok_push = 1;
> +			mac_header_xmit = 1;
>   			break;
>   		}
>   	} else {
> @@ -136,7 +136,7 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
>   			dev_put(rcu_dereference_protected(m->tcfm_dev, 1));
>   		dev_hold(dev);
>   		rcu_assign_pointer(m->tcfm_dev, dev);
> -		m->tcfm_ok_push = ok_push;
> +		m->tcfm_mac_header_xmit = mac_header_xmit;
>   	}
>
>   	if (ret == ACT_P_CREATED) {
> @@ -181,7 +181,7 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
>   		goto out;
>
>   	if (!(at & AT_EGRESS)) {
> -		if (m->tcfm_ok_push)
> +		if (m->tcfm_mac_header_xmit)
>   			skb_push_rcsum(skb2, skb->mac_len);
>   	}
>
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-27  8:07     ` Shmulik Ladkani
@ 2016-09-27 10:39       ` Daniel Borkmann
  2016-09-27 13:44         ` David Miller
  2016-09-27 14:06       ` Jamal Hadi Salim
  1 sibling, 1 reply; 44+ messages in thread
From: Daniel Borkmann @ 2016-09-27 10:39 UTC (permalink / raw)
  To: Shmulik Ladkani, David Miller
  Cc: jhs, xiyou.wangcong, edumazet, netdev, shmulik.ladkani

On 09/27/2016 10:07 AM, Shmulik Ladkani wrote:
> Hi David,
>
> On Tue, 27 Sep 2016 01:56:06 -0400 (EDT), davem@davemloft.net wrote:
>> The discussion on this patch has ventured off into what to do about
>> recursion.
>>
>> But it unclear to me where this specific patch, and this series,
>> stands right now.  Someone please clear this up for me.
>
> Status:
>   - Series adds "ingress redirect/mirror" support
>   - Positive feedback for the feature
>   - So far no comments regarding code itself
>   - Questions raised regarding "recursion handling"
>     Expressed that existing mirred code (i.e egress redirect) is *already*
>     loop-unsafe (and also, some non-tc netdev constructs, as exampled by
>     others).
>     Discussion then wandered to "recursion handling".

Any reason why dev_forward_skb() is not preferred over direct
netif_receive_skb() you're using? It would, for example, implicitly
assure that pkt_type is always PACKET_HOST, etc.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-27 10:39       ` Daniel Borkmann
@ 2016-09-27 13:44         ` David Miller
  2016-09-27 14:18           ` Shmulik Ladkani
  0 siblings, 1 reply; 44+ messages in thread
From: David Miller @ 2016-09-27 13:44 UTC (permalink / raw)
  To: daniel
  Cc: shmulik.ladkani, jhs, xiyou.wangcong, edumazet, netdev,
	shmulik.ladkani

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Tue, 27 Sep 2016 12:39:34 +0200

> Any reason why dev_forward_skb() is not preferred over direct
> netif_receive_skb() you're using? It would, for example, implicitly
> assure that pkt_type is always PACKET_HOST, etc.

dev_forward_skb() will pull the ethernet header.

And since a direct call to netif_receive_skb() will not, one of these
two choices won't work properly.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-27  8:07     ` Shmulik Ladkani
  2016-09-27 10:39       ` Daniel Borkmann
@ 2016-09-27 14:06       ` Jamal Hadi Salim
  1 sibling, 0 replies; 44+ messages in thread
From: Jamal Hadi Salim @ 2016-09-27 14:06 UTC (permalink / raw)
  To: Shmulik Ladkani, David Miller
  Cc: xiyou.wangcong, edumazet, netdev, shmulik.ladkani

On 16-09-27 04:07 AM, Shmulik Ladkani wrote:
> Hi David,
>
> On Tue, 27 Sep 2016 01:56:06 -0400 (EDT), davem@davemloft.net wrote:
>> The discussion on this patch has ventured off into what to do about
>> recursion.
>>
>> But it unclear to me where this specific patch, and this series,
>> stands right now.  Someone please clear this up for me.
>
> Status:
>  - Series adds "ingress redirect/mirror" support
>  - Positive feedback for the feature
>  - So far no comments regarding code itself
>  - Questions raised regarding "recursion handling"
 >
>    Expressed that existing mirred code (i.e egress redirect) is *already*
>    loop-unsafe (and also, some non-tc netdev constructs, as exampled by
>    others).
>    Discussion then wandered to "recursion handling".

not totaly bike-shed discussion; legit issues are being raised
(and the egress issue you point out is fixable now that we are paying
attention to it).
We need to take care of loops. I pointed to how the original thought
process was. I _dont_ see this as resolvable via recursion handling
since this is per-skb and not per entry point.
You can add my Acked-by if you promise to take care of this issue next.

cheers,
jamal

PS:- the code looks straight forward

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-27 13:44         ` David Miller
@ 2016-09-27 14:18           ` Shmulik Ladkani
  2016-09-27 14:47             ` Daniel Borkmann
  0 siblings, 1 reply; 44+ messages in thread
From: Shmulik Ladkani @ 2016-09-27 14:18 UTC (permalink / raw)
  To: David Miller, daniel
  Cc: jhs, xiyou.wangcong, edumazet, netdev, shmulik.ladkani

On Tue, 27 Sep 2016 09:44:41 -0400 (EDT), davem@davemloft.net wrote:
> From: Daniel Borkmann <daniel@iogearbox.net>
> Date: Tue, 27 Sep 2016 12:39:34 +0200
> 
> > Any reason why dev_forward_skb() is not preferred over direct
> > netif_receive_skb() you're using? It would, for example, implicitly
> > assure that pkt_type is always PACKET_HOST, etc.
> 
> dev_forward_skb() will pull the ethernet header.
> 
> And since a direct call to netif_receive_skb() will not, one of these
> two choices won't work properly.

In the patch, I'm issuing a skb_pull_rcsum() prior the netif_receive_skb,
snip:

+	/* If action's target direction differs than filter's direction,
+	 * and devices expect a mac header on xmit, then mac push/pull is
+	 * needed.
+	 */
+	if (at != tcf_mirred_act_direction(m->tcfm_eaction) &&
+	    m->tcfm_mac_header_xmit) {
+		if (at & AT_EGRESS) {
+			/* caught at egress, act ingress: pull mac */
+			mac_len = skb_network_header(skb) - skb_mac_header(skb);
+			skb_pull_rcsum(skb2, mac_len);

Existing *egress* mir/red already supported pairing two non-eth devices.
Therefore I allow it for the new *ingress* mir/red as well.

Now, following this premise, the skb_pull_rcsum() shown above is executed
for devices whose xmit is mac_header based.

I've tested both ARPHRD_ETHER devices and some non ARPHRD_ETHER devices.

(an *existing* symmetric skb_push_rcsum() is invoked if packet is caught at
 ingress and redirected for egress on a mac_header xmit device)

This was the reason for picking netif_receive_skb() over dev_forward_skb().

Regards,
Shmulik

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions
  2016-09-27 14:18           ` Shmulik Ladkani
@ 2016-09-27 14:47             ` Daniel Borkmann
  0 siblings, 0 replies; 44+ messages in thread
From: Daniel Borkmann @ 2016-09-27 14:47 UTC (permalink / raw)
  To: Shmulik Ladkani, David Miller
  Cc: jhs, xiyou.wangcong, edumazet, netdev, shmulik.ladkani

On 09/27/2016 04:18 PM, Shmulik Ladkani wrote:
> On Tue, 27 Sep 2016 09:44:41 -0400 (EDT), davem@davemloft.net wrote:
>> From: Daniel Borkmann <daniel@iogearbox.net>
>> Date: Tue, 27 Sep 2016 12:39:34 +0200
>>
>>> Any reason why dev_forward_skb() is not preferred over direct
>>> netif_receive_skb() you're using? It would, for example, implicitly
>>> assure that pkt_type is always PACKET_HOST, etc.
>>
>> dev_forward_skb() will pull the ethernet header.
>>
>> And since a direct call to netif_receive_skb() will not, one of these
>> two choices won't work properly.
>
> In the patch, I'm issuing a skb_pull_rcsum() prior the netif_receive_skb,
> snip:
>
[...]
>
> Existing *egress* mir/red already supported pairing two non-eth devices.
> Therefore I allow it for the new *ingress* mir/red as well.
[...]

Yeah, makes sense then. Should skb->pkt_type become an issue, you might
probably just use act_skbedit for such cases.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH net-next 1/4] net/sched: act_mirred: Rename tcfm_ok_push to tcfm_mac_header_xmit
  2016-09-27 10:30   ` Daniel Borkmann
@ 2016-09-27 18:24     ` Shmulik Ladkani
  0 siblings, 0 replies; 44+ messages in thread
From: Shmulik Ladkani @ 2016-09-27 18:24 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: David S. Miller, Jamal Hadi Salim, WANG Cong, Eric Dumazet,
	netdev, Shmulik Ladkani

Hi,

On Tue, 27 Sep 2016 12:30:20 +0200 Daniel Borkmann <daniel@iogearbox.net> wrote:
> On 09/22/2016 03:21 PM, Shmulik Ladkani wrote:
> > From: Shmulik Ladkani <shmulik.ladkani@gmail.com>
> >
> > 'tcfm_ok_push' specifies whether a mac_len sized push is needed upon
> > egress to the target device (if action is performed at ingress).
> >
> > Rename it to 'tcfm_mac_header_xmit' as this is actually an attribute of
> > the target device.
> > This allows to decouple the attribute from the action to be taken.
> >
> > Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
> > ---
> >   include/net/tc_act/tc_mirred.h |  2 +-
> >   net/sched/act_mirred.c         | 10 +++++-----
> >   2 files changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/net/tc_act/tc_mirred.h b/include/net/tc_act/tc_mirred.h
> > index 62770ad..5275158 100644
> > --- a/include/net/tc_act/tc_mirred.h
> > +++ b/include/net/tc_act/tc_mirred.h
> > @@ -8,7 +8,7 @@ struct tcf_mirred {
> >   	struct tc_action	common;
> >   	int			tcfm_eaction;
> >   	int			tcfm_ifindex;
> > -	int			tcfm_ok_push;
> > +	int			tcfm_mac_header_xmit;  
> 
> Since you already touch this here and in patch 2/4 anyway, maybe
> make that a bool along the way?

Ok.
(Thought of it, but my urge to lessen the diff eventually won)

> Perhaps instead of tcfm_mac_header_xmit, tcfm_mac_header_push
> might be a better name?

Don't think so.

Eventually this serves as the decision to either push or pull, so prefer
not to name it as the action (push/pull) but rather what is target
device's property (xmits at mh?).

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2016-09-27 18:24 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-09-22 13:21 [PATCH net-next 0/4] act_mirred: Ingress actions support Shmulik Ladkani
2016-09-22 13:21 ` [PATCH net-next 1/4] net/sched: act_mirred: Rename tcfm_ok_push to tcfm_mac_header_xmit Shmulik Ladkani
2016-09-27 10:30   ` Daniel Borkmann
2016-09-27 18:24     ` Shmulik Ladkani
2016-09-22 13:21 ` [PATCH net-next 2/4] net/sched: act_mirred: Refactor detection whether dev needs xmit at mac header Shmulik Ladkani
2016-09-22 13:21 ` [PATCH net-next 3/4] net/sched: tc_mirred: Rename public predicates 'is_tcf_mirred_redirect' and 'is_tcf_mirred_mirror' Shmulik Ladkani
2016-09-22 13:21 ` [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions Shmulik Ladkani
2016-09-22 14:54   ` Eric Dumazet
2016-09-22 18:27     ` Shmulik Ladkani
2016-09-22 18:42       ` Eric Dumazet
2016-09-22 23:40   ` Jamal Hadi Salim
2016-09-23  5:11     ` Shmulik Ladkani
2016-09-23 12:48       ` Jamal Hadi Salim
2016-09-23 15:40         ` Shmulik Ladkani
2016-09-25  0:20           ` Cong Wang
2016-09-25 13:05           ` Jamal Hadi Salim
2016-09-25 16:26             ` Daniel Borkmann
2016-09-25 18:33               ` Florian Westphal
2016-09-25 23:47                 ` Jamal Hadi Salim
2016-09-25 23:31               ` Jamal Hadi Salim
2016-09-25 17:33             ` Shmulik Ladkani
2016-09-25 18:31               ` Florian Westphal
2016-09-26  1:15                 ` Jamal Hadi Salim
2016-09-26  1:35                   ` Florian Westphal
2016-09-26  1:40                     ` Jamal Hadi Salim
2016-09-26 14:43                     ` Hannes Frederic Sowa
2016-09-26 14:53                       ` Daniel Borkmann
2016-09-26 15:12                         ` Hannes Frederic Sowa
2016-09-26 15:53                           ` Daniel Borkmann
2016-09-26 15:26                       ` Shmulik Ladkani
2016-09-25 23:45               ` Jamal Hadi Salim
2016-09-25  0:07       ` Cong Wang
2016-09-25 13:39         ` Jamal Hadi Salim
2016-09-26  4:55           ` Cong Wang
2016-09-25 17:59         ` Shmulik Ladkani
2016-09-26  4:56           ` Cong Wang
2016-09-24 23:50   ` Cong Wang
2016-09-27  5:56   ` David Miller
2016-09-27  8:07     ` Shmulik Ladkani
2016-09-27 10:39       ` Daniel Borkmann
2016-09-27 13:44         ` David Miller
2016-09-27 14:18           ` Shmulik Ladkani
2016-09-27 14:47             ` Daniel Borkmann
2016-09-27 14:06       ` Jamal Hadi Salim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).