* [PATCH -next 1/5] net: sched: replace NCLS macro with tc_nocls bit flag
2015-05-04 18:48 [PATCH -next 0/5] replace skb tc_verd member with 3 dedicated bit flags Florian Westphal
@ 2015-05-04 18:48 ` Florian Westphal
2015-05-05 10:38 ` Daniel Borkmann
2015-05-05 23:15 ` David Miller
2015-05-04 18:48 ` [PATCH -next 2/5] net: sched: use counter to break reclassify loops Florian Westphal
` (4 subsequent siblings)
5 siblings, 2 replies; 19+ messages in thread
From: Florian Westphal @ 2015-05-04 18:48 UTC (permalink / raw)
To: netdev; +Cc: jhs, alexei.starovoitov, Florian Westphal
Signed-off-by: Florian Westphal <fw@strlen.de>
---
drivers/net/ifb.c | 2 +-
drivers/staging/octeon/ethernet-tx.c | 1 +
include/linux/skbuff.h | 5 ++++-
include/uapi/linux/pkt_cls.h | 2 ++
net/core/dev.c | 4 ++--
net/sched/act_api.c | 4 ++--
6 files changed, 12 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c
index 94570aa..3ed70302 100644
--- a/drivers/net/ifb.c
+++ b/drivers/net/ifb.c
@@ -82,7 +82,7 @@ static void ri_tasklet(unsigned long dev)
u32 from = G_TC_FROM(skb->tc_verd);
skb->tc_verd = 0;
- skb->tc_verd = SET_TC_NCLS(skb->tc_verd);
+ skb->tc_nocls = 1;
u64_stats_update_begin(&dp->tsync);
dp->tx_packets++;
diff --git a/drivers/staging/octeon/ethernet-tx.c b/drivers/staging/octeon/ethernet-tx.c
index 5b9ac1f..24e0ac6 100644
--- a/drivers/staging/octeon/ethernet-tx.c
+++ b/drivers/staging/octeon/ethernet-tx.c
@@ -403,6 +403,7 @@ int cvm_oct_xmit(struct sk_buff *skb, struct net_device *dev)
#ifdef CONFIG_NET_SCHED
skb->tc_index = 0;
#ifdef CONFIG_NET_CLS_ACT
+ skb->tc_nocls = 0;
skb->tc_verd = 0;
#endif /* CONFIG_NET_CLS_ACT */
#endif /* CONFIG_NET_SCHED */
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index acb83e2..c2112cd 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -487,6 +487,7 @@ static inline u32 skb_mstamp_us_delta(const struct skb_mstamp *t1,
* @hash: the packet hash
* @queue_mapping: Queue mapping for multiqueue devices
* @xmit_more: More SKBs are pending for this queue
+ * @tc_nocls: skip classification on ingress
* @ndisc_nodetype: router type (from link layer)
* @ooo_okay: allow the mapping of a socket to a queue to be changed
* @l4_hash: indicate hash is a canonical 4-tuple hash over transport
@@ -566,8 +567,10 @@ struct sk_buff {
fclone:2,
peeked:1,
head_frag:1,
+#ifdef CONFIG_NET_CLS_ACT
+ tc_nocls:1,
+#endif
xmit_more:1;
- /* one bit hole */
kmemcheck_bitfield_end(flags1);
/* fields enclosed in headers_start/headers_end are copied
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 596ffa0..ef8eb88 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -61,9 +61,11 @@ bits 9,10,11: redirect counter - redirect TTL. Loop avoidance
#define AT_INGRESS 0x1
#define AT_EGRESS 0x2
+#ifndef __KERNEL__
#define TC_NCLS _TC_MAKEMASK1(8)
#define SET_TC_NCLS(v) ( TC_NCLS | (v & ~TC_NCLS))
#define CLR_TC_NCLS(v) ( v & ~TC_NCLS)
+#endif
#ifndef __KERNEL__
#define S_TC_RTTL _TC_MAKE32(9)
diff --git a/net/core/dev.c b/net/core/dev.c
index 862875e..68bf73a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3678,8 +3678,8 @@ another_round:
}
#ifdef CONFIG_NET_CLS_ACT
- if (skb->tc_verd & TC_NCLS) {
- skb->tc_verd = CLR_TC_NCLS(skb->tc_verd);
+ if (skb->tc_nocls) {
+ skb->tc_nocls = 0;
goto ncls;
}
#endif
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index af427a3..022ed23 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -384,8 +384,8 @@ int tcf_action_exec(struct sk_buff *skb, const struct list_head *actions,
const struct tc_action *a;
int ret = -1;
- if (skb->tc_verd & TC_NCLS) {
- skb->tc_verd = CLR_TC_NCLS(skb->tc_verd);
+ if (skb->tc_nocls) {
+ skb->tc_nocls = 0;
ret = TC_ACT_OK;
goto exec_done;
}
--
2.0.5
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH -next 1/5] net: sched: replace NCLS macro with tc_nocls bit flag
2015-05-04 18:48 ` [PATCH -next 1/5] net: sched: replace NCLS macro with tc_nocls bit flag Florian Westphal
@ 2015-05-05 10:38 ` Daniel Borkmann
2015-05-05 23:15 ` David Miller
1 sibling, 0 replies; 19+ messages in thread
From: Daniel Borkmann @ 2015-05-05 10:38 UTC (permalink / raw)
To: Florian Westphal, netdev; +Cc: jhs, alexei.starovoitov
On 05/04/2015 08:48 PM, Florian Westphal wrote:
> Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH -next 1/5] net: sched: replace NCLS macro with tc_nocls bit flag
2015-05-04 18:48 ` [PATCH -next 1/5] net: sched: replace NCLS macro with tc_nocls bit flag Florian Westphal
2015-05-05 10:38 ` Daniel Borkmann
@ 2015-05-05 23:15 ` David Miller
1 sibling, 0 replies; 19+ messages in thread
From: David Miller @ 2015-05-05 23:15 UTC (permalink / raw)
To: fw; +Cc: netdev, jhs, alexei.starovoitov
From: Florian Westphal <fw@strlen.de>
Date: Mon, 4 May 2015 20:48:34 +0200
> @@ -566,8 +567,10 @@ struct sk_buff {
> fclone:2,
> peeked:1,
> head_frag:1,
> +#ifdef CONFIG_NET_CLS_ACT
> + tc_nocls:1,
> +#endif
> xmit_more:1;
> - /* one bit hole */
> kmemcheck_bitfield_end(flags1);
You can now remove the "one bit hole" comment?
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH -next 2/5] net: sched: use counter to break reclassify loops
2015-05-04 18:48 [PATCH -next 0/5] replace skb tc_verd member with 3 dedicated bit flags Florian Westphal
2015-05-04 18:48 ` [PATCH -next 1/5] net: sched: replace NCLS macro with tc_nocls bit flag Florian Westphal
@ 2015-05-04 18:48 ` Florian Westphal
2015-05-05 10:47 ` Daniel Borkmann
2015-05-04 18:48 ` [PATCH -next 3/5] net: sched: remove FROM INGRESS/EGRESS Florian Westphal
` (3 subsequent siblings)
5 siblings, 1 reply; 19+ messages in thread
From: Florian Westphal @ 2015-05-04 18:48 UTC (permalink / raw)
To: netdev; +Cc: jhs, alexei.starovoitov, Florian Westphal
Seems all we want here is to avoid endless 'goto reclassify' loop.
tc_classify_compat even resets this counter when something other
than TC_ACT_RECLASSIFY is returned, so this skb-counter doesn't
break hypothetical loops induced by something other than perpetual
TC_ACT_RECLASSIFY return values.
skb_act_clone is now identical to skb_clone, so just use that.
Tested with following (bogus) filter:
tc filter add dev eth0 parent ffff: \
protocol ip u32 match u32 0 0 police rate 10Kbit burst \
64000 mtu 1500 action reclassify
Signed-off-by: Florian Westphal <fw@strlen.de>
---
Documentation/networking/tc-actions-env-rules.txt | 4 ----
include/net/sch_generic.h | 15 ---------------
include/uapi/linux/pkt_cls.h | 4 +---
net/sched/act_mirred.c | 2 +-
net/sched/sch_api.c | 12 +++---------
5 files changed, 5 insertions(+), 32 deletions(-)
diff --git a/Documentation/networking/tc-actions-env-rules.txt b/Documentation/networking/tc-actions-env-rules.txt
index 95c7171..f378146 100644
--- a/Documentation/networking/tc-actions-env-rules.txt
+++ b/Documentation/networking/tc-actions-env-rules.txt
@@ -8,10 +8,6 @@ For example if your action queues a packet to be processed later,
or intentionally branches by redirecting a packet, then you need to
clone the packet.
-There are certain fields in the skb tc_verd that need to be reset so we
-avoid loops, etc. A few are generic enough that skb_act_clone()
-resets them for you, so invoke skb_act_clone() rather than skb_clone().
-
2) If you munge any packet thou shalt call pskb_expand_head in the case
someone else is referencing the skb. After that you "own" the skb.
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 994b5a0..a611ed4 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -745,21 +745,6 @@ static inline u32 qdisc_l2t(struct qdisc_rate_table* rtab, unsigned int pktlen)
return rtab->data[slot];
}
-#ifdef CONFIG_NET_CLS_ACT
-static inline struct sk_buff *skb_act_clone(struct sk_buff *skb, gfp_t gfp_mask,
- int action)
-{
- struct sk_buff *n;
-
- n = skb_clone(skb, gfp_mask);
-
- if (n) {
- n->tc_verd = SET_TC_VERD(n->tc_verd, 0);
- }
- return n;
-}
-#endif
-
struct psched_ratecfg {
u64 rate_bytes_ps; /* bytes per second */
u32 mult;
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index ef8eb88..02dca42 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -44,13 +44,13 @@ bits 9,10,11: redirect counter - redirect TTL. Loop avoidance
#define TC_OK2MUNGE _TC_MAKEMASK1(1)
#define SET_TC_OK2MUNGE(v) ( TC_OK2MUNGE | (v & ~TC_OK2MUNGE))
#define CLR_TC_OK2MUNGE(v) ( v & ~TC_OK2MUNGE)
-#endif
#define S_TC_VERD _TC_MAKE32(2)
#define M_TC_VERD _TC_MAKEMASK(4,S_TC_VERD)
#define G_TC_VERD(x) _TC_GETVALUE(x,S_TC_VERD,M_TC_VERD)
#define V_TC_VERD(x) _TC_MAKEVALUE(x,S_TC_VERD)
#define SET_TC_VERD(v,n) ((V_TC_VERD(n)) | (v & ~M_TC_VERD))
+#endif
#define S_TC_FROM _TC_MAKE32(6)
#define M_TC_FROM _TC_MAKEMASK(2,S_TC_FROM)
@@ -65,9 +65,7 @@ bits 9,10,11: redirect counter - redirect TTL. Loop avoidance
#define TC_NCLS _TC_MAKEMASK1(8)
#define SET_TC_NCLS(v) ( TC_NCLS | (v & ~TC_NCLS))
#define CLR_TC_NCLS(v) ( v & ~TC_NCLS)
-#endif
-#ifndef __KERNEL__
#define S_TC_RTTL _TC_MAKE32(9)
#define M_TC_RTTL _TC_MAKEMASK(3,S_TC_RTTL)
#define G_TC_RTTL(x) _TC_GETVALUE(x,S_TC_RTTL,M_TC_RTTL)
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 3f63cea..a42a3b2 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -151,7 +151,7 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
}
at = G_TC_AT(skb->tc_verd);
- skb2 = skb_act_clone(skb, GFP_ATOMIC, m->tcf_action);
+ skb2 = skb_clone(skb, GFP_ATOMIC);
if (skb2 == NULL)
goto out;
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index ad9eed7..0b74dc0 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -1816,13 +1816,8 @@ int tc_classify_compat(struct sk_buff *skb, const struct tcf_proto *tp,
continue;
err = tp->classify(skb, tp, res);
- if (err >= 0) {
-#ifdef CONFIG_NET_CLS_ACT
- if (err != TC_ACT_RECLASSIFY && skb->tc_verd)
- skb->tc_verd = SET_TC_VERD(skb->tc_verd, 0);
-#endif
+ if (err >= 0)
return err;
- }
}
return -1;
}
@@ -1834,23 +1829,22 @@ int tc_classify(struct sk_buff *skb, const struct tcf_proto *tp,
int err = 0;
#ifdef CONFIG_NET_CLS_ACT
const struct tcf_proto *otp = tp;
+ int limit = 0;
reclassify:
#endif
err = tc_classify_compat(skb, tp, res);
#ifdef CONFIG_NET_CLS_ACT
if (err == TC_ACT_RECLASSIFY) {
- u32 verd = G_TC_VERD(skb->tc_verd);
tp = otp;
- if (verd++ >= MAX_REC_LOOP) {
+ if (unlikely(limit++ >= MAX_REC_LOOP)) {
net_notice_ratelimited("%s: packet reclassify loop rule prio %u protocol %02x\n",
tp->q->ops->id,
tp->prio & 0xffff,
ntohs(tp->protocol));
return TC_ACT_SHOT;
}
- skb->tc_verd = SET_TC_VERD(skb->tc_verd, verd);
goto reclassify;
}
#endif
--
2.0.5
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH -next 2/5] net: sched: use counter to break reclassify loops
2015-05-04 18:48 ` [PATCH -next 2/5] net: sched: use counter to break reclassify loops Florian Westphal
@ 2015-05-05 10:47 ` Daniel Borkmann
0 siblings, 0 replies; 19+ messages in thread
From: Daniel Borkmann @ 2015-05-05 10:47 UTC (permalink / raw)
To: Florian Westphal, netdev; +Cc: jhs, alexei.starovoitov
On 05/04/2015 08:48 PM, Florian Westphal wrote:
> Seems all we want here is to avoid endless 'goto reclassify' loop.
> tc_classify_compat even resets this counter when something other
> than TC_ACT_RECLASSIFY is returned, so this skb-counter doesn't
> break hypothetical loops induced by something other than perpetual
> TC_ACT_RECLASSIFY return values.
>
> skb_act_clone is now identical to skb_clone, so just use that.
>
> Tested with following (bogus) filter:
> tc filter add dev eth0 parent ffff: \
> protocol ip u32 match u32 0 0 police rate 10Kbit burst \
> 64000 mtu 1500 action reclassify
>
> Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH -next 3/5] net: sched: remove FROM INGRESS/EGRESS
2015-05-04 18:48 [PATCH -next 0/5] replace skb tc_verd member with 3 dedicated bit flags Florian Westphal
2015-05-04 18:48 ` [PATCH -next 1/5] net: sched: replace NCLS macro with tc_nocls bit flag Florian Westphal
2015-05-04 18:48 ` [PATCH -next 2/5] net: sched: use counter to break reclassify loops Florian Westphal
@ 2015-05-04 18:48 ` Florian Westphal
2015-05-05 10:51 ` Daniel Borkmann
2015-05-04 18:48 ` [PATCH -next 4/5] net: sched: remove AT INGRESS/EGRESS Florian Westphal
` (2 subsequent siblings)
5 siblings, 1 reply; 19+ messages in thread
From: Florian Westphal @ 2015-05-04 18:48 UTC (permalink / raw)
To: netdev; +Cc: jhs, alexei.starovoitov, Florian Westphal
use a single bit marker to indicate if skb originates from rx path.
Signed-off-by: Florian Westphal <fw@strlen.de>
---
drivers/net/ifb.c | 12 ++++--------
drivers/staging/octeon/ethernet-tx.c | 1 +
include/linux/skbuff.h | 6 +++++-
include/uapi/linux/pkt_cls.h | 2 +-
net/sched/act_mirred.c | 2 +-
net/sched/sch_netem.c | 2 +-
6 files changed, 13 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c
index 3ed70302..5ec6797 100644
--- a/drivers/net/ifb.c
+++ b/drivers/net/ifb.c
@@ -79,8 +79,6 @@ static void ri_tasklet(unsigned long dev)
}
while ((skb = __skb_dequeue(&dp->tq)) != NULL) {
- u32 from = G_TC_FROM(skb->tc_verd);
-
skb->tc_verd = 0;
skb->tc_nocls = 1;
@@ -102,13 +100,12 @@ static void ri_tasklet(unsigned long dev)
rcu_read_unlock();
skb->skb_iif = _dev->ifindex;
- if (from & AT_EGRESS) {
+ if (!skb->tc_from_ingress) {
dev_queue_xmit(skb);
- } else if (from & AT_INGRESS) {
+ } else {
skb_pull(skb, skb->mac_len);
netif_receive_skb(skb);
- } else
- BUG();
+ }
}
if (__netif_tx_trylock(txq)) {
@@ -193,14 +190,13 @@ static void ifb_setup(struct net_device *dev)
static netdev_tx_t ifb_xmit(struct sk_buff *skb, struct net_device *dev)
{
struct ifb_private *dp = netdev_priv(dev);
- u32 from = G_TC_FROM(skb->tc_verd);
u64_stats_update_begin(&dp->rsync);
dp->rx_packets++;
dp->rx_bytes += skb->len;
u64_stats_update_end(&dp->rsync);
- if (!(from & (AT_INGRESS|AT_EGRESS)) || !skb->skb_iif) {
+ if (!skb->skb_iif) {
dev_kfree_skb(skb);
dev->stats.rx_dropped++;
return NETDEV_TX_OK;
diff --git a/drivers/staging/octeon/ethernet-tx.c b/drivers/staging/octeon/ethernet-tx.c
index 24e0ac6..bcefebd 100644
--- a/drivers/staging/octeon/ethernet-tx.c
+++ b/drivers/staging/octeon/ethernet-tx.c
@@ -404,6 +404,7 @@ int cvm_oct_xmit(struct sk_buff *skb, struct net_device *dev)
skb->tc_index = 0;
#ifdef CONFIG_NET_CLS_ACT
skb->tc_nocls = 0;
+ skb->tc_from_ingress = 0;
skb->tc_verd = 0;
#endif /* CONFIG_NET_CLS_ACT */
#endif /* CONFIG_NET_SCHED */
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index c2112cd..911d84e 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -487,6 +487,7 @@ static inline u32 skb_mstamp_us_delta(const struct skb_mstamp *t1,
* @hash: the packet hash
* @queue_mapping: Queue mapping for multiqueue devices
* @xmit_more: More SKBs are pending for this queue
+ * @tc_from_ingress: skb is processed during rx, not transmit
* @tc_nocls: skip classification on ingress
* @ndisc_nodetype: router type (from link layer)
* @ooo_okay: allow the mapping of a socket to a queue to be changed
@@ -617,7 +618,10 @@ struct sk_buff {
__u8 ipvs_property:1;
__u8 inner_protocol_type:1;
__u8 remcsum_offload:1;
- /* 3 or 5 bit hole */
+#ifdef CONFIG_NET_CLS_ACT
+ __u8 tc_from_ingress:1;
+#endif
+ /* 2 or 4 bit hole */
#ifdef CONFIG_NET_SCHED
__u16 tc_index; /* traffic control index */
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 02dca42..ac88cbb 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -50,13 +50,13 @@ bits 9,10,11: redirect counter - redirect TTL. Loop avoidance
#define G_TC_VERD(x) _TC_GETVALUE(x,S_TC_VERD,M_TC_VERD)
#define V_TC_VERD(x) _TC_MAKEVALUE(x,S_TC_VERD)
#define SET_TC_VERD(v,n) ((V_TC_VERD(n)) | (v & ~M_TC_VERD))
-#endif
#define S_TC_FROM _TC_MAKE32(6)
#define M_TC_FROM _TC_MAKEMASK(2,S_TC_FROM)
#define G_TC_FROM(x) _TC_GETVALUE(x,S_TC_FROM,M_TC_FROM)
#define V_TC_FROM(x) _TC_MAKEVALUE(x,S_TC_FROM)
#define SET_TC_FROM(v,n) ((V_TC_FROM(n)) | (v & ~M_TC_FROM))
+#endif
#define AT_STACK 0x0
#define AT_INGRESS 0x1
#define AT_EGRESS 0x2
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index a42a3b2..9d51baa 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -162,7 +162,7 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
/* mirror is always swallowed */
if (m->tcfm_eaction != TCA_EGRESS_MIRROR)
- skb2->tc_verd = SET_TC_FROM(skb2->tc_verd, at);
+ skb2->tc_from_ingress = at & AT_INGRESS ? 1 : 0;
skb2->skb_iif = skb->dev->ifindex;
skb2->dev = dev;
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 956ead2..b5e4396 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -588,7 +588,7 @@ deliver:
* If it's at ingress let's pretend the delay is
* from the network (tstamp will be updated).
*/
- if (G_TC_FROM(skb->tc_verd) & AT_INGRESS)
+ if (skb->tc_from_ingress)
skb->tstamp.tv64 = 0;
#endif
--
2.0.5
^ permalink raw reply related [flat|nested] 19+ messages in thread* [PATCH -next 4/5] net: sched: remove AT INGRESS/EGRESS
2015-05-04 18:48 [PATCH -next 0/5] replace skb tc_verd member with 3 dedicated bit flags Florian Westphal
` (2 preceding siblings ...)
2015-05-04 18:48 ` [PATCH -next 3/5] net: sched: remove FROM INGRESS/EGRESS Florian Westphal
@ 2015-05-04 18:48 ` Florian Westphal
2015-05-05 11:06 ` Daniel Borkmann
2015-05-04 18:48 ` [PATCH -next 5/5] skbuff: remove tc_verd member Florian Westphal
2015-05-05 11:39 ` [PATCH -next 0/5] replace skb tc_verd member with 3 dedicated bit flags Jamal Hadi Salim
5 siblings, 1 reply; 19+ messages in thread
From: Florian Westphal @ 2015-05-04 18:48 UTC (permalink / raw)
To: netdev; +Cc: jhs, alexei.starovoitov, Florian Westphal
use single marker to propagate location.
tc_at_ingress 1: ingress, 0 is egress.
The new flag is set/unset in sch_ingress instead of the core.
We will also no longer set skb->tc_verd to AT_EGRESS in the xmit
handler.
Signed-off-by: Florian Westphal <fw@strlen.de>
---
drivers/staging/octeon/ethernet-tx.c | 1 +
include/linux/skbuff.h | 4 +++-
include/uapi/linux/pkt_cls.h | 4 +---
net/core/dev.c | 5 -----
net/sched/act_mirred.c | 7 +++----
net/sched/sch_ingress.c | 2 ++
6 files changed, 10 insertions(+), 13 deletions(-)
diff --git a/drivers/staging/octeon/ethernet-tx.c b/drivers/staging/octeon/ethernet-tx.c
index bcefebd..2852d8a 100644
--- a/drivers/staging/octeon/ethernet-tx.c
+++ b/drivers/staging/octeon/ethernet-tx.c
@@ -405,6 +405,7 @@ int cvm_oct_xmit(struct sk_buff *skb, struct net_device *dev)
#ifdef CONFIG_NET_CLS_ACT
skb->tc_nocls = 0;
skb->tc_from_ingress = 0;
+ skb->tc_at_ingress = 0;
skb->tc_verd = 0;
#endif /* CONFIG_NET_CLS_ACT */
#endif /* CONFIG_NET_SCHED */
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 911d84e..d794077 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -488,6 +488,7 @@ static inline u32 skb_mstamp_us_delta(const struct skb_mstamp *t1,
* @queue_mapping: Queue mapping for multiqueue devices
* @xmit_more: More SKBs are pending for this queue
* @tc_from_ingress: skb is processed during rx, not transmit
+ * @tc_at_ingress: skb is processed during rx, not transmit
* @tc_nocls: skip classification on ingress
* @ndisc_nodetype: router type (from link layer)
* @ooo_okay: allow the mapping of a socket to a queue to be changed
@@ -619,9 +620,10 @@ struct sk_buff {
__u8 inner_protocol_type:1;
__u8 remcsum_offload:1;
#ifdef CONFIG_NET_CLS_ACT
+ __u8 tc_at_ingress:1;
__u8 tc_from_ingress:1;
#endif
- /* 2 or 4 bit hole */
+ /* 1 or 3 bit hole */
#ifdef CONFIG_NET_SCHED
__u16 tc_index; /* traffic control index */
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index ac88cbb..621d9d1 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -56,12 +56,10 @@ bits 9,10,11: redirect counter - redirect TTL. Loop avoidance
#define G_TC_FROM(x) _TC_GETVALUE(x,S_TC_FROM,M_TC_FROM)
#define V_TC_FROM(x) _TC_MAKEVALUE(x,S_TC_FROM)
#define SET_TC_FROM(v,n) ((V_TC_FROM(n)) | (v & ~M_TC_FROM))
-#endif
#define AT_STACK 0x0
#define AT_INGRESS 0x1
#define AT_EGRESS 0x2
-#ifndef __KERNEL__
#define TC_NCLS _TC_MAKEMASK1(8)
#define SET_TC_NCLS(v) ( TC_NCLS | (v & ~TC_NCLS))
#define CLR_TC_NCLS(v) ( v & ~TC_NCLS)
@@ -71,13 +69,13 @@ bits 9,10,11: redirect counter - redirect TTL. Loop avoidance
#define G_TC_RTTL(x) _TC_GETVALUE(x,S_TC_RTTL,M_TC_RTTL)
#define V_TC_RTTL(x) _TC_MAKEVALUE(x,S_TC_RTTL)
#define SET_TC_RTTL(v,n) ((V_TC_RTTL(n)) | (v & ~M_TC_RTTL))
-#endif
#define S_TC_AT _TC_MAKE32(12)
#define M_TC_AT _TC_MAKEMASK(2,S_TC_AT)
#define G_TC_AT(x) _TC_GETVALUE(x,S_TC_AT,M_TC_AT)
#define V_TC_AT(x) _TC_MAKEVALUE(x,S_TC_AT)
#define SET_TC_AT(v,n) ((V_TC_AT(n)) | (v & ~M_TC_AT))
+#endif
/* Action attributes */
enum {
diff --git a/net/core/dev.c b/net/core/dev.c
index 68bf73a..e2549fd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2964,9 +2964,6 @@ static int __dev_queue_xmit(struct sk_buff *skb, void *accel_priv)
txq = netdev_pick_tx(dev, skb, accel_priv);
q = rcu_dereference_bh(txq->qdisc);
-#ifdef CONFIG_NET_CLS_ACT
- skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_EGRESS);
-#endif
trace_net_dev_queue(skb);
if (q->enqueue) {
rc = __dev_xmit_skb(skb, q, dev, txq);
@@ -3534,8 +3531,6 @@ static int ing_filter(struct sk_buff *skb, struct netdev_queue *rxq)
int result = TC_ACT_OK;
struct Qdisc *q;
- skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_INGRESS);
-
q = rcu_dereference(rxq->qdisc);
if (q != &noop_qdisc) {
if (likely(!test_bit(__QDISC_STATE_DEACTIVATED, &q->state)))
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 9d51baa..63bd83b 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -131,7 +131,6 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
struct tcf_mirred *m = a->priv;
struct net_device *dev;
struct sk_buff *skb2;
- u32 at;
int retval, err = 1;
spin_lock(&m->tcf_lock);
@@ -150,19 +149,19 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
goto out;
}
- at = G_TC_AT(skb->tc_verd);
skb2 = skb_clone(skb, GFP_ATOMIC);
if (skb2 == NULL)
goto out;
- if (!(at & AT_EGRESS)) {
+ if (skb->tc_at_ingress) {
if (m->tcfm_ok_push)
skb_push(skb2, skb->mac_len);
+ skb2->tc_at_ingress = 0;
}
/* mirror is always swallowed */
if (m->tcfm_eaction != TCA_EGRESS_MIRROR)
- skb2->tc_from_ingress = at & AT_INGRESS ? 1 : 0;
+ skb2->tc_from_ingress = skb->tc_at_ingress;
skb2->skb_iif = skb->dev->ifindex;
skb2->dev = dev;
diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c
index a89cc32..b163658 100644
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ -63,7 +63,9 @@ static int ingress_enqueue(struct sk_buff *skb, struct Qdisc *sch)
struct tcf_proto *fl = rcu_dereference_bh(p->filter_list);
int result;
+ skb->tc_at_ingress = 1;
result = tc_classify(skb, fl, &res);
+ skb->tc_at_ingress = 0;
qdisc_bstats_update_cpu(sch, skb);
switch (result) {
--
2.0.5
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH -next 4/5] net: sched: remove AT INGRESS/EGRESS
2015-05-04 18:48 ` [PATCH -next 4/5] net: sched: remove AT INGRESS/EGRESS Florian Westphal
@ 2015-05-05 11:06 ` Daniel Borkmann
2015-05-05 11:11 ` Florian Westphal
0 siblings, 1 reply; 19+ messages in thread
From: Daniel Borkmann @ 2015-05-05 11:06 UTC (permalink / raw)
To: Florian Westphal, netdev; +Cc: jhs, alexei.starovoitov
On 05/04/2015 08:48 PM, Florian Westphal wrote:
> use single marker to propagate location.
> tc_at_ingress 1: ingress, 0 is egress.
>
> The new flag is set/unset in sch_ingress instead of the core.
> We will also no longer set skb->tc_verd to AT_EGRESS in the xmit
> handler.
>
> Signed-off-by: Florian Westphal <fw@strlen.de>
...
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 911d84e..d794077 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -488,6 +488,7 @@ static inline u32 skb_mstamp_us_delta(const struct skb_mstamp *t1,
> * @queue_mapping: Queue mapping for multiqueue devices
> * @xmit_more: More SKBs are pending for this queue
> * @tc_from_ingress: skb is processed during rx, not transmit
> + * @tc_at_ingress: skb is processed during rx, not transmit
Minor nit:
I think both comments needs to be a bit more clear. I.e. tc_at_ingress
tells the *current* location whether we are invoked from ingress or
egress. And, tc_from_ingress explains the *previous* location, whether
the skbs has been mirred from ingress path or egress path.
Otherwise, the rest looks good to me.
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH -next 4/5] net: sched: remove AT INGRESS/EGRESS
2015-05-05 11:06 ` Daniel Borkmann
@ 2015-05-05 11:11 ` Florian Westphal
0 siblings, 0 replies; 19+ messages in thread
From: Florian Westphal @ 2015-05-05 11:11 UTC (permalink / raw)
To: Daniel Borkmann; +Cc: Florian Westphal, netdev, jhs, alexei.starovoitov
Daniel Borkmann <daniel@iogearbox.net> wrote:
> On 05/04/2015 08:48 PM, Florian Westphal wrote:
> >use single marker to propagate location.
> >tc_at_ingress 1: ingress, 0 is egress.
> >
> >The new flag is set/unset in sch_ingress instead of the core.
> >We will also no longer set skb->tc_verd to AT_EGRESS in the xmit
> >handler.
> >
> >Signed-off-by: Florian Westphal <fw@strlen.de>
> ...
> >diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> >index 911d84e..d794077 100644
> >--- a/include/linux/skbuff.h
> >+++ b/include/linux/skbuff.h
> >@@ -488,6 +488,7 @@ static inline u32 skb_mstamp_us_delta(const struct skb_mstamp *t1,
> > * @queue_mapping: Queue mapping for multiqueue devices
> > * @xmit_more: More SKBs are pending for this queue
> > * @tc_from_ingress: skb is processed during rx, not transmit
> >+ * @tc_at_ingress: skb is processed during rx, not transmit
>
> Minor nit:
>
> I think both comments needs to be a bit more clear. I.e. tc_at_ingress
> tells the *current* location whether we are invoked from ingress or
> egress. And, tc_from_ingress explains the *previous* location, whether
> the skbs has been mirred from ingress path or egress path.
Agree, the reason is that I tried to go with single tc_ingress:1 but it
turned out to be not as trivial as I thought.
I'll fix it up in v2, I'll wait a bit more though to give others
a chance to comment.
Thanks for reviewing!
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH -next 5/5] skbuff: remove tc_verd member
2015-05-04 18:48 [PATCH -next 0/5] replace skb tc_verd member with 3 dedicated bit flags Florian Westphal
` (3 preceding siblings ...)
2015-05-04 18:48 ` [PATCH -next 4/5] net: sched: remove AT INGRESS/EGRESS Florian Westphal
@ 2015-05-04 18:48 ` Florian Westphal
2015-05-05 11:09 ` Daniel Borkmann
2015-05-05 11:39 ` [PATCH -next 0/5] replace skb tc_verd member with 3 dedicated bit flags Jamal Hadi Salim
5 siblings, 1 reply; 19+ messages in thread
From: Florian Westphal @ 2015-05-04 18:48 UTC (permalink / raw)
To: netdev; +Cc: jhs, alexei.starovoitov, Florian Westphal
not used anymore; replaced with tc_nocls and tc_{at,from}_ingress flags.
Signed-off-by: Florian Westphal <fw@strlen.de>
---
drivers/net/ifb.c | 1 -
drivers/staging/octeon/ethernet-tx.c | 1 -
include/linux/skbuff.h | 4 ----
net/core/dev.c | 1 -
net/core/skbuff.c | 3 ---
5 files changed, 10 deletions(-)
diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c
index 5ec6797..9f88ac5 100644
--- a/drivers/net/ifb.c
+++ b/drivers/net/ifb.c
@@ -79,7 +79,6 @@ static void ri_tasklet(unsigned long dev)
}
while ((skb = __skb_dequeue(&dp->tq)) != NULL) {
- skb->tc_verd = 0;
skb->tc_nocls = 1;
u64_stats_update_begin(&dp->tsync);
diff --git a/drivers/staging/octeon/ethernet-tx.c b/drivers/staging/octeon/ethernet-tx.c
index 2852d8a..bac1685 100644
--- a/drivers/staging/octeon/ethernet-tx.c
+++ b/drivers/staging/octeon/ethernet-tx.c
@@ -406,7 +406,6 @@ int cvm_oct_xmit(struct sk_buff *skb, struct net_device *dev)
skb->tc_nocls = 0;
skb->tc_from_ingress = 0;
skb->tc_at_ingress = 0;
- skb->tc_verd = 0;
#endif /* CONFIG_NET_CLS_ACT */
#endif /* CONFIG_NET_SCHED */
#endif /* REUSE_SKBUFFS_WITHOUT_FREE */
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index d794077..eebbb79 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -483,7 +483,6 @@ static inline u32 skb_mstamp_us_delta(const struct skb_mstamp *t1,
* @nf_bridge: Saved data about a bridged frame - see br_netfilter.c
* @skb_iif: ifindex of device we arrived on
* @tc_index: Traffic control index
- * @tc_verd: traffic control verdict
* @hash: the packet hash
* @queue_mapping: Queue mapping for multiqueue devices
* @xmit_more: More SKBs are pending for this queue
@@ -627,9 +626,6 @@ struct sk_buff {
#ifdef CONFIG_NET_SCHED
__u16 tc_index; /* traffic control index */
-#ifdef CONFIG_NET_CLS_ACT
- __u16 tc_verd; /* traffic control verdict */
-#endif
#endif
union {
diff --git a/net/core/dev.c b/net/core/dev.c
index e2549fd..7178602 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3702,7 +3702,6 @@ skip_taps:
goto unlock;
}
- skb->tc_verd = 0;
ncls:
#endif
if (pfmemalloc && !skb_pfmemalloc_protocol(skb))
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 3cfff2a..81dd87f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -848,9 +848,6 @@ static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
#endif
#ifdef CONFIG_NET_SCHED
CHECK_SKB_FIELD(tc_index);
-#ifdef CONFIG_NET_CLS_ACT
- CHECK_SKB_FIELD(tc_verd);
-#endif
#endif
}
--
2.0.5
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH -next 0/5] replace skb tc_verd member with 3 dedicated bit flags
2015-05-04 18:48 [PATCH -next 0/5] replace skb tc_verd member with 3 dedicated bit flags Florian Westphal
` (4 preceding siblings ...)
2015-05-04 18:48 ` [PATCH -next 5/5] skbuff: remove tc_verd member Florian Westphal
@ 2015-05-05 11:39 ` Jamal Hadi Salim
2015-05-05 11:47 ` Florian Westphal
5 siblings, 1 reply; 19+ messages in thread
From: Jamal Hadi Salim @ 2015-05-05 11:39 UTC (permalink / raw)
To: Florian Westphal, netdev; +Cc: alexei.starovoitov
Florian,
Initial feedback on the series:
- Can you keep the macros around? eg SET_TC_NCLS is more readable
than skb->tc_nocls = 1 also hides the bit details.
Those macros are not used in user space (there was need earlier but
it is no longer necessary; dont worry about out of tree code).
So this is an opportunity to redefine them. Move them to
include/net/pkt_cls.h and redefine them so readability is better. Maybe
use set_bit
instead of the complex ones that exist there.
I think the ones that are no longer needed should just be deleted
as opposed to what you and Alexei did earlier.
- We need two bits for the location (ingress, egress, from stack)
from stack being 0 i.e when it is not set implicitly it is from the
host stack then we can check for ingress or egress when we choose.
Please keep the two bits.
cheers,
jamal
On 05/04/15 14:48, Florian Westphal wrote:
> This series removes the tc_verd member from sk_buff
> and uses 3 new flag bits instead. It is on top of
> http://patchwork.ozlabs.org/patch/467186/
> (tc: remove unused redirect ttl) from Alexei.
>
> The patch set tries to not affect tc inner workings.
> I tested a few scenarios, namely:
>
> - htb on egress
> - netem attached to ifb with mirred redirect from ingress
> qdisc
> - mirred to different device
> - bogus packet reclassify loop
>
> All of that works as expected (i.e. in last case most packets
> are dropped).
>
> motivation is two-fold:
> 1) provide better documentation as to what tc_verd is
> used for and try to move some flag set/clear operations
> into sch_ingress
>
> 2) provide a way to reduce skb size by 8 bytes
> (s/u16 mac_len/u8 mac_len/ would result in
> two 2 byte and one 4 byte hole, i.e. 8 byte reduction with
> minor reshuffling).
>
> Florian Westphal (5):
> net: sched: replace NCLS macro with tc_nocls bit flag
> net: sched: use counter to break reclassify loops
> net: sched: remove FROM INGRESS/EGRESS
> net: sched: remove AT INGRESS/EGRESS
> skbuff: remove tc_verd member
>
> Documentation/networking/tc-actions-env-rules.txt | 4 ----
> drivers/net/ifb.c | 15 +++++----------
> drivers/staging/octeon/ethernet-tx.c | 4 +++-
> include/linux/skbuff.h | 17 +++++++++++------
> include/net/sch_generic.h | 15 ---------------
> include/uapi/linux/pkt_cls.h | 4 +---
> net/core/dev.c | 10 ++--------
> net/core/skbuff.c | 3 ---
> net/sched/act_api.c | 4 ++--
> net/sched/act_mirred.c | 9 ++++-----
> net/sched/sch_api.c | 12 +++---------
> net/sched/sch_ingress.c | 2 ++
> net/sched/sch_netem.c | 2 +-
> 13 files changed, 34 insertions(+), 67 deletions(-)
>
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH -next 0/5] replace skb tc_verd member with 3 dedicated bit flags
2015-05-05 11:39 ` [PATCH -next 0/5] replace skb tc_verd member with 3 dedicated bit flags Jamal Hadi Salim
@ 2015-05-05 11:47 ` Florian Westphal
2015-05-05 11:58 ` Jamal Hadi Salim
0 siblings, 1 reply; 19+ messages in thread
From: Florian Westphal @ 2015-05-05 11:47 UTC (permalink / raw)
To: Jamal Hadi Salim; +Cc: Florian Westphal, netdev, alexei.starovoitov
Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> Initial feedback on the series:
> - Can you keep the macros around? eg SET_TC_NCLS is more readable
> than skb->tc_nocls = 1 also hides the bit details.
I beg to differ, sorry :-/
We use blah:1 everywhere else in sk_buff, only tc is different and
its not obvious (to me) how tc_verd is being used and for what.
Or are you saying that should redefine SET_TC_NCLS to something like
#define SET_TC_NCLS(skb) (skb)->tc_nocls = 1
?
> I think the ones that are no longer needed should just be deleted
> as opposed to what you and Alexei did earlier.
Fair enough, I can do that.
> - We need two bits for the location (ingress, egress, from stack)
> from stack being 0 i.e when it is not set implicitly it is from the
> host stack then we can check for ingress or egress when we choose.
Hmm, are you sure? How is that used?
In fact ifb will BUG() if neither AT_INGRESS or AT_EGRESS was set
in G_TC_FROM().
Thanks for reviewing!
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH -next 0/5] replace skb tc_verd member with 3 dedicated bit flags
2015-05-05 11:47 ` Florian Westphal
@ 2015-05-05 11:58 ` Jamal Hadi Salim
2015-05-05 12:37 ` Daniel Borkmann
2015-05-05 13:06 ` Florian Westphal
0 siblings, 2 replies; 19+ messages in thread
From: Jamal Hadi Salim @ 2015-05-05 11:58 UTC (permalink / raw)
To: Florian Westphal; +Cc: netdev, alexei.starovoitov
On 05/05/15 07:47, Florian Westphal wrote:
> Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>> Initial feedback on the series:
>> - Can you keep the macros around? eg SET_TC_NCLS is more readable
>> than skb->tc_nocls = 1 also hides the bit details.
>
> I beg to differ, sorry :-/
>
> We use blah:1 everywhere else in sk_buff, only tc is different and
> its not obvious (to me) how tc_verd is being used and for what.
>
> Or are you saying that should redefine SET_TC_NCLS to something like
>
> #define SET_TC_NCLS(skb) (skb)->tc_nocls = 1
>
> ?
>
It is borderline questionable for 1 bit but for consistency i
suggest you do what was there before. I pointed to nocls but
i meant the comment generically because previous code you are
changing intended to use the macros.
In any case I will leave it up to you.
>> I think the ones that are no longer needed should just be deleted
>> as opposed to what you and Alexei did earlier.
>
> Fair enough, I can do that.
Perhaps even as a first patch that move would be useful.
>
>> - We need two bits for the location (ingress, egress, from stack)
>> from stack being 0 i.e when it is not set implicitly it is from the
>> host stack then we can check for ingress or egress when we choose.
>
> Hmm, are you sure? How is that used?
>
As example, when something like
if (!(at & AT_EGRESS))
implies it is either from ingress or the stack.
It does not only from ingress.
> In fact ifb will BUG() if neither AT_INGRESS or AT_EGRESS was set
> in G_TC_FROM().
>
Yes, because you cant send directly from the stack host to ifb. You
can only redirect to it. If we ever end there from the host we should
bug()
cheers,
jamal
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH -next 0/5] replace skb tc_verd member with 3 dedicated bit flags
2015-05-05 11:58 ` Jamal Hadi Salim
@ 2015-05-05 12:37 ` Daniel Borkmann
2015-05-05 13:22 ` Jamal Hadi Salim
2015-05-05 13:06 ` Florian Westphal
1 sibling, 1 reply; 19+ messages in thread
From: Daniel Borkmann @ 2015-05-05 12:37 UTC (permalink / raw)
To: Jamal Hadi Salim, Florian Westphal; +Cc: netdev, alexei.starovoitov
On 05/05/2015 01:58 PM, Jamal Hadi Salim wrote:
> On 05/05/15 07:47, Florian Westphal wrote:
>> Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>>> Initial feedback on the series:
>>> - Can you keep the macros around? eg SET_TC_NCLS is more readable
>>> than skb->tc_nocls = 1 also hides the bit details.
(see below)
>> I beg to differ, sorry :-/
>>
>> We use blah:1 everywhere else in sk_buff, only tc is different and
>> its not obvious (to me) how tc_verd is being used and for what.
>>
>> Or are you saying that should redefine SET_TC_NCLS to something like
>>
>> #define SET_TC_NCLS(skb) (skb)->tc_nocls = 1
>>
>> ?
>
> It is borderline questionable for 1 bit but for consistency i
> suggest you do what was there before. I pointed to nocls but
> i meant the comment generically because previous code you are
> changing intended to use the macros.
> In any case I will leave it up to you.
So, just to give my comment on the macros ... if I look at them ...
#define TC_MUNGED _TC_MAKEMASK1(0)
#define SET_TC_MUNGED(v) ( TC_MUNGED | (v & ~TC_MUNGED))
#define CLR_TC_MUNGED(v) ( v & ~TC_MUNGED)
#define TC_OK2MUNGE _TC_MAKEMASK1(1)
#define SET_TC_OK2MUNGE(v) ( TC_OK2MUNGE | (v & ~TC_OK2MUNGE))
#define CLR_TC_OK2MUNGE(v) ( v & ~TC_OK2MUNGE)
#define S_TC_VERD _TC_MAKE32(2)
#define M_TC_VERD _TC_MAKEMASK(4,S_TC_VERD)
#define G_TC_VERD(x) _TC_GETVALUE(x,S_TC_VERD,M_TC_VERD)
#define V_TC_VERD(x) _TC_MAKEVALUE(x,S_TC_VERD)
#define SET_TC_VERD(v,n) ((V_TC_VERD(n)) | (v & ~M_TC_VERD))
#define S_TC_FROM _TC_MAKE32(6)
#define M_TC_FROM _TC_MAKEMASK(2,S_TC_FROM)
#define G_TC_FROM(x) _TC_GETVALUE(x,S_TC_FROM,M_TC_FROM)
#define V_TC_FROM(x) _TC_MAKEVALUE(x,S_TC_FROM)
#define SET_TC_FROM(v,n) ((V_TC_FROM(n)) | (v & ~M_TC_FROM))
... I quite frankly find the transformation after Florian's series
*MUCH*, *MUCH* more intuitive, also given that we use such kind of
bit flags already extensively everywhere in else the stack.
What's more is that we reduce skbuff usage by 13-12 bits (given the
follow up fix with AT_STACK is addressed).
I really think we should get rid of these macros from tc code.
>>> I think the ones that are no longer needed should just be deleted
>>> as opposed to what you and Alexei did earlier.
>>
>> Fair enough, I can do that.
>
> Perhaps even as a first patch that move would be useful.
I think that can be done as a follow-up *after* the series. Given
it's uapi (which probably never should have been?) it's a different
question on its own.
Looking at git log include/uapi/linux/pkt_cls.h, it slipped in via
David Howells uapi script ...
commit 607ca46e97a1b6594b29647d98a32d545c24bdff
Author: David Howells <dhowells@redhat.com>
Date: Sat Oct 13 10:46:48 2012 +0100
UAPI: (Scripted) Disintegrate include/linux
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Jones <davej@redhat.com>
... if that was accidental, we should remove all the macros that are
then undef'ed for kernel space. I currently fail to see how iproute2
could have ever used them.
>>> - We need two bits for the location (ingress, egress, from stack)
>>> from stack being 0 i.e when it is not set implicitly it is from the
>>> host stack then we can check for ingress or egress when we choose.
>>
>> Hmm, are you sure? How is that used?
>
> As example, when something like
> if (!(at & AT_EGRESS))
> implies it is either from ingress or the stack.
> It does not only from ingress.
>
>> In fact ifb will BUG() if neither AT_INGRESS or AT_EGRESS was set
>> in G_TC_FROM().
>
> Yes, because you cant send directly from the stack host to ifb. You
> can only redirect to it. If we ever end there from the host we should
> bug()
Hm, I actually missed we have AT_STACK (== 0) as it's not set explicitly,
good point, that means tc_from_ingress needs to be one single bit more.
Thanks,
Daniel
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH -next 0/5] replace skb tc_verd member with 3 dedicated bit flags
2015-05-05 12:37 ` Daniel Borkmann
@ 2015-05-05 13:22 ` Jamal Hadi Salim
0 siblings, 0 replies; 19+ messages in thread
From: Jamal Hadi Salim @ 2015-05-05 13:22 UTC (permalink / raw)
To: Daniel Borkmann, Florian Westphal; +Cc: netdev, alexei.starovoitov
On 05/05/15 08:37, Daniel Borkmann wrote:
> On 05/05/2015 01:58 PM, Jamal Hadi Salim wrote:
>
> ... I quite frankly find the transformation after Florian's series
> *MUCH*, *MUCH* more intuitive, also given that we use such kind of
> bit flags already extensively everywhere in else the stack.
>
But he has to go around and change all occurrences where the macros are
invoked.
In some cases nothing has changed; in such a case the macros are useful
for hiding what goes on.
In any case - this is not as a big issue.
> What's more is that we reduce skbuff usage by 13-12 bits (given the
> follow up fix with AT_STACK is addressed).
>
Thats an orthogonal issue. Those bits were useful a few years ago,
and the use cases didnt pan out. So iam not against getting recycling.
> I think that can be done as a follow-up *after* the series.
Sure. Would of course be better to do it in this series if changes
are going to be made.
> Given
> it's uapi (which probably never should have been?) it's a different
> question on its own.
>
> Looking at git log include/uapi/linux/pkt_cls.h, it slipped in via
> David Howells uapi script ...
>
> commit 607ca46e97a1b6594b29647d98a32d545c24bdff
> Author: David Howells <dhowells@redhat.com>
> Date: Sat Oct 13 10:46:48 2012 +0100
>
Should never have been in uapi.
I dont think i saw that submission you pointed to.
But the problem may have existed before that patch.
cheers,
jamal
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH -next 0/5] replace skb tc_verd member with 3 dedicated bit flags
2015-05-05 11:58 ` Jamal Hadi Salim
2015-05-05 12:37 ` Daniel Borkmann
@ 2015-05-05 13:06 ` Florian Westphal
1 sibling, 0 replies; 19+ messages in thread
From: Florian Westphal @ 2015-05-05 13:06 UTC (permalink / raw)
To: Jamal Hadi Salim; +Cc: Florian Westphal, netdev, alexei.starovoitov
Jamal Hadi Salim <jhs@mojatatu.com> wrote:
> It is borderline questionable for 1 bit but for consistency i
> suggest you do what was there before. I pointed to nocls but
> i meant the comment generically because previous code you are
> changing intended to use the macros.
True.
> In any case I will leave it up to you.
Okay, thanks Jamal.
I'll have to think about this wrt. how to proceed.
> >>- We need two bits for the location (ingress, egress, from stack)
> >>from stack being 0 i.e when it is not set implicitly it is from the
> >>host stack then we can check for ingress or egress when we choose.
> >
> >Hmm, are you sure? How is that used?
> >
>
> As example, when something like
> if (!(at & AT_EGRESS))
> implies it is either from ingress or the stack.
> It does not only from ingress.
> >In fact ifb will BUG() if neither AT_INGRESS or AT_EGRESS was set
> >in G_TC_FROM().
> >
>
> Yes, because you cant send directly from the stack host to ifb. You
> can only redirect to it. If we ever end there from the host we should
> bug()
$ git grep G_TC_FROM | nl
1 drivers/net/ifb.c: u32 from = G_TC_FROM(skb->tc_verd);
2 drivers/net/ifb.c: u32 from = G_TC_FROM(skb->tc_verd);
3 include/uapi/linux/pkt_cls.h:#define G_TC_FROM(x) _TC_GETVALUE(x,S_TC_FROM,M_TC_FROM)
4 net/sched/sch_netem.c: if (G_TC_FROM(skb->tc_verd) & AT_INGRESS)
#1 will BUG() if G_TC_FROM yields 0
#4 only cares about AT_INGRESS
I can indeed get #2 to return 'from 0', via:
ip link set dev ifb0 up
ip addr add 10.2.1.1/8 dev ifb0
ping 10.2.1.2
However, there is no user-visible behaviour change since skb->iif is 0 for locally generated
skb. Since we do '(from == 0 || skb->iif == 0) -> drop' ifb will still be a /dev/null sink
without skb coming in via mirred action.
This also seems to work:
ip link set dev ifb0 up
ip link set dev eth1 up
ip addr add 192.168.42.1/24 dev eth1
tc qdisc add dev eth1 root handle 1: htb default 1
tc filter add dev eth1 parent 1: protocol all u32 match u32 0 0 action mirred egress redirect dev ifb0
ping -c 1 -b 192.168.42.255
iif is set to eth1 iif by the mirred action, so its nonzero by the time skb ends up in ifb and is redirected.
I fully admit that I did not consider this beforehand though ;)
Is there anything else I am missing?
Cheers,
Florian
^ permalink raw reply [flat|nested] 19+ messages in thread