netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] qos: Limit a filter's priority to a 16 bit value
@ 2009-05-01 18:42 Robert Love
  2009-05-01 22:51 ` David Miller
  0 siblings, 1 reply; 7+ messages in thread
From: Robert Love @ 2009-05-01 18:42 UTC (permalink / raw)
  To: hadi, netdev; +Cc: davem, robert.w.love, jeffrey.t.kirsher

Currently the kernel is storing the priority value for
a filter as a 32 bit value. However, the netlink messages
between the kernel and userspace are passing only 16 bits
for the priority. This means that the kernel will always
try to compare the passed down filter's priority using the
correct upper 16 bits, but all 0s for the lower 16 bits.

Most comparisons will fail when checking against the filter
chain unless the actual filter has a priority with all 0s
for the last 16 bits.

The fallout is that users cannot modify or delete most
filters that are added.

This patch makes it so the kernel only uses 16 bits for
a filter's priority. It has no impact on the kernel/user
interaction, but does reduce the number of potential filters.

I sent this as an RFC becuase I'm not sure if the correct
fix is to switch the kernel's priority value to 32 bits or
to change the netlink messages to pass 32bit priorities.

I implemented the change to 16bit approach becuase I didn't
want to change kernel/user messages, I think that changing
this API may be undesirable. Also 16 bits still allows 65535
filter priorities, which to me seems like enough, but I may
not understand all usages of the filters.

Signed-off-by: Robert Love <robert.w.love@intel.com>
---

 include/net/sch_generic.h |    2 +-
 net/sched/cls_api.c       |   16 ++++++++--------
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 964ffa0..2a03027 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -170,7 +170,7 @@ struct tcf_proto
 	__be16			protocol;
 
 	/* All the rest */
-	u32			prio;
+	u16			prio;
 	u32			classid;
 	struct Qdisc		*q;
 	void			*data;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 173fcc4..b754c22 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -104,9 +104,9 @@ static int tfilter_notify(struct sk_buff *oskb, struct nlmsghdr *n,
 
 /* Select new prio value from the range, managed by kernel. */
 
-static inline u32 tcf_auto_prio(struct tcf_proto *tp)
+static inline u16 tcf_auto_prio(struct tcf_proto *tp)
 {
-	u32 first = TC_H_MAKE(0xC0000000U, 0U);
+	u16 first = 0xC000U;
 
 	if (tp)
 		first = tp->prio-1;
@@ -123,8 +123,8 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n, void *arg)
 	spinlock_t *root_lock;
 	struct tcmsg *t;
 	u32 protocol;
-	u32 prio;
-	u32 nprio;
+	u16 prio;
+	u16 nprio;
 	u32 parent;
 	struct net_device *dev;
 	struct Qdisc  *q;
@@ -142,7 +142,7 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n, void *arg)
 replay:
 	t = NLMSG_DATA(n);
 	protocol = TC_H_MIN(t->tcm_info);
-	prio = TC_H_MAJ(t->tcm_info);
+	prio = TC_H_MAJ(t->tcm_info) >> 16;
 	nprio = prio;
 	parent = t->tcm_parent;
 	cl = 0;
@@ -151,7 +151,7 @@ replay:
 		/* If no priority is given, user wants we allocated it. */
 		if (n->nlmsg_type != RTM_NEWTFILTER || !(n->nlmsg_flags&NLM_F_CREATE))
 			return -ENOENT;
-		prio = TC_H_MAKE(0x80000000U, 0U);
+		prio = 0x8000U;
 	}
 
 	/* Find head of filter chain. */
@@ -340,7 +340,7 @@ static int tcf_fill_node(struct sk_buff *skb, struct tcf_proto *tp,
 	tcm->tcm__pad1 = 0;
 	tcm->tcm_ifindex = qdisc_dev(tp->q)->ifindex;
 	tcm->tcm_parent = tp->classid;
-	tcm->tcm_info = TC_H_MAKE(tp->prio, tp->protocol);
+	tcm->tcm_info = TC_H_MAKE((u32)tp->prio << 16, tp->protocol);
 	NLA_PUT_STRING(skb, TCA_KIND, tp->ops->kind);
 	tcm->tcm_handle = fh;
 	if (RTM_DELTFILTER != event) {
@@ -436,7 +436,7 @@ static int tc_dump_tfilter(struct sk_buff *skb, struct netlink_callback *cb)
 	for (tp=*chain, t=0; tp; tp = tp->next, t++) {
 		if (t < s_t) continue;
 		if (TC_H_MAJ(tcm->tcm_info) &&
-		    TC_H_MAJ(tcm->tcm_info) != tp->prio)
+		    (TC_H_MAJ(tcm->tcm_info) >> 16) != tp->prio)
 			continue;
 		if (TC_H_MIN(tcm->tcm_info) &&
 		    TC_H_MIN(tcm->tcm_info) != tp->protocol)


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] qos: Limit a filter's priority to a 16 bit value
  2009-05-01 18:42 [RFC PATCH] qos: Limit a filter's priority to a 16 bit value Robert Love
@ 2009-05-01 22:51 ` David Miller
  2009-05-01 23:30   ` Love, Robert W
  0 siblings, 1 reply; 7+ messages in thread
From: David Miller @ 2009-05-01 22:51 UTC (permalink / raw)
  To: robert.w.love; +Cc: hadi, netdev, jeffrey.t.kirsher

From: Robert Love <robert.w.love@intel.com>
Date: Fri, 01 May 2009 11:42:42 -0700

> Currently the kernel is storing the priority value for
> a filter as a 32 bit value. However, the netlink messages
> between the kernel and userspace are passing only 16 bits
> for the priority. This means that the kernel will always
> try to compare the passed down filter's priority using the
> correct upper 16 bits, but all 0s for the lower 16 bits.
> 
> Most comparisons will fail when checking against the filter
> chain unless the actual filter has a priority with all 0s
> for the last 16 bits.
> 
> The fallout is that users cannot modify or delete most
> filters that are added.

Really?

I'm looking at the code and I don't see this failure case.
Can you describe the problem with specific examples?

All of the code I see either 1) uses u32's to store the
priorities and just compares them (cls_u32.c) or 2)
always uses TC_H_MAJ() of the user's provided value
and uses that to compare with tp->prio values.

Both of which should always just work.

We're just wasting the low 16-bits, but that shouldn't
cause problems like the one you're describing as long
as everyone extracts and compares the top bits properly
which as far as I can tell is the case.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [RFC PATCH] qos: Limit a filter's priority to a 16 bit value
  2009-05-01 22:51 ` David Miller
@ 2009-05-01 23:30   ` Love, Robert W
  2009-05-02  1:52     ` jamal
  0 siblings, 1 reply; 7+ messages in thread
From: Love, Robert W @ 2009-05-01 23:30 UTC (permalink / raw)
  To: David Miller; +Cc: hadi@cyberus.ca, netdev@vger.kernel.org, Kirsher, Jeffrey T

David Miller wrote:
> From: Robert Love <robert.w.love@intel.com>
> Date: Fri, 01 May 2009 11:42:42 -0700
> 
>> Currently the kernel is storing the priority value for
>> a filter as a 32 bit value. However, the netlink messages
>> between the kernel and userspace are passing only 16 bits
>> for the priority. This means that the kernel will always
>> try to compare the passed down filter's priority using the
>> correct upper 16 bits, but all 0s for the lower 16 bits.
>> 
>> Most comparisons will fail when checking against the filter
>> chain unless the actual filter has a priority with all 0s for the
>> last 16 bits. 
>> 
>> The fallout is that users cannot modify or delete most
>> filters that are added.
> 
> Really?
> 
> I'm looking at the code and I don't see this failure case.
> Can you describe the problem with specific examples?
> 
Sure.

I should be more clear by saying that this should only
fail with filter priorities assigned by the kernel.
I think if the user passes down the priority when
creating the filter it should always be 16bits and it's
fine.

However, when the kernel is assigning priorities, the
first assigned priority for a filter is 0xC0000000,
the second is "the lowest priority - 1" so 0xBFFFFFFF.

It will assign this value in tcf_auto_prio() which will
directly assign the 32bit value to to tp->prio with-

tp->prio = nprio ? : tcf_auto_prio(*back);

(nprio == 0, since it wasn't specified by the user)

static inline u32 tcf_auto_prio(struct tcf_proto *tp)
{
 	u32 first = TC_H_MAKE(0xC0000000U, 0U);
	if (tp)
                first = tp->prio-1;
	return first;
}

If you want to delete that second filter, you'll use
tc to query the filter list to determine the priority.

The kernel will take the 32 bit 'prio' member of 'struct
tcf_proto' (which is 0xBFFFFFFF) and use the TC_H_MAJ
macro to zero-out the lower 16 bits (so that it can
fit into the top half of the 'tcm_info' member of
'struct tcmsg'). This has already mangled the priority
as we're now telling the user that the priority of the
filter is 0xBFFF0000, when in fact it is 0xBFFFFFFF.

The user will then pass this back down to the kernel to
delete the filter, however in tc_ctl_tfilter() we're
comapring 0xBFFF0000 against 0xBFFFFFFF and it never
matches.

Here's my userspace output (this is going to look ugly)-

[root@itchy ~]# tc qdisc add dev eth3 root handle 0: multiq

[root@itchy ~]# tc filter add dev eth3 parent 0: protocol 35078 handle 0x3003039 basic match 'cmp(u16' at 12 layer 1 mask 0xffff eq '35078)' action skbedit queue_mapping 3

[root@itchy ~]# tc filter add dev eth3 parent 0: protocol 35092 handle 0x3010932 basic match 'cmp(u16' at 12 layer 1 mask 0xffff eq '35092)' action skbedit queue_mapping 3

[root@itchy ~]# tc filter show dev eth3
filter parent 8002: protocol [35092] pref 49151 basic 
filter parent 8002: protocol [35092] pref 49151 basic handle 0x3010932 
  cmp(u16 at 12 layer 1 mask 0xffff eq 35092)

	action order 1:  skbedit queue_mapping 3
filter parent 8002: protocol [35078] pref 49152 basic 
filter parent 8002: protocol [35078] pref 49152 basic handle 0x3003039 
  cmp(u16 at 12 layer 1 mask 0xffff eq 35078)

	action order 33:  skbedit queue_mapping 3

[root@itchy ~]# tc filter delete dev eth3 parent 8002: protocol 35092 pref 49151 handle 0x3010932 basic match 'cmp(u16' at 12 layer 1 mask 0xffff eq '35092)' action skbedit queue_mapping 3

RTNETLINK answers: No such file or directory
We have an error talking to the kernel


You can see that the kernel would be storing 0xBFFFFFFF
with tp->prio = nprio ? : tcf_auto_prio(*back);

But the 'show' query shows the prio as 49151 (0xBFFF)
which is passed down with the 'delete' command, expanded
to 0xBFFF0000, and doesn't match the filter's actual
priority.

> All of the code I see either 1) uses u32's to store the
> priorities and just compares them (cls_u32.c) or 2)
> always uses TC_H_MAJ() of the user's provided value
> and uses that to compare with tp->prio values.
> 
> Both of which should always just work.
> 
> We're just wasting the low 16-bits, but that shouldn't
> cause problems like the one you're describing as long
> as everyone extracts and compares the top bits properly
> which as far as I can tell is the case.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [RFC PATCH] qos: Limit a filter's priority to a 16 bit value
  2009-05-01 23:30   ` Love, Robert W
@ 2009-05-02  1:52     ` jamal
  2009-05-02  2:10       ` Love, Robert W
  0 siblings, 1 reply; 7+ messages in thread
From: jamal @ 2009-05-02  1:52 UTC (permalink / raw)
  To: Love, Robert W; +Cc: David Miller, netdev@vger.kernel.org, Kirsher, Jeffrey T

On Fri, 2009-05-01 at 16:30 -0700, Love, Robert W wrote:

> I should be more clear by saying that this should only
> fail with filter priorities assigned by the kernel.
> I think if the user passes down the priority when
> creating the filter it should always be 16bits and it's
> fine.
> 
> However, when the kernel is assigning priorities, the
> first assigned priority for a filter is 0xC0000000,
> the second is "the lowest priority - 1" so 0xBFFFFFFF.
> 
> It will assign this value in tcf_auto_prio() which will
> directly assign the 32bit value to to tp->prio with-
> 
> tp->prio = nprio ? : tcf_auto_prio(*back);

I think the above should read:
tp->prio = nprio ? : TC_H_MAJ(tcf_auto_prio(*back));

More importantly however, you should _never_ specify
a filter you intend to delete individually without specifying 
its priority.
Common practise in scripts is to just delete the root 
qdisc or class to delete all filters.

cheers,
jamal


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [RFC PATCH] qos: Limit a filter's priority to a 16 bit value
  2009-05-02  1:52     ` jamal
@ 2009-05-02  2:10       ` Love, Robert W
  2009-05-02  2:18         ` David Miller
  0 siblings, 1 reply; 7+ messages in thread
From: Love, Robert W @ 2009-05-02  2:10 UTC (permalink / raw)
  To: hadi@cyberus.ca; +Cc: David Miller, netdev@vger.kernel.org, Kirsher, Jeffrey T

jamal wrote:
> On Fri, 2009-05-01 at 16:30 -0700, Love, Robert W wrote:
> 
>> I should be more clear by saying that this should only
>> fail with filter priorities assigned by the kernel.
>> I think if the user passes down the priority when
>> creating the filter it should always be 16bits and it's fine.
>> 
>> However, when the kernel is assigning priorities, the
>> first assigned priority for a filter is 0xC0000000,
>> the second is "the lowest priority - 1" so 0xBFFFFFFF.
>> 
>> It will assign this value in tcf_auto_prio() which will
>> directly assign the 32bit value to to tp->prio with-
>> 
>> tp->prio = nprio ? : tcf_auto_prio(*back);
> 
> I think the above should read:
> tp->prio = nprio ? : TC_H_MAJ(tcf_auto_prio(*back));
> 
I think you might be right, although, I don't see why
we use a u32 when we're always storing a u16. There might
be a good reason, but it seems error-prone.

With your change the kernel would be selecting the
following priorities.

0xC0000000
0xC0000000 - 1 = 0xBFFFFFFF -> TC_H_MAJ makes 0xBFFF
0xBFFF0000 - 1 = 0xBFFEFFFF -> TC_H_MAJ makes 0xBFFE
...

so the low bits are irrelevant.

> More importantly however, you should _never_ specify
> a filter you intend to delete individually without specifying
> its priority.

Absolutely, the kernel would never find it. :)

> Common practise in scripts is to just delete the root
> qdisc or class to delete all filters.
> 
Just for a little background. The netdev based FCoE solution
w/ DCB uses the multiq qdisc to filter FCoE packets into
specific queues. I don't want our script to yank out the
multiq qdisc just because FCoE traffic has stopped. The FCoE
script doesn't know if the user has other filters applied for
that interface/qdisc. The script should just remove its
filters and not disrupt anything else on the chain.

I have a patch to our user space script that filters FIP
(FCoE Initialization Protocol) to the same FCoE queue that
DCB has negotiated with the switch. As soon as I added the
second filter I realized that I couldn't delete or modify it.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] qos: Limit a filter's priority to a 16 bit value
  2009-05-02  2:10       ` Love, Robert W
@ 2009-05-02  2:18         ` David Miller
  2009-05-02  2:35           ` jamal
  0 siblings, 1 reply; 7+ messages in thread
From: David Miller @ 2009-05-02  2:18 UTC (permalink / raw)
  To: robert.w.love; +Cc: hadi, netdev, jeffrey.t.kirsher

From: "Love, Robert W" <robert.w.love@intel.com>
Date: Fri, 1 May 2009 19:10:36 -0700

> jamal wrote:
>> On Fri, 2009-05-01 at 16:30 -0700, Love, Robert W wrote:
>> 
>>> I should be more clear by saying that this should only
>>> fail with filter priorities assigned by the kernel.
>>> I think if the user passes down the priority when
>>> creating the filter it should always be 16bits and it's fine.
>>> 
>>> However, when the kernel is assigning priorities, the
>>> first assigned priority for a filter is 0xC0000000,
>>> the second is "the lowest priority - 1" so 0xBFFFFFFF.
>>> 
>>> It will assign this value in tcf_auto_prio() which will
>>> directly assign the 32bit value to to tp->prio with-
>>> 
>>> tp->prio = nprio ? : tcf_auto_prio(*back);
>> 
>> I think the above should read:
>> tp->prio = nprio ? : TC_H_MAJ(tcf_auto_prio(*back));
>> 
> I think you might be right, although, I don't see why
> we use a u32 when we're always storing a u16.

That's irrelevant.

You've essentially found what turned out to be a one-line bug, and
fixed it by changing and rearranging a lot of things.

So could I just get that one-liner fix patch from somebody?

Thanks :-)

Meanwhile, on the issue of how this value is stored, perhaps someone
intended to support "classful" filter priorities, just as we support
classful qdiscs.  Or perhaps someone had the idea to store the
protocol in there as well, who knows :-)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH] qos: Limit a filter's priority to a 16 bit value
  2009-05-02  2:18         ` David Miller
@ 2009-05-02  2:35           ` jamal
  0 siblings, 0 replies; 7+ messages in thread
From: jamal @ 2009-05-02  2:35 UTC (permalink / raw)
  To: David Miller; +Cc: robert.w.love, netdev, jeffrey.t.kirsher

On Fri, 2009-05-01 at 19:10 -0700, Love, Robert W wrote:

> I don't want our script to yank out the
> multiq qdisc just because FCoE traffic has stopped. The FCoE
> script doesn't know if the user has other filters applied for
> that interface/qdisc. The script should just remove its
> filters and not disrupt anything else on the chain.

True - but if you specify the prio on add/del you should be fine
though.
I looked at about 15-20 scripts and i noticed i always specified
the priority; i suspect most people do thats why noone bumped into
this over the years.

On Fri, 2009-05-01 at 19:18 -0700, David Miller wrote:

> So could I just get that one-liner fix patch from somebody?

Robert, would you do the honors of submitting the patch since
you did all the hardwork?

> Thanks :-)
> 
> Meanwhile, on the issue of how this value is stored, perhaps someone
> intended to support "classful" filter priorities, just as we support
> classful qdiscs.  Or perhaps someone had the idea to store the
> protocol in there as well, who knows :-)

I cant remember the reason. Alexey probably would - but i think the
important point is it is consistent everywhere.

cheers,
jamal


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-05-02  2:35 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-01 18:42 [RFC PATCH] qos: Limit a filter's priority to a 16 bit value Robert Love
2009-05-01 22:51 ` David Miller
2009-05-01 23:30   ` Love, Robert W
2009-05-02  1:52     ` jamal
2009-05-02  2:10       ` Love, Robert W
2009-05-02  2:18         ` David Miller
2009-05-02  2:35           ` jamal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).