* Qdisc->u32_node - licence to kill
@ 2017-08-07 16:41 Jiri Pirko
2017-08-07 17:47 ` John Fastabend
0 siblings, 1 reply; 7+ messages in thread
From: Jiri Pirko @ 2017-08-07 16:41 UTC (permalink / raw)
To: jhs, xiyou.wangcong, davem; +Cc: netdev, mlxsw
Hi Jamal/Cong/David/all.
Digging in the u32 code deeper now. I need to get rid of tp->q for shared
blocks, but I found out about this:
struct Qdisc {
......
void *u32_node;
......
};
Yeah, ugly. u32 uses it to store some shared data, tp_c. It actually
stores a linked list of all hashtables added to one qdiscs.
So basically what you have is, you have 1 root ht per prio/pref. Then
you can have multiple hts, linked from any other ht, does not matter in
which prio/pref they are.
Do I understand that correctly that prio/pref only has meaning if
linking does not take place, because if there is linking, the prio/pref
of inserted rule is simply ignored?
That is the most confusing thing I saw in net/sched/ so far.
Is this a bug? Sounds like one.
Did someone introduce *u32_node (formerly static struct tc_u_common
*u32_list;) just to allow this weirdness?
Can I just remove this shared tp_c and make the linking to other
hashtables only possible within the same prio/pref? That would make
sense to me.
Thanks.
Jiri
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Qdisc->u32_node - licence to kill
2017-08-07 16:41 Qdisc->u32_node - licence to kill Jiri Pirko
@ 2017-08-07 17:47 ` John Fastabend
2017-08-07 19:06 ` Jiri Pirko
0 siblings, 1 reply; 7+ messages in thread
From: John Fastabend @ 2017-08-07 17:47 UTC (permalink / raw)
To: Jiri Pirko, jhs, xiyou.wangcong, davem; +Cc: netdev, mlxsw
On 08/07/2017 09:41 AM, Jiri Pirko wrote:
> Hi Jamal/Cong/David/all.
>
> Digging in the u32 code deeper now. I need to get rid of tp->q for shared
> blocks, but I found out about this:
>
> struct Qdisc {
> ......
> void *u32_node;
> ......
> };
>
> Yeah, ugly. u32 uses it to store some shared data, tp_c. It actually
> stores a linked list of all hashtables added to one qdiscs.
>
> So basically what you have is, you have 1 root ht per prio/pref. Then
> you can have multiple hts, linked from any other ht, does not matter in
> which prio/pref they are.
>
We can create arbitrary hash tables here independent of prio/pref via
TCA_U32_DIVISOR. Then these can be linked to other hash tables via
TCA_U32_LINK commands.
prio/pref does not really play any part here from my reading, except as
a further specifier in the walk callbacks. Making it a useful filter on
dump operations.
> Do I understand that correctly that prio/pref only has meaning if
> linking does not take place, because if there is linking, the prio/pref
> of inserted rule is simply ignored?
I think even then the prio/pref meaning is dubious, from u32_change,
for (pins = rtnl_dereference(*ins); pins;
ins = &pins->next, pins = rtnl_dereference(*ins))
if (TC_U32_NODE(handle) < TC_U32_NODE(pins->handle))
break;
I think the list insert is done via handle not via prio/pref.
>
> That is the most confusing thing I saw in net/sched/ so far.
> Is this a bug? Sounds like one.
>
I don't think this is a bug at very least I don't see how we can
change it without breaking users. I know people depend on the hash map
capabilities and linking logic.
> Did someone introduce *u32_node (formerly static struct tc_u_common
> *u32_list;) just to allow this weirdness?
>
> Can I just remove this shared tp_c and make the linking to other
> hashtables only possible within the same prio/pref? That would make
> sense to me.
>
The idea to make linking hash tables only possible within the same
prio/pref will break existing programs. We can't do this its part of
UAPI now and people depend on it.
> Thanks.
>
> Jiri
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Qdisc->u32_node - licence to kill
2017-08-07 17:47 ` John Fastabend
@ 2017-08-07 19:06 ` Jiri Pirko
2017-08-07 19:54 ` John Fastabend
0 siblings, 1 reply; 7+ messages in thread
From: Jiri Pirko @ 2017-08-07 19:06 UTC (permalink / raw)
To: John Fastabend; +Cc: jhs, xiyou.wangcong, davem, netdev, mlxsw
Mon, Aug 07, 2017 at 07:47:14PM CEST, john.fastabend@gmail.com wrote:
>On 08/07/2017 09:41 AM, Jiri Pirko wrote:
>> Hi Jamal/Cong/David/all.
>>
>> Digging in the u32 code deeper now. I need to get rid of tp->q for shared
>> blocks, but I found out about this:
>>
>> struct Qdisc {
>> ......
>> void *u32_node;
>> ......
>> };
>>
>> Yeah, ugly. u32 uses it to store some shared data, tp_c. It actually
>> stores a linked list of all hashtables added to one qdiscs.
>>
>> So basically what you have is, you have 1 root ht per prio/pref. Then
>> you can have multiple hts, linked from any other ht, does not matter in
>> which prio/pref they are.
>>
>
>We can create arbitrary hash tables here independent of prio/pref via
>TCA_U32_DIVISOR. Then these can be linked to other hash tables via
>TCA_U32_LINK commands.
Yeah, that's what I thought.
>
>prio/pref does not really play any part here from my reading, except as
>a further specifier in the walk callbacks. Making it a useful filter on
>dump operations.
Not correct. prio/pref is one level up priority, independent on specific
cls implementation. You can have cls_u32 instance on prio 10 and
cls_flower instance on prio 20. Both work.
In fact, the current u32 "linking" ignores the upper level
prio/pref and breakes user assumptions when he inserts rules with
specific prio.
>
>> Do I understand that correctly that prio/pref only has meaning if
>> linking does not take place, because if there is linking, the prio/pref
>> of inserted rule is simply ignored?
>
>I think even then the prio/pref meaning is dubious, from u32_change,
Please see tc_ctl_tfilter. That is where prio/pref is processed. What
you describe is one level down.
>
> for (pins = rtnl_dereference(*ins); pins;
> ins = &pins->next, pins = rtnl_dereference(*ins))
> if (TC_U32_NODE(handle) < TC_U32_NODE(pins->handle))
> break;
>
>I think the list insert is done via handle not via prio/pref.
>
>>
>> That is the most confusing thing I saw in net/sched/ so far.
>> Is this a bug? Sounds like one.
>>
>
>I don't think this is a bug at very least I don't see how we can
>change it without breaking users. I know people depend on the hash map
>capabilities and linking logic.
Do they insert rules into multiple hashtables with different prio? Why?
What is the usecase?
>
>> Did someone introduce *u32_node (formerly static struct tc_u_common
>> *u32_list;) just to allow this weirdness?
>>
>> Can I just remove this shared tp_c and make the linking to other
>> hashtables only possible within the same prio/pref? That would make
>> sense to me.
>>
>
>The idea to make linking hash tables only possible within the same
>prio/pref will break existing programs. We can't do this its part of
>UAPI now and people depend on it.
That's why I asked if that is a bug. I still feel it is. But I
definitelly understand your concern. I'm just trying to figure out how
to resolve this misdesign :(
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Qdisc->u32_node - licence to kill
2017-08-07 19:06 ` Jiri Pirko
@ 2017-08-07 19:54 ` John Fastabend
2017-08-07 23:21 ` Cong Wang
0 siblings, 1 reply; 7+ messages in thread
From: John Fastabend @ 2017-08-07 19:54 UTC (permalink / raw)
To: Jiri Pirko; +Cc: jhs, xiyou.wangcong, davem, netdev, mlxsw
On 08/07/2017 12:06 PM, Jiri Pirko wrote:
> Mon, Aug 07, 2017 at 07:47:14PM CEST, john.fastabend@gmail.com wrote:
>> On 08/07/2017 09:41 AM, Jiri Pirko wrote:
>>> Hi Jamal/Cong/David/all.
>>>
>>> Digging in the u32 code deeper now. I need to get rid of tp->q for shared
>>> blocks, but I found out about this:
>>>
>>> struct Qdisc {
>>> ......
>>> void *u32_node;
>>> ......
>>> };
>>>
>>> Yeah, ugly. u32 uses it to store some shared data, tp_c. It actually
>>> stores a linked list of all hashtables added to one qdiscs.
>>>
>>> So basically what you have is, you have 1 root ht per prio/pref. Then
>>> you can have multiple hts, linked from any other ht, does not matter in
>>> which prio/pref they are.
>>>
>>
>> We can create arbitrary hash tables here independent of prio/pref via
>> TCA_U32_DIVISOR. Then these can be linked to other hash tables via
>> TCA_U32_LINK commands.
>
> Yeah, that's what I thought.
>
>
>>
>> prio/pref does not really play any part here from my reading, except as
>> a further specifier in the walk callbacks. Making it a useful filter on
>> dump operations.
>
> Not correct. prio/pref is one level up priority, independent on specific
> cls implementation. You can have cls_u32 instance on prio 10 and
> cls_flower instance on prio 20. Both work.
ah right, lets make sure I got this right then (its been awhile since I've
read this code). So the tcf_ctl_tfilter hook walks classifiers, inserting the
classifier by prio. Then tcf_classify walks the list of classifiers looking
for any matches, specifically any return codes it recognizes or a return code
greater than zero. u32 though has this link notion that allows users to jump
to other u32 classifiers that are in this list, because it has a global hash
table list. So the per prio classifier isolation is not true in u32 case.
>
> In fact, the current u32 "linking" ignores the upper level
> prio/pref and breakes user assumptions when he inserts rules with
> specific prio.
>
>
hmm yep, I guess users of u32 have a "different" set of assumptions when
working with u32 hash tables than the rest of the classifiers.
>>
>>> Do I understand that correctly that prio/pref only has meaning if
>>> linking does not take place, because if there is linking, the prio/pref
>>> of inserted rule is simply ignored?
>>
>> I think even then the prio/pref meaning is dubious, from u32_change,
>
> Please see tc_ctl_tfilter. That is where prio/pref is processed. What
> you describe is one level down.
>
got it.
>
>>
>> for (pins = rtnl_dereference(*ins); pins;
>> ins = &pins->next, pins = rtnl_dereference(*ins))
>> if (TC_U32_NODE(handle) < TC_U32_NODE(pins->handle))
>> break;
>>
>> I think the list insert is done via handle not via prio/pref.
>>
>>>
>>> That is the most confusing thing I saw in net/sched/ so far.
>>> Is this a bug? Sounds like one.
>>>
>>
>> I don't think this is a bug at very least I don't see how we can
>> change it without breaking users. I know people depend on the hash map
>> capabilities and linking logic.
>
> Do they insert rules into multiple hashtables with different prio? Why?
> What is the usecase?
>
Single u32 classifier with multiple hash tables linked together I would
think is the normal way. I guess because the API never disallowed it
and the user api is a bit tricky its possible users may use multiple prios,
but probably it is not needed.
Maybe Jamal has some use case where this is required?
>
>>
>>> Did someone introduce *u32_node (formerly static struct tc_u_common
>>> *u32_list;) just to allow this weirdness?
>>>
>>> Can I just remove this shared tp_c and make the linking to other
>>> hashtables only possible within the same prio/pref? That would make
>>> sense to me.
>>>
>>
>> The idea to make linking hash tables only possible within the same
>> prio/pref will break existing programs. We can't do this its part of
>> UAPI now and people depend on it.
>
> That's why I asked if that is a bug. I still feel it is. But I
> definitelly understand your concern. I'm just trying to figure out how
> to resolve this misdesign :(
>
I don't have a good argument for the current design, but just want to be
sure we don't break existing users.
.John
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Qdisc->u32_node - licence to kill
2017-08-07 19:54 ` John Fastabend
@ 2017-08-07 23:21 ` Cong Wang
2017-08-09 12:40 ` Jamal Hadi Salim
0 siblings, 1 reply; 7+ messages in thread
From: Cong Wang @ 2017-08-07 23:21 UTC (permalink / raw)
To: John Fastabend
Cc: Jiri Pirko, Jamal Hadi Salim, David Miller,
Linux Kernel Network Developers, mlxsw
On Mon, Aug 7, 2017 at 12:54 PM, John Fastabend
<john.fastabend@gmail.com> wrote:
> On 08/07/2017 12:06 PM, Jiri Pirko wrote:
>> Mon, Aug 07, 2017 at 07:47:14PM CEST, john.fastabend@gmail.com wrote:
>>> On 08/07/2017 09:41 AM, Jiri Pirko wrote:
>>>> Hi Jamal/Cong/David/all.
>>>>
>>>> Digging in the u32 code deeper now. I need to get rid of tp->q for shared
>>>> blocks, but I found out about this:
>>>>
>>>> struct Qdisc {
>>>> ......
>>>> void *u32_node;
>>>> ......
>>>> };
>>>>
>>>> Yeah, ugly. u32 uses it to store some shared data, tp_c. It actually
>>>> stores a linked list of all hashtables added to one qdiscs.
>>>>
>>>> So basically what you have is, you have 1 root ht per prio/pref. Then
>>>> you can have multiple hts, linked from any other ht, does not matter in
>>>> which prio/pref they are.
>>>>
>>>
>>> We can create arbitrary hash tables here independent of prio/pref via
>>> TCA_U32_DIVISOR. Then these can be linked to other hash tables via
>>> TCA_U32_LINK commands.
>>
>> Yeah, that's what I thought.
>>
>>
>>>
>>> prio/pref does not really play any part here from my reading, except as
>>> a further specifier in the walk callbacks. Making it a useful filter on
>>> dump operations.
>>
>> Not correct. prio/pref is one level up priority, independent on specific
>> cls implementation. You can have cls_u32 instance on prio 10 and
>> cls_flower instance on prio 20. Both work.
>
> ah right, lets make sure I got this right then (its been awhile since I've
> read this code). So the tcf_ctl_tfilter hook walks classifiers, inserting the
> classifier by prio. Then tcf_classify walks the list of classifiers looking
> for any matches, specifically any return codes it recognizes or a return code
> greater than zero. u32 though has this link notion that allows users to jump
> to other u32 classifiers that are in this list, because it has a global hash
> table list. So the per prio classifier isolation is not true in u32 case.
u32 filter supports multiple hash tables within a qdisc, struct
tc_u_common is supposed to link them together. This has to be
per qdisc because all of these hash tables belong to one qdisc
and their ID's are unique within the qdisc.
I dislike it too, and I actually tried to improve it in the past,
unfortunately didn't make any real progress. I think we can
definitely make it less ugly, but I don't think we can totally
get rid of it because of the design of u32.
Similar for tp->data.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Qdisc->u32_node - licence to kill
2017-08-07 23:21 ` Cong Wang
@ 2017-08-09 12:40 ` Jamal Hadi Salim
2017-08-09 12:48 ` Jiri Pirko
0 siblings, 1 reply; 7+ messages in thread
From: Jamal Hadi Salim @ 2017-08-09 12:40 UTC (permalink / raw)
To: Cong Wang, John Fastabend
Cc: Jiri Pirko, David Miller, Linux Kernel Network Developers, mlxsw
On 17-08-07 07:21 PM, Cong Wang wrote:
> On Mon, Aug 7, 2017 at 12:54 PM, John Fastabend
> <john.fastabend@gmail.com> wrote:
>> On 08/07/2017 12:06 PM, Jiri Pirko wrote:
>>> Mon, Aug 07, 2017 at 07:47:14PM CEST, john.fastabend@gmail.com wrote:
>>>> On 08/07/2017 09:41 AM, Jiri Pirko wrote:
>>>>> Hi Jamal/Cong/David/all.
>>>>>
=
>>>
>>> Not correct. prio/pref is one level up priority, independent on specific
>>> cls implementation. You can have cls_u32 instance on prio 10 and
>>> cls_flower instance on prio 20. Both work.
>>
>> ah right, lets make sure I got this right then (its been awhile since I've
>> read this code). So the tcf_ctl_tfilter hook walks classifiers, inserting the
>> classifier by prio. Then tcf_classify walks the list of classifiers looking
>> for any matches, specifically any return codes it recognizes or a return code
>> greater than zero. u32 though has this link notion that allows users to jump
>> to other u32 classifiers that are in this list, because it has a global hash
>> table list. So the per prio classifier isolation is not true in u32 case.
>
> u32 filter supports multiple hash tables within a qdisc, struct
> tc_u_common is supposed to link them together. This has to be
> per qdisc because all of these hash tables belong to one qdisc
> and their ID's are unique within the qdisc.
>
I think historically this used to sit within the u32 code as a
static linked list; i cant recall why it got attached to the
qdisc.
Indeed, the idea is that hash tables can be added independently
without linking and then linked afterwards. They have to be held
somewhere in transient. And they have priorities (at least the
prios are used in the dump)
> I dislike it too, and I actually tried to improve it in the past,
> unfortunately didn't make any real progress. I think we can
> definitely make it less ugly, but I don't think we can totally
> get rid of it because of the design of u32.
>
> Similar for tp->data.
>
tp->q maybe harder to deal with. I agree with getting rid of
this dependency. Could this info be stored in the block instead?
cheers,
jamal
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Qdisc->u32_node - licence to kill
2017-08-09 12:40 ` Jamal Hadi Salim
@ 2017-08-09 12:48 ` Jiri Pirko
0 siblings, 0 replies; 7+ messages in thread
From: Jiri Pirko @ 2017-08-09 12:48 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: Cong Wang, John Fastabend, David Miller,
Linux Kernel Network Developers, mlxsw
Wed, Aug 09, 2017 at 02:40:30PM CEST, jhs@mojatatu.com wrote:
>On 17-08-07 07:21 PM, Cong Wang wrote:
>> On Mon, Aug 7, 2017 at 12:54 PM, John Fastabend
>> <john.fastabend@gmail.com> wrote:
>> > On 08/07/2017 12:06 PM, Jiri Pirko wrote:
>> > > Mon, Aug 07, 2017 at 07:47:14PM CEST, john.fastabend@gmail.com wrote:
>> > > > On 08/07/2017 09:41 AM, Jiri Pirko wrote:
>> > > > > Hi Jamal/Cong/David/all.
>> > > > >
>=
>> > >
>> > > Not correct. prio/pref is one level up priority, independent on specific
>> > > cls implementation. You can have cls_u32 instance on prio 10 and
>> > > cls_flower instance on prio 20. Both work.
>> >
>> > ah right, lets make sure I got this right then (its been awhile since I've
>> > read this code). So the tcf_ctl_tfilter hook walks classifiers, inserting the
>> > classifier by prio. Then tcf_classify walks the list of classifiers looking
>> > for any matches, specifically any return codes it recognizes or a return code
>> > greater than zero. u32 though has this link notion that allows users to jump
>> > to other u32 classifiers that are in this list, because it has a global hash
>> > table list. So the per prio classifier isolation is not true in u32 case.
>>
>> u32 filter supports multiple hash tables within a qdisc, struct
>> tc_u_common is supposed to link them together. This has to be
>> per qdisc because all of these hash tables belong to one qdisc
>> and their ID's are unique within the qdisc.
>>
>
>I think historically this used to sit within the u32 code as a
>static linked list; i cant recall why it got attached to the
>qdisc.
>Indeed, the idea is that hash tables can be added independently
>without linking and then linked afterwards. They have to be held
>somewhere in transient. And they have priorities (at least the
>prios are used in the dump)
>
>> I dislike it too, and I actually tried to improve it in the past,
>> unfortunately didn't make any real progress. I think we can
>> definitely make it less ugly, but I don't think we can totally
>> get rid of it because of the design of u32.
>>
>> Similar for tp->data.
>>
>
>tp->q maybe harder to deal with. I agree with getting rid of
>this dependency. Could this info be stored in the block instead?
Yeah, I will have to do that. I just wanted to get rid of it if possible :/
>
>cheers,
>jamal
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-08-09 12:48 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-07 16:41 Qdisc->u32_node - licence to kill Jiri Pirko
2017-08-07 17:47 ` John Fastabend
2017-08-07 19:06 ` Jiri Pirko
2017-08-07 19:54 ` John Fastabend
2017-08-07 23:21 ` Cong Wang
2017-08-09 12:40 ` Jamal Hadi Salim
2017-08-09 12:48 ` Jiri Pirko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).