* Qdisc->u32_node - licence to kill
@ 2017-08-07 16:41 Jiri Pirko
2017-08-07 17:47 ` John Fastabend
0 siblings, 1 reply; 7+ messages in thread
From: Jiri Pirko @ 2017-08-07 16:41 UTC (permalink / raw)
To: jhs, xiyou.wangcong, davem; +Cc: netdev, mlxsw
Hi Jamal/Cong/David/all.
Digging in the u32 code deeper now. I need to get rid of tp->q for shared
blocks, but I found out about this:
struct Qdisc {
......
void *u32_node;
......
};
Yeah, ugly. u32 uses it to store some shared data, tp_c. It actually
stores a linked list of all hashtables added to one qdiscs.
So basically what you have is, you have 1 root ht per prio/pref. Then
you can have multiple hts, linked from any other ht, does not matter in
which prio/pref they are.
Do I understand that correctly that prio/pref only has meaning if
linking does not take place, because if there is linking, the prio/pref
of inserted rule is simply ignored?
That is the most confusing thing I saw in net/sched/ so far.
Is this a bug? Sounds like one.
Did someone introduce *u32_node (formerly static struct tc_u_common
*u32_list;) just to allow this weirdness?
Can I just remove this shared tp_c and make the linking to other
hashtables only possible within the same prio/pref? That would make
sense to me.
Thanks.
Jiri
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: Qdisc->u32_node - licence to kill 2017-08-07 16:41 Qdisc->u32_node - licence to kill Jiri Pirko @ 2017-08-07 17:47 ` John Fastabend 2017-08-07 19:06 ` Jiri Pirko 0 siblings, 1 reply; 7+ messages in thread From: John Fastabend @ 2017-08-07 17:47 UTC (permalink / raw) To: Jiri Pirko, jhs, xiyou.wangcong, davem; +Cc: netdev, mlxsw On 08/07/2017 09:41 AM, Jiri Pirko wrote: > Hi Jamal/Cong/David/all. > > Digging in the u32 code deeper now. I need to get rid of tp->q for shared > blocks, but I found out about this: > > struct Qdisc { > ...... > void *u32_node; > ...... > }; > > Yeah, ugly. u32 uses it to store some shared data, tp_c. It actually > stores a linked list of all hashtables added to one qdiscs. > > So basically what you have is, you have 1 root ht per prio/pref. Then > you can have multiple hts, linked from any other ht, does not matter in > which prio/pref they are. > We can create arbitrary hash tables here independent of prio/pref via TCA_U32_DIVISOR. Then these can be linked to other hash tables via TCA_U32_LINK commands. prio/pref does not really play any part here from my reading, except as a further specifier in the walk callbacks. Making it a useful filter on dump operations. > Do I understand that correctly that prio/pref only has meaning if > linking does not take place, because if there is linking, the prio/pref > of inserted rule is simply ignored? I think even then the prio/pref meaning is dubious, from u32_change, for (pins = rtnl_dereference(*ins); pins; ins = &pins->next, pins = rtnl_dereference(*ins)) if (TC_U32_NODE(handle) < TC_U32_NODE(pins->handle)) break; I think the list insert is done via handle not via prio/pref. > > That is the most confusing thing I saw in net/sched/ so far. > Is this a bug? Sounds like one. > I don't think this is a bug at very least I don't see how we can change it without breaking users. I know people depend on the hash map capabilities and linking logic. > Did someone introduce *u32_node (formerly static struct tc_u_common > *u32_list;) just to allow this weirdness? > > Can I just remove this shared tp_c and make the linking to other > hashtables only possible within the same prio/pref? That would make > sense to me. > The idea to make linking hash tables only possible within the same prio/pref will break existing programs. We can't do this its part of UAPI now and people depend on it. > Thanks. > > Jiri > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Qdisc->u32_node - licence to kill 2017-08-07 17:47 ` John Fastabend @ 2017-08-07 19:06 ` Jiri Pirko 2017-08-07 19:54 ` John Fastabend 0 siblings, 1 reply; 7+ messages in thread From: Jiri Pirko @ 2017-08-07 19:06 UTC (permalink / raw) To: John Fastabend; +Cc: jhs, xiyou.wangcong, davem, netdev, mlxsw Mon, Aug 07, 2017 at 07:47:14PM CEST, john.fastabend@gmail.com wrote: >On 08/07/2017 09:41 AM, Jiri Pirko wrote: >> Hi Jamal/Cong/David/all. >> >> Digging in the u32 code deeper now. I need to get rid of tp->q for shared >> blocks, but I found out about this: >> >> struct Qdisc { >> ...... >> void *u32_node; >> ...... >> }; >> >> Yeah, ugly. u32 uses it to store some shared data, tp_c. It actually >> stores a linked list of all hashtables added to one qdiscs. >> >> So basically what you have is, you have 1 root ht per prio/pref. Then >> you can have multiple hts, linked from any other ht, does not matter in >> which prio/pref they are. >> > >We can create arbitrary hash tables here independent of prio/pref via >TCA_U32_DIVISOR. Then these can be linked to other hash tables via >TCA_U32_LINK commands. Yeah, that's what I thought. > >prio/pref does not really play any part here from my reading, except as >a further specifier in the walk callbacks. Making it a useful filter on >dump operations. Not correct. prio/pref is one level up priority, independent on specific cls implementation. You can have cls_u32 instance on prio 10 and cls_flower instance on prio 20. Both work. In fact, the current u32 "linking" ignores the upper level prio/pref and breakes user assumptions when he inserts rules with specific prio. > >> Do I understand that correctly that prio/pref only has meaning if >> linking does not take place, because if there is linking, the prio/pref >> of inserted rule is simply ignored? > >I think even then the prio/pref meaning is dubious, from u32_change, Please see tc_ctl_tfilter. That is where prio/pref is processed. What you describe is one level down. > > for (pins = rtnl_dereference(*ins); pins; > ins = &pins->next, pins = rtnl_dereference(*ins)) > if (TC_U32_NODE(handle) < TC_U32_NODE(pins->handle)) > break; > >I think the list insert is done via handle not via prio/pref. > >> >> That is the most confusing thing I saw in net/sched/ so far. >> Is this a bug? Sounds like one. >> > >I don't think this is a bug at very least I don't see how we can >change it without breaking users. I know people depend on the hash map >capabilities and linking logic. Do they insert rules into multiple hashtables with different prio? Why? What is the usecase? > >> Did someone introduce *u32_node (formerly static struct tc_u_common >> *u32_list;) just to allow this weirdness? >> >> Can I just remove this shared tp_c and make the linking to other >> hashtables only possible within the same prio/pref? That would make >> sense to me. >> > >The idea to make linking hash tables only possible within the same >prio/pref will break existing programs. We can't do this its part of >UAPI now and people depend on it. That's why I asked if that is a bug. I still feel it is. But I definitelly understand your concern. I'm just trying to figure out how to resolve this misdesign :( ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Qdisc->u32_node - licence to kill 2017-08-07 19:06 ` Jiri Pirko @ 2017-08-07 19:54 ` John Fastabend 2017-08-07 23:21 ` Cong Wang 0 siblings, 1 reply; 7+ messages in thread From: John Fastabend @ 2017-08-07 19:54 UTC (permalink / raw) To: Jiri Pirko; +Cc: jhs, xiyou.wangcong, davem, netdev, mlxsw On 08/07/2017 12:06 PM, Jiri Pirko wrote: > Mon, Aug 07, 2017 at 07:47:14PM CEST, john.fastabend@gmail.com wrote: >> On 08/07/2017 09:41 AM, Jiri Pirko wrote: >>> Hi Jamal/Cong/David/all. >>> >>> Digging in the u32 code deeper now. I need to get rid of tp->q for shared >>> blocks, but I found out about this: >>> >>> struct Qdisc { >>> ...... >>> void *u32_node; >>> ...... >>> }; >>> >>> Yeah, ugly. u32 uses it to store some shared data, tp_c. It actually >>> stores a linked list of all hashtables added to one qdiscs. >>> >>> So basically what you have is, you have 1 root ht per prio/pref. Then >>> you can have multiple hts, linked from any other ht, does not matter in >>> which prio/pref they are. >>> >> >> We can create arbitrary hash tables here independent of prio/pref via >> TCA_U32_DIVISOR. Then these can be linked to other hash tables via >> TCA_U32_LINK commands. > > Yeah, that's what I thought. > > >> >> prio/pref does not really play any part here from my reading, except as >> a further specifier in the walk callbacks. Making it a useful filter on >> dump operations. > > Not correct. prio/pref is one level up priority, independent on specific > cls implementation. You can have cls_u32 instance on prio 10 and > cls_flower instance on prio 20. Both work. ah right, lets make sure I got this right then (its been awhile since I've read this code). So the tcf_ctl_tfilter hook walks classifiers, inserting the classifier by prio. Then tcf_classify walks the list of classifiers looking for any matches, specifically any return codes it recognizes or a return code greater than zero. u32 though has this link notion that allows users to jump to other u32 classifiers that are in this list, because it has a global hash table list. So the per prio classifier isolation is not true in u32 case. > > In fact, the current u32 "linking" ignores the upper level > prio/pref and breakes user assumptions when he inserts rules with > specific prio. > > hmm yep, I guess users of u32 have a "different" set of assumptions when working with u32 hash tables than the rest of the classifiers. >> >>> Do I understand that correctly that prio/pref only has meaning if >>> linking does not take place, because if there is linking, the prio/pref >>> of inserted rule is simply ignored? >> >> I think even then the prio/pref meaning is dubious, from u32_change, > > Please see tc_ctl_tfilter. That is where prio/pref is processed. What > you describe is one level down. > got it. > >> >> for (pins = rtnl_dereference(*ins); pins; >> ins = &pins->next, pins = rtnl_dereference(*ins)) >> if (TC_U32_NODE(handle) < TC_U32_NODE(pins->handle)) >> break; >> >> I think the list insert is done via handle not via prio/pref. >> >>> >>> That is the most confusing thing I saw in net/sched/ so far. >>> Is this a bug? Sounds like one. >>> >> >> I don't think this is a bug at very least I don't see how we can >> change it without breaking users. I know people depend on the hash map >> capabilities and linking logic. > > Do they insert rules into multiple hashtables with different prio? Why? > What is the usecase? > Single u32 classifier with multiple hash tables linked together I would think is the normal way. I guess because the API never disallowed it and the user api is a bit tricky its possible users may use multiple prios, but probably it is not needed. Maybe Jamal has some use case where this is required? > >> >>> Did someone introduce *u32_node (formerly static struct tc_u_common >>> *u32_list;) just to allow this weirdness? >>> >>> Can I just remove this shared tp_c and make the linking to other >>> hashtables only possible within the same prio/pref? That would make >>> sense to me. >>> >> >> The idea to make linking hash tables only possible within the same >> prio/pref will break existing programs. We can't do this its part of >> UAPI now and people depend on it. > > That's why I asked if that is a bug. I still feel it is. But I > definitelly understand your concern. I'm just trying to figure out how > to resolve this misdesign :( > I don't have a good argument for the current design, but just want to be sure we don't break existing users. .John ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Qdisc->u32_node - licence to kill 2017-08-07 19:54 ` John Fastabend @ 2017-08-07 23:21 ` Cong Wang 2017-08-09 12:40 ` Jamal Hadi Salim 0 siblings, 1 reply; 7+ messages in thread From: Cong Wang @ 2017-08-07 23:21 UTC (permalink / raw) To: John Fastabend Cc: Jiri Pirko, Jamal Hadi Salim, David Miller, Linux Kernel Network Developers, mlxsw On Mon, Aug 7, 2017 at 12:54 PM, John Fastabend <john.fastabend@gmail.com> wrote: > On 08/07/2017 12:06 PM, Jiri Pirko wrote: >> Mon, Aug 07, 2017 at 07:47:14PM CEST, john.fastabend@gmail.com wrote: >>> On 08/07/2017 09:41 AM, Jiri Pirko wrote: >>>> Hi Jamal/Cong/David/all. >>>> >>>> Digging in the u32 code deeper now. I need to get rid of tp->q for shared >>>> blocks, but I found out about this: >>>> >>>> struct Qdisc { >>>> ...... >>>> void *u32_node; >>>> ...... >>>> }; >>>> >>>> Yeah, ugly. u32 uses it to store some shared data, tp_c. It actually >>>> stores a linked list of all hashtables added to one qdiscs. >>>> >>>> So basically what you have is, you have 1 root ht per prio/pref. Then >>>> you can have multiple hts, linked from any other ht, does not matter in >>>> which prio/pref they are. >>>> >>> >>> We can create arbitrary hash tables here independent of prio/pref via >>> TCA_U32_DIVISOR. Then these can be linked to other hash tables via >>> TCA_U32_LINK commands. >> >> Yeah, that's what I thought. >> >> >>> >>> prio/pref does not really play any part here from my reading, except as >>> a further specifier in the walk callbacks. Making it a useful filter on >>> dump operations. >> >> Not correct. prio/pref is one level up priority, independent on specific >> cls implementation. You can have cls_u32 instance on prio 10 and >> cls_flower instance on prio 20. Both work. > > ah right, lets make sure I got this right then (its been awhile since I've > read this code). So the tcf_ctl_tfilter hook walks classifiers, inserting the > classifier by prio. Then tcf_classify walks the list of classifiers looking > for any matches, specifically any return codes it recognizes or a return code > greater than zero. u32 though has this link notion that allows users to jump > to other u32 classifiers that are in this list, because it has a global hash > table list. So the per prio classifier isolation is not true in u32 case. u32 filter supports multiple hash tables within a qdisc, struct tc_u_common is supposed to link them together. This has to be per qdisc because all of these hash tables belong to one qdisc and their ID's are unique within the qdisc. I dislike it too, and I actually tried to improve it in the past, unfortunately didn't make any real progress. I think we can definitely make it less ugly, but I don't think we can totally get rid of it because of the design of u32. Similar for tp->data. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Qdisc->u32_node - licence to kill 2017-08-07 23:21 ` Cong Wang @ 2017-08-09 12:40 ` Jamal Hadi Salim 2017-08-09 12:48 ` Jiri Pirko 0 siblings, 1 reply; 7+ messages in thread From: Jamal Hadi Salim @ 2017-08-09 12:40 UTC (permalink / raw) To: Cong Wang, John Fastabend Cc: Jiri Pirko, David Miller, Linux Kernel Network Developers, mlxsw On 17-08-07 07:21 PM, Cong Wang wrote: > On Mon, Aug 7, 2017 at 12:54 PM, John Fastabend > <john.fastabend@gmail.com> wrote: >> On 08/07/2017 12:06 PM, Jiri Pirko wrote: >>> Mon, Aug 07, 2017 at 07:47:14PM CEST, john.fastabend@gmail.com wrote: >>>> On 08/07/2017 09:41 AM, Jiri Pirko wrote: >>>>> Hi Jamal/Cong/David/all. >>>>> = >>> >>> Not correct. prio/pref is one level up priority, independent on specific >>> cls implementation. You can have cls_u32 instance on prio 10 and >>> cls_flower instance on prio 20. Both work. >> >> ah right, lets make sure I got this right then (its been awhile since I've >> read this code). So the tcf_ctl_tfilter hook walks classifiers, inserting the >> classifier by prio. Then tcf_classify walks the list of classifiers looking >> for any matches, specifically any return codes it recognizes or a return code >> greater than zero. u32 though has this link notion that allows users to jump >> to other u32 classifiers that are in this list, because it has a global hash >> table list. So the per prio classifier isolation is not true in u32 case. > > u32 filter supports multiple hash tables within a qdisc, struct > tc_u_common is supposed to link them together. This has to be > per qdisc because all of these hash tables belong to one qdisc > and their ID's are unique within the qdisc. > I think historically this used to sit within the u32 code as a static linked list; i cant recall why it got attached to the qdisc. Indeed, the idea is that hash tables can be added independently without linking and then linked afterwards. They have to be held somewhere in transient. And they have priorities (at least the prios are used in the dump) > I dislike it too, and I actually tried to improve it in the past, > unfortunately didn't make any real progress. I think we can > definitely make it less ugly, but I don't think we can totally > get rid of it because of the design of u32. > > Similar for tp->data. > tp->q maybe harder to deal with. I agree with getting rid of this dependency. Could this info be stored in the block instead? cheers, jamal ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Qdisc->u32_node - licence to kill 2017-08-09 12:40 ` Jamal Hadi Salim @ 2017-08-09 12:48 ` Jiri Pirko 0 siblings, 0 replies; 7+ messages in thread From: Jiri Pirko @ 2017-08-09 12:48 UTC (permalink / raw) To: Jamal Hadi Salim Cc: Cong Wang, John Fastabend, David Miller, Linux Kernel Network Developers, mlxsw Wed, Aug 09, 2017 at 02:40:30PM CEST, jhs@mojatatu.com wrote: >On 17-08-07 07:21 PM, Cong Wang wrote: >> On Mon, Aug 7, 2017 at 12:54 PM, John Fastabend >> <john.fastabend@gmail.com> wrote: >> > On 08/07/2017 12:06 PM, Jiri Pirko wrote: >> > > Mon, Aug 07, 2017 at 07:47:14PM CEST, john.fastabend@gmail.com wrote: >> > > > On 08/07/2017 09:41 AM, Jiri Pirko wrote: >> > > > > Hi Jamal/Cong/David/all. >> > > > > >= >> > > >> > > Not correct. prio/pref is one level up priority, independent on specific >> > > cls implementation. You can have cls_u32 instance on prio 10 and >> > > cls_flower instance on prio 20. Both work. >> > >> > ah right, lets make sure I got this right then (its been awhile since I've >> > read this code). So the tcf_ctl_tfilter hook walks classifiers, inserting the >> > classifier by prio. Then tcf_classify walks the list of classifiers looking >> > for any matches, specifically any return codes it recognizes or a return code >> > greater than zero. u32 though has this link notion that allows users to jump >> > to other u32 classifiers that are in this list, because it has a global hash >> > table list. So the per prio classifier isolation is not true in u32 case. >> >> u32 filter supports multiple hash tables within a qdisc, struct >> tc_u_common is supposed to link them together. This has to be >> per qdisc because all of these hash tables belong to one qdisc >> and their ID's are unique within the qdisc. >> > >I think historically this used to sit within the u32 code as a >static linked list; i cant recall why it got attached to the >qdisc. >Indeed, the idea is that hash tables can be added independently >without linking and then linked afterwards. They have to be held >somewhere in transient. And they have priorities (at least the >prios are used in the dump) > >> I dislike it too, and I actually tried to improve it in the past, >> unfortunately didn't make any real progress. I think we can >> definitely make it less ugly, but I don't think we can totally >> get rid of it because of the design of u32. >> >> Similar for tp->data. >> > >tp->q maybe harder to deal with. I agree with getting rid of >this dependency. Could this info be stored in the block instead? Yeah, I will have to do that. I just wanted to get rid of it if possible :/ > >cheers, >jamal ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-08-09 12:48 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-08-07 16:41 Qdisc->u32_node - licence to kill Jiri Pirko 2017-08-07 17:47 ` John Fastabend 2017-08-07 19:06 ` Jiri Pirko 2017-08-07 19:54 ` John Fastabend 2017-08-07 23:21 ` Cong Wang 2017-08-09 12:40 ` Jamal Hadi Salim 2017-08-09 12:48 ` Jiri Pirko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).