* [RFC] batched tc to improve change throughput
@ 2005-01-17 15:23 Thomas Graf
2005-01-17 15:45 ` jamal
` (2 more replies)
0 siblings, 3 replies; 44+ messages in thread
From: Thomas Graf @ 2005-01-17 15:23 UTC (permalink / raw)
To: Jamal Hadi Salim, Patrick McHardy, Stephen Hemminger; +Cc: netdev
While collecting performance numbers for the ematch changes
I realized that the throughput of changes per second is
almost only limited by the cost of starting the tc binary
over and over. In order to improve this, batching of commands
is required. My plan to do so is quite simple, introduce
a new flag -f which puts tc into batched mode and makes
it read commands from stdin. A bison based parser splits
things into tokens, the grammer would be quite easy:
INPUT ::= { /* empty */ | CMDS }
CMDS ::= { CMD | CMD ';' CMDS }
CMD ::= ARGS
ARGS ::= { STRING | STRING ARGS }
The lexical part can be made to ignore c-syle and
shell-style comments, i.e.
---
#!/sbin/tc -f
/* some comments here */
qdisc add ..
class ...
# shell like comments also possible
filter add ... basic match ...
---
Of course this loses ability to use shell features like
variables and loops and it's probably not worth trying
to emulate things. One can always generate these tc scripts
with the help of other tools like m4, you name it.
This could also be applied to ip of course.
Thoughts?
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [RFC] batched tc to improve change throughput 2005-01-17 15:23 [RFC] batched tc to improve change throughput Thomas Graf @ 2005-01-17 15:45 ` jamal 2005-01-17 16:05 ` Thomas Graf 2005-01-17 18:00 ` Stephen Hemminger 2005-01-17 18:02 ` Stephen Hemminger 2 siblings, 1 reply; 44+ messages in thread From: jamal @ 2005-01-17 15:45 UTC (permalink / raw) To: Thomas Graf; +Cc: Patrick McHardy, Stephen Hemminger, netdev You dont like the -batch option to tc? ;-> cheers, jamal On Mon, 2005-01-17 at 10:23, Thomas Graf wrote: > While collecting performance numbers for the ematch changes > I realized that the throughput of changes per second is > almost only limited by the cost of starting the tc binary > over and over. In order to improve this, batching of commands > is required. My plan to do so is quite simple, introduce > a new flag -f which puts tc into batched mode and makes > it read commands from stdin. A bison based parser splits > things into tokens, the grammer would be quite easy: > > INPUT ::= { /* empty */ | CMDS } > CMDS ::= { CMD | CMD ';' CMDS } > CMD ::= ARGS > ARGS ::= { STRING | STRING ARGS } > > The lexical part can be made to ignore c-syle and > shell-style comments, i.e. > > --- > #!/sbin/tc -f > > /* some comments here */ > qdisc add .. > class ... > > # shell like comments also possible > filter add ... basic match ... > --- > > Of course this loses ability to use shell features like > variables and loops and it's probably not worth trying > to emulate things. One can always generate these tc scripts > with the help of other tools like m4, you name it. > > This could also be applied to ip of course. > > Thoughts? > > ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-01-17 15:45 ` jamal @ 2005-01-17 16:05 ` Thomas Graf 2005-01-17 16:36 ` jamal 0 siblings, 1 reply; 44+ messages in thread From: Thomas Graf @ 2005-01-17 16:05 UTC (permalink / raw) To: jamal; +Cc: Patrick McHardy, Stephen Hemminger, netdev * jamal <1105976711.1078.1.camel@jzny.localdomain> 2005-01-17 10:45 > You dont like the -batch option to tc? ;-> No, because: - it duplicates logic - it doesn't allow any commenting - it doesn't get along with my more complicated ematch parsing ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-01-17 16:05 ` Thomas Graf @ 2005-01-17 16:36 ` jamal 2005-01-17 16:56 ` Thomas Graf 0 siblings, 1 reply; 44+ messages in thread From: jamal @ 2005-01-17 16:36 UTC (permalink / raw) To: Thomas Graf; +Cc: Patrick McHardy, Stephen Hemminger, netdev On Mon, 2005-01-17 at 11:05, Thomas Graf wrote: > * jamal <1105976711.1078.1.camel@jzny.localdomain> 2005-01-17 10:45 > > You dont like the -batch option to tc? ;-> > > No, because: > > - it duplicates logic Didnt follow this - uses the same code as command line. What logic gets duplicated? > - it doesn't allow any commenting Trivial thing you can fix in about 33.5 seconds ;-> > - it doesn't get along with my more complicated ematch parsing Example? cheers, jamal ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-01-17 16:36 ` jamal @ 2005-01-17 16:56 ` Thomas Graf 2005-01-17 22:49 ` jamal 0 siblings, 1 reply; 44+ messages in thread From: Thomas Graf @ 2005-01-17 16:56 UTC (permalink / raw) To: jamal; +Cc: Patrick McHardy, Stephen Hemminger, netdev * jamal <1105979807.1078.16.camel@jzny.localdomain> 2005-01-17 11:36 > On Mon, 2005-01-17 at 11:05, Thomas Graf wrote: > > * jamal <1105976711.1078.1.camel@jzny.localdomain> 2005-01-17 10:45 > > > You dont like the -batch option to tc? ;-> > > > > No, because: > > > > - it duplicates logic > > Didnt follow this - uses the same code as command line. What logic gets > duplicated? The parsing of top level nodes. > > - it doesn't allow any commenting > > Trivial thing you can fix in about 33.5 seconds ;- Simple full-line comments yes, mid-line comments no. -batch is also not able to split things across multiple lines. I want my scripts to look like this: /** * filter dla_fp * match DLA traffic at lower watermark */ tc filter add dev %DEV parent 1:12 prio 40 protocol all basic match meta(nfmark eq %LOW_WATERMARK) and ( nbyte("\x0\x1\x2\x3\x4" at 4 layer 2) /* 00 01 02 03 04 (dla fp) */ or u32(ip src 10.0.0.0/8) ) flowid 1:20 > > - it doesn't get along with my more complicated ematch parsing > > Example? stuff like nbyte, kmp or regexp rely on quoted strings and those would be destroyed if they'd contain whitespaces. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-01-17 16:56 ` Thomas Graf @ 2005-01-17 22:49 ` jamal 2005-01-18 13:44 ` Thomas Graf 0 siblings, 1 reply; 44+ messages in thread From: jamal @ 2005-01-17 22:49 UTC (permalink / raw) To: Thomas Graf; +Cc: Patrick McHardy, Stephen Hemminger, netdev On Mon, 2005-01-17 at 11:56, Thomas Graf wrote: > * jamal <1105979807.1078.16.camel@jzny.localdomain> 2005-01-17 11:36 [..] > > Didnt follow this - uses the same code as command line. What logic gets > > duplicated? > > The parsing of top level nodes. Probably not very big deal - will just force you to type in the commands in full over and over in your batch file > > > - it doesn't allow any commenting > > > > Trivial thing you can fix in about 33.5 seconds ;- > > Simple full-line comments yes, mid-line comments no. -batch > is also not able to split things across multiple lines. > > I want my scripts to look like this: > > /** > * filter dla_fp > * match DLA traffic at lower watermark > */ > tc filter add > dev %DEV > parent 1:12 > prio 40 > protocol all > basic match meta(nfmark eq %LOW_WATERMARK) > and ( > nbyte("\x0\x1\x2\x3\x4" at 4 layer 2) /* 00 01 02 03 04 (dla fp) */ > or u32(ip src 10.0.0.0/8) > ) > flowid 1:20 > > It does look clean. - btw look at Werners approach on tcng as well. > > > - it doesn't get along with my more complicated ematch parsing > > > > Example? > > stuff like nbyte, kmp or regexp rely on quoted strings and those > would be destroyed if they'd contain whitespaces. sounds reasonable. Another thing that would be really neat is to have a iso like cli (something like what zebra has) so you can go down the parse tree to say the ematch level and just start typing away these commands. Should probably be easy to rip off the vtysh stuff off zebra or use libio or something along those lines to do this. cheers, jamal ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-01-17 22:49 ` jamal @ 2005-01-18 13:44 ` Thomas Graf 2005-01-18 14:29 ` jamal 0 siblings, 1 reply; 44+ messages in thread From: Thomas Graf @ 2005-01-18 13:44 UTC (permalink / raw) To: jamal; +Cc: Patrick McHardy, Stephen Hemminger, netdev * jamal <1106002197.1046.19.camel@jzny.localdomain> 2005-01-17 17:49 > On Mon, 2005-01-17 at 11:56, Thomas Graf wrote: > > /** > > * filter dla_fp > > * match DLA traffic at lower watermark > > */ > > tc filter add > > dev %DEV > > parent 1:12 > > prio 40 > > protocol all > > basic match meta(nfmark eq %LOW_WATERMARK) > > and ( > > nbyte("\x0\x1\x2\x3\x4" at 4 layer 2) /* 00 01 02 03 04 (dla fp) */ > > or u32(ip src 10.0.0.0/8) > > ) > > flowid 1:20 > > > > > > It does look clean. - btw look at Werners approach on tcng as well. I'm aware of it but naturally it always lags behind a bit and keeping it up to date requires quite some work and I already have problems finding the time for my own changes ;-> > Another thing that would be really neat is to have a iso like cli > (something like what zebra has) so you can go down the parse tree to > say the ematch level and just start typing away these commands. > Should probably be easy to rip off the vtysh stuff off zebra or > use libio or something along those lines to do this. I wouldn't call it easy but it's doable. I'm not sure if entering/leaving subsystem features makes any sense. I find a context help by pressing '?' and normal completion most useful. It's not that I dislike your idea but I think it's not worth it. Actualy, I've been working on such a thing (called netsh) being a frontend to iproute2 + tc + ... with 3 modes: - batched mode (-f) - interactive shell supporting context help + completion - call over arguments It includes a quite easy to use API to define the grammar which can be used by readline to do the completion and print context aware help. It could be easly ported to iproute2 but every module needs to be changed, luckly this can happen step by step. I will port it to iproute2 and transform one of the easier modules like neighbour to it and we can see if we like it. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-01-18 13:44 ` Thomas Graf @ 2005-01-18 14:29 ` jamal 2005-01-18 14:36 ` Lennert Buytenhek ` (2 more replies) 0 siblings, 3 replies; 44+ messages in thread From: jamal @ 2005-01-18 14:29 UTC (permalink / raw) To: Thomas Graf Cc: Patrick McHardy, Stephen Hemminger, netdev, Werner Almesberger On Tue, 2005-01-18 at 08:44, Thomas Graf wrote: > * jamal <1106002197.1046.19.camel@jzny.localdomain> 2005-01-17 17:49 > > On Mon, 2005-01-17 at 11:56, Thomas Graf wrote: > > > /** > > > * filter dla_fp > > > * match DLA traffic at lower watermark > > > */ > > > tc filter add > > > dev %DEV > > > parent 1:12 > > > prio 40 > > > protocol all > > > basic match meta(nfmark eq %LOW_WATERMARK) > > > and ( > > > nbyte("\x0\x1\x2\x3\x4" at 4 layer 2) /* 00 01 02 03 04 (dla fp) */ > > > or u32(ip src 10.0.0.0/8) > > > ) > > > flowid 1:20 > > > > > > > > > > It does look clean. - btw look at Werners approach on tcng as well. > > I'm aware of it but naturally it always lags behind a bit and keeping > it up to date requires quite some work and I already have problems > finding the time for my own changes ;-> > I think it is worth getting mr. Almesbergers view. CCed him. > > Another thing that would be really neat is to have a iso like cli > > (something like what zebra has) so you can go down the parse tree to > > say the ematch level and just start typing away these commands. > > Should probably be easy to rip off the vtysh stuff off zebra or > > use libio or something along those lines to do this. > > I wouldn't call it easy but it's doable. I dont think its hard at all. It would take me, cycles pending, not more than a day to whip something off libio. > I'm not sure if > entering/leaving subsystem features makes any sense. I find a context > help by pressing '?' and normal completion most useful. It's not > that I dislike your idea but I think it's not worth it. What doesnt make sense or is not worth it? Two problems that are to be solved - whatever the solution is, it needs to address them: a) usability. i) I dont need to remember how the parse tree looks like or where i am on the parse tree. I go: tc <enter> tc> ? i get some help on the next levels. ii) I should be able to ssh to this thing from some remote location. This way i can write some scripts to automate things b) extrenous typing on command line. I go to the filter level u32> ? gives me help u32> context filter dev lo parent ffff: protocol ip prio 10 u32> add u32> match ip src 10.0.0.21/32 flowid 1:16 action drop u32> match ip src 0/0 flowid 1:16 action ok u32> commit filters submitted .. u32> .. //takes you up one u32> ls listing here of filter dev lo parent ffff: protocol ip prio 10 .. .. u32> /qdisc/dev/eth0 now into the qdisc level context for eth0 > Actualy, > I've been working on such a thing (called netsh) being a frontend > to iproute2 + tc + ... with 3 modes: > - batched mode (-f) this is useful. > - interactive shell supporting context help + completion MUST > - call over arguments Dont understand this. > It includes a quite easy to use API to define the grammar which > can be used by readline to do the completion and print context > aware help. what does readline provide you again? > It could be easly ported to iproute2 but every > module needs to be changed, luckly this can happen step by > step. I will port it to iproute2 and transform one of the > easier modules like neighbour to it and we can see if we like > it. I think iproute2 should stay as is - dont wanna break someones scripts or make it fatter than it is already. Any app to provide the above should be standalone. cheers, jamal ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-01-18 14:29 ` jamal @ 2005-01-18 14:36 ` Lennert Buytenhek 2005-01-18 14:43 ` jamal 2005-01-18 14:58 ` Thomas Graf 2005-01-18 15:07 ` Werner Almesberger 2 siblings, 1 reply; 44+ messages in thread From: Lennert Buytenhek @ 2005-01-18 14:36 UTC (permalink / raw) To: jamal Cc: Thomas Graf, Patrick McHardy, Stephen Hemminger, netdev, Werner Almesberger On Tue, Jan 18, 2005 at 09:29:52AM -0500, jamal wrote: > > > Another thing that would be really neat is to have a iso like cli > > > (something like what zebra has) so you can go down the parse tree to > > > say the ematch level and just start typing away these commands. > > > Should probably be easy to rip off the vtysh stuff off zebra or > > > use libio or something along those lines to do this. > > > > I wouldn't call it easy but it's doable. > > I dont think its hard at all. It would take me, cycles pending, not more > than a day to whip something off libio. If you do this, please consider using Juniper config syntax instead of doing it the Cisco/quagga way. cheers, Lennert ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-01-18 14:36 ` Lennert Buytenhek @ 2005-01-18 14:43 ` jamal 2005-01-18 15:07 ` Thomas Graf 2005-01-18 15:20 ` Lennert Buytenhek 0 siblings, 2 replies; 44+ messages in thread From: jamal @ 2005-01-18 14:43 UTC (permalink / raw) To: Lennert Buytenhek Cc: Thomas Graf, Patrick McHardy, Stephen Hemminger, netdev, Werner Almesberger On Tue, 2005-01-18 at 09:36, Lennert Buytenhek wrote: > On Tue, Jan 18, 2005 at 09:29:52AM -0500, jamal wrote: > If you do this, please consider using Juniper config syntax instead > of doing it the Cisco/quagga way. > Juniper is XML driven config files? [I am hoping Thomas would do it, btw;-> only if we strongly disagree then i will be tempted to provide an alternative]. btw, libio uses libevent; i recall you said you had some alternative to it. cheers, jamal ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-01-18 14:43 ` jamal @ 2005-01-18 15:07 ` Thomas Graf 2005-01-18 15:20 ` Lennert Buytenhek 1 sibling, 0 replies; 44+ messages in thread From: Thomas Graf @ 2005-01-18 15:07 UTC (permalink / raw) To: jamal Cc: Lennert Buytenhek, Patrick McHardy, Stephen Hemminger, netdev, Werner Almesberger * jamal <1106059431.1035.101.camel@jzny.localdomain> 2005-01-18 09:43 > On Tue, 2005-01-18 at 09:36, Lennert Buytenhek wrote: > > On Tue, Jan 18, 2005 at 09:29:52AM -0500, jamal wrote: > > > If you do this, please consider using Juniper config syntax instead > > of doing it the Cisco/quagga way. > > > > Juniper is XML driven config files? > [I am hoping Thomas would do it, btw;-> only if we strongly disagree > then i will be tempted to provide an alternative]. I'm sure we can find somethign everyone ges along with just fine. Iff we do the XML thing we might want to try to stick to the ietf netconf thoughts. > btw, libio uses libevent; i recall you said you had some alternative to > it. libio couldbe put underneath libreadline but it doesn't make much sense, I think remote shells do the job just fine. I'd favour a XML protocol like netconf if we want to follow the remote configuration path. Endianess issues will hit us quite hard though. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-01-18 14:43 ` jamal 2005-01-18 15:07 ` Thomas Graf @ 2005-01-18 15:20 ` Lennert Buytenhek 2005-01-19 14:24 ` jamal 1 sibling, 1 reply; 44+ messages in thread From: Lennert Buytenhek @ 2005-01-18 15:20 UTC (permalink / raw) To: jamal Cc: Thomas Graf, Patrick McHardy, Stephen Hemminger, netdev, Werner Almesberger On Tue, Jan 18, 2005 at 09:43:51AM -0500, jamal wrote: > > If you do this, please consider using Juniper config syntax instead > > of doing it the Cisco/quagga way. > > Juniper is XML driven config files? Uhm, no :) The main advantage over Cisco (IMHO, the last thing I want is to start a holy war) is hierarchical config syntax, and the ability to commit/rollback all your changes in one go. It's unfortunately rather more verbose, though. Example at the bottom. > btw, libio uses libevent; i recall you said you had some alternative to > it. Yeah, ivykis (http://libivykis.sourceforge.net/) has been happily used in-house at my last job for years but never really caught on anywhere else, which is perhaps because I never did much lobbying for it. The way it works also requires all code written for it to be fully async (can't use blocking code anywhere), which I think is an advantage but everyone else thinks is a disadvantage. cheers, Lennert This is an example of a Juniper policy-statement, which is basically just a prefix filter. This particular filter controls which non-OSPF prefixes are exported to OSPF. buytenh@asd-tc2-m20core1> show configuration policy-options policy-statement ospf-export term accept-default { from { route-filter 0.0.0.0/0 exact; } then { external { type 1; } accept; } } term accept-peering { from { protocol direct; route-filter 0.0.0.0/0 prefix-length-range /30-/30; } then { external { type 1; } accept; } } term accept-ams-ix { from { protocol direct; route-filter 195.69.144.0/23 exact; } then { external { type 1; } accept; } } term reject-rest { then reject; } buytenh@asd-tc2-m20core1> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-01-18 15:20 ` Lennert Buytenhek @ 2005-01-19 14:24 ` jamal 0 siblings, 0 replies; 44+ messages in thread From: jamal @ 2005-01-19 14:24 UTC (permalink / raw) To: Lennert Buytenhek Cc: Thomas Graf, Patrick McHardy, Stephen Hemminger, netdev, Werner Almesberger On Tue, 2005-01-18 at 10:20, Lennert Buytenhek wrote: > On Tue, Jan 18, 2005 at 09:43:51AM -0500, jamal wrote: > > > > If you do this, please consider using Juniper config syntax instead > > > of doing it the Cisco/quagga way. > > > > Juniper is XML driven config files? > > Uhm, no :) The main advantage over Cisco (IMHO, the last thing I want > is to start a holy war) is hierarchical config syntax, and the ability > to commit/rollback all your changes in one go. It's unfortunately > rather more verbose, though. As long as a machine(as opposed to a human) is worrying about the verbosity - should be fine ;-> > Example at the bottom. It has a well defined structure. Do you know if the publish the bnf to it? Are these guys gonna come after us if we replicate it you think? > > > btw, libio uses libevent; i recall you said you had some alternative to > > it. > > Yeah, ivykis (http://libivykis.sourceforge.net/) has been happily used > in-house at my last job for years but never really caught on anywhere > else, which is perhaps because I never did much lobbying for it. The > way it works also requires all code written for it to be fully async > (can't use blocking code anywhere), which I think is an advantage but > everyone else thinks is a disadvantage. very similar to libevent. I am gonna start using yours because then i could send you emails which start with "Dear sir, .. i have doubts because it doesnt work" ;-> cheers, jamal ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-01-18 14:29 ` jamal 2005-01-18 14:36 ` Lennert Buytenhek @ 2005-01-18 14:58 ` Thomas Graf 2005-01-18 15:23 ` Lennert Buytenhek 2005-01-19 14:13 ` jamal 2005-01-18 15:07 ` Werner Almesberger 2 siblings, 2 replies; 44+ messages in thread From: Thomas Graf @ 2005-01-18 14:58 UTC (permalink / raw) To: jamal; +Cc: Patrick McHardy, Stephen Hemminger, netdev, Werner Almesberger * jamal <1106058592.1035.95.camel@jzny.localdomain> 2005-01-18 09:29 > On Tue, 2005-01-18 at 08:44, Thomas Graf wrote: > > I'm not sure if > > entering/leaving subsystem features makes any sense. I find a context > > help by pressing '?' and normal completion most useful. It's not > > that I dislike your idea but I think it's not worth it. > > What doesnt make sense or is not worth it? My very personal opinion is that it's not worth it. > a) usability. > i) I dont need to remember how the parse tree looks like or where i am > on the parse tree. > I go: > tc <enter> > tc> ? > i get some help on the next levels. > ii) I should be able to ssh to this thing from some remote location. > This way i can write some scripts to automate things > > b) extrenous typing on command line. > I go to the filter level > > u32> ? > gives me help > u32> context > filter dev lo parent ffff: protocol ip prio 10 > u32> add > u32> match ip src 10.0.0.21/32 flowid 1:16 action drop > u32> match ip src 0/0 flowid 1:16 action ok > u32> commit > filters submitted .. What do you if there is an error? To what kind of context do you go? Let's say the kernel reports -EINVAL. > u32> .. //takes you up one > u32> ls > listing here of filter dev lo parent ffff: protocol ip prio 10 > .. > .. > u32> /qdisc/dev/eth0 > now into the qdisc level context for eth0 That's what I have: tgr:axs ~/dev/netconfig/src ./netconfig ... axs# ? Next level commands: link ... Link (interface) configuration neighbour ... Neighbour (ARP) configuration warranty Show warranty exit Quit application axs# n? Backtrace: ->neighbour - Neighbour (ARP) configuration Description: Module to view and modify the neighbour tables. The neighbour table establishes bindings between protocol addresses and link layer addresses for hosts sharing the same physical link. This module allows you to view the content of these tables and to manipulate their content. Next level commands: add <ADDR> ... Add a neighbour modify <ADDR> ... Modify a neighbour delete <ADDR> ... Delete a neighbour list ... List neighbour attributes axs# neighbour l? Backtrace: ->neighbour - Neighbour (ARP) configuration ->list - List neighbour attributes Next level commands: <cr> Command can be executed at this point. stats ... Verbose listing (all attributes/statistics) where ... Only dump neighbours matching a filter axs# neighbour list w? Backtrace: ->neighbour - Neighbour (ARP) configuration ->list - List neighbour attributes ->where - Only dump neighbours matching a filter Attributes of this node: lladdr <LLADDR> Link layer address dst <ADDR> Destination address dev <DEV> Link the neighbour is on Next level commands: <cr> Command can be executed at this point. flags ... axs# ... Again, It's not that I dislike contexts but in the end it all gets down to make error correction as easy as possible. Everytime you request a completion or context help the command will get parsed and a very verbose message including the possibilities you have will be printed and you can correct your error. It's more typing work I know, but usually one only types the first 1..3 chars of the commands. I think something like this should be the base and contextes can be build upon it. > > - call over arguments > > Dont understand this. The way it is now, configuration over program arguments. > > It includes a quite easy to use API to define the grammar which > > can be used by readline to do the completion and print context > > aware help. > > what does readline provide you again? It basically takes over the reading of a line and allows manipulation of the input buffer. It implements all the useful line editing like in bash and helps with completion. You can bind the '?' key to a help function so that '?' will not be printed on the screen but instead the help text is printed and you get back your original line untouched. It also gives the user a chance to bind keys to certain actions so everyone can keep the bindings they like with the additional possibilities to export functions so one could for example bind C-N to "list-neighbours". > I think iproute2 should stay as is - dont wanna break someones scripts > or make it fatter than it is already. Any app to provide the above > should be standalone. Well, you mean like generating iproute2 input? This means we'd have to reimplement the logic twice and handling errors from iproute2 gets really hard. It's not a problem to keep the old way of iproute2 as-is. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-01-18 14:58 ` Thomas Graf @ 2005-01-18 15:23 ` Lennert Buytenhek 2005-01-19 14:13 ` jamal 1 sibling, 0 replies; 44+ messages in thread From: Lennert Buytenhek @ 2005-01-18 15:23 UTC (permalink / raw) To: Thomas Graf Cc: jamal, Patrick McHardy, Stephen Hemminger, netdev, Werner Almesberger On Tue, Jan 18, 2005 at 03:58:30PM +0100, Thomas Graf wrote: > > u32> ? > > gives me help > > u32> context > > filter dev lo parent ffff: protocol ip prio 10 > > u32> add > > u32> match ip src 10.0.0.21/32 flowid 1:16 action drop > > u32> match ip src 0/0 flowid 1:16 action ok > > u32> commit > > filters submitted .. > > What do you if there is an error? To what kind of context do you > go? Let's say the kernel reports -EINVAL. You just refuse the commit and stay where you are: buytenh@asd-tc2-m20core1> configure exclusive Entering configuration mode [edit] buytenh@asd-tc2-m20core1# edit policy-options policy-statement test [edit policy-options policy-statement test] buytenh@asd-tc2-m20core1# set from prefix-list test [edit policy-options policy-statement test] buytenh@asd-tc2-m20core1# set then accept [edit policy-options policy-statement test] buytenh@asd-tc2-m20core1# show from { prefix-list test; } then accept; [edit policy-options policy-statement test] buytenh@asd-tc2-m20core1# commit [edit] 'policy-options' Policy error: test prefix-list referenced but not defined error: configuration check-out failed [edit policy-options policy-statement test] buytenh@asd-tc2-m20core1# top [edit] buytenh@asd-tc2-m20core1# rollback load complete [edit] buytenh@asd-tc2-m20core1# ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-01-18 14:58 ` Thomas Graf 2005-01-18 15:23 ` Lennert Buytenhek @ 2005-01-19 14:13 ` jamal 2005-01-19 14:36 ` Thomas Graf ` (2 more replies) 1 sibling, 3 replies; 44+ messages in thread From: jamal @ 2005-01-19 14:13 UTC (permalink / raw) To: Thomas Graf Cc: Patrick McHardy, Stephen Hemminger, netdev, Werner Almesberger On Tue, 2005-01-18 at 09:58, Thomas Graf wrote: > * jamal <1106058592.1035.95.camel@jzny.localdomain> 2005-01-18 09:29 > > On Tue, 2005-01-18 at 08:44, Thomas Graf wrote: > > > I'm not sure if > > > entering/leaving subsystem features makes any sense. I find a context > > > help by pressing '?' and normal completion most useful. It's not > > > that I dislike your idea but I think it's not worth it. > > > > What doesnt make sense or is not worth it? > > My very personal opinion is that it's not worth it. Ok, lets discuss more see if we can change that ;-> > > a) usability. > > i) I dont need to remember how the parse tree looks like or where i am > > on the parse tree. > > I go: > > tc <enter> > > tc> ? > > i get some help on the next levels. > > ii) I should be able to ssh to this thing from some remote location. > > This way i can write some scripts to automate things > > > > b) extrenous typing on command line. > > I go to the filter level > > > > u32> ? > > gives me help > > u32> context > > filter dev lo parent ffff: protocol ip prio 10 > > u32> add > > u32> match ip src 10.0.0.21/32 flowid 1:16 action drop > > u32> match ip src 0/0 flowid 1:16 action ok > > u32> commit > > filters submitted .. > > What do you if there is an error? To what kind of context do you > go? Let's say the kernel reports -EINVAL. > As Lennert was saying, puke whatever the kernel said and allow for rollback. i.e undo the first one if it succeeded for example. > tgr:axs ~/dev/netconfig/src ./netconfig [some really neat stuff deleted for brevity] > Again, It's not that I dislike contexts but in the end it all > gets down to make error correction as easy as possible. Everytime > you request a completion or context help the command will get > parsed and a very verbose message including the possibilities you > have will be printed and you can correct your error. > > It's more typing work I know, but usually one only types the first > 1..3 chars of the commands. > Same as in what i showed. Probably not a very big deal. The one neat thing about the context approach is it allows you to remember state easily. As an example, after commiting those two u32 filters and you want to undo, you then remember what it is that you can undo. What you have is really nice because you could use standard tools such as "| grep forsomething | formatit etc" to manipulate further. This would be hard to do in the case of what i described earlier. The best solution we can have is to have a mix of the two approaches IMO. > > > It includes a quite easy to use API to define the grammar which > > > can be used by readline to do the completion and print context > > > aware help. > > > > what does readline provide you again? > > It basically takes over the reading of a line and allows manipulation > of the input buffer. It implements all the useful line editing like > in bash and helps with completion. You can bind the '?' key to a > help function so that '?' will not be printed on the screen but instead > the help text is printed and you get back your original line untouched. > It also gives the user a chance to bind keys to certain actions so > everyone can keep the bindings they like with the additional > possibilities to export functions so one could for example bind C-N > to "list-neighbours". These are all very nice features to have. > > I think iproute2 should stay as is - dont wanna break someones scripts > > or make it fatter than it is already. Any app to provide the above > > should be standalone. > > Well, you mean like generating iproute2 input? This means we'd have to > reimplement the logic twice and handling errors from iproute2 gets > really hard. It's not a problem to keep the old way of iproute2 as-is. What i mean is that we should probably leave iproute2 code alone so that people can run old scripts etc with it. i.e the netsh tool should just either reuse libnetlink and add any things to it or create a brand new library. cheers, jamal ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-01-19 14:13 ` jamal @ 2005-01-19 14:36 ` Thomas Graf 2005-01-19 16:45 ` Werner Almesberger 2005-01-19 16:54 ` Thomas Graf 2 siblings, 0 replies; 44+ messages in thread From: Thomas Graf @ 2005-01-19 14:36 UTC (permalink / raw) To: jamal; +Cc: Patrick McHardy, Stephen Hemminger, netdev, Werner Almesberger * jamal <1106144009.1047.989.camel@jzny.localdomain> 2005-01-19 09:13 > On Tue, 2005-01-18 at 09:58, Thomas Graf wrote: > > What do you if there is an error? To what kind of context do you > > go? Let's say the kernel reports -EINVAL. > > > > As Lennert was saying, puke whatever the kernel said and allow for > rollback. i.e undo the first one if it succeeded for example. Yes but this is not dependant on a context, is it? It is possible, I have it working for most netlink messages but it is non trivial. > > Again, It's not that I dislike contexts but in the end it all > > gets down to make error correction as easy as possible. Everytime > > you request a completion or context help the command will get > > parsed and a very verbose message including the possibilities you > > have will be printed and you can correct your error. > > > > It's more typing work I know, but usually one only types the first > > 1..3 chars of the commands. > > > > Same as in what i showed. Probably not a very big deal. > The one neat thing about the context approach is it allows you to > remember state easily. As an example, after commiting those two u32 > filters and you want to undo, you then remember what it is that you can > undo. So it seems you really want to do commit/rollback on context level. Can you maybe explain this more verbosely, maybe it's easier than I think. The commit/rollback is only useful for groupings of requests such as complete filtering configurations, the consistency of a single addition should be handled properly by the kernel. The changes in my tcf_exts patches solved most of them except for the indev stuff. > > > I think iproute2 should stay as is - dont wanna break someones scripts > > > or make it fatter than it is already. Any app to provide the above > > > should be standalone. > > > > Well, you mean like generating iproute2 input? This means we'd have to > > reimplement the logic twice and handling errors from iproute2 gets > > really hard. It's not a problem to keep the old way of iproute2 as-is. > > What i mean is that we should probably leave iproute2 code alone so that > people can run old scripts etc with it. i.e the netsh tool should just > either reuse libnetlink and add any things to it or create a brand new > library. Well, libnetlink contains 1% of the code required, 99% are in the modules and that's the hard work. Remeber libnl? netsh is based on it and it took me 2 weeks to get neighbour and link code finished and working, it is more powerful than ip now though. The basic features of ip and tc isn't very difficult but there's quite a lot more in iproute2 which needs to be mostly rewritten to fit into a library. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-01-19 14:13 ` jamal 2005-01-19 14:36 ` Thomas Graf @ 2005-01-19 16:45 ` Werner Almesberger 2005-01-19 16:54 ` Thomas Graf 2 siblings, 0 replies; 44+ messages in thread From: Werner Almesberger @ 2005-01-19 16:45 UTC (permalink / raw) To: jamal; +Cc: Thomas Graf, Patrick McHardy, Stephen Hemminger, netdev jamal wrote: > As Lennert was saying, puke whatever the kernel said and allow for > rollback. i.e undo the first one if it succeeded for example. How about this: you have a shadow configuration tree on each ingress and egress point (i.e. two per interface). An update generates the shadow tree. If done, you commit the shadow tree to become the new config, and then you can quietly dismantle the old config. If you didn't like the changes, you kill the shadow tree, without committing it. To preserve queue content and element state (e.g. policer state), you'd also have to have a means to tell where to take things from, e.g. "shadow" qdisc X inherits the packets from "real" qdisc Y. All this could be checked for consistency at setup time. There are of course limitations, e.g. when you merge or split flows. As an optimization, you could have some "lazy copy": instead of creating new qdisc "inheriting" from some other, you just move it from the old to the new tree. - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina werner@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-01-19 14:13 ` jamal 2005-01-19 14:36 ` Thomas Graf 2005-01-19 16:45 ` Werner Almesberger @ 2005-01-19 16:54 ` Thomas Graf 2005-01-20 14:42 ` jamal 2 siblings, 1 reply; 44+ messages in thread From: Thomas Graf @ 2005-01-19 16:54 UTC (permalink / raw) To: jamal; +Cc: Patrick McHardy, Stephen Hemminger, netdev, Werner Almesberger * jamal <1106144009.1047.989.camel@jzny.localdomain> 2005-01-19 09:13 > What i mean is that we should probably leave iproute2 code alone so that > people can run old scripts etc with it. i.e the netsh tool should just > either reuse libnetlink and add any things to it or create a brand new > library. Inspected some more code and I've finished already more than I thought. The architecture currently allows specifying the grammar with macros like this: NODELIST(neigh_modify_dev) NODE(dev) CALLBACK(set_dev) FOLLOW(neigh_modify) ARG(GA_TEXT, &storage.dev, CACHE_MGEN_FUNC(ifname_dst), "<DEV>") DESC("Link") END_NODE END_NODELIST NODELIST(neigh_ops) NODE(add) FOLLOW(neigh_add_dev) CALLBACK(do_neigh_add) ARG(GA_TEXT, &storage.dst, CACHE_MGEN_FUNC(dst), "<ADDR>") DESC("Add a neighbour") END_NODE NODE(modify) FOLLOW(neigh_modify_dev) CALLBACK(do_neigh_modify) ARG(GA_TEXT, &storage.dst, CACHE_MGEN_FUNC(dst), "<ADDR>") DESC("Modify a neighbour") END_NODE NODE(delete) FOLLOW(neigh_del_dev) CALLBACK(do_neigh_del) ARG(GA_TEXT, &storage.dst, CACHE_MGEN_FUNC(dst), "<ADDR>") DESC("Delete a neighbour") END_NODE NODE(list) FOLLOW(neigh_list_attrs) CALLBACK(do_neigh_list) DESC("List neighbour attributes") END_NODE END_NODELIST TOPNODE(ng, neighbour) FOLLOW(neigh_ops) DESC("Neighbour (ARP) configuration") LONG_DESC( " Module to view and modify the neighbour tables.\n" " \n" \ " The neighbour table establishes bindings between protocol\n" \ " addresses and link layer addresses for hosts sharing the same\n" \ " physical link. This module allows you to view the content of\n" \ " these tables and to manipulate their content.\n") END_TOPNODE Looks a bit complicated but is actually quite easy, you can do it the linux way. This will get your full completion and context help for your grammar and also completion of arguments like link names, addresses in neighbour cache, etc. All you have to do is specify a function returning a list of possibilities. It allows you to build recursive grammars and multiple end points in the automation. The above results in this: axs# neigh ? Backtrace: ->neighbour - Neighbour (ARP) configuration Description: Module to view and modify the neighbour tables. The neighbour table establishes bindings between protocol addresses and link layer addresses for hosts sharing the same physical link. This module allows you to view the content of these tables and to manipulate their content. Next level commands: add <ADDR> ... Add a neighbour modify <ADDR> ... Modify a neighbour delete <ADDR> ... Delete a neighbour list ... List neighbour attributes axs# neigh modify ? Backtrace: ->neighbour - Neighbour (ARP) configuration ->modify - Modify a neighbour Expecting argument: <ADDR> axs# neigh modify <TAB> 192.168.23.12 192.168.23.13 axs# neigh modify 192.168.23.1 Note: the <TAB> above will look up the addresses in the neighbour table and list the possible entries, since all share a common prefix it is automatically filled in. There is quite some thinking in these completion functions, the link name completion in neighbour context will only show up links which actually have neighbour entries. The status of the whole thing: link and neighbour are finished, core architecture finished as well, route is half done, addresses are half done (both easy to finish). libnl has net/sched/ finished but is still missing code for a lot of modules. Session management (commit/rollback) was once in but was too unstable, needs a partial rewrite (design flaws) but should fit in quite easly because libnl was designed to support it. It will basically look like: nl_session_start(); ... any high level operations .. if (nl_session_commit() < 0) nl_session_rollback(); Problems? Keeping the cache valid (multiple netlink programs). The final update just before the commit and the commit itself must be atomic. Solutions: - Use ATOMIC flag (dangerous) - Seq counter in netlink, increased evertime a netlink message gets processed and returned in ack. A netlink request may contain a flag and the expected sequence number and the request gets only processed if they match, otherwise the request fails. (my favourite) - Lock file in userspace (how to enfroce everyone to use it?) - Try to detect changes from third party after commit. Quite hard but possible, reduces race window but doesn't close it completely. Background: I keep 2 caches, 1 cache represents the current state in the kernel, it gets updated when required. The second cache contains the local caches. The first cache gets merged into the second before the update and then gets commited. In case of a failure the first cache is used to restore things. Thoughts? ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-01-19 16:54 ` Thomas Graf @ 2005-01-20 14:42 ` jamal 2005-01-20 15:35 ` Thomas Graf 0 siblings, 1 reply; 44+ messages in thread From: jamal @ 2005-01-20 14:42 UTC (permalink / raw) To: Thomas Graf Cc: Patrick McHardy, Stephen Hemminger, netdev, Werner Almesberger On Wed, 2005-01-19 at 11:54, Thomas Graf wrote: > * jamal <1106144009.1047.989.camel@jzny.localdomain> 2005-01-19 09:13 > > What i mean is that we should probably leave iproute2 code alone so that > > people can run old scripts etc with it. i.e the netsh tool should just > > either reuse libnetlink and add any things to it or create a brand new > > library. > > Inspected some more code and I've finished already more than I thought. > The architecture currently allows specifying the grammar with macros > like this: [.. good stuff was here ..] I like it. Assuming we can have arbitrary hierachies; you just show one level - but that may be just the example at hand. Given that should be able to meet the layout requirements that Lennert alluded to earlier. > Looks a bit complicated but is actually quite easy, you can do it the > linux way. I get it, like it and yes TheLinuxWay is the OnlyWay ;-> [..] > The status of the whole thing: link and neighbour are finished, > core architecture finished as well, route is half done, addresses > are half done (both easy to finish). libnl has net/sched/ > finished but is still missing code for a lot of modules. This is the part i am a little uncomfortable with. If you can make that library maybe part of iproute2 it would ease maintanance. Extend libnetlink or have another layer on top of it. I know you have already put the effort, but consider this thought. > Session > management (commit/rollback) was once in but was too unstable, > needs a partial rewrite (design flaws) but should fit in quite > easly because libnl was designed to support it. It will basically > look like: nl_session_start(); ... any high level operations .. > if (nl_session_commit() < 0) nl_session_rollback(); > Looks right. > Problems? Keeping the cache valid (multiple netlink programs). > The final update just before the commit and the commit itself > must be atomic. > Indeed. > Solutions: > - Use ATOMIC flag (dangerous) Would really need a kernel hack to do right. And .. would slow down traffic while you hold the "atomic lock". > - Seq counter in netlink, increased evertime a netlink message > gets processed and returned in ack. A netlink request may contain > a flag and the expected sequence number and the request gets only > processed if they match, otherwise the request fails. (my favourite) > - Lock file in userspace (how to enfroce everyone to use it?) > - Try to detect changes from third party after commit. Quite > hard but possible, reduces race window but doesn't close it > completely. > Other apps changing things will screw you. If that gets handled then we are set. I actually did start working on a netlink redirect(hook) for a very different reason, but it should serve this purpose. Essentially you register to be the proxy for netlink and all messages go via you. You can then munge them, etc before issuing the response or allowing it to go on to configure things. With this your "lock" would be to ask for certain things to be redirected to you during an update phase. Ok, maybe i will put more effort on it over the weekend (Sunday). > Background: I keep 2 caches, 1 cache represents the current state > in the kernel, it gets updated when required. The second cache > contains the local caches. The first cache gets merged into the > second before the update and then gets commited. In case of a > failure the first cache is used to restore things. > Looks like the right way forward. cheers, jamal ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-01-20 14:42 ` jamal @ 2005-01-20 15:35 ` Thomas Graf 2005-01-20 17:06 ` Stephen Hemminger 2005-01-24 14:13 ` jamal 0 siblings, 2 replies; 44+ messages in thread From: Thomas Graf @ 2005-01-20 15:35 UTC (permalink / raw) To: jamal; +Cc: Patrick McHardy, Stephen Hemminger, netdev, Werner Almesberger * jamal <1106232168.1041.125.camel@jzny.localdomain> 2005-01-20 09:42 > I like it. Assuming we can have arbitrary hierachies; you just show one > level - but that may be just the example at hand. Given that should be > able to meet the layout requirements that Lennert alluded to earlier. It doesn't include any context code, the BNF: PARSER := TOPNODE* TOPNODE := NODELIST DESC LONG_DESC NODELIST := NODE* NODE := DESC [ NODELIST ] [ ARGUMENT ] [ ATTRS ] [ END_POINT ] END_POINT := possible end of command ATTRS := ATTR* ATTR := KEY [ VALUE ] ARGUMENT := VALUE [ DESC ] Not sure if this helps, I attached a complete module below. > > The status of the whole thing: link and neighbour are finished, > > core architecture finished as well, route is half done, addresses > > are half done (both easy to finish). libnl has net/sched/ > > finished but is still missing code for a lot of modules. > > This is the part i am a little uncomfortable with. If you can make that > library maybe part of iproute2 it would ease maintanance. Extend > libnetlink or have another layer on top of it. > I know you have already put the effort, but consider this thought. We can move it into iproute2 but the code really differs from iproute2 and code sharing is almost impossible. We can make iproute2 use it at some point but that doesn't make much sense for me. > > - Seq counter in netlink, increased evertime a netlink message > > gets processed and returned in ack. A netlink request may contain > > a flag and the expected sequence number and the request gets only > > processed if they match, otherwise the request fails. (my favourite) Do you have any objections on this? > > - Lock file in userspace (how to enfroce everyone to use it?) > > - Try to detect changes from third party after commit. Quite > > hard but possible, reduces race window but doesn't close it > > completely. > > > > Other apps changing things will screw you. If that gets handled then we > are set. I actually did start working on a netlink redirect(hook) for a > very different reason, but it should serve this purpose. Essentially you > register to be the proxy for netlink and all messages go via you. You > can then munge them, etc before issuing the response or allowing it to > go on to configure things. With this your "lock" would be to ask for > certain things to be redirected to you during an update phase. > Ok, maybe i will put more effort on it over the weekend (Sunday). Indeed, that would serve me well and we can avoid the userspace daemon. It doesn't even have to be a proxy, a simple callback hook capable of returning an action would be enough for my purpose. NOTE: Read bottom-up: /* * neigh.c linux net config utility * * $Id$ * * Copyright (c) 2004 Thomas Graf <tgraf@suug.ch> * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ #include <nc/config.h> #include <nc/parse.h> #include <nc/utils.h> #include <nc/link.h> static struct nl_cache neigh_cache = RTNL_INIT_NEIGH_CACHE(); static int dump_type = NL_DUMP_BRIEF; static struct rtnl_neigh filter = RTNL_INIT_NEIGH(); static struct { char *lladdr, *dev, *dst, *proxy, *router, *incomplete; char *reachable, *stale, *delay, *probe, *failed, *noarp; char *perm; } storage; static int set_dump_type(struct grammar_node *g) { dump_type = (int) g->gn_data; return 0; } static int set_dev(struct grammar_node *g) { int err; struct nl_cache *c = nl_cache_lookup(RTNL_LINK); BUG_ON(!g); err = update_link_cache(); if (err < 0) return err; return rtnl_neigh_set_ifindex_name(&filter, c, gr_arg_val(g)); } static int set_lladdr(struct grammar_node *g) { BUG_ON(!g); return rtnl_neigh_set_lladdr(&filter, gr_arg_val(g)); } static int set_dst(struct grammar_node *g) { BUG_ON(!g); return rtnl_neigh_set_dst(&filter, gr_arg_val(g)); } static int set_state2(struct grammar_node *g) { BUG_ON(!g); rtnl_neigh_set_state(&filter, (int) g->gn_data); return 0; } static int set_state(struct grammar_node *g) { BUG_ON(!g); if (gr_is_enabled(g)) rtnl_neigh_set_state(&filter, (int) g->gn_data); else if (gr_is_disabled(g)) rtnl_neigh_unset_state(&filter, (int) g->gn_data); else { put_err("Invalid toggle value '%s', must be {on|off}\n", gr_arg_val(g)); return -1; } return 0; } static int set_flag(struct grammar_node *g) { BUG_ON(!g); if (gr_is_enabled(g)) rtnl_neigh_set_flag(&filter, (int) g->gn_data); else if (gr_is_disabled(g)) rtnl_neigh_unset_flag(&filter, (int) g->gn_data); else { put_err("Invalid toggle value '%s', must be {on|off}\n", gr_arg_val(g)); return -1; } return 0; } static inline struct rtnl_neigh * get_neigh(int i) { return (struct rtnl_neigh *) nl_cache_get(&neigh_cache, i); } CACHE_MGEN(lladdr, &nlh_route, &neigh_cache) { return nl_addr2str_r(&(get_neigh(i)->n_lladdr), buf, len); } CACHE_MGEN(dst, &nlh_route, &neigh_cache) { return nl_addr2str_r(&(get_neigh(i)->n_dst), buf, len); } CACHE_MGEN(ifname_dst, &nlh_route, &neigh_cache) { struct nl_cache *c = nl_cache_lookup(RTNL_LINK); struct rtnl_neigh *n = get_neigh(i); if (update_link_cache() < 0) return NULL; if (storage.dst) { struct nl_addr f; if (nl_str2addr(storage.dst, &f) < 0) goto fallback; if (n->n_dst.a_len == f.a_len && !memcmp(n->n_dst.a_addr, f.a_addr, n->n_dst.a_len)) return (char *) rtnl_link_i2name(c, n->n_ifindex); else return NULL; } fallback: return (char *) rtnl_link_i2name(c, n->n_ifindex); } static inline void reset_filter(void) { memset(&filter, 0, sizeof(filter)); } static int update_neigh_cache(void) { if (nl_cache_update(&nlh_route, &neigh_cache) < 0) { put_err("%s\n", nl_geterror()); return -1; } return 0; } static int do_neigh_list(struct grammar_node *g) { int err; BUG_ON(!g); err = update_link_cache(); if (err < 0) goto out; err = update_neigh_cache(); if (err < 0) goto out; nl_cache_dump_filter(dump_type, &neigh_cache, (struct nl_common *) &filter, fd_out); err = 0; out: dump_type = NL_DUMP_BRIEF; reset_filter(); return err; } static int do_neigh_add(struct grammar_node *g) { int err = -1; BUG_ON(!g); filter.n_family = filter.n_lladdr.a_family; err = rtnl_neigh_set_dst(&filter, gr_arg_val(g)); if (err < 0) goto out; err = rtnl_neigh_add(&nlh_route, &filter); if (err < 0) goto out; err = 0; out: reset_filter(); return err; } static int do_neigh_del(struct grammar_node *g) { int err = -1; BUG_ON(!g); err = rtnl_neigh_set_dst(&filter, gr_arg_val(g)); if (err < 0) goto out; err = rtnl_neigh_delete(&nlh_route, &filter); if (err < 0) goto out; err = 0; out: reset_filter(); return err; } static int do_neigh_modify(struct grammar_node *g) { int err; BUG_ON(!g); if (filter.n_mask == 0) return 0; err = rtnl_neigh_set_dst(&filter, gr_arg_val(g)); if (err < 0) goto out; err = rtnl_neigh_change(&nlh_route, &filter, &filter); if (err < 0) goto out; err = 0; out: reset_filter(); return err; } ATTRLIST(neigh_flags_attrs) ATTR_FLAG(proxy, NTF_PROXY, &storage.dst, set_flag, "Proxy") ATTR_FLAG(router, NTF_ROUTER, &storage.router, set_flag, "Router") ATTR_FLAG(incomplete, NUD_INCOMPLETE, &storage.incomplete, set_state, "Lookup is incomplete") ATTR_FLAG(reachable, NUD_REACHABLE, &storage.reachable, set_state, "Reachable") ATTR_FLAG(stale, NUD_STALE, &storage.stale, set_state, "Stale entry") ATTR_FLAG(delay, NUD_DELAY, &storage.delay, set_state, "Delayed") ATTR_FLAG(probe, NUD_PROBE, &storage.probe, set_state, "Probe") ATTR_FLAG(failed, NUD_FAILED, &storage.failed, set_state, "Failed") ATTR_FLAG(noarp, NUD_NOARP, &storage.noarp, set_state, "No ARP") ATTR_FLAG(permanent, NUD_PERMANENT, &storage.perm, set_state, "Permanent entry") END_ATTRLIST NODELIST(neigh_flags) END_POINT NODE(flags) ATTRS(neigh_flags_attrs) END_NODE END_NODELIST ATTRLIST(neigh_filter) ATTR(lladdr) CALLBACK(set_lladdr) ARG(GA_TEXT, &storage.lladdr,CACHE_MGEN_FUNC(lladdr),"<LLADDR>") DESC("Link layer address") END_ATTR ATTR(dst) CALLBACK(set_dst) ARG(GA_TEXT, &storage.dst, CACHE_MGEN_FUNC(dst), "<ADDR>") DESC("Destination address") END_ATTR ATTR(dev) CALLBACK(set_dev) ARG(GA_TEXT, &storage.dev, CACHE_MGEN_FUNC(ifname), "<DEV>") DESC("Link the neighbour is on") END_ATTR END_ATTRLIST NODELIST(neigh_where) END_POINT NODE(where) ATTRS(neigh_filter) FOLLOW(neigh_flags) DESC("Only dump neighbours matching a filter") END_NODE END_NODELIST NODELIST(neigh_list_attrs) END_POINT NODE(brief) DATA(NL_DUMP_BRIEF) FOLLOW(neigh_where) CALLBACK(set_dump_type) DESC("Brief listing of attributes") END_NODE NODE(full) DATA(NL_DUMP_FULL) FOLLOW(neigh_where) CALLBACK(set_dump_type) DESC("Verbose listing (all attributes)") END_NODE NODE(stats) DATA(NL_DUMP_STATS) FOLLOW(neigh_where) CALLBACK(set_dump_type) DESC("Verbose listing (all attributes/statistics)") END_NODE NODE(where) ATTRS(neigh_filter) FOLLOW(neigh_flags) DESC("Only dump neighbours matching a filter") END_NODE END_NODELIST NODELIST(neigh_add_state) NODE(permanent) CALLBACK(set_state2) DATA(NUD_PERMANENT) DESC("Permanent entry") END_NODE NODE(stale) CALLBACK(set_state2) DATA(NUD_STALE) DESC("Stale entry") END_NODE NODE(noarp) CALLBACK(set_state2) DATA(NUD_NOARP) DESC("No ARP") END_NODE NODE(reachable) CALLBACK(set_state2) DATA(NUD_REACHABLE) DESC("Reachable") END_NODE NODE(failed) CALLBACK(set_state2) DATA(NUD_FAILED) DESC("Failed") END_NODE END_NODELIST NODELIST(neigh_add_lladdr) NODE(lladdr) CALLBACK(set_lladdr) FOLLOW(neigh_add_state) ARG(GA_TEXT, &storage.lladdr,CACHE_MGEN_FUNC(lladdr),"<LLADDR>") DESC("Link layer address") END_NODE END_NODELIST NODELIST(neigh_add_dev) NODE(dev) CALLBACK(set_dev) FOLLOW(neigh_add_lladdr) ARG(GA_TEXT, &storage.dev, CACHE_MGEN_FUNC(ifname), "<DEV>") DESC("Link") END_NODE END_NODELIST NODELIST(neigh_del_dev) NODE(dev) CALLBACK(set_dev) ARG(GA_TEXT, &storage.dev, CACHE_MGEN_FUNC(ifname_dst), "<DEV>") DESC("Link") END_NODE END_NODELIST ATTRLIST(neigh_set_attrs) ATTR(lladdr) CALLBACK(set_lladdr) ARG(GA_TEXT, &storage.lladdr,CACHE_MGEN_FUNC(lladdr),"<LLADDR>") DESC("Link layer address") END_ATTR ATTR_FLAG(proxy, NTF_PROXY, &storage.proxy, set_flag, "Proxy") ATTR_FLAG(router, NTF_ROUTER, &storage.router, set_flag, "Router") ATTR_FLAG(incomplete, NUD_INCOMPLETE, &storage.incomplete, set_state, "Incomplete lookup") ATTR_FLAG(reachable, NUD_REACHABLE, &storage.reachable, set_state, "Reachable") ATTR_FLAG(stale, NUD_STALE, &storage.stale, set_state, "Stale entry") ATTR_FLAG(delay, NUD_DELAY, &storage.delay, set_state, "Delayed") ATTR_FLAG(probe, NUD_PROBE, &storage.probe, set_state, "Probe") ATTR_FLAG(failed, NUD_FAILED, &storage.failed, set_state, "Failed") ATTR_FLAG(noarp, NUD_NOARP, &storage.noarp, set_state, "No ARP") ATTR_FLAG(permanent, NUD_PERMANENT, &storage.perm, set_state, "Permanent entry") END_ATTRLIST NODELIST(neigh_modify) NODE(set) ATTRS(neigh_set_attrs) END_NODE END_NODELIST NODELIST(neigh_modify_dev) NODE(dev) CALLBACK(set_dev) FOLLOW(neigh_modify) ARG(GA_TEXT, &storage.dev, CACHE_MGEN_FUNC(ifname_dst), "<DEV>") DESC("Link") END_NODE END_NODELIST NODELIST(neigh_ops) NODE(add) FOLLOW(neigh_add_dev) CALLBACK(do_neigh_add) ARG(GA_TEXT, &storage.dst, CACHE_MGEN_FUNC(dst), "<ADDR>") DESC("Add a neighbour") END_NODE NODE(modify) FOLLOW(neigh_modify_dev) CALLBACK(do_neigh_modify) ARG(GA_TEXT, &storage.dst, CACHE_MGEN_FUNC(dst), "<ADDR>") DESC("Modify a neighbour") END_NODE NODE(delete) FOLLOW(neigh_del_dev) CALLBACK(do_neigh_del) ARG(GA_TEXT, &storage.dst, CACHE_MGEN_FUNC(dst), "<ADDR>") DESC("Delete a neighbour") END_NODE NODE(list) FOLLOW(neigh_list_attrs) CALLBACK(do_neigh_list) DESC("List neighbour attributes") END_NODE END_NODELIST TOPNODE(ng, neighbour) FOLLOW(neigh_ops) DESC("Neighbour (ARP) configuration") LONG_DESC( " Module to view and modify the neighbour tables.\n" " \n" \ " The neighbour table establishes bindings between protocol\n" \ " addresses and link layer addresses for hosts sharing the same\n" \ " physical link. This module allows you to view the content of\n" \ " these tables and to manipulate their content.\n") END_TOPNODE static void __init neigh_init(void) { MAKE_LIST(neigh_ops); MAKE_LIST(neigh_list_attrs); MAKE_LIST(neigh_where); MAKE_LIST(neigh_filter); MAKE_LIST(neigh_add_dev); MAKE_LIST(neigh_add_lladdr); MAKE_LIST(neigh_add_state); MAKE_LIST(neigh_del_dev); MAKE_LIST(neigh_modify_dev); MAKE_LIST(neigh_modify); MAKE_LIST(neigh_set_attrs); MAKE_LIST(neigh_flags); MAKE_LIST(neigh_flags_attrs); register_top_node(&ng); } ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-01-20 15:35 ` Thomas Graf @ 2005-01-20 17:06 ` Stephen Hemminger 2005-01-20 17:19 ` Thomas Graf 2005-01-24 14:13 ` jamal 1 sibling, 1 reply; 44+ messages in thread From: Stephen Hemminger @ 2005-01-20 17:06 UTC (permalink / raw) To: Thomas Graf; +Cc: jamal, Patrick McHardy, netdev, Werner Almesberger On Thu, 20 Jan 2005 16:35:59 +0100 Thomas Graf <tgraf@suug.ch> wrote: > * jamal <1106232168.1041.125.camel@jzny.localdomain> 2005-01-20 09:42 > > I like it. Assuming we can have arbitrary hierachies; you just show one > > level - but that may be just the example at hand. Given that should be > > able to meet the layout requirements that Lennert alluded to earlier. > > It doesn't include any context code, the BNF: > > PARSER := TOPNODE* > TOPNODE := NODELIST DESC LONG_DESC > NODELIST := NODE* > NODE := DESC [ NODELIST ] [ ARGUMENT ] [ ATTRS ] [ END_POINT ] > END_POINT := possible end of command > ATTRS := ATTR* > ATTR := KEY [ VALUE ] > ARGUMENT := VALUE [ DESC ] > > Not sure if this helps, I attached a complete module below. > Go for it! A couple additional suggestions. It would be great to get a useful API to for 'tc' that is one step above actual low level netlink stuff. And it would be great to reuse some existing scripting language grammar and parsing library infrastructure. Don't feel constrained to C on this. If using C++ or even something like phython or ruby would be better go ahead; but please no Perl. -- Stephen Hemminger <shemminger@osdl.org> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-01-20 17:06 ` Stephen Hemminger @ 2005-01-20 17:19 ` Thomas Graf 0 siblings, 0 replies; 44+ messages in thread From: Thomas Graf @ 2005-01-20 17:19 UTC (permalink / raw) To: Stephen Hemminger; +Cc: jamal, Patrick McHardy, netdev, Werner Almesberger * Stephen Hemminger <20050120090628.29205d59@dxpl.pdx.osdl.net> 2005-01-20 09:06 > On Thu, 20 Jan 2005 16:35:59 +0100 > Thomas Graf <tgraf@suug.ch> wrote: > > > * jamal <1106232168.1041.125.camel@jzny.localdomain> 2005-01-20 09:42 > > > I like it. Assuming we can have arbitrary hierachies; you just show one > > > level - but that may be just the example at hand. Given that should be > > > able to meet the layout requirements that Lennert alluded to earlier. > > > > It doesn't include any context code, the BNF: > > > > PARSER := TOPNODE* > > TOPNODE := NODELIST DESC LONG_DESC > > NODELIST := NODE* > > NODE := DESC [ NODELIST ] [ ARGUMENT ] [ ATTRS ] [ END_POINT ] > > END_POINT := possible end of command > > ATTRS := ATTR* > > ATTR := KEY [ VALUE ] > > ARGUMENT := VALUE [ DESC ] > > > > Not sure if this helps, I attached a complete module below. > > > > A couple additional suggestions. It would be great to get a useful API > to for 'tc' that is one step above actual low level netlink stuff. Planned. Trying to reuse an existing grammar but didn't found that suits well enough yet. > And it would be great to reuse some existing scripting language grammar > and parsing library infrastructure. Tried very hard to do so. I'd really like to build upon readline and its completion method but most parser generators are not made to get along with readline + completion. A c++ hack exist but doesn't really work with completion. That's why I wrote my own grammar definition thing. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-01-20 15:35 ` Thomas Graf 2005-01-20 17:06 ` Stephen Hemminger @ 2005-01-24 14:13 ` jamal 2005-01-24 15:06 ` Thomas Graf 1 sibling, 1 reply; 44+ messages in thread From: jamal @ 2005-01-24 14:13 UTC (permalink / raw) To: Thomas Graf Cc: Patrick McHardy, Stephen Hemminger, netdev, Werner Almesberger On Thu, 2005-01-20 at 10:35, Thomas Graf wrote: > * jamal <1106232168.1041.125.camel@jzny.localdomain> 2005-01-20 09:42 > > I like it. Assuming we can have arbitrary hierachies; you just show one > > level - but that may be just the example at hand. Given that should be > > able to meet the layout requirements that Lennert alluded to earlier. > > It doesn't include any context code, the BNF: > > PARSER := TOPNODE* > TOPNODE := NODELIST DESC LONG_DESC > NODELIST := NODE* > NODE := DESC [ NODELIST ] [ ARGUMENT ] [ ATTRS ] [ END_POINT ] > END_POINT := possible end of command > ATTRS := ATTR* > ATTR := KEY [ VALUE ] > ARGUMENT := VALUE [ DESC ] > > Not sure if this helps, I attached a complete module below. Theres a few holes in the BNF, but i dont think that matters that much. When the time comes it can be cleaned up. Should be noted that all iproute2 apps already have a very concise BNF. > > This is the part i am a little uncomfortable with. If you can make that > > library maybe part of iproute2 it would ease maintanance. Extend > > libnetlink or have another layer on top of it. > > I know you have already put the effort, but consider this thought. > > We can move it into iproute2 but the code really differs from iproute2 > and code sharing is almost impossible. We can make iproute2 use it > at some point but that doesn't make much sense for me. > I am not sure which path would be better .. > > > > - Seq counter in netlink, increased evertime a netlink message > > > gets processed and returned in ack. A netlink request may contain > > > a flag and the expected sequence number and the request gets only > > > processed if they match, otherwise the request fails. (my favourite) > > Do you have any objections on this? A seq number in netlink is infact a transaction identifier (as opposed to a message identifier). We also have a window of 1 for a very good reason - simplicty. If what you are saying is we muck around seq numbers, i think its a bad idea. > Indeed, that would serve me well and we can avoid the userspace daemon. > It doesn't even have to be a proxy, a simple callback hook capable of > returning an action would be enough for my purpose. > Sorry did not get an iota of time to work more on this over the weekend (kid decided to own me over the weekend). One thing to note is if you cant have multuiple apps requesting for RTM_XXXACTION redirects i.e you can only have one. And if this one app disapears, it is a DOS. So a daemon may be necessary just so we can check from the kernel if the app is still alive (after some timeout) and if not we can cleanup state for other apps to reuse the redirect. > NOTE: Read bottom-up: Looks very good. My thoughts now are you need to build on top of libnetlink - another library. Example, to administratively bring up a netdevice, one would call something like admin_up("eth0"); This is not to say you cant build a competing library to libnetlink, i am just not sure it is worth the effort of having two competing libraries doing almost the same thing (that need maintanance). cheers, jamal ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-01-24 14:13 ` jamal @ 2005-01-24 15:06 ` Thomas Graf 2005-01-26 13:48 ` jamal 0 siblings, 1 reply; 44+ messages in thread From: Thomas Graf @ 2005-01-24 15:06 UTC (permalink / raw) To: jamal; +Cc: Patrick McHardy, Stephen Hemminger, netdev, Werner Almesberger * jamal <1106576005.1652.1292.camel@jzny.localdomain> 2005-01-24 09:13 > On Thu, 2005-01-20 at 10:35, Thomas Graf wrote: > > > > - Seq counter in netlink, increased evertime a netlink message > > > > gets processed and returned in ack. A netlink request may contain > > > > a flag and the expected sequence number and the request gets only > > > > processed if they match, otherwise the request fails. (my favourite) > > > > Do you have any objections on this? > > A seq number in netlink is infact a transaction identifier (as opposed > to a message identifier). We also have a window of 1 for a very good > reason - simplicty. > If what you are saying is we muck around seq numbers, i think its a bad > idea. I'm not talking of the nlmsg_seq but rather a a sequence number with global or nl_family scope. It gets increased whenever a netlink message of that family is processed and is returned with the ack. If a userspace application wants to enforce atomicy between two requests which cannot be batched because a answer is expected in between then it could provide the expected sequence number and the request is only fullfilled if this is true. Example: --> RTM_NEWLINK <-- answer <-- ACK (seq = 222) --> RTM_SETLINK (expect = 222) <-- ACK Now if another netlink app interfers: --> RTM_NEWLINK <-- answer <-- ACK (seq = 222) -- other app -- --> RTM_SETLINK <-- ACK (seq = 223) -- back to first app -- --> RTM_SETLINK (expect = 222) <-- ERROR The application can then retry it's operation a few times and finally give up. The main problem I see is to extend nlmsghdr in a way it stays compatible. > My thoughts now are you need to build on top of libnetlink - another > library. Example, to administratively bring up a netdevice, one would > call something like > > admin_up("eth0"); > > This is not to say you cant build a competing library to libnetlink, i > am just not sure it is worth the effort of having two competing > libraries doing almost the same thing (that need maintanance). I think it is, the feedback is overwhelming and people are already contributing to support more netlink users. As I said, 95% of the functionality is in iproute2 itself and not libnetlink. It is vital to have some kind of library to abstract the low level netlink functionality in a simple form to other applications. Currently it's quite hard to for example access the tc class tree, one can use libnetlink to do the request and parse the answer but everyone needs to write their own TLV parsing routines. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-01-24 15:06 ` Thomas Graf @ 2005-01-26 13:48 ` jamal 2005-01-26 14:35 ` Thomas Graf 2005-02-11 15:07 ` Dan Siemon 0 siblings, 2 replies; 44+ messages in thread From: jamal @ 2005-01-26 13:48 UTC (permalink / raw) To: Thomas Graf Cc: Patrick McHardy, Stephen Hemminger, netdev, Werner Almesberger On Mon, 2005-01-24 at 10:06, Thomas Graf wrote: > I'm not talking of the nlmsg_seq but rather a a sequence number with > global or nl_family scope. It gets increased whenever a netlink > message of that family is processed and is returned with the ack. If > a userspace application wants to enforce atomicy between two requests > which cannot be batched because a answer is expected in between then > it could provide the expected sequence number and the request is only > fullfilled if this is true. Example: > > --> RTM_NEWLINK > <-- answer > <-- ACK (seq = 222) > --> RTM_SETLINK (expect = 222) > <-- ACK > > Now if another netlink app interfers: > > --> RTM_NEWLINK > <-- answer > <-- ACK (seq = 222) > > -- other app -- > --> RTM_SETLINK > <-- ACK (seq = 223) > > -- back to first app -- > --> RTM_SETLINK (expect = 222) > <-- ERROR > > The application can then retry it's operation a few times and > finally give up. The main problem I see is to extend nlmsghdr > in a way it stays compatible. The best thing you could get out of this is a warning that something changed under you i.e doesnt really solve the synchronization issue. [And a lot more complexity is introduced - if you say you want to change the netlink header and maintain state in the kernel]. > > My thoughts now are you need to build on top of libnetlink - another > > library. Example, to administratively bring up a netdevice, one would > > call something like > > > > admin_up("eth0"); > > > > This is not to say you cant build a competing library to libnetlink, i > > am just not sure it is worth the effort of having two competing > > libraries doing almost the same thing (that need maintanance). > > I think it is, the feedback is overwhelming and people are already > contributing to support more netlink users. As I said, 95% of the > functionality is in iproute2 itself and not libnetlink. It is vital > to have some kind of library to abstract the low level netlink > functionality in a simple form to other applications. Currently it's > quite hard to for example access the tc class tree, one can use > libnetlink to do the request and parse the answer but everyone needs > to write their own TLV parsing routines. Your call really - you are the one who is going to maintain it;-> As for ease of use and avoiding users from knowing details of how tlvs are put together etc - i think it doesnt matter how thats done underneath the hood; it is still doable on top of current libnetlink. In other words whats required, IMO, is something that hides netlink totaly so that the programmer/user doesnt even get to see TLVs. cheers, jamal ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-01-26 13:48 ` jamal @ 2005-01-26 14:35 ` Thomas Graf 2005-02-11 15:07 ` Dan Siemon 1 sibling, 0 replies; 44+ messages in thread From: Thomas Graf @ 2005-01-26 14:35 UTC (permalink / raw) To: jamal; +Cc: Patrick McHardy, Stephen Hemminger, netdev, Werner Almesberger * jamal <1106747313.1107.7.camel@jzny.localdomain> 2005-01-26 08:48 > On Mon, 2005-01-24 at 10:06, Thomas Graf wrote: > > > I'm not talking of the nlmsg_seq but rather a a sequence number with > > global or nl_family scope. It gets increased whenever a netlink > > message of that family is processed and is returned with the ack. If > > a userspace application wants to enforce atomicy between two requests > > which cannot be batched because a answer is expected in between then > > it could provide the expected sequence number and the request is only > > fullfilled if this is true. Example: > > > > --> RTM_NEWLINK > > <-- answer > > <-- ACK (seq = 222) > > --> RTM_SETLINK (expect = 222) > > <-- ACK > > > > Now if another netlink app interfers: > > > > --> RTM_NEWLINK > > <-- answer > > <-- ACK (seq = 222) > > > > -- other app -- > > --> RTM_SETLINK > > <-- ACK (seq = 223) > > > > -- back to first app -- > > --> RTM_SETLINK (expect = 222) > > <-- ERROR > > > > The application can then retry it's operation a few times and > > finally give up. The main problem I see is to extend nlmsghdr > > in a way it stays compatible. > > The best thing you could get out of this is a warning that something > changed under you i.e doesnt really solve the synchronization issue. Why? If we do the check with regard to the rtnl sem we can guarantee atomicity. The comparison of the expected seq and the current seq must be done before any action and within the rtnl semaphore. It is very unlikely that someone interfers so strict locking is pretty inefficient. rtnl_send_atomic(msg, expect_seq) retries := 10; retry: res := send_msg(msg, expect_seq); if res = -ERETRY and --retries then goto retry; endif if retries = 0 then err "Timeout while trying to achieve atomic operation" endif and in the kernel: rtnl_lock(); if expect_seq != seq then rtnl_unlock() return -ERETRY; endif ... atomic action can take place here ... Of course this only works if netlink requests itself are synchronized in the relevant netlink family. > [And a lot more complexity is introduced - if you say you want to change > the netlink header and maintain state in the kernel]. This is the big problem, there is no padding gap common to all rtnl users. What we can do is to set a flag in nlmsghdr stating that a u32 block of data follows the nlmsg header before the netlink user specific header, i.e. +---------------------------------+ | nlmsghdr flags |= NLM_F_EXP_SEQ | +---------------------------------+ | expected_seq (u32) | +---------------------------------+ | netlink user specific data | +---------------------------------+ I'd even go one step further and define a header options chain like in IPv6 so we can add more header attributes later on, like: +--------------------------------+ | nlmsghdr flags |= NLM_F_OPTS | +--------------------------------+ | size=4, type=expt_seq, next=0 | +- - - - - - - - - - - - - - - -+ | expected sequence | +--------------------------------+ | netlink user specific data | +--------------------------------+ Thoughts? > Your call really - you are the one who is going to maintain it;-> > As for ease of use and avoiding users from knowing details of how > tlvs are put together etc - i think it doesnt matter how thats done > underneath the hood; it is still doable on top of current libnetlink. In > other words whats required, IMO, is something that hides netlink totaly > so that the programmer/user doesnt even get to see TLVs. Agreed, I even hide the structs exported to usersapce to avoid breakage, i.e. i don't export tc_stats directly for example. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-01-26 13:48 ` jamal 2005-01-26 14:35 ` Thomas Graf @ 2005-02-11 15:07 ` Dan Siemon 2005-02-12 13:45 ` jamal 1 sibling, 1 reply; 44+ messages in thread From: Dan Siemon @ 2005-02-11 15:07 UTC (permalink / raw) To: netdev; +Cc: hadi On Wed, 2005-26-01 at 08:48 -0500, jamal wrote: > Your call really - you are the one who is going to maintain it;-> > As for ease of use and avoiding users from knowing details of how > tlvs are put together etc - i think it doesnt matter how thats done > underneath the hood; it is still doable on top of current libnetlink. In > other words whats required, IMO, is something that hides netlink totaly > so that the programmer/user doesnt even get to see TLVs. (Sorry to join this thread so late.) I'd like to make a little plug for my Linux QoS Library (LQL) [1] project. LQL provides an abstraction of the kernel QoS features. Full API documentation is available from the website. I already have working bindings for one high level language (C# on Mono) [2]. Creating bindings for Python, Perl etc should be quite easy. Someone recently emailed me about starting work on Perl bindings. There is still lots of work to do. Support for more QDiscs and classifiers is needed and the socket handling code needs a rewrite to better handle errors. Work continues and these things will be fixed. [1] - http://www.coverfire.com/lql/ [2] - http://www.coverfire.com/lql-sharp/ -- OpenPGP key: http://www.coverfire.com/files/pubkey.txt Key fingerprint: FB0A 2D8A A1E9 11B6 6CA3 0C53 742A 9EA8 891C BD98 ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-02-11 15:07 ` Dan Siemon @ 2005-02-12 13:45 ` jamal 2005-02-12 14:29 ` Thomas Graf 2005-02-12 22:07 ` Dan Siemon 0 siblings, 2 replies; 44+ messages in thread From: jamal @ 2005-02-12 13:45 UTC (permalink / raw) To: Dan Siemon; +Cc: netdev On first impression, this looks very nice - I think you got the object hierachy figured etc; i will look closely later. What would be really interesting is to see (gulp) a SOAP/xml interface on top of this. Is this something you can do with those "bindings"? It seems to me from a library perspective, youa nd Thomas may have to sync. And restricting to just QoS maybe be a limitation (netlink is the friend you are looking for; so from networking perspective, give them GUI clikers ways to set routes, static IPSEC SAs etc). Oh and it would be interesting to have events (see that link come up down etc) cheers, jamal On Fri, 2005-02-11 at 10:07, Dan Siemon wrote: > On Wed, 2005-26-01 at 08:48 -0500, jamal wrote: > > Your call really - you are the one who is going to maintain it;-> > > As for ease of use and avoiding users from knowing details of how > > tlvs are put together etc - i think it doesnt matter how thats done > > underneath the hood; it is still doable on top of current libnetlink. In > > other words whats required, IMO, is something that hides netlink totaly > > so that the programmer/user doesnt even get to see TLVs. > > (Sorry to join this thread so late.) > > I'd like to make a little plug for my Linux QoS Library (LQL) [1] > project. LQL provides an abstraction of the kernel QoS features. Full > API documentation is available from the website. > > I already have working bindings for one high level language (C# on Mono) > [2]. Creating bindings for Python, Perl etc should be quite easy. > Someone recently emailed me about starting work on Perl bindings. > > There is still lots of work to do. Support for more QDiscs and > classifiers is needed and the socket handling code needs a rewrite to > better handle errors. Work continues and these things will be fixed. > > [1] - http://www.coverfire.com/lql/ > [2] - http://www.coverfire.com/lql-sharp/ ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-02-12 13:45 ` jamal @ 2005-02-12 14:29 ` Thomas Graf 2005-02-12 22:07 ` Dan Siemon 1 sibling, 0 replies; 44+ messages in thread From: Thomas Graf @ 2005-02-12 14:29 UTC (permalink / raw) To: jamal; +Cc: Dan Siemon, netdev > > I'd like to make a little plug for my Linux QoS Library (LQL) [1] > > project. LQL provides an abstraction of the kernel QoS features. Full > > API documentation is available from the website. I've been looking at this before and I do like your approach. The license prevents me from really using it buts that's not your problem. What I really like about it are the bindings to other languages, that's definitely a good thing. * jamal <1108215923.1126.132.camel@jzny.localdomain> 2005-02-12 08:45 > What would be really interesting is to see (gulp) a SOAP/xml interface > on top of this. Is this something you can do with those "bindings"? Indeed, an xml interface would be really nice to have. I've been implementing some xml bits for links and neighbour in libnl so far but i'm not yet sure about the exact format until I'm sure there is no existing format that would fit us. > It seems to me from a library perspective, youa nd Thomas may have to > sync. Sure, I'm willing to sync as long as the result stays LGPL. The architectures are quite different so I think the only code we can share or convert are the actual qdisc/class/filter modules to parse the netlink messages and do various translations of types and flags etc. > And restricting to just QoS maybe be a limitation (netlink is the > friend you are looking for; so from networking perspective, give them > GUI clikers ways to set routes, static IPSEC SAs etc). Oh and it would > be interesting to have events (see that link come up down etc) Although it's possible to receive multicast group messages I haven't added a specific API into it and you have to do the demuxing of the various messages types yourself for now (via valid message callback). Maybe a quick status report about what's done: - core netlink API: 80%, lacks better support for non-blocking sockets and for multicast grouping messages - link: 100% - neighbour: 100% - route: 70%, msg parser and attribte setting done, still lacks a good dumping procedure and message building - address: partial patch received which works more or less, someone has started working on it again - rule: 0% - tc 50% still lacks implementations for various qdiscs and classifiers I haven't touched any other netlink users such as xfrm but those should be easier to do actually, rtnetlink is the biggest ;-> I've been spending some time on documenting the existing API and about 40% is done by now. You can have a look at the current progress of the documentation, the neighbour and link documentation is nearly finished and gives you the best impression. http://people.suug.ch/~tgr/libnl/doc/modules.html ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-02-12 13:45 ` jamal 2005-02-12 14:29 ` Thomas Graf @ 2005-02-12 22:07 ` Dan Siemon 2005-02-12 22:32 ` Thomas Graf 1 sibling, 1 reply; 44+ messages in thread From: Dan Siemon @ 2005-02-12 22:07 UTC (permalink / raw) To: hadi; +Cc: netdev On Sat, 2005-12-02 at 08:45 -0500, jamal wrote: > On first impression, this looks very nice - I think you got the object > hierachy figured etc; i will look closely later. > What would be really interesting is to see (gulp) a SOAP/xml interface > on top of this. Is this something you can do with those "bindings"? Yes, a SOAP/XML-RPC interface should be quite possible. This is one of the main reasons I went to the trouble of creating the Mono bindings. I need to create some sort of XML interface to LQL in the next few weeks. > It seems to me from a library perspective, youa nd Thomas may have to > sync. And restricting to just QoS maybe be a limitation (netlink is the > friend you are looking for; so from networking perspective, give them > GUI clikers ways to set routes, static IPSEC SAs etc). Oh and it would > be interesting to have events (see that link come up down etc) I named the project Linux QoS Library before I realized that interface parameters etc could be manipulated via Netlink. I have no intention of limiting LQL to just QoS features. Eventually I'll probably stop referring to it as Linux QoS Library and just use LQL. As for combining my work with Thomas, I'm certainly willing to discuss it. -- OpenPGP key: http://www.coverfire.com/files/pubkey.txt Key fingerprint: FB0A 2D8A A1E9 11B6 6CA3 0C53 742A 9EA8 891C BD98 ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-02-12 22:07 ` Dan Siemon @ 2005-02-12 22:32 ` Thomas Graf 2005-02-14 0:23 ` Dan Siemon 0 siblings, 1 reply; 44+ messages in thread From: Thomas Graf @ 2005-02-12 22:32 UTC (permalink / raw) To: Dan Siemon; +Cc: hadi, netdev * Dan Siemon <1108246033.7554.18.camel@ganymede> 2005-02-12 17:07 > On Sat, 2005-12-02 at 08:45 -0500, jamal wrote: > > On first impression, this looks very nice - I think you got the object > > hierachy figured etc; i will look closely later. > > What would be really interesting is to see (gulp) a SOAP/xml interface > > on top of this. Is this something you can do with those "bindings"? > > Yes, a SOAP/XML-RPC interface should be quite possible. This is one of > the main reasons I went to the trouble of creating the Mono bindings. I > need to create some sort of XML interface to LQL in the next few weeks. Before you go ahead, please consider its possible usages. If possible it should conform to an existing format allowing for distributed configuration of network nodes. If no such thing exist and you design your own format please consider the following requirements, because it would be sad if you waste effort that needs to be redone later on. - all components of the networking configuration must be configurable. This includes links, neighbours, routes, routing rules, traffic control but also configuration parameters currently only accessible via ethtool. - The whole interface must take care of the byte order issues. This is the most tricky part. - It must be possible to extend it without breaking backward compatibility. > As for combining my work with Thomas, I'm certainly willing to discuss > it. So let's discuss it, from what I can see your library only consists of basic netlink connection abilities and message parsers/builders on a per netlink user basis. You do not provide any ways to customize it, if the user of your library wants to send its own messages he's pretty much on its own because the whole process of constructing the message, sending it and waiting for the ack is hidden behind one single function. The per object API is quite similiar, you also let the user set the attributes to a object and then commit that object to the kernel. Honestly speaking, your API doesn't fit my needs and the changes to make it suiteable would be rather big so I'm not sure whether a merge of my code into yours would make much sense and the only that could be merged from your side to mine would be the additional support for two or three qdiscs such as htb. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-02-12 22:32 ` Thomas Graf @ 2005-02-14 0:23 ` Dan Siemon 2005-02-14 14:27 ` Thomas Graf 0 siblings, 1 reply; 44+ messages in thread From: Dan Siemon @ 2005-02-14 0:23 UTC (permalink / raw) To: Thomas Graf; +Cc: hadi, netdev On Sat, 2005-12-02 at 23:32 +0100, Thomas Graf wrote: > * Dan Siemon <1108246033.7554.18.camel@ganymede> 2005-02-12 17:07 > > Yes, a SOAP/XML-RPC interface should be quite possible. This is one of > > the main reasons I went to the trouble of creating the Mono bindings. I > > need to create some sort of XML interface to LQL in the next few weeks. > > Before you go ahead, please consider its possible usages. If possible > it should conform to an existing format allowing for distributed > configuration of network nodes. If no such thing exist and you design > your own format please consider the following requirements, because it > would be sad if you waste effort that needs to be redone later on. The initial implementation will be very specific to LQLs methods. I need this for a prototype application. > - The whole interface must take care of the byte order issues. This is > the most tricky part. I don't see how byte order issues are a problem when using SOAP. Example? > > As for combining my work with Thomas, I'm certainly willing to discuss > > it. > > So let's discuss it, from what I can see your library only consists of > basic netlink connection abilities and message parsers/builders on a > per netlink user basis. You do not provide any ways to customize it, > if the user of your library wants to send its own messages he's pretty > much on its own because the whole process of constructing the message, > sending it and waiting for the ack is hidden behind one single function. My main design goal for LQL is a nice C library for the existing QoS elements (and later link and friends). As such public functions that allow the user of the library to construct their own nlmsg packets is not my main interest. The functions in the LQL namespace attempt to hide all aspects of Netlink and the underlying communication to the kernel. However, I do have functions for manipulating raw messages. These functions are all in the NL namespace (nl.c and nl.h). They are quite purposely hidden from the public API documentation. Perhaps these functions should be documented publicly; although for 99% of the people using the library the last thing they want to do is build a netlink message. Examples: gboolean nl_tcpacket_add_rtattr(TcPacket *pkt, unsigned short type, unsigned short len, void *data); This function adds a new rtattr to the message. There is a similar function that adds a nested rtattr. nl_tcpacket_do_command() will send the message and wait for an ACK. Usage examples can be found in lql_qdisc_htb_helper.c. > Honestly speaking, your API doesn't fit my needs and the changes to > make it suiteable would be rather big so I'm not sure whether a merge > of my code into yours would make much sense and the only that could be > merged from your side to mine would be the additional support for two > or three qdiscs such as htb. I'm curious exactly what your needs are. It does appear you are aiming for a somewhat more low level library than I am. Whether or not that precludes some kind of merger I don't know. -- OpenPGP key: http://www.coverfire.com/files/pubkey.txt Key fingerprint: FB0A 2D8A A1E9 11B6 6CA3 0C53 742A 9EA8 891C BD98 ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-02-14 0:23 ` Dan Siemon @ 2005-02-14 14:27 ` Thomas Graf 2005-02-15 20:28 ` Dan Siemon 0 siblings, 1 reply; 44+ messages in thread From: Thomas Graf @ 2005-02-14 14:27 UTC (permalink / raw) To: Dan Siemon; +Cc: hadi, netdev > > - The whole interface must take care of the byte order issues. This is > > the most tricky part. > > I don't see how byte order issues are a problem when using SOAP. > Example? It depends on wehther your outline every qdisc/filter in the protocol. If you do so it's not a problem but you have to extend your protocol every time a new qdisc is introduced or an existing one changes. A generic partly binary based protocol will have byte order issues. My current idea, given I can't find an existing protocol, is to let every netlink user describe its own format with a generic grammar so the protocol can stay stable. One of the candidates is the netconf specification which basically does what we need but is still in early development. > I'm curious exactly what your needs are. Basically I need to be able to change the beavhiour of the message parser to for example overwrite the sequence number checking in order to do message multiplexing. It's not like I would be represenative though. > It does appear you are aiming for a somewhat more low level library than > I am. Whether or not that precludes some kind of merger I don't know. Yes, it seems so. It's a pitty that we waste effort by doing the same nearly work but I really need the low level API and the possibility to customize the parsing and sending code. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-02-14 14:27 ` Thomas Graf @ 2005-02-15 20:28 ` Dan Siemon 2005-02-15 20:47 ` Thomas Graf 0 siblings, 1 reply; 44+ messages in thread From: Dan Siemon @ 2005-02-15 20:28 UTC (permalink / raw) To: Thomas Graf; +Cc: hadi, netdev On Mon, 2005-14-02 at 15:27 +0100, Thomas Graf wrote: > > I'm curious exactly what your needs are. > > Basically I need to be able to change the beavhiour of the message > parser to for example overwrite the sequence number checking in order > to do message multiplexing. It's not like I would be represenative > though. > > > It does appear you are aiming for a somewhat more low level library than > > I am. Whether or not that precludes some kind of merger I don't know. > > Yes, it seems so. It's a pitty that we waste effort by doing the same > nearly work but I really need the low level API and the possibility to > customize the parsing and sending code. Perhaps we could agree on a single API for the low-level message parsing and netlink message construction. At least then we would not be duplicating bug-fixes in our netlink code. Whether or not this sharing would be useful probably depends on if you would continue to maintain your own non-GObject APIs for the various QDiscs and classifiers. GObject makes the creation and maintenance of the language bindings much easier so its basically necessary for my goals. I'm willing to switch the underlying implementation of LQL to use your more featureful NL implementation if that means there won't be two competing C APIs to the individual QDiscs etc. -- OpenPGP key: http://www.coverfire.com/files/pubkey.txt Key fingerprint: FB0A 2D8A A1E9 11B6 6CA3 0C53 742A 9EA8 891C BD98 ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-02-15 20:28 ` Dan Siemon @ 2005-02-15 20:47 ` Thomas Graf 2005-02-22 21:40 ` Dan Siemon 0 siblings, 1 reply; 44+ messages in thread From: Thomas Graf @ 2005-02-15 20:47 UTC (permalink / raw) To: Dan Siemon; +Cc: hadi, netdev > Perhaps we could agree on a single API for the low-level message parsing > and netlink message construction. At least then we would not be > duplicating bug-fixes in our netlink code. Sure, I think they're quite similiar. I abstracted the netlink message and routing attributes building a bit and added some bits for simplification. http://people.suug.ch/~tgr/libnl/doc/group__msg.html http://people.suug.ch/~tgr/libnl/doc/group__rtattr.html I do not care what to use though as long as it is easy to use. > Whether or not this sharing would be useful probably depends on if you > would continue to maintain your own non-GObject APIs for the various > QDiscs and classifiers. GObject makes the creation and maintenance of > the language bindings much easier so its basically necessary for my > goals. I see, well I can extend my objects, I'm even willing to change the architecture if needed. The only requirements from my side is to keep the generic caching header to allow putting these objects into generic caches and keep it simple to readd commit/rollback extesions later on. What is exactly required to make it GObject aware? I've never worked with GOBject so far. Basically a qdisc looks like this at the moment: struct rtnl_qdisc { NLHDR_COMMON; /* common fields required by cache */ NL_TCA_GENERIC(q); /* generic tc fields (parent, handle, ifindex ...) */ void *opts; /* qdisc specific options (e.g. rtnl_sch_fifo) */ }; The NLHDR_COMMON must stay first, the ordering of the others doesn't matter. > I'm willing to switch the underlying implementation of LQL to use your > more featureful NL implementation if that means there won't be two > competing C APIs to the individual QDiscs etc. It would be nice if we find a way to integrate both without losing the features of any side. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-02-15 20:47 ` Thomas Graf @ 2005-02-22 21:40 ` Dan Siemon 2005-02-22 23:15 ` Thomas Graf 0 siblings, 1 reply; 44+ messages in thread From: Dan Siemon @ 2005-02-22 21:40 UTC (permalink / raw) To: Thomas Graf; +Cc: hadi, netdev Sorry, for the tardy response. On Tue, 2005-15-02 at 21:47 +0100, Thomas Graf wrote: > I see, well I can extend my objects, I'm even willing to change the > architecture if needed. The only requirements from my side is to > keep the generic caching header to allow putting these objects into > generic caches and keep it simple to readd commit/rollback extesions > later on. > > What is exactly required to make it GObject aware? I've never worked > with GOBject so far. Basically a qdisc looks like this at the moment: > > struct rtnl_qdisc > { > NLHDR_COMMON; /* common fields required by cache */ > NL_TCA_GENERIC(q); /* generic tc fields (parent, handle, ifindex ...) */ > void *opts; /* qdisc specific options (e.g. rtnl_sch_fifo) */ > }; > > The NLHDR_COMMON must stay first, the ordering of the others doesn't > matter. That could be a problem. The GObject struct must be at the start so that all sub-classes can be operated on with the g_object_ functions. The only way to make these objects work with your caching scheme would be to make a sub-class of GObject with the caching information. This would have the benefit of adding ref counting etc. The following URL will give you a bit of background on GObject. http://www.le-hacker.org/papers/gobject/ -- OpenPGP key: http://www.coverfire.com/files/pubkey.txt Key fingerprint: FB0A 2D8A A1E9 11B6 6CA3 0C53 742A 9EA8 891C BD98 ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-02-22 21:40 ` Dan Siemon @ 2005-02-22 23:15 ` Thomas Graf 0 siblings, 0 replies; 44+ messages in thread From: Thomas Graf @ 2005-02-22 23:15 UTC (permalink / raw) To: Dan Siemon; +Cc: hadi, netdev > > The NLHDR_COMMON must stay first, the ordering of the others doesn't > > matter. > > That could be a problem. The GObject struct must be at the start so > that all sub-classes can be operated on with the g_object_ functions. > The only way to make these objects work with your caching scheme would > be to make a sub-class of GObject with the caching information. This > would have the benefit of adding ref counting etc. It's not a problem, as you note we can put the gobject information into NLHDR_COMMON. I'm not focusing on such bindings but if you want to reuse my code, feel free. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-01-18 14:29 ` jamal 2005-01-18 14:36 ` Lennert Buytenhek 2005-01-18 14:58 ` Thomas Graf @ 2005-01-18 15:07 ` Werner Almesberger 2005-01-19 14:08 ` Thomas Graf 2 siblings, 1 reply; 44+ messages in thread From: Werner Almesberger @ 2005-01-18 15:07 UTC (permalink / raw) To: jamal; +Cc: Thomas Graf, Patrick McHardy, Stephen Hemminger, netdev jamal wrote: > On Tue, 2005-01-18 at 08:44, Thomas Graf wrote: > > I'm aware of [tcng] but naturally it always lags behind a bit and keeping > > it up to date requires quite some work and I already have problems > > finding the time for my own changes ;-> Sigh, yes, I don't have all that much time for it myself, and my focus of interest has shifted, too. Unfortunately, everybody I've tried to talk in to taking over its maintenance so far was wise enough to politely pass up the offer :-( There's also the issue of classifier construction: while I think that the language for this is near-perfect, the internal processing is scary at best, and doesn't produce very nice results. I dream of a new classifier works on a state machine constructed from single-bit classification decisions, but the graph theory required for ordering them properly is a bit above me. (Construction of an unordered and redundant FSM is almost trivial - tcng can already do this.) > > - interactive shell supporting context help + completion > > MUST I'm not so sure about interactive use of "tc". In general, a single configuration line has no meaning. You almost always need a lot more context to understand what it does. Think of the interactive BASIC systems on ancient PCs. There, you would enter/edit/remove lines by their number. Now, would you want to use something like this for C ? Me, I prefer a free-format text editor :-) An interactive help system that could be called from an editor, e.g. when editing tcng configurations, would certainly be a nice touch. But that's an orthogonal issue. A set of man, info, etc. pages would serve nicely, too. - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina werner@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-01-18 15:07 ` Werner Almesberger @ 2005-01-19 14:08 ` Thomas Graf 2005-01-19 16:33 ` Werner Almesberger 0 siblings, 1 reply; 44+ messages in thread From: Thomas Graf @ 2005-01-19 14:08 UTC (permalink / raw) To: Werner Almesberger; +Cc: jamal, Patrick McHardy, Stephen Hemminger, netdev * Werner Almesberger <20050118120737.I15303@almesberger.net> 2005-01-18 12:07 > There's also the issue of classifier construction: while I think > that the language for this is near-perfect, the internal > processing is scary at best, and doesn't produce very nice > results. I dream of a new classifier works on a state machine > constructed from single-bit classification decisions, but the > graph theory required for ordering them properly is a bit above > me. (Construction of an unordered and redundant FSM is almost > trivial - tcng can already do this.) I did some experiments in this direction where one could write code in c-like syntax which is transformed into an insturction set understood by the state machine in the kernel. Similiar to BPF but not focused on parsing but rather on classification. The main disadvantage is speed, the idea isn't new and has been implemented various times already. It didn't performen well enough and everyone switched to the current way of packet filtering. The mistake they did was to completely rely on the state machine for even the most simple packet classification problems. Now that we have specialized classifiers for often used filtering problems we can pick up the idea again and add it _additionaly_ for all these cases that the specialized classifiers do not cover yet. Basically to get a state machine capable solving almost every problem the following parts must be provided: - a small instruction set for basic operations to implement arithmetic, branching, and loops. - some abstract way to access data from various sources, be it packet data, constant values, registers, or meta data. - an advanced instruction set to improve the performance of the state machine for often used patterns, e.g. find-byte, classify, byte order transformations, header length calculation shortcuts, find-next-ipv6-opt, etc. - a good optimizer able to transform multiple simple instructions into a larger instruction, because the main bottleneck in a software state machine is aver number of instructions needed to process to get to a result. - stack frames to allow building libraries for often used problem not worth making a single instruction out of it. > I'm not so sure about interactive use of "tc". In general, a > single configuration line has no meaning. You almost always > need a lot more context to understand what it does. I think the interactive mode is very useful for maintaince. I agree with you that for the initial script a higher language at the level of the big picture is more apppropriate. However, we're moving slowly towards new aspects with the actionts bits and also ematches, things get more complicated and less rules are required. Therefore in my opinion it would be nice to have an interactive shell assisting you with the initial construction. It also heavly depends on the usage of tc et al, the normal dscp, port, and address classification schemas perfectly fit into tcng and the big picture is most important. OTOH, as soon as we get to more complex classifiction and the more classification possibilities we get the more important it is to have some way to interactively construct single filters. So in my opinion the whole problem needs be divided into two parts, the logical big picture part best solved with tcng were logic groupings count more than single bits in the packet and the interactive shell with context help to assist in creating complex filters and do the maintaince jobs. Combined together the result is a more useable interface. > Think of the interactive BASIC systems on ancient PCs. There, > you would enter/edit/remove lines by their number. Now, would > you want to use something like this for C ? Me, I prefer a > free-format text editor :-) Yes but I think you also like context based assistance tools such as cscope where you can get help based on your current context, e.g. symbols, types, references. > An interactive help system that could be called from an > editor, e.g. when editing tcng configurations, would certainly > be a nice touch. But that's an orthogonal issue. A set of man, > info, etc. pages would serve nicely, too. To make it really useful the help system needs to parse your current line to find out the context. Assuming we implement a interactive shell like I described in a earlier post, we could add a parameter to put iproute2 into explanation mode not doing anything put parse the input and print a help text based on it. It shouldn't be too hard to tell your editor to call it. Thoughts? ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-01-19 14:08 ` Thomas Graf @ 2005-01-19 16:33 ` Werner Almesberger 2005-01-19 17:22 ` Thomas Graf 0 siblings, 1 reply; 44+ messages in thread From: Werner Almesberger @ 2005-01-19 16:33 UTC (permalink / raw) To: Thomas Graf; +Cc: jamal, Patrick McHardy, Stephen Hemminger, netdev Thomas Graf wrote: > filtering. The mistake they did was to completely rely on the > state machine for even the most simple packet classification > problems. I don't see much of a performance problem: once you have a nice FSM with single-bit decisions, you can quite easily construct various efficient matcher stages. You can even prepare (or compile on the fly) suitable specialized matchers. If doing the matching in hardware, you may even use just the FSM. > Basically to get a state machine capable solving almost every > problem the following parts must be provided: > > - a small instruction set for basic operations to implement > arithmetic, branching, and loops. You need arithmetic only for pointers, and there it's basically mask and shift. You can do surprisingly well without loops. E.g. tcng doesn't have loops. (Although they would be a nice addition, particularly if you move more in the direction of firewalls.) > - some abstract way to access data from various sources, be it > packet data, constant values, registers, or meta data. You can just define some "magic" offsets, e.g. negative ones. > - an advanced instruction set to improve the performance > of the state machine for often used patterns, e.g. > find-byte, classify, byte order transformations, header > length calculation shortcuts, find-next-ipv6-opt, etc. This can be nicely separated and put into post-processing stages. Most of the time, you probably don't notice a difference anyway. > - a good optimizer able to transform multiple simple > instructions into a larger instruction, because the main > bottleneck in a software state machine is aver number of > instructions needed to process to get to a result. Yes, that would be part of the post-processing: combine things, detect patterns, and emit the right high-level construct. > - stack frames to allow building libraries for often used > problem not worth making a single instruction out of it. Huh ? Probably too complex already. Also, if you're in software, you may very well compile your own helper modules on the fly. tcng has this as a proof-of-concept with the "C" target. > I think the interactive mode is very useful for maintaince. Hmm, I kind of doubt it. You're quicker with your editor, just changing that line. What you need is a nice way for updating the in-kernel configuration without loss of state. You also need some "handles" where you can attach automated rule generation and/or modification. That's something tcng doesn't support very well. > It also heavly depends on the usage of tc et al, the normal > dscp, port, and address classification schemas perfectly fit > into tcng and the big picture is most important. Ah, but you know that that first thing tcng does when it sees an "if" is that it rips the expression apart and then works on "anonymous" fields or even single bits ? > soon as we get to more complex classifiction and the more > classification possibilities we get the more important it is > to have some way to interactively construct single filters. I think the contrary is the case :-) If things get complex enough, you'll want to dry-run them in tcsim or such. It's really not very different from programming - if I want to change some complicated expression, I just edit it. It wouldn't occur to me to tweak assembler instructions, no matter how convenient the assembler is. - Werner -- _________________________________________________________________________ / Werner Almesberger, Buenos Aires, Argentina werner@almesberger.net / /_http://www.almesberger.net/____________________________________________/ ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-01-19 16:33 ` Werner Almesberger @ 2005-01-19 17:22 ` Thomas Graf 0 siblings, 0 replies; 44+ messages in thread From: Thomas Graf @ 2005-01-19 17:22 UTC (permalink / raw) To: Werner Almesberger; +Cc: jamal, Patrick McHardy, Stephen Hemminger, netdev * Werner Almesberger <20050119133353.H15303@almesberger.net> 2005-01-19 13:33 > Thomas Graf wrote: > > filtering. The mistake they did was to completely rely on the > > state machine for even the most simple packet classification > > problems. > > I don't see much of a performance problem: once you have a nice > FSM with single-bit decisions, you can quite easily construct > various efficient matcher stages. You can even prepare (or > compile on the fly) suitable specialized matchers. If doing the > matching in hardware, you may even use just the FSM. I guess we're speaking of slightly different FSMs. I assume yours is 100% hardcoded and only works for static patterns? Those you could also implement in FPGAs quite easly. Assuming one needs 2K rules for classyfing on top of some kind of dynamic header, assuming IPv4 and IPv6 for now. The static FSMs needs to parse the headers everytime which isn't much of a problem with IPv4 but gets really expensive with IPv6 even with specialized instructions to parse through the options. Sure, one can put packets into classes and build filters on top of it, but once splitted one can never merge them together again. I've seen many static FSMs based on BPF, which I assume is what you're talking about. _All_ of them break in practical situations no longer matching the theory. The only way I see to solve this is to put more logic into the state machine. Give the state machine the chance to share data, that's why you need local and global registers. To really make use of it you want to be able to modify the data and thus need arithmetic instructions. I know, I'm pretty lonely on this path but I think this is the way to go. > You need arithmetic only for pointers, and there it's basically > mask and shift. You can do surprisingly well without loops. E.g. > tcng doesn't have loops. (Although they would be a nice addition, > particularly if you move more in the direction of firewalls.) How do you parse IPv6 options? Specialized classifiers? mask and shift might be enough to transform IHL but it won't be sufficent for higher protocols. > Huh ? Probably too complex already. Also, if you're in software, > you may very well compile your own helper modules on the fly. > tcng has this as a proof-of-concept with the "C" target. Of course stack frames are only necessary if you introduce variables and they may only exist in user space and then eliminated for the kernel state machine. > > I think the interactive mode is very useful for maintaince. > > Hmm, I kind of doubt it. You're quicker with your editor, just > changing that line. What you need is a nice way for updating the > in-kernel configuration without loss of state. I'm talking about maintaince stuff in terms of checking statistics, inspecting your filter tree, etc. I fully aggree that the construction of the configuration belongs into a text editor. > You also need some "handles" where you can attach automated > rule generation and/or modification. That's something tcng doesn't > support very well. I don't get this. > Ah, but you know that that first thing tcng does when it sees an > "if" is that it rips the expression apart and then works on > "anonymous" fields or even single bits ? Sure, this is the way to go. > I think the contrary is the case :-) If things get complex > enough, you'll want to dry-run them in tcsim or such. It's > really not very different from programming - if I want to > change some complicated expression, I just edit it. It wouldn't > occur to me to tweak assembler instructions, no matter how > convenient the assembler is. I agreed, I'm solely and only talking about getting help while constructing a filter. With all the new extensions comming in the construction will get more complex with logic expressions, various small classification tools and one needs to look up their parameters etc. I think tcng will never be able to 100% support all of them, because of their easy usage many will write their own so we might see dozens of new ones like in netfilter. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-01-17 15:23 [RFC] batched tc to improve change throughput Thomas Graf 2005-01-17 15:45 ` jamal @ 2005-01-17 18:00 ` Stephen Hemminger 2005-01-17 18:02 ` Stephen Hemminger 2 siblings, 0 replies; 44+ messages in thread From: Stephen Hemminger @ 2005-01-17 18:00 UTC (permalink / raw) To: Thomas Graf; +Cc: Jamal Hadi Salim, Patrick McHardy, netdev On Mon, 17 Jan 2005 16:23:12 +0100 Thomas Graf <tgraf@suug.ch> wrote: > While collecting performance numbers for the ematch changes > I realized that the throughput of changes per second is > almost only limited by the cost of starting the tc binary > over and over. In order to improve this, batching of commands > is required. My plan to do so is quite simple, introduce > a new flag -f which puts tc into batched mode and makes > it read commands from stdin. A bison based parser splits > things into tokens, the grammer would be quite easy: > > INPUT ::= { /* empty */ | CMDS } > CMDS ::= { CMD | CMD ';' CMDS } > CMD ::= ARGS > ARGS ::= { STRING | STRING ARGS } > > The lexical part can be made to ignore c-syle and > shell-style comments, i.e. > > --- > #!/sbin/tc -f > > /* some comments here */ > qdisc add .. > class ... > > # shell like comments also possible > filter add ... basic match ... > --- > > Of course this loses ability to use shell features like > variables and loops and it's probably not worth trying > to emulate things. One can always generate these tc scripts > with the help of other tools like m4, you name it. > > This could also be applied to ip of course. > > Thoughts? I have no problem with -f input, but don't turn it into a full blown interpreter. There are enuf messy scripting languages already. -- Stephen Hemminger <shemminger@osdl.org> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput 2005-01-17 15:23 [RFC] batched tc to improve change throughput Thomas Graf 2005-01-17 15:45 ` jamal 2005-01-17 18:00 ` Stephen Hemminger @ 2005-01-17 18:02 ` Stephen Hemminger 2 siblings, 0 replies; 44+ messages in thread From: Stephen Hemminger @ 2005-01-17 18:02 UTC (permalink / raw) To: Thomas Graf; +Cc: Jamal Hadi Salim, Patrick McHardy, netdev On Mon, 17 Jan 2005 16:23:12 +0100 Thomas Graf <tgraf@suug.ch> wrote: > While collecting performance numbers for the ematch changes > I realized that the throughput of changes per second is > almost only limited by the cost of starting the tc binary > over and over. In order to improve this, batching of commands > is required. My plan to do so is quite simple, introduce > a new flag -f which puts tc into batched mode and makes > it read commands from stdin. A bison based parser splits > things into tokens, the grammer would be quite easy: > > INPUT ::= { /* empty */ | CMDS } > CMDS ::= { CMD | CMD ';' CMDS } > CMD ::= ARGS > ARGS ::= { STRING | STRING ARGS } > > The lexical part can be made to ignore c-syle and > shell-style comments, i.e. > > --- > #!/sbin/tc -f > > /* some comments here */ > qdisc add .. > class ... > > # shell like comments also possible > filter add ... basic match ... > --- > > Of course this loses ability to use shell features like > variables and loops and it's probably not worth trying > to emulate things. One can always generate these tc scripts > with the help of other tools like m4, you name it. > > This could also be applied to ip of course. > > Thoughts? The tc command line processing might leak memory now, or suffer from expected variable initialization issues. You may want to run it with valgrind or other tools to check for that. -- Stephen Hemminger <shemminger@osdl.org> ^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2005-02-22 23:15 UTC | newest] Thread overview: 44+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-01-17 15:23 [RFC] batched tc to improve change throughput Thomas Graf 2005-01-17 15:45 ` jamal 2005-01-17 16:05 ` Thomas Graf 2005-01-17 16:36 ` jamal 2005-01-17 16:56 ` Thomas Graf 2005-01-17 22:49 ` jamal 2005-01-18 13:44 ` Thomas Graf 2005-01-18 14:29 ` jamal 2005-01-18 14:36 ` Lennert Buytenhek 2005-01-18 14:43 ` jamal 2005-01-18 15:07 ` Thomas Graf 2005-01-18 15:20 ` Lennert Buytenhek 2005-01-19 14:24 ` jamal 2005-01-18 14:58 ` Thomas Graf 2005-01-18 15:23 ` Lennert Buytenhek 2005-01-19 14:13 ` jamal 2005-01-19 14:36 ` Thomas Graf 2005-01-19 16:45 ` Werner Almesberger 2005-01-19 16:54 ` Thomas Graf 2005-01-20 14:42 ` jamal 2005-01-20 15:35 ` Thomas Graf 2005-01-20 17:06 ` Stephen Hemminger 2005-01-20 17:19 ` Thomas Graf 2005-01-24 14:13 ` jamal 2005-01-24 15:06 ` Thomas Graf 2005-01-26 13:48 ` jamal 2005-01-26 14:35 ` Thomas Graf 2005-02-11 15:07 ` Dan Siemon 2005-02-12 13:45 ` jamal 2005-02-12 14:29 ` Thomas Graf 2005-02-12 22:07 ` Dan Siemon 2005-02-12 22:32 ` Thomas Graf 2005-02-14 0:23 ` Dan Siemon 2005-02-14 14:27 ` Thomas Graf 2005-02-15 20:28 ` Dan Siemon 2005-02-15 20:47 ` Thomas Graf 2005-02-22 21:40 ` Dan Siemon 2005-02-22 23:15 ` Thomas Graf 2005-01-18 15:07 ` Werner Almesberger 2005-01-19 14:08 ` Thomas Graf 2005-01-19 16:33 ` Werner Almesberger 2005-01-19 17:22 ` Thomas Graf 2005-01-17 18:00 ` Stephen Hemminger 2005-01-17 18:02 ` Stephen Hemminger
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).