* [RFC] batched tc to improve change throughput
@ 2005-01-17 15:23 Thomas Graf
2005-01-17 15:45 ` jamal
` (2 more replies)
0 siblings, 3 replies; 44+ messages in thread
From: Thomas Graf @ 2005-01-17 15:23 UTC (permalink / raw)
To: Jamal Hadi Salim, Patrick McHardy, Stephen Hemminger; +Cc: netdev
While collecting performance numbers for the ematch changes
I realized that the throughput of changes per second is
almost only limited by the cost of starting the tc binary
over and over. In order to improve this, batching of commands
is required. My plan to do so is quite simple, introduce
a new flag -f which puts tc into batched mode and makes
it read commands from stdin. A bison based parser splits
things into tokens, the grammer would be quite easy:
INPUT ::= { /* empty */ | CMDS }
CMDS ::= { CMD | CMD ';' CMDS }
CMD ::= ARGS
ARGS ::= { STRING | STRING ARGS }
The lexical part can be made to ignore c-syle and
shell-style comments, i.e.
---
#!/sbin/tc -f
/* some comments here */
qdisc add ..
class ...
# shell like comments also possible
filter add ... basic match ...
---
Of course this loses ability to use shell features like
variables and loops and it's probably not worth trying
to emulate things. One can always generate these tc scripts
with the help of other tools like m4, you name it.
This could also be applied to ip of course.
Thoughts?
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-17 15:23 [RFC] batched tc to improve change throughput Thomas Graf
@ 2005-01-17 15:45 ` jamal
2005-01-17 16:05 ` Thomas Graf
2005-01-17 18:00 ` Stephen Hemminger
2005-01-17 18:02 ` Stephen Hemminger
2 siblings, 1 reply; 44+ messages in thread
From: jamal @ 2005-01-17 15:45 UTC (permalink / raw)
To: Thomas Graf; +Cc: Patrick McHardy, Stephen Hemminger, netdev
You dont like the -batch option to tc? ;->
cheers,
jamal
On Mon, 2005-01-17 at 10:23, Thomas Graf wrote:
> While collecting performance numbers for the ematch changes
> I realized that the throughput of changes per second is
> almost only limited by the cost of starting the tc binary
> over and over. In order to improve this, batching of commands
> is required. My plan to do so is quite simple, introduce
> a new flag -f which puts tc into batched mode and makes
> it read commands from stdin. A bison based parser splits
> things into tokens, the grammer would be quite easy:
>
> INPUT ::= { /* empty */ | CMDS }
> CMDS ::= { CMD | CMD ';' CMDS }
> CMD ::= ARGS
> ARGS ::= { STRING | STRING ARGS }
>
> The lexical part can be made to ignore c-syle and
> shell-style comments, i.e.
>
> ---
> #!/sbin/tc -f
>
> /* some comments here */
> qdisc add ..
> class ...
>
> # shell like comments also possible
> filter add ... basic match ...
> ---
>
> Of course this loses ability to use shell features like
> variables and loops and it's probably not worth trying
> to emulate things. One can always generate these tc scripts
> with the help of other tools like m4, you name it.
>
> This could also be applied to ip of course.
>
> Thoughts?
>
>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-17 15:45 ` jamal
@ 2005-01-17 16:05 ` Thomas Graf
2005-01-17 16:36 ` jamal
0 siblings, 1 reply; 44+ messages in thread
From: Thomas Graf @ 2005-01-17 16:05 UTC (permalink / raw)
To: jamal; +Cc: Patrick McHardy, Stephen Hemminger, netdev
* jamal <1105976711.1078.1.camel@jzny.localdomain> 2005-01-17 10:45
> You dont like the -batch option to tc? ;->
No, because:
- it duplicates logic
- it doesn't allow any commenting
- it doesn't get along with my more complicated ematch parsing
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-17 16:05 ` Thomas Graf
@ 2005-01-17 16:36 ` jamal
2005-01-17 16:56 ` Thomas Graf
0 siblings, 1 reply; 44+ messages in thread
From: jamal @ 2005-01-17 16:36 UTC (permalink / raw)
To: Thomas Graf; +Cc: Patrick McHardy, Stephen Hemminger, netdev
On Mon, 2005-01-17 at 11:05, Thomas Graf wrote:
> * jamal <1105976711.1078.1.camel@jzny.localdomain> 2005-01-17 10:45
> > You dont like the -batch option to tc? ;->
>
> No, because:
>
> - it duplicates logic
Didnt follow this - uses the same code as command line. What logic gets
duplicated?
> - it doesn't allow any commenting
Trivial thing you can fix in about 33.5 seconds ;->
> - it doesn't get along with my more complicated ematch parsing
Example?
cheers,
jamal
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-17 16:36 ` jamal
@ 2005-01-17 16:56 ` Thomas Graf
2005-01-17 22:49 ` jamal
0 siblings, 1 reply; 44+ messages in thread
From: Thomas Graf @ 2005-01-17 16:56 UTC (permalink / raw)
To: jamal; +Cc: Patrick McHardy, Stephen Hemminger, netdev
* jamal <1105979807.1078.16.camel@jzny.localdomain> 2005-01-17 11:36
> On Mon, 2005-01-17 at 11:05, Thomas Graf wrote:
> > * jamal <1105976711.1078.1.camel@jzny.localdomain> 2005-01-17 10:45
> > > You dont like the -batch option to tc? ;->
> >
> > No, because:
> >
> > - it duplicates logic
>
> Didnt follow this - uses the same code as command line. What logic gets
> duplicated?
The parsing of top level nodes.
> > - it doesn't allow any commenting
>
> Trivial thing you can fix in about 33.5 seconds ;-
Simple full-line comments yes, mid-line comments no. -batch
is also not able to split things across multiple lines.
I want my scripts to look like this:
/**
* filter dla_fp
* match DLA traffic at lower watermark
*/
tc filter add
dev %DEV
parent 1:12
prio 40
protocol all
basic match meta(nfmark eq %LOW_WATERMARK)
and (
nbyte("\x0\x1\x2\x3\x4" at 4 layer 2) /* 00 01 02 03 04 (dla fp) */
or u32(ip src 10.0.0.0/8)
)
flowid 1:20
> > - it doesn't get along with my more complicated ematch parsing
>
> Example?
stuff like nbyte, kmp or regexp rely on quoted strings and those
would be destroyed if they'd contain whitespaces.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-17 15:23 [RFC] batched tc to improve change throughput Thomas Graf
2005-01-17 15:45 ` jamal
@ 2005-01-17 18:00 ` Stephen Hemminger
2005-01-17 18:02 ` Stephen Hemminger
2 siblings, 0 replies; 44+ messages in thread
From: Stephen Hemminger @ 2005-01-17 18:00 UTC (permalink / raw)
To: Thomas Graf; +Cc: Jamal Hadi Salim, Patrick McHardy, netdev
On Mon, 17 Jan 2005 16:23:12 +0100
Thomas Graf <tgraf@suug.ch> wrote:
> While collecting performance numbers for the ematch changes
> I realized that the throughput of changes per second is
> almost only limited by the cost of starting the tc binary
> over and over. In order to improve this, batching of commands
> is required. My plan to do so is quite simple, introduce
> a new flag -f which puts tc into batched mode and makes
> it read commands from stdin. A bison based parser splits
> things into tokens, the grammer would be quite easy:
>
> INPUT ::= { /* empty */ | CMDS }
> CMDS ::= { CMD | CMD ';' CMDS }
> CMD ::= ARGS
> ARGS ::= { STRING | STRING ARGS }
>
> The lexical part can be made to ignore c-syle and
> shell-style comments, i.e.
>
> ---
> #!/sbin/tc -f
>
> /* some comments here */
> qdisc add ..
> class ...
>
> # shell like comments also possible
> filter add ... basic match ...
> ---
>
> Of course this loses ability to use shell features like
> variables and loops and it's probably not worth trying
> to emulate things. One can always generate these tc scripts
> with the help of other tools like m4, you name it.
>
> This could also be applied to ip of course.
>
> Thoughts?
I have no problem with -f input, but don't turn it into a full blown
interpreter. There are enuf messy scripting languages already.
--
Stephen Hemminger <shemminger@osdl.org>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-17 15:23 [RFC] batched tc to improve change throughput Thomas Graf
2005-01-17 15:45 ` jamal
2005-01-17 18:00 ` Stephen Hemminger
@ 2005-01-17 18:02 ` Stephen Hemminger
2 siblings, 0 replies; 44+ messages in thread
From: Stephen Hemminger @ 2005-01-17 18:02 UTC (permalink / raw)
To: Thomas Graf; +Cc: Jamal Hadi Salim, Patrick McHardy, netdev
On Mon, 17 Jan 2005 16:23:12 +0100
Thomas Graf <tgraf@suug.ch> wrote:
> While collecting performance numbers for the ematch changes
> I realized that the throughput of changes per second is
> almost only limited by the cost of starting the tc binary
> over and over. In order to improve this, batching of commands
> is required. My plan to do so is quite simple, introduce
> a new flag -f which puts tc into batched mode and makes
> it read commands from stdin. A bison based parser splits
> things into tokens, the grammer would be quite easy:
>
> INPUT ::= { /* empty */ | CMDS }
> CMDS ::= { CMD | CMD ';' CMDS }
> CMD ::= ARGS
> ARGS ::= { STRING | STRING ARGS }
>
> The lexical part can be made to ignore c-syle and
> shell-style comments, i.e.
>
> ---
> #!/sbin/tc -f
>
> /* some comments here */
> qdisc add ..
> class ...
>
> # shell like comments also possible
> filter add ... basic match ...
> ---
>
> Of course this loses ability to use shell features like
> variables and loops and it's probably not worth trying
> to emulate things. One can always generate these tc scripts
> with the help of other tools like m4, you name it.
>
> This could also be applied to ip of course.
>
> Thoughts?
The tc command line processing might leak memory now, or suffer
from expected variable initialization issues. You may want to run it
with valgrind or other tools to check for that.
--
Stephen Hemminger <shemminger@osdl.org>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-17 16:56 ` Thomas Graf
@ 2005-01-17 22:49 ` jamal
2005-01-18 13:44 ` Thomas Graf
0 siblings, 1 reply; 44+ messages in thread
From: jamal @ 2005-01-17 22:49 UTC (permalink / raw)
To: Thomas Graf; +Cc: Patrick McHardy, Stephen Hemminger, netdev
On Mon, 2005-01-17 at 11:56, Thomas Graf wrote:
> * jamal <1105979807.1078.16.camel@jzny.localdomain> 2005-01-17 11:36
[..]
> > Didnt follow this - uses the same code as command line. What logic gets
> > duplicated?
>
> The parsing of top level nodes.
Probably not very big deal - will just force you to type in
the commands in full over and over in your batch file
> > > - it doesn't allow any commenting
> >
> > Trivial thing you can fix in about 33.5 seconds ;-
>
> Simple full-line comments yes, mid-line comments no. -batch
> is also not able to split things across multiple lines.
>
> I want my scripts to look like this:
>
> /**
> * filter dla_fp
> * match DLA traffic at lower watermark
> */
> tc filter add
> dev %DEV
> parent 1:12
> prio 40
> protocol all
> basic match meta(nfmark eq %LOW_WATERMARK)
> and (
> nbyte("\x0\x1\x2\x3\x4" at 4 layer 2) /* 00 01 02 03 04 (dla fp) */
> or u32(ip src 10.0.0.0/8)
> )
> flowid 1:20
>
>
It does look clean. - btw look at Werners approach on tcng as well.
> > > - it doesn't get along with my more complicated ematch parsing
> >
> > Example?
>
> stuff like nbyte, kmp or regexp rely on quoted strings and those
> would be destroyed if they'd contain whitespaces.
sounds reasonable.
Another thing that would be really neat is to have a iso like cli
(something like what zebra has) so you can go down the parse tree to
say the ematch level and just start typing away these commands.
Should probably be easy to rip off the vtysh stuff off zebra or
use libio or something along those lines to do this.
cheers,
jamal
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-17 22:49 ` jamal
@ 2005-01-18 13:44 ` Thomas Graf
2005-01-18 14:29 ` jamal
0 siblings, 1 reply; 44+ messages in thread
From: Thomas Graf @ 2005-01-18 13:44 UTC (permalink / raw)
To: jamal; +Cc: Patrick McHardy, Stephen Hemminger, netdev
* jamal <1106002197.1046.19.camel@jzny.localdomain> 2005-01-17 17:49
> On Mon, 2005-01-17 at 11:56, Thomas Graf wrote:
> > /**
> > * filter dla_fp
> > * match DLA traffic at lower watermark
> > */
> > tc filter add
> > dev %DEV
> > parent 1:12
> > prio 40
> > protocol all
> > basic match meta(nfmark eq %LOW_WATERMARK)
> > and (
> > nbyte("\x0\x1\x2\x3\x4" at 4 layer 2) /* 00 01 02 03 04 (dla fp) */
> > or u32(ip src 10.0.0.0/8)
> > )
> > flowid 1:20
> >
> >
>
> It does look clean. - btw look at Werners approach on tcng as well.
I'm aware of it but naturally it always lags behind a bit and keeping
it up to date requires quite some work and I already have problems
finding the time for my own changes ;->
> Another thing that would be really neat is to have a iso like cli
> (something like what zebra has) so you can go down the parse tree to
> say the ematch level and just start typing away these commands.
> Should probably be easy to rip off the vtysh stuff off zebra or
> use libio or something along those lines to do this.
I wouldn't call it easy but it's doable. I'm not sure if
entering/leaving subsystem features makes any sense. I find a context
help by pressing '?' and normal completion most useful. It's not
that I dislike your idea but I think it's not worth it. Actualy,
I've been working on such a thing (called netsh) being a frontend
to iproute2 + tc + ... with 3 modes:
- batched mode (-f)
- interactive shell supporting context help + completion
- call over arguments
It includes a quite easy to use API to define the grammar which
can be used by readline to do the completion and print context
aware help. It could be easly ported to iproute2 but every
module needs to be changed, luckly this can happen step by
step. I will port it to iproute2 and transform one of the
easier modules like neighbour to it and we can see if we like
it.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-18 13:44 ` Thomas Graf
@ 2005-01-18 14:29 ` jamal
2005-01-18 14:36 ` Lennert Buytenhek
` (2 more replies)
0 siblings, 3 replies; 44+ messages in thread
From: jamal @ 2005-01-18 14:29 UTC (permalink / raw)
To: Thomas Graf
Cc: Patrick McHardy, Stephen Hemminger, netdev, Werner Almesberger
On Tue, 2005-01-18 at 08:44, Thomas Graf wrote:
> * jamal <1106002197.1046.19.camel@jzny.localdomain> 2005-01-17 17:49
> > On Mon, 2005-01-17 at 11:56, Thomas Graf wrote:
> > > /**
> > > * filter dla_fp
> > > * match DLA traffic at lower watermark
> > > */
> > > tc filter add
> > > dev %DEV
> > > parent 1:12
> > > prio 40
> > > protocol all
> > > basic match meta(nfmark eq %LOW_WATERMARK)
> > > and (
> > > nbyte("\x0\x1\x2\x3\x4" at 4 layer 2) /* 00 01 02 03 04 (dla fp) */
> > > or u32(ip src 10.0.0.0/8)
> > > )
> > > flowid 1:20
> > >
> > >
> >
> > It does look clean. - btw look at Werners approach on tcng as well.
>
> I'm aware of it but naturally it always lags behind a bit and keeping
> it up to date requires quite some work and I already have problems
> finding the time for my own changes ;->
>
I think it is worth getting mr. Almesbergers view. CCed him.
> > Another thing that would be really neat is to have a iso like cli
> > (something like what zebra has) so you can go down the parse tree to
> > say the ematch level and just start typing away these commands.
> > Should probably be easy to rip off the vtysh stuff off zebra or
> > use libio or something along those lines to do this.
>
> I wouldn't call it easy but it's doable.
I dont think its hard at all. It would take me, cycles pending, not more
than a day to whip something off libio.
> I'm not sure if
> entering/leaving subsystem features makes any sense. I find a context
> help by pressing '?' and normal completion most useful. It's not
> that I dislike your idea but I think it's not worth it.
What doesnt make sense or is not worth it?
Two problems that are to be solved - whatever the solution is, it needs
to address them:
a) usability.
i) I dont need to remember how the parse tree looks like or where i am
on the parse tree.
I go:
tc <enter>
tc> ?
i get some help on the next levels.
ii) I should be able to ssh to this thing from some remote location.
This way i can write some scripts to automate things
b) extrenous typing on command line.
I go to the filter level
u32> ?
gives me help
u32> context
filter dev lo parent ffff: protocol ip prio 10
u32> add
u32> match ip src 10.0.0.21/32 flowid 1:16 action drop
u32> match ip src 0/0 flowid 1:16 action ok
u32> commit
filters submitted ..
u32> .. //takes you up one
u32> ls
listing here of filter dev lo parent ffff: protocol ip prio 10
..
..
u32> /qdisc/dev/eth0
now into the qdisc level context for eth0
> Actualy,
> I've been working on such a thing (called netsh) being a frontend
> to iproute2 + tc + ... with 3 modes:
> - batched mode (-f)
this is useful.
> - interactive shell supporting context help + completion
MUST
> - call over arguments
Dont understand this.
> It includes a quite easy to use API to define the grammar which
> can be used by readline to do the completion and print context
> aware help.
what does readline provide you again?
> It could be easly ported to iproute2 but every
> module needs to be changed, luckly this can happen step by
> step. I will port it to iproute2 and transform one of the
> easier modules like neighbour to it and we can see if we like
> it.
I think iproute2 should stay as is - dont wanna break someones scripts
or make it fatter than it is already. Any app to provide the above
should be standalone.
cheers,
jamal
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-18 14:29 ` jamal
@ 2005-01-18 14:36 ` Lennert Buytenhek
2005-01-18 14:43 ` jamal
2005-01-18 14:58 ` Thomas Graf
2005-01-18 15:07 ` Werner Almesberger
2 siblings, 1 reply; 44+ messages in thread
From: Lennert Buytenhek @ 2005-01-18 14:36 UTC (permalink / raw)
To: jamal
Cc: Thomas Graf, Patrick McHardy, Stephen Hemminger, netdev,
Werner Almesberger
On Tue, Jan 18, 2005 at 09:29:52AM -0500, jamal wrote:
> > > Another thing that would be really neat is to have a iso like cli
> > > (something like what zebra has) so you can go down the parse tree to
> > > say the ematch level and just start typing away these commands.
> > > Should probably be easy to rip off the vtysh stuff off zebra or
> > > use libio or something along those lines to do this.
> >
> > I wouldn't call it easy but it's doable.
>
> I dont think its hard at all. It would take me, cycles pending, not more
> than a day to whip something off libio.
If you do this, please consider using Juniper config syntax instead
of doing it the Cisco/quagga way.
cheers,
Lennert
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-18 14:36 ` Lennert Buytenhek
@ 2005-01-18 14:43 ` jamal
2005-01-18 15:07 ` Thomas Graf
2005-01-18 15:20 ` Lennert Buytenhek
0 siblings, 2 replies; 44+ messages in thread
From: jamal @ 2005-01-18 14:43 UTC (permalink / raw)
To: Lennert Buytenhek
Cc: Thomas Graf, Patrick McHardy, Stephen Hemminger, netdev,
Werner Almesberger
On Tue, 2005-01-18 at 09:36, Lennert Buytenhek wrote:
> On Tue, Jan 18, 2005 at 09:29:52AM -0500, jamal wrote:
> If you do this, please consider using Juniper config syntax instead
> of doing it the Cisco/quagga way.
>
Juniper is XML driven config files?
[I am hoping Thomas would do it, btw;-> only if we strongly disagree
then i will be tempted to provide an alternative].
btw, libio uses libevent; i recall you said you had some alternative to
it.
cheers,
jamal
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-18 14:29 ` jamal
2005-01-18 14:36 ` Lennert Buytenhek
@ 2005-01-18 14:58 ` Thomas Graf
2005-01-18 15:23 ` Lennert Buytenhek
2005-01-19 14:13 ` jamal
2005-01-18 15:07 ` Werner Almesberger
2 siblings, 2 replies; 44+ messages in thread
From: Thomas Graf @ 2005-01-18 14:58 UTC (permalink / raw)
To: jamal; +Cc: Patrick McHardy, Stephen Hemminger, netdev, Werner Almesberger
* jamal <1106058592.1035.95.camel@jzny.localdomain> 2005-01-18 09:29
> On Tue, 2005-01-18 at 08:44, Thomas Graf wrote:
> > I'm not sure if
> > entering/leaving subsystem features makes any sense. I find a context
> > help by pressing '?' and normal completion most useful. It's not
> > that I dislike your idea but I think it's not worth it.
>
> What doesnt make sense or is not worth it?
My very personal opinion is that it's not worth it.
> a) usability.
> i) I dont need to remember how the parse tree looks like or where i am
> on the parse tree.
> I go:
> tc <enter>
> tc> ?
> i get some help on the next levels.
> ii) I should be able to ssh to this thing from some remote location.
> This way i can write some scripts to automate things
>
> b) extrenous typing on command line.
> I go to the filter level
>
> u32> ?
> gives me help
> u32> context
> filter dev lo parent ffff: protocol ip prio 10
> u32> add
> u32> match ip src 10.0.0.21/32 flowid 1:16 action drop
> u32> match ip src 0/0 flowid 1:16 action ok
> u32> commit
> filters submitted ..
What do you if there is an error? To what kind of context do you
go? Let's say the kernel reports -EINVAL.
> u32> .. //takes you up one
> u32> ls
> listing here of filter dev lo parent ffff: protocol ip prio 10
> ..
> ..
> u32> /qdisc/dev/eth0
> now into the qdisc level context for eth0
That's what I have:
tgr:axs ~/dev/netconfig/src ./netconfig
...
axs# ?
Next level commands:
link ... Link (interface) configuration
neighbour ... Neighbour (ARP) configuration
warranty Show warranty
exit Quit application
axs# n?
Backtrace:
->neighbour - Neighbour (ARP) configuration
Description:
Module to view and modify the neighbour tables.
The neighbour table establishes bindings between protocol
addresses and link layer addresses for hosts sharing the same
physical link. This module allows you to view the content of
these tables and to manipulate their content.
Next level commands:
add <ADDR> ... Add a neighbour
modify <ADDR> ... Modify a neighbour
delete <ADDR> ... Delete a neighbour
list ... List neighbour attributes
axs# neighbour l?
Backtrace:
->neighbour - Neighbour (ARP) configuration
->list - List neighbour attributes
Next level commands:
<cr> Command can be executed at this point.
stats ... Verbose listing (all attributes/statistics)
where ... Only dump neighbours matching a filter
axs# neighbour list w?
Backtrace:
->neighbour - Neighbour (ARP) configuration
->list - List neighbour attributes
->where - Only dump neighbours matching a filter
Attributes of this node:
lladdr <LLADDR> Link layer address
dst <ADDR> Destination address
dev <DEV> Link the neighbour is on
Next level commands:
<cr> Command can be executed at this point.
flags ...
axs# ...
Again, It's not that I dislike contexts but in the end it all
gets down to make error correction as easy as possible. Everytime
you request a completion or context help the command will get
parsed and a very verbose message including the possibilities you
have will be printed and you can correct your error.
It's more typing work I know, but usually one only types the first
1..3 chars of the commands.
I think something like this should be the base and contextes can
be build upon it.
> > - call over arguments
>
> Dont understand this.
The way it is now, configuration over program arguments.
> > It includes a quite easy to use API to define the grammar which
> > can be used by readline to do the completion and print context
> > aware help.
>
> what does readline provide you again?
It basically takes over the reading of a line and allows manipulation
of the input buffer. It implements all the useful line editing like
in bash and helps with completion. You can bind the '?' key to a
help function so that '?' will not be printed on the screen but instead
the help text is printed and you get back your original line untouched.
It also gives the user a chance to bind keys to certain actions so
everyone can keep the bindings they like with the additional
possibilities to export functions so one could for example bind C-N
to "list-neighbours".
> I think iproute2 should stay as is - dont wanna break someones scripts
> or make it fatter than it is already. Any app to provide the above
> should be standalone.
Well, you mean like generating iproute2 input? This means we'd have to
reimplement the logic twice and handling errors from iproute2 gets
really hard. It's not a problem to keep the old way of iproute2 as-is.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-18 14:43 ` jamal
@ 2005-01-18 15:07 ` Thomas Graf
2005-01-18 15:20 ` Lennert Buytenhek
1 sibling, 0 replies; 44+ messages in thread
From: Thomas Graf @ 2005-01-18 15:07 UTC (permalink / raw)
To: jamal
Cc: Lennert Buytenhek, Patrick McHardy, Stephen Hemminger, netdev,
Werner Almesberger
* jamal <1106059431.1035.101.camel@jzny.localdomain> 2005-01-18 09:43
> On Tue, 2005-01-18 at 09:36, Lennert Buytenhek wrote:
> > On Tue, Jan 18, 2005 at 09:29:52AM -0500, jamal wrote:
>
> > If you do this, please consider using Juniper config syntax instead
> > of doing it the Cisco/quagga way.
> >
>
> Juniper is XML driven config files?
> [I am hoping Thomas would do it, btw;-> only if we strongly disagree
> then i will be tempted to provide an alternative].
I'm sure we can find somethign everyone ges along with just fine.
Iff we do the XML thing we might want to try to stick to the ietf
netconf thoughts.
> btw, libio uses libevent; i recall you said you had some alternative to
> it.
libio couldbe put underneath libreadline but it doesn't make much sense,
I think remote shells do the job just fine. I'd favour a XML protocol
like netconf if we want to follow the remote configuration path.
Endianess issues will hit us quite hard though.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-18 14:29 ` jamal
2005-01-18 14:36 ` Lennert Buytenhek
2005-01-18 14:58 ` Thomas Graf
@ 2005-01-18 15:07 ` Werner Almesberger
2005-01-19 14:08 ` Thomas Graf
2 siblings, 1 reply; 44+ messages in thread
From: Werner Almesberger @ 2005-01-18 15:07 UTC (permalink / raw)
To: jamal; +Cc: Thomas Graf, Patrick McHardy, Stephen Hemminger, netdev
jamal wrote:
> On Tue, 2005-01-18 at 08:44, Thomas Graf wrote:
> > I'm aware of [tcng] but naturally it always lags behind a bit and keeping
> > it up to date requires quite some work and I already have problems
> > finding the time for my own changes ;->
Sigh, yes, I don't have all that much time for it myself, and
my focus of interest has shifted, too. Unfortunately, everybody
I've tried to talk in to taking over its maintenance so far was
wise enough to politely pass up the offer :-(
There's also the issue of classifier construction: while I think
that the language for this is near-perfect, the internal
processing is scary at best, and doesn't produce very nice
results. I dream of a new classifier works on a state machine
constructed from single-bit classification decisions, but the
graph theory required for ordering them properly is a bit above
me. (Construction of an unordered and redundant FSM is almost
trivial - tcng can already do this.)
> > - interactive shell supporting context help + completion
>
> MUST
I'm not so sure about interactive use of "tc". In general, a
single configuration line has no meaning. You almost always
need a lot more context to understand what it does.
Think of the interactive BASIC systems on ancient PCs. There,
you would enter/edit/remove lines by their number. Now, would
you want to use something like this for C ? Me, I prefer a
free-format text editor :-)
An interactive help system that could be called from an
editor, e.g. when editing tcng configurations, would certainly
be a nice touch. But that's an orthogonal issue. A set of man,
info, etc. pages would serve nicely, too.
- Werner
--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina werner@almesberger.net /
/_http://www.almesberger.net/____________________________________________/
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-18 14:43 ` jamal
2005-01-18 15:07 ` Thomas Graf
@ 2005-01-18 15:20 ` Lennert Buytenhek
2005-01-19 14:24 ` jamal
1 sibling, 1 reply; 44+ messages in thread
From: Lennert Buytenhek @ 2005-01-18 15:20 UTC (permalink / raw)
To: jamal
Cc: Thomas Graf, Patrick McHardy, Stephen Hemminger, netdev,
Werner Almesberger
On Tue, Jan 18, 2005 at 09:43:51AM -0500, jamal wrote:
> > If you do this, please consider using Juniper config syntax instead
> > of doing it the Cisco/quagga way.
>
> Juniper is XML driven config files?
Uhm, no :) The main advantage over Cisco (IMHO, the last thing I want
is to start a holy war) is hierarchical config syntax, and the ability
to commit/rollback all your changes in one go. It's unfortunately
rather more verbose, though. Example at the bottom.
> btw, libio uses libevent; i recall you said you had some alternative to
> it.
Yeah, ivykis (http://libivykis.sourceforge.net/) has been happily used
in-house at my last job for years but never really caught on anywhere
else, which is perhaps because I never did much lobbying for it. The
way it works also requires all code written for it to be fully async
(can't use blocking code anywhere), which I think is an advantage but
everyone else thinks is a disadvantage.
cheers,
Lennert
This is an example of a Juniper policy-statement, which is basically
just a prefix filter. This particular filter controls which non-OSPF
prefixes are exported to OSPF.
buytenh@asd-tc2-m20core1> show configuration policy-options policy-statement ospf-export
term accept-default {
from {
route-filter 0.0.0.0/0 exact;
}
then {
external {
type 1;
}
accept;
}
}
term accept-peering {
from {
protocol direct;
route-filter 0.0.0.0/0 prefix-length-range /30-/30;
}
then {
external {
type 1;
}
accept;
}
}
term accept-ams-ix {
from {
protocol direct;
route-filter 195.69.144.0/23 exact;
}
then {
external {
type 1;
}
accept;
}
}
term reject-rest {
then reject;
}
buytenh@asd-tc2-m20core1>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-18 14:58 ` Thomas Graf
@ 2005-01-18 15:23 ` Lennert Buytenhek
2005-01-19 14:13 ` jamal
1 sibling, 0 replies; 44+ messages in thread
From: Lennert Buytenhek @ 2005-01-18 15:23 UTC (permalink / raw)
To: Thomas Graf
Cc: jamal, Patrick McHardy, Stephen Hemminger, netdev,
Werner Almesberger
On Tue, Jan 18, 2005 at 03:58:30PM +0100, Thomas Graf wrote:
> > u32> ?
> > gives me help
> > u32> context
> > filter dev lo parent ffff: protocol ip prio 10
> > u32> add
> > u32> match ip src 10.0.0.21/32 flowid 1:16 action drop
> > u32> match ip src 0/0 flowid 1:16 action ok
> > u32> commit
> > filters submitted ..
>
> What do you if there is an error? To what kind of context do you
> go? Let's say the kernel reports -EINVAL.
You just refuse the commit and stay where you are:
buytenh@asd-tc2-m20core1> configure exclusive
Entering configuration mode
[edit]
buytenh@asd-tc2-m20core1# edit policy-options policy-statement test
[edit policy-options policy-statement test]
buytenh@asd-tc2-m20core1# set from prefix-list test
[edit policy-options policy-statement test]
buytenh@asd-tc2-m20core1# set then accept
[edit policy-options policy-statement test]
buytenh@asd-tc2-m20core1# show
from {
prefix-list test;
}
then accept;
[edit policy-options policy-statement test]
buytenh@asd-tc2-m20core1# commit
[edit]
'policy-options'
Policy error: test prefix-list referenced but not defined
error: configuration check-out failed
[edit policy-options policy-statement test]
buytenh@asd-tc2-m20core1# top
[edit]
buytenh@asd-tc2-m20core1# rollback
load complete
[edit]
buytenh@asd-tc2-m20core1#
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-18 15:07 ` Werner Almesberger
@ 2005-01-19 14:08 ` Thomas Graf
2005-01-19 16:33 ` Werner Almesberger
0 siblings, 1 reply; 44+ messages in thread
From: Thomas Graf @ 2005-01-19 14:08 UTC (permalink / raw)
To: Werner Almesberger; +Cc: jamal, Patrick McHardy, Stephen Hemminger, netdev
* Werner Almesberger <20050118120737.I15303@almesberger.net> 2005-01-18 12:07
> There's also the issue of classifier construction: while I think
> that the language for this is near-perfect, the internal
> processing is scary at best, and doesn't produce very nice
> results. I dream of a new classifier works on a state machine
> constructed from single-bit classification decisions, but the
> graph theory required for ordering them properly is a bit above
> me. (Construction of an unordered and redundant FSM is almost
> trivial - tcng can already do this.)
I did some experiments in this direction where one could write
code in c-like syntax which is transformed into an insturction
set understood by the state machine in the kernel. Similiar to
BPF but not focused on parsing but rather on classification.
The main disadvantage is speed, the idea isn't new and has been
implemented various times already. It didn't performen well
enough and everyone switched to the current way of packet
filtering. The mistake they did was to completely rely on the
state machine for even the most simple packet classification
problems. Now that we have specialized classifiers for often
used filtering problems we can pick up the idea again and add
it _additionaly_ for all these cases that the specialized
classifiers do not cover yet.
Basically to get a state machine capable solving almost every
problem the following parts must be provided:
- a small instruction set for basic operations to implement
arithmetic, branching, and loops.
- some abstract way to access data from various sources, be it
packet data, constant values, registers, or meta data.
- an advanced instruction set to improve the performance
of the state machine for often used patterns, e.g.
find-byte, classify, byte order transformations, header
length calculation shortcuts, find-next-ipv6-opt, etc.
- a good optimizer able to transform multiple simple
instructions into a larger instruction, because the main
bottleneck in a software state machine is aver number of
instructions needed to process to get to a result.
- stack frames to allow building libraries for often used
problem not worth making a single instruction out of it.
> I'm not so sure about interactive use of "tc". In general, a
> single configuration line has no meaning. You almost always
> need a lot more context to understand what it does.
I think the interactive mode is very useful for maintaince. I
agree with you that for the initial script a higher language at
the level of the big picture is more apppropriate. However, we're
moving slowly towards new aspects with the actionts bits and
also ematches, things get more complicated and less rules are
required. Therefore in my opinion it would be nice to have
an interactive shell assisting you with the initial construction.
It also heavly depends on the usage of tc et al, the normal
dscp, port, and address classification schemas perfectly fit
into tcng and the big picture is most important. OTOH, as
soon as we get to more complex classifiction and the more
classification possibilities we get the more important it is
to have some way to interactively construct single filters.
So in my opinion the whole problem needs be divided into
two parts, the logical big picture part best solved with
tcng were logic groupings count more than single bits in the
packet and the interactive shell with context help to assist
in creating complex filters and do the maintaince jobs.
Combined together the result is a more useable interface.
> Think of the interactive BASIC systems on ancient PCs. There,
> you would enter/edit/remove lines by their number. Now, would
> you want to use something like this for C ? Me, I prefer a
> free-format text editor :-)
Yes but I think you also like context based assistance tools
such as cscope where you can get help based on your current
context, e.g. symbols, types, references.
> An interactive help system that could be called from an
> editor, e.g. when editing tcng configurations, would certainly
> be a nice touch. But that's an orthogonal issue. A set of man,
> info, etc. pages would serve nicely, too.
To make it really useful the help system needs to parse your
current line to find out the context. Assuming we implement
a interactive shell like I described in a earlier post, we could
add a parameter to put iproute2 into explanation mode not
doing anything put parse the input and print a help text based
on it. It shouldn't be too hard to tell your editor to call
it.
Thoughts?
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-18 14:58 ` Thomas Graf
2005-01-18 15:23 ` Lennert Buytenhek
@ 2005-01-19 14:13 ` jamal
2005-01-19 14:36 ` Thomas Graf
` (2 more replies)
1 sibling, 3 replies; 44+ messages in thread
From: jamal @ 2005-01-19 14:13 UTC (permalink / raw)
To: Thomas Graf
Cc: Patrick McHardy, Stephen Hemminger, netdev, Werner Almesberger
On Tue, 2005-01-18 at 09:58, Thomas Graf wrote:
> * jamal <1106058592.1035.95.camel@jzny.localdomain> 2005-01-18 09:29
> > On Tue, 2005-01-18 at 08:44, Thomas Graf wrote:
> > > I'm not sure if
> > > entering/leaving subsystem features makes any sense. I find a context
> > > help by pressing '?' and normal completion most useful. It's not
> > > that I dislike your idea but I think it's not worth it.
> >
> > What doesnt make sense or is not worth it?
>
> My very personal opinion is that it's not worth it.
Ok, lets discuss more see if we can change that ;->
> > a) usability.
> > i) I dont need to remember how the parse tree looks like or where i am
> > on the parse tree.
> > I go:
> > tc <enter>
> > tc> ?
> > i get some help on the next levels.
> > ii) I should be able to ssh to this thing from some remote location.
> > This way i can write some scripts to automate things
> >
> > b) extrenous typing on command line.
> > I go to the filter level
> >
> > u32> ?
> > gives me help
> > u32> context
> > filter dev lo parent ffff: protocol ip prio 10
> > u32> add
> > u32> match ip src 10.0.0.21/32 flowid 1:16 action drop
> > u32> match ip src 0/0 flowid 1:16 action ok
> > u32> commit
> > filters submitted ..
>
> What do you if there is an error? To what kind of context do you
> go? Let's say the kernel reports -EINVAL.
>
As Lennert was saying, puke whatever the kernel said and allow for
rollback. i.e undo the first one if it succeeded for example.
> tgr:axs ~/dev/netconfig/src ./netconfig
[some really neat stuff deleted for brevity]
> Again, It's not that I dislike contexts but in the end it all
> gets down to make error correction as easy as possible. Everytime
> you request a completion or context help the command will get
> parsed and a very verbose message including the possibilities you
> have will be printed and you can correct your error.
>
> It's more typing work I know, but usually one only types the first
> 1..3 chars of the commands.
>
Same as in what i showed. Probably not a very big deal.
The one neat thing about the context approach is it allows you to
remember state easily. As an example, after commiting those two u32
filters and you want to undo, you then remember what it is that you can
undo.
What you have is really nice because you could use standard tools such
as "| grep forsomething | formatit etc" to manipulate further. This
would be hard to do in the case of what i described earlier.
The best solution we can have is to have a mix of the two approaches
IMO.
> > > It includes a quite easy to use API to define the grammar which
> > > can be used by readline to do the completion and print context
> > > aware help.
> >
> > what does readline provide you again?
>
> It basically takes over the reading of a line and allows manipulation
> of the input buffer. It implements all the useful line editing like
> in bash and helps with completion. You can bind the '?' key to a
> help function so that '?' will not be printed on the screen but instead
> the help text is printed and you get back your original line untouched.
> It also gives the user a chance to bind keys to certain actions so
> everyone can keep the bindings they like with the additional
> possibilities to export functions so one could for example bind C-N
> to "list-neighbours".
These are all very nice features to have.
> > I think iproute2 should stay as is - dont wanna break someones scripts
> > or make it fatter than it is already. Any app to provide the above
> > should be standalone.
>
> Well, you mean like generating iproute2 input? This means we'd have to
> reimplement the logic twice and handling errors from iproute2 gets
> really hard. It's not a problem to keep the old way of iproute2 as-is.
What i mean is that we should probably leave iproute2 code alone so that
people can run old scripts etc with it. i.e the netsh tool should just
either reuse libnetlink and add any things to it or create a brand new
library.
cheers,
jamal
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-18 15:20 ` Lennert Buytenhek
@ 2005-01-19 14:24 ` jamal
0 siblings, 0 replies; 44+ messages in thread
From: jamal @ 2005-01-19 14:24 UTC (permalink / raw)
To: Lennert Buytenhek
Cc: Thomas Graf, Patrick McHardy, Stephen Hemminger, netdev,
Werner Almesberger
On Tue, 2005-01-18 at 10:20, Lennert Buytenhek wrote:
> On Tue, Jan 18, 2005 at 09:43:51AM -0500, jamal wrote:
>
> > > If you do this, please consider using Juniper config syntax instead
> > > of doing it the Cisco/quagga way.
> >
> > Juniper is XML driven config files?
>
> Uhm, no :) The main advantage over Cisco (IMHO, the last thing I want
> is to start a holy war) is hierarchical config syntax, and the ability
> to commit/rollback all your changes in one go. It's unfortunately
> rather more verbose, though.
As long as a machine(as opposed to a human) is worrying about the
verbosity - should be fine ;->
> Example at the bottom.
It has a well defined structure. Do you know if the publish the bnf to
it? Are these guys gonna come after us if we replicate it you think?
>
> > btw, libio uses libevent; i recall you said you had some alternative to
> > it.
>
> Yeah, ivykis (http://libivykis.sourceforge.net/) has been happily used
> in-house at my last job for years but never really caught on anywhere
> else, which is perhaps because I never did much lobbying for it. The
> way it works also requires all code written for it to be fully async
> (can't use blocking code anywhere), which I think is an advantage but
> everyone else thinks is a disadvantage.
very similar to libevent. I am gonna start using yours because then i
could send you emails which start with "Dear sir, .. i have doubts
because it doesnt work" ;->
cheers,
jamal
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-19 14:13 ` jamal
@ 2005-01-19 14:36 ` Thomas Graf
2005-01-19 16:45 ` Werner Almesberger
2005-01-19 16:54 ` Thomas Graf
2 siblings, 0 replies; 44+ messages in thread
From: Thomas Graf @ 2005-01-19 14:36 UTC (permalink / raw)
To: jamal; +Cc: Patrick McHardy, Stephen Hemminger, netdev, Werner Almesberger
* jamal <1106144009.1047.989.camel@jzny.localdomain> 2005-01-19 09:13
> On Tue, 2005-01-18 at 09:58, Thomas Graf wrote:
> > What do you if there is an error? To what kind of context do you
> > go? Let's say the kernel reports -EINVAL.
> >
>
> As Lennert was saying, puke whatever the kernel said and allow for
> rollback. i.e undo the first one if it succeeded for example.
Yes but this is not dependant on a context, is it? It is possible, I
have it working for most netlink messages but it is non trivial.
> > Again, It's not that I dislike contexts but in the end it all
> > gets down to make error correction as easy as possible. Everytime
> > you request a completion or context help the command will get
> > parsed and a very verbose message including the possibilities you
> > have will be printed and you can correct your error.
> >
> > It's more typing work I know, but usually one only types the first
> > 1..3 chars of the commands.
> >
>
> Same as in what i showed. Probably not a very big deal.
> The one neat thing about the context approach is it allows you to
> remember state easily. As an example, after commiting those two u32
> filters and you want to undo, you then remember what it is that you can
> undo.
So it seems you really want to do commit/rollback on context level.
Can you maybe explain this more verbosely, maybe it's easier than
I think.
The commit/rollback is only useful for groupings of requests such
as complete filtering configurations, the consistency of a single
addition should be handled properly by the kernel. The changes
in my tcf_exts patches solved most of them except for the indev
stuff.
> > > I think iproute2 should stay as is - dont wanna break someones scripts
> > > or make it fatter than it is already. Any app to provide the above
> > > should be standalone.
> >
> > Well, you mean like generating iproute2 input? This means we'd have to
> > reimplement the logic twice and handling errors from iproute2 gets
> > really hard. It's not a problem to keep the old way of iproute2 as-is.
>
> What i mean is that we should probably leave iproute2 code alone so that
> people can run old scripts etc with it. i.e the netsh tool should just
> either reuse libnetlink and add any things to it or create a brand new
> library.
Well, libnetlink contains 1% of the code required, 99% are in the
modules and that's the hard work. Remeber libnl? netsh is based on it
and it took me 2 weeks to get neighbour and link code finished and
working, it is more powerful than ip now though. The basic features of
ip and tc isn't very difficult but there's quite a lot more in iproute2
which needs to be mostly rewritten to fit into a library.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-19 14:08 ` Thomas Graf
@ 2005-01-19 16:33 ` Werner Almesberger
2005-01-19 17:22 ` Thomas Graf
0 siblings, 1 reply; 44+ messages in thread
From: Werner Almesberger @ 2005-01-19 16:33 UTC (permalink / raw)
To: Thomas Graf; +Cc: jamal, Patrick McHardy, Stephen Hemminger, netdev
Thomas Graf wrote:
> filtering. The mistake they did was to completely rely on the
> state machine for even the most simple packet classification
> problems.
I don't see much of a performance problem: once you have a nice
FSM with single-bit decisions, you can quite easily construct
various efficient matcher stages. You can even prepare (or
compile on the fly) suitable specialized matchers. If doing the
matching in hardware, you may even use just the FSM.
> Basically to get a state machine capable solving almost every
> problem the following parts must be provided:
>
> - a small instruction set for basic operations to implement
> arithmetic, branching, and loops.
You need arithmetic only for pointers, and there it's basically
mask and shift. You can do surprisingly well without loops. E.g.
tcng doesn't have loops. (Although they would be a nice addition,
particularly if you move more in the direction of firewalls.)
> - some abstract way to access data from various sources, be it
> packet data, constant values, registers, or meta data.
You can just define some "magic" offsets, e.g. negative ones.
> - an advanced instruction set to improve the performance
> of the state machine for often used patterns, e.g.
> find-byte, classify, byte order transformations, header
> length calculation shortcuts, find-next-ipv6-opt, etc.
This can be nicely separated and put into post-processing stages.
Most of the time, you probably don't notice a difference anyway.
> - a good optimizer able to transform multiple simple
> instructions into a larger instruction, because the main
> bottleneck in a software state machine is aver number of
> instructions needed to process to get to a result.
Yes, that would be part of the post-processing: combine things,
detect patterns, and emit the right high-level construct.
> - stack frames to allow building libraries for often used
> problem not worth making a single instruction out of it.
Huh ? Probably too complex already. Also, if you're in software,
you may very well compile your own helper modules on the fly.
tcng has this as a proof-of-concept with the "C" target.
> I think the interactive mode is very useful for maintaince.
Hmm, I kind of doubt it. You're quicker with your editor, just
changing that line. What you need is a nice way for updating the
in-kernel configuration without loss of state.
You also need some "handles" where you can attach automated
rule generation and/or modification. That's something tcng doesn't
support very well.
> It also heavly depends on the usage of tc et al, the normal
> dscp, port, and address classification schemas perfectly fit
> into tcng and the big picture is most important.
Ah, but you know that that first thing tcng does when it sees an
"if" is that it rips the expression apart and then works on
"anonymous" fields or even single bits ?
> soon as we get to more complex classifiction and the more
> classification possibilities we get the more important it is
> to have some way to interactively construct single filters.
I think the contrary is the case :-) If things get complex
enough, you'll want to dry-run them in tcsim or such. It's
really not very different from programming - if I want to
change some complicated expression, I just edit it. It wouldn't
occur to me to tweak assembler instructions, no matter how
convenient the assembler is.
- Werner
--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina werner@almesberger.net /
/_http://www.almesberger.net/____________________________________________/
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-19 14:13 ` jamal
2005-01-19 14:36 ` Thomas Graf
@ 2005-01-19 16:45 ` Werner Almesberger
2005-01-19 16:54 ` Thomas Graf
2 siblings, 0 replies; 44+ messages in thread
From: Werner Almesberger @ 2005-01-19 16:45 UTC (permalink / raw)
To: jamal; +Cc: Thomas Graf, Patrick McHardy, Stephen Hemminger, netdev
jamal wrote:
> As Lennert was saying, puke whatever the kernel said and allow for
> rollback. i.e undo the first one if it succeeded for example.
How about this: you have a shadow configuration tree on each ingress
and egress point (i.e. two per interface). An update generates the
shadow tree. If done, you commit the shadow tree to become the new
config, and then you can quietly dismantle the old config. If you
didn't like the changes, you kill the shadow tree, without committing
it.
To preserve queue content and element state (e.g. policer state),
you'd also have to have a means to tell where to take things from,
e.g. "shadow" qdisc X inherits the packets from "real" qdisc Y. All
this could be checked for consistency at setup time.
There are of course limitations, e.g. when you merge or split flows.
As an optimization, you could have some "lazy copy": instead of
creating new qdisc "inheriting" from some other, you just move it
from the old to the new tree.
- Werner
--
_________________________________________________________________________
/ Werner Almesberger, Buenos Aires, Argentina werner@almesberger.net /
/_http://www.almesberger.net/____________________________________________/
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-19 14:13 ` jamal
2005-01-19 14:36 ` Thomas Graf
2005-01-19 16:45 ` Werner Almesberger
@ 2005-01-19 16:54 ` Thomas Graf
2005-01-20 14:42 ` jamal
2 siblings, 1 reply; 44+ messages in thread
From: Thomas Graf @ 2005-01-19 16:54 UTC (permalink / raw)
To: jamal; +Cc: Patrick McHardy, Stephen Hemminger, netdev, Werner Almesberger
* jamal <1106144009.1047.989.camel@jzny.localdomain> 2005-01-19 09:13
> What i mean is that we should probably leave iproute2 code alone so that
> people can run old scripts etc with it. i.e the netsh tool should just
> either reuse libnetlink and add any things to it or create a brand new
> library.
Inspected some more code and I've finished already more than I thought.
The architecture currently allows specifying the grammar with macros
like this:
NODELIST(neigh_modify_dev)
NODE(dev)
CALLBACK(set_dev)
FOLLOW(neigh_modify)
ARG(GA_TEXT, &storage.dev, CACHE_MGEN_FUNC(ifname_dst), "<DEV>")
DESC("Link")
END_NODE
END_NODELIST
NODELIST(neigh_ops)
NODE(add)
FOLLOW(neigh_add_dev)
CALLBACK(do_neigh_add)
ARG(GA_TEXT, &storage.dst, CACHE_MGEN_FUNC(dst), "<ADDR>")
DESC("Add a neighbour")
END_NODE
NODE(modify)
FOLLOW(neigh_modify_dev)
CALLBACK(do_neigh_modify)
ARG(GA_TEXT, &storage.dst, CACHE_MGEN_FUNC(dst), "<ADDR>")
DESC("Modify a neighbour")
END_NODE
NODE(delete)
FOLLOW(neigh_del_dev)
CALLBACK(do_neigh_del)
ARG(GA_TEXT, &storage.dst, CACHE_MGEN_FUNC(dst), "<ADDR>")
DESC("Delete a neighbour")
END_NODE
NODE(list)
FOLLOW(neigh_list_attrs)
CALLBACK(do_neigh_list)
DESC("List neighbour attributes")
END_NODE
END_NODELIST
TOPNODE(ng, neighbour)
FOLLOW(neigh_ops)
DESC("Neighbour (ARP) configuration")
LONG_DESC(
" Module to view and modify the neighbour tables.\n"
" \n" \
" The neighbour table establishes bindings between protocol\n" \
" addresses and link layer addresses for hosts sharing the same\n" \
" physical link. This module allows you to view the content of\n" \
" these tables and to manipulate their content.\n")
END_TOPNODE
Looks a bit complicated but is actually quite easy, you can do it the
linux way. This will get your full completion and context help for
your grammar and also completion of arguments like link names,
addresses in neighbour cache, etc. All you have to do is specify
a function returning a list of possibilities. It allows you to
build recursive grammars and multiple end points in the automation.
The above results in this:
axs# neigh ?
Backtrace:
->neighbour - Neighbour (ARP) configuration
Description:
Module to view and modify the neighbour tables.
The neighbour table establishes bindings between protocol
addresses and link layer addresses for hosts sharing the same
physical link. This module allows you to view the content of
these tables and to manipulate their content.
Next level commands:
add <ADDR> ... Add a neighbour
modify <ADDR> ... Modify a neighbour
delete <ADDR> ... Delete a neighbour
list ... List neighbour attributes
axs# neigh modify ?
Backtrace:
->neighbour - Neighbour (ARP) configuration
->modify - Modify a neighbour
Expecting argument: <ADDR>
axs# neigh modify <TAB>
192.168.23.12 192.168.23.13
axs# neigh modify 192.168.23.1
Note: the <TAB> above will look up the addresses in the neighbour
table and list the possible entries, since all share a common
prefix it is automatically filled in. There is quite some
thinking in these completion functions, the link name completion
in neighbour context will only show up links which actually have
neighbour entries.
The status of the whole thing: link and neighbour are finished,
core architecture finished as well, route is half done, addresses
are half done (both easy to finish). libnl has net/sched/
finished but is still missing code for a lot of modules. Session
management (commit/rollback) was once in but was too unstable,
needs a partial rewrite (design flaws) but should fit in quite
easly because libnl was designed to support it. It will basically
look like: nl_session_start(); ... any high level operations ..
if (nl_session_commit() < 0) nl_session_rollback();
Problems? Keeping the cache valid (multiple netlink programs).
The final update just before the commit and the commit itself
must be atomic.
Solutions:
- Use ATOMIC flag (dangerous)
- Seq counter in netlink, increased evertime a netlink message
gets processed and returned in ack. A netlink request may contain
a flag and the expected sequence number and the request gets only
processed if they match, otherwise the request fails. (my favourite)
- Lock file in userspace (how to enfroce everyone to use it?)
- Try to detect changes from third party after commit. Quite
hard but possible, reduces race window but doesn't close it
completely.
Background: I keep 2 caches, 1 cache represents the current state
in the kernel, it gets updated when required. The second cache
contains the local caches. The first cache gets merged into the
second before the update and then gets commited. In case of a
failure the first cache is used to restore things.
Thoughts?
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-19 16:33 ` Werner Almesberger
@ 2005-01-19 17:22 ` Thomas Graf
0 siblings, 0 replies; 44+ messages in thread
From: Thomas Graf @ 2005-01-19 17:22 UTC (permalink / raw)
To: Werner Almesberger; +Cc: jamal, Patrick McHardy, Stephen Hemminger, netdev
* Werner Almesberger <20050119133353.H15303@almesberger.net> 2005-01-19 13:33
> Thomas Graf wrote:
> > filtering. The mistake they did was to completely rely on the
> > state machine for even the most simple packet classification
> > problems.
>
> I don't see much of a performance problem: once you have a nice
> FSM with single-bit decisions, you can quite easily construct
> various efficient matcher stages. You can even prepare (or
> compile on the fly) suitable specialized matchers. If doing the
> matching in hardware, you may even use just the FSM.
I guess we're speaking of slightly different FSMs. I assume yours
is 100% hardcoded and only works for static patterns? Those you
could also implement in FPGAs quite easly.
Assuming one needs 2K rules for classyfing on top of some kind of
dynamic header, assuming IPv4 and IPv6 for now. The static FSMs
needs to parse the headers everytime which isn't much of a
problem with IPv4 but gets really expensive with IPv6 even with
specialized instructions to parse through the options. Sure,
one can put packets into classes and build filters on top of it,
but once splitted one can never merge them together again.
I've seen many static FSMs based on BPF, which I assume is
what you're talking about. _All_ of them break in practical
situations no longer matching the theory.
The only way I see to solve this is to put more logic into the
state machine. Give the state machine the chance to share
data, that's why you need local and global registers. To really
make use of it you want to be able to modify the data and thus
need arithmetic instructions. I know, I'm pretty lonely on this
path but I think this is the way to go.
> You need arithmetic only for pointers, and there it's basically
> mask and shift. You can do surprisingly well without loops. E.g.
> tcng doesn't have loops. (Although they would be a nice addition,
> particularly if you move more in the direction of firewalls.)
How do you parse IPv6 options? Specialized classifiers? mask and
shift might be enough to transform IHL but it won't be sufficent
for higher protocols.
> Huh ? Probably too complex already. Also, if you're in software,
> you may very well compile your own helper modules on the fly.
> tcng has this as a proof-of-concept with the "C" target.
Of course stack frames are only necessary if you introduce variables
and they may only exist in user space and then eliminated for
the kernel state machine.
> > I think the interactive mode is very useful for maintaince.
>
> Hmm, I kind of doubt it. You're quicker with your editor, just
> changing that line. What you need is a nice way for updating the
> in-kernel configuration without loss of state.
I'm talking about maintaince stuff in terms of checking statistics,
inspecting your filter tree, etc. I fully aggree that the construction
of the configuration belongs into a text editor.
> You also need some "handles" where you can attach automated
> rule generation and/or modification. That's something tcng doesn't
> support very well.
I don't get this.
> Ah, but you know that that first thing tcng does when it sees an
> "if" is that it rips the expression apart and then works on
> "anonymous" fields or even single bits ?
Sure, this is the way to go.
> I think the contrary is the case :-) If things get complex
> enough, you'll want to dry-run them in tcsim or such. It's
> really not very different from programming - if I want to
> change some complicated expression, I just edit it. It wouldn't
> occur to me to tweak assembler instructions, no matter how
> convenient the assembler is.
I agreed, I'm solely and only talking about getting help while
constructing a filter. With all the new extensions comming in
the construction will get more complex with logic expressions,
various small classification tools and one needs to look up
their parameters etc. I think tcng will never be able to 100%
support all of them, because of their easy usage many will
write their own so we might see dozens of new ones like in
netfilter.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-19 16:54 ` Thomas Graf
@ 2005-01-20 14:42 ` jamal
2005-01-20 15:35 ` Thomas Graf
0 siblings, 1 reply; 44+ messages in thread
From: jamal @ 2005-01-20 14:42 UTC (permalink / raw)
To: Thomas Graf
Cc: Patrick McHardy, Stephen Hemminger, netdev, Werner Almesberger
On Wed, 2005-01-19 at 11:54, Thomas Graf wrote:
> * jamal <1106144009.1047.989.camel@jzny.localdomain> 2005-01-19 09:13
> > What i mean is that we should probably leave iproute2 code alone so that
> > people can run old scripts etc with it. i.e the netsh tool should just
> > either reuse libnetlink and add any things to it or create a brand new
> > library.
>
> Inspected some more code and I've finished already more than I thought.
> The architecture currently allows specifying the grammar with macros
> like this:
[.. good stuff was here ..]
I like it. Assuming we can have arbitrary hierachies; you just show one
level - but that may be just the example at hand. Given that should be
able to meet the layout requirements that Lennert alluded to earlier.
> Looks a bit complicated but is actually quite easy, you can do it the
> linux way.
I get it, like it and yes TheLinuxWay is the OnlyWay ;->
[..]
> The status of the whole thing: link and neighbour are finished,
> core architecture finished as well, route is half done, addresses
> are half done (both easy to finish). libnl has net/sched/
> finished but is still missing code for a lot of modules.
This is the part i am a little uncomfortable with. If you can make that
library maybe part of iproute2 it would ease maintanance. Extend
libnetlink or have another layer on top of it.
I know you have already put the effort, but consider this thought.
> Session
> management (commit/rollback) was once in but was too unstable,
> needs a partial rewrite (design flaws) but should fit in quite
> easly because libnl was designed to support it. It will basically
> look like: nl_session_start(); ... any high level operations ..
> if (nl_session_commit() < 0) nl_session_rollback();
>
Looks right.
> Problems? Keeping the cache valid (multiple netlink programs).
> The final update just before the commit and the commit itself
> must be atomic.
>
Indeed.
> Solutions:
> - Use ATOMIC flag (dangerous)
Would really need a kernel hack to do right. And .. would slow down
traffic while you hold the "atomic lock".
> - Seq counter in netlink, increased evertime a netlink message
> gets processed and returned in ack. A netlink request may contain
> a flag and the expected sequence number and the request gets only
> processed if they match, otherwise the request fails. (my favourite)
> - Lock file in userspace (how to enfroce everyone to use it?)
> - Try to detect changes from third party after commit. Quite
> hard but possible, reduces race window but doesn't close it
> completely.
>
Other apps changing things will screw you. If that gets handled then we
are set. I actually did start working on a netlink redirect(hook) for a
very different reason, but it should serve this purpose. Essentially you
register to be the proxy for netlink and all messages go via you. You
can then munge them, etc before issuing the response or allowing it to
go on to configure things. With this your "lock" would be to ask for
certain things to be redirected to you during an update phase.
Ok, maybe i will put more effort on it over the weekend (Sunday).
> Background: I keep 2 caches, 1 cache represents the current state
> in the kernel, it gets updated when required. The second cache
> contains the local caches. The first cache gets merged into the
> second before the update and then gets commited. In case of a
> failure the first cache is used to restore things.
>
Looks like the right way forward.
cheers,
jamal
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-20 14:42 ` jamal
@ 2005-01-20 15:35 ` Thomas Graf
2005-01-20 17:06 ` Stephen Hemminger
2005-01-24 14:13 ` jamal
0 siblings, 2 replies; 44+ messages in thread
From: Thomas Graf @ 2005-01-20 15:35 UTC (permalink / raw)
To: jamal; +Cc: Patrick McHardy, Stephen Hemminger, netdev, Werner Almesberger
* jamal <1106232168.1041.125.camel@jzny.localdomain> 2005-01-20 09:42
> I like it. Assuming we can have arbitrary hierachies; you just show one
> level - but that may be just the example at hand. Given that should be
> able to meet the layout requirements that Lennert alluded to earlier.
It doesn't include any context code, the BNF:
PARSER := TOPNODE*
TOPNODE := NODELIST DESC LONG_DESC
NODELIST := NODE*
NODE := DESC [ NODELIST ] [ ARGUMENT ] [ ATTRS ] [ END_POINT ]
END_POINT := possible end of command
ATTRS := ATTR*
ATTR := KEY [ VALUE ]
ARGUMENT := VALUE [ DESC ]
Not sure if this helps, I attached a complete module below.
> > The status of the whole thing: link and neighbour are finished,
> > core architecture finished as well, route is half done, addresses
> > are half done (both easy to finish). libnl has net/sched/
> > finished but is still missing code for a lot of modules.
>
> This is the part i am a little uncomfortable with. If you can make that
> library maybe part of iproute2 it would ease maintanance. Extend
> libnetlink or have another layer on top of it.
> I know you have already put the effort, but consider this thought.
We can move it into iproute2 but the code really differs from iproute2
and code sharing is almost impossible. We can make iproute2 use it
at some point but that doesn't make much sense for me.
> > - Seq counter in netlink, increased evertime a netlink message
> > gets processed and returned in ack. A netlink request may contain
> > a flag and the expected sequence number and the request gets only
> > processed if they match, otherwise the request fails. (my favourite)
Do you have any objections on this?
> > - Lock file in userspace (how to enfroce everyone to use it?)
> > - Try to detect changes from third party after commit. Quite
> > hard but possible, reduces race window but doesn't close it
> > completely.
> >
>
> Other apps changing things will screw you. If that gets handled then we
> are set. I actually did start working on a netlink redirect(hook) for a
> very different reason, but it should serve this purpose. Essentially you
> register to be the proxy for netlink and all messages go via you. You
> can then munge them, etc before issuing the response or allowing it to
> go on to configure things. With this your "lock" would be to ask for
> certain things to be redirected to you during an update phase.
> Ok, maybe i will put more effort on it over the weekend (Sunday).
Indeed, that would serve me well and we can avoid the userspace daemon.
It doesn't even have to be a proxy, a simple callback hook capable of
returning an action would be enough for my purpose.
NOTE: Read bottom-up:
/*
* neigh.c linux net config utility
*
* $Id$
*
* Copyright (c) 2004 Thomas Graf <tgraf@suug.ch>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
#include <nc/config.h>
#include <nc/parse.h>
#include <nc/utils.h>
#include <nc/link.h>
static struct nl_cache neigh_cache = RTNL_INIT_NEIGH_CACHE();
static int dump_type = NL_DUMP_BRIEF;
static struct rtnl_neigh filter = RTNL_INIT_NEIGH();
static struct {
char *lladdr, *dev, *dst, *proxy, *router, *incomplete;
char *reachable, *stale, *delay, *probe, *failed, *noarp;
char *perm;
} storage;
static int set_dump_type(struct grammar_node *g)
{
dump_type = (int) g->gn_data;
return 0;
}
static int set_dev(struct grammar_node *g)
{
int err;
struct nl_cache *c = nl_cache_lookup(RTNL_LINK);
BUG_ON(!g);
err = update_link_cache();
if (err < 0)
return err;
return rtnl_neigh_set_ifindex_name(&filter, c, gr_arg_val(g));
}
static int set_lladdr(struct grammar_node *g)
{
BUG_ON(!g);
return rtnl_neigh_set_lladdr(&filter, gr_arg_val(g));
}
static int set_dst(struct grammar_node *g)
{
BUG_ON(!g);
return rtnl_neigh_set_dst(&filter, gr_arg_val(g));
}
static int set_state2(struct grammar_node *g)
{
BUG_ON(!g);
rtnl_neigh_set_state(&filter, (int) g->gn_data);
return 0;
}
static int set_state(struct grammar_node *g)
{
BUG_ON(!g);
if (gr_is_enabled(g))
rtnl_neigh_set_state(&filter, (int) g->gn_data);
else if (gr_is_disabled(g))
rtnl_neigh_unset_state(&filter, (int) g->gn_data);
else {
put_err("Invalid toggle value '%s', must be {on|off}\n",
gr_arg_val(g));
return -1;
}
return 0;
}
static int set_flag(struct grammar_node *g)
{
BUG_ON(!g);
if (gr_is_enabled(g))
rtnl_neigh_set_flag(&filter, (int) g->gn_data);
else if (gr_is_disabled(g))
rtnl_neigh_unset_flag(&filter, (int) g->gn_data);
else {
put_err("Invalid toggle value '%s', must be {on|off}\n",
gr_arg_val(g));
return -1;
}
return 0;
}
static inline struct rtnl_neigh * get_neigh(int i)
{
return (struct rtnl_neigh *) nl_cache_get(&neigh_cache, i);
}
CACHE_MGEN(lladdr, &nlh_route, &neigh_cache)
{
return nl_addr2str_r(&(get_neigh(i)->n_lladdr), buf, len);
}
CACHE_MGEN(dst, &nlh_route, &neigh_cache)
{
return nl_addr2str_r(&(get_neigh(i)->n_dst), buf, len);
}
CACHE_MGEN(ifname_dst, &nlh_route, &neigh_cache)
{
struct nl_cache *c = nl_cache_lookup(RTNL_LINK);
struct rtnl_neigh *n = get_neigh(i);
if (update_link_cache() < 0)
return NULL;
if (storage.dst) {
struct nl_addr f;
if (nl_str2addr(storage.dst, &f) < 0)
goto fallback;
if (n->n_dst.a_len == f.a_len &&
!memcmp(n->n_dst.a_addr, f.a_addr, n->n_dst.a_len))
return (char *) rtnl_link_i2name(c, n->n_ifindex);
else
return NULL;
}
fallback:
return (char *) rtnl_link_i2name(c, n->n_ifindex);
}
static inline void reset_filter(void)
{
memset(&filter, 0, sizeof(filter));
}
static int update_neigh_cache(void)
{
if (nl_cache_update(&nlh_route, &neigh_cache) < 0) {
put_err("%s\n", nl_geterror());
return -1;
}
return 0;
}
static int do_neigh_list(struct grammar_node *g)
{
int err;
BUG_ON(!g);
err = update_link_cache();
if (err < 0)
goto out;
err = update_neigh_cache();
if (err < 0)
goto out;
nl_cache_dump_filter(dump_type, &neigh_cache,
(struct nl_common *) &filter, fd_out);
err = 0;
out:
dump_type = NL_DUMP_BRIEF;
reset_filter();
return err;
}
static int do_neigh_add(struct grammar_node *g)
{
int err = -1;
BUG_ON(!g);
filter.n_family = filter.n_lladdr.a_family;
err = rtnl_neigh_set_dst(&filter, gr_arg_val(g));
if (err < 0)
goto out;
err = rtnl_neigh_add(&nlh_route, &filter);
if (err < 0)
goto out;
err = 0;
out:
reset_filter();
return err;
}
static int do_neigh_del(struct grammar_node *g)
{
int err = -1;
BUG_ON(!g);
err = rtnl_neigh_set_dst(&filter, gr_arg_val(g));
if (err < 0)
goto out;
err = rtnl_neigh_delete(&nlh_route, &filter);
if (err < 0)
goto out;
err = 0;
out:
reset_filter();
return err;
}
static int do_neigh_modify(struct grammar_node *g)
{
int err;
BUG_ON(!g);
if (filter.n_mask == 0)
return 0;
err = rtnl_neigh_set_dst(&filter, gr_arg_val(g));
if (err < 0)
goto out;
err = rtnl_neigh_change(&nlh_route, &filter, &filter);
if (err < 0)
goto out;
err = 0;
out:
reset_filter();
return err;
}
ATTRLIST(neigh_flags_attrs)
ATTR_FLAG(proxy, NTF_PROXY, &storage.dst, set_flag, "Proxy")
ATTR_FLAG(router, NTF_ROUTER, &storage.router, set_flag, "Router")
ATTR_FLAG(incomplete, NUD_INCOMPLETE, &storage.incomplete, set_state,
"Lookup is incomplete")
ATTR_FLAG(reachable, NUD_REACHABLE, &storage.reachable, set_state,
"Reachable")
ATTR_FLAG(stale, NUD_STALE, &storage.stale, set_state, "Stale entry")
ATTR_FLAG(delay, NUD_DELAY, &storage.delay, set_state, "Delayed")
ATTR_FLAG(probe, NUD_PROBE, &storage.probe, set_state, "Probe")
ATTR_FLAG(failed, NUD_FAILED, &storage.failed, set_state, "Failed")
ATTR_FLAG(noarp, NUD_NOARP, &storage.noarp, set_state, "No ARP")
ATTR_FLAG(permanent, NUD_PERMANENT, &storage.perm, set_state,
"Permanent entry")
END_ATTRLIST
NODELIST(neigh_flags)
END_POINT
NODE(flags)
ATTRS(neigh_flags_attrs)
END_NODE
END_NODELIST
ATTRLIST(neigh_filter)
ATTR(lladdr)
CALLBACK(set_lladdr)
ARG(GA_TEXT, &storage.lladdr,CACHE_MGEN_FUNC(lladdr),"<LLADDR>")
DESC("Link layer address")
END_ATTR
ATTR(dst)
CALLBACK(set_dst)
ARG(GA_TEXT, &storage.dst, CACHE_MGEN_FUNC(dst), "<ADDR>")
DESC("Destination address")
END_ATTR
ATTR(dev)
CALLBACK(set_dev)
ARG(GA_TEXT, &storage.dev, CACHE_MGEN_FUNC(ifname), "<DEV>")
DESC("Link the neighbour is on")
END_ATTR
END_ATTRLIST
NODELIST(neigh_where)
END_POINT
NODE(where)
ATTRS(neigh_filter)
FOLLOW(neigh_flags)
DESC("Only dump neighbours matching a filter")
END_NODE
END_NODELIST
NODELIST(neigh_list_attrs)
END_POINT
NODE(brief)
DATA(NL_DUMP_BRIEF)
FOLLOW(neigh_where)
CALLBACK(set_dump_type)
DESC("Brief listing of attributes")
END_NODE
NODE(full)
DATA(NL_DUMP_FULL)
FOLLOW(neigh_where)
CALLBACK(set_dump_type)
DESC("Verbose listing (all attributes)")
END_NODE
NODE(stats)
DATA(NL_DUMP_STATS)
FOLLOW(neigh_where)
CALLBACK(set_dump_type)
DESC("Verbose listing (all attributes/statistics)")
END_NODE
NODE(where)
ATTRS(neigh_filter)
FOLLOW(neigh_flags)
DESC("Only dump neighbours matching a filter")
END_NODE
END_NODELIST
NODELIST(neigh_add_state)
NODE(permanent)
CALLBACK(set_state2)
DATA(NUD_PERMANENT)
DESC("Permanent entry")
END_NODE
NODE(stale)
CALLBACK(set_state2)
DATA(NUD_STALE)
DESC("Stale entry")
END_NODE
NODE(noarp)
CALLBACK(set_state2)
DATA(NUD_NOARP)
DESC("No ARP")
END_NODE
NODE(reachable)
CALLBACK(set_state2)
DATA(NUD_REACHABLE)
DESC("Reachable")
END_NODE
NODE(failed)
CALLBACK(set_state2)
DATA(NUD_FAILED)
DESC("Failed")
END_NODE
END_NODELIST
NODELIST(neigh_add_lladdr)
NODE(lladdr)
CALLBACK(set_lladdr)
FOLLOW(neigh_add_state)
ARG(GA_TEXT, &storage.lladdr,CACHE_MGEN_FUNC(lladdr),"<LLADDR>")
DESC("Link layer address")
END_NODE
END_NODELIST
NODELIST(neigh_add_dev)
NODE(dev)
CALLBACK(set_dev)
FOLLOW(neigh_add_lladdr)
ARG(GA_TEXT, &storage.dev, CACHE_MGEN_FUNC(ifname), "<DEV>")
DESC("Link")
END_NODE
END_NODELIST
NODELIST(neigh_del_dev)
NODE(dev)
CALLBACK(set_dev)
ARG(GA_TEXT, &storage.dev, CACHE_MGEN_FUNC(ifname_dst), "<DEV>")
DESC("Link")
END_NODE
END_NODELIST
ATTRLIST(neigh_set_attrs)
ATTR(lladdr)
CALLBACK(set_lladdr)
ARG(GA_TEXT, &storage.lladdr,CACHE_MGEN_FUNC(lladdr),"<LLADDR>")
DESC("Link layer address")
END_ATTR
ATTR_FLAG(proxy, NTF_PROXY, &storage.proxy, set_flag, "Proxy")
ATTR_FLAG(router, NTF_ROUTER, &storage.router, set_flag, "Router")
ATTR_FLAG(incomplete, NUD_INCOMPLETE, &storage.incomplete, set_state,
"Incomplete lookup")
ATTR_FLAG(reachable, NUD_REACHABLE, &storage.reachable, set_state,
"Reachable")
ATTR_FLAG(stale, NUD_STALE, &storage.stale, set_state, "Stale entry")
ATTR_FLAG(delay, NUD_DELAY, &storage.delay, set_state, "Delayed")
ATTR_FLAG(probe, NUD_PROBE, &storage.probe, set_state, "Probe")
ATTR_FLAG(failed, NUD_FAILED, &storage.failed, set_state, "Failed")
ATTR_FLAG(noarp, NUD_NOARP, &storage.noarp, set_state, "No ARP")
ATTR_FLAG(permanent, NUD_PERMANENT, &storage.perm, set_state,
"Permanent entry")
END_ATTRLIST
NODELIST(neigh_modify)
NODE(set)
ATTRS(neigh_set_attrs)
END_NODE
END_NODELIST
NODELIST(neigh_modify_dev)
NODE(dev)
CALLBACK(set_dev)
FOLLOW(neigh_modify)
ARG(GA_TEXT, &storage.dev, CACHE_MGEN_FUNC(ifname_dst), "<DEV>")
DESC("Link")
END_NODE
END_NODELIST
NODELIST(neigh_ops)
NODE(add)
FOLLOW(neigh_add_dev)
CALLBACK(do_neigh_add)
ARG(GA_TEXT, &storage.dst, CACHE_MGEN_FUNC(dst), "<ADDR>")
DESC("Add a neighbour")
END_NODE
NODE(modify)
FOLLOW(neigh_modify_dev)
CALLBACK(do_neigh_modify)
ARG(GA_TEXT, &storage.dst, CACHE_MGEN_FUNC(dst), "<ADDR>")
DESC("Modify a neighbour")
END_NODE
NODE(delete)
FOLLOW(neigh_del_dev)
CALLBACK(do_neigh_del)
ARG(GA_TEXT, &storage.dst, CACHE_MGEN_FUNC(dst), "<ADDR>")
DESC("Delete a neighbour")
END_NODE
NODE(list)
FOLLOW(neigh_list_attrs)
CALLBACK(do_neigh_list)
DESC("List neighbour attributes")
END_NODE
END_NODELIST
TOPNODE(ng, neighbour)
FOLLOW(neigh_ops)
DESC("Neighbour (ARP) configuration")
LONG_DESC(
" Module to view and modify the neighbour tables.\n"
" \n" \
" The neighbour table establishes bindings between protocol\n" \
" addresses and link layer addresses for hosts sharing the same\n" \
" physical link. This module allows you to view the content of\n" \
" these tables and to manipulate their content.\n")
END_TOPNODE
static void __init neigh_init(void)
{
MAKE_LIST(neigh_ops);
MAKE_LIST(neigh_list_attrs);
MAKE_LIST(neigh_where);
MAKE_LIST(neigh_filter);
MAKE_LIST(neigh_add_dev);
MAKE_LIST(neigh_add_lladdr);
MAKE_LIST(neigh_add_state);
MAKE_LIST(neigh_del_dev);
MAKE_LIST(neigh_modify_dev);
MAKE_LIST(neigh_modify);
MAKE_LIST(neigh_set_attrs);
MAKE_LIST(neigh_flags);
MAKE_LIST(neigh_flags_attrs);
register_top_node(&ng);
}
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-20 15:35 ` Thomas Graf
@ 2005-01-20 17:06 ` Stephen Hemminger
2005-01-20 17:19 ` Thomas Graf
2005-01-24 14:13 ` jamal
1 sibling, 1 reply; 44+ messages in thread
From: Stephen Hemminger @ 2005-01-20 17:06 UTC (permalink / raw)
To: Thomas Graf; +Cc: jamal, Patrick McHardy, netdev, Werner Almesberger
On Thu, 20 Jan 2005 16:35:59 +0100
Thomas Graf <tgraf@suug.ch> wrote:
> * jamal <1106232168.1041.125.camel@jzny.localdomain> 2005-01-20 09:42
> > I like it. Assuming we can have arbitrary hierachies; you just show one
> > level - but that may be just the example at hand. Given that should be
> > able to meet the layout requirements that Lennert alluded to earlier.
>
> It doesn't include any context code, the BNF:
>
> PARSER := TOPNODE*
> TOPNODE := NODELIST DESC LONG_DESC
> NODELIST := NODE*
> NODE := DESC [ NODELIST ] [ ARGUMENT ] [ ATTRS ] [ END_POINT ]
> END_POINT := possible end of command
> ATTRS := ATTR*
> ATTR := KEY [ VALUE ]
> ARGUMENT := VALUE [ DESC ]
>
> Not sure if this helps, I attached a complete module below.
>
Go for it!
A couple additional suggestions. It would be great to get a useful API
to for 'tc' that is one step above actual low level netlink stuff.
And it would be great to reuse some existing scripting language grammar
and parsing library infrastructure.
Don't feel constrained to C on this. If using C++ or even something like
phython or ruby would be better go ahead; but please no Perl.
--
Stephen Hemminger <shemminger@osdl.org>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-20 17:06 ` Stephen Hemminger
@ 2005-01-20 17:19 ` Thomas Graf
0 siblings, 0 replies; 44+ messages in thread
From: Thomas Graf @ 2005-01-20 17:19 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: jamal, Patrick McHardy, netdev, Werner Almesberger
* Stephen Hemminger <20050120090628.29205d59@dxpl.pdx.osdl.net> 2005-01-20 09:06
> On Thu, 20 Jan 2005 16:35:59 +0100
> Thomas Graf <tgraf@suug.ch> wrote:
>
> > * jamal <1106232168.1041.125.camel@jzny.localdomain> 2005-01-20 09:42
> > > I like it. Assuming we can have arbitrary hierachies; you just show one
> > > level - but that may be just the example at hand. Given that should be
> > > able to meet the layout requirements that Lennert alluded to earlier.
> >
> > It doesn't include any context code, the BNF:
> >
> > PARSER := TOPNODE*
> > TOPNODE := NODELIST DESC LONG_DESC
> > NODELIST := NODE*
> > NODE := DESC [ NODELIST ] [ ARGUMENT ] [ ATTRS ] [ END_POINT ]
> > END_POINT := possible end of command
> > ATTRS := ATTR*
> > ATTR := KEY [ VALUE ]
> > ARGUMENT := VALUE [ DESC ]
> >
> > Not sure if this helps, I attached a complete module below.
> >
>
> A couple additional suggestions. It would be great to get a useful API
> to for 'tc' that is one step above actual low level netlink stuff.
Planned. Trying to reuse an existing grammar but didn't found that
suits well enough yet.
> And it would be great to reuse some existing scripting language grammar
> and parsing library infrastructure.
Tried very hard to do so. I'd really like to build upon readline and
its completion method but most parser generators are not made to get along
with readline + completion. A c++ hack exist but doesn't really work
with completion. That's why I wrote my own grammar definition thing.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-20 15:35 ` Thomas Graf
2005-01-20 17:06 ` Stephen Hemminger
@ 2005-01-24 14:13 ` jamal
2005-01-24 15:06 ` Thomas Graf
1 sibling, 1 reply; 44+ messages in thread
From: jamal @ 2005-01-24 14:13 UTC (permalink / raw)
To: Thomas Graf
Cc: Patrick McHardy, Stephen Hemminger, netdev, Werner Almesberger
On Thu, 2005-01-20 at 10:35, Thomas Graf wrote:
> * jamal <1106232168.1041.125.camel@jzny.localdomain> 2005-01-20 09:42
> > I like it. Assuming we can have arbitrary hierachies; you just show one
> > level - but that may be just the example at hand. Given that should be
> > able to meet the layout requirements that Lennert alluded to earlier.
>
> It doesn't include any context code, the BNF:
>
> PARSER := TOPNODE*
> TOPNODE := NODELIST DESC LONG_DESC
> NODELIST := NODE*
> NODE := DESC [ NODELIST ] [ ARGUMENT ] [ ATTRS ] [ END_POINT ]
> END_POINT := possible end of command
> ATTRS := ATTR*
> ATTR := KEY [ VALUE ]
> ARGUMENT := VALUE [ DESC ]
>
> Not sure if this helps, I attached a complete module below.
Theres a few holes in the BNF, but i dont think that matters that much.
When the time comes it can be cleaned up.
Should be noted that all iproute2 apps already have a very concise BNF.
> > This is the part i am a little uncomfortable with. If you can make that
> > library maybe part of iproute2 it would ease maintanance. Extend
> > libnetlink or have another layer on top of it.
> > I know you have already put the effort, but consider this thought.
>
> We can move it into iproute2 but the code really differs from iproute2
> and code sharing is almost impossible. We can make iproute2 use it
> at some point but that doesn't make much sense for me.
>
I am not sure which path would be better ..
>
> > > - Seq counter in netlink, increased evertime a netlink message
> > > gets processed and returned in ack. A netlink request may contain
> > > a flag and the expected sequence number and the request gets only
> > > processed if they match, otherwise the request fails. (my favourite)
>
> Do you have any objections on this?
A seq number in netlink is infact a transaction identifier (as opposed
to a message identifier). We also have a window of 1 for a very good
reason - simplicty.
If what you are saying is we muck around seq numbers, i think its a bad
idea.
> Indeed, that would serve me well and we can avoid the userspace daemon.
> It doesn't even have to be a proxy, a simple callback hook capable of
> returning an action would be enough for my purpose.
>
Sorry did not get an iota of time to work more on this over the weekend
(kid decided to own me over the weekend).
One thing to note is if you cant have multuiple apps requesting for
RTM_XXXACTION redirects i.e you can only have one. And if this one app
disapears, it is a DOS. So a daemon may be necessary just so we can
check from the kernel if the app is still alive (after some timeout) and
if not we can cleanup state for other apps to reuse the redirect.
> NOTE: Read bottom-up:
Looks very good.
My thoughts now are you need to build on top of libnetlink - another
library. Example, to administratively bring up a netdevice, one would
call something like
admin_up("eth0");
This is not to say you cant build a competing library to libnetlink, i
am just not sure it is worth the effort of having two competing
libraries doing almost the same thing (that need maintanance).
cheers,
jamal
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-24 14:13 ` jamal
@ 2005-01-24 15:06 ` Thomas Graf
2005-01-26 13:48 ` jamal
0 siblings, 1 reply; 44+ messages in thread
From: Thomas Graf @ 2005-01-24 15:06 UTC (permalink / raw)
To: jamal; +Cc: Patrick McHardy, Stephen Hemminger, netdev, Werner Almesberger
* jamal <1106576005.1652.1292.camel@jzny.localdomain> 2005-01-24 09:13
> On Thu, 2005-01-20 at 10:35, Thomas Graf wrote:
> > > > - Seq counter in netlink, increased evertime a netlink message
> > > > gets processed and returned in ack. A netlink request may contain
> > > > a flag and the expected sequence number and the request gets only
> > > > processed if they match, otherwise the request fails. (my favourite)
> >
> > Do you have any objections on this?
>
> A seq number in netlink is infact a transaction identifier (as opposed
> to a message identifier). We also have a window of 1 for a very good
> reason - simplicty.
> If what you are saying is we muck around seq numbers, i think its a bad
> idea.
I'm not talking of the nlmsg_seq but rather a a sequence number with
global or nl_family scope. It gets increased whenever a netlink
message of that family is processed and is returned with the ack. If
a userspace application wants to enforce atomicy between two requests
which cannot be batched because a answer is expected in between then
it could provide the expected sequence number and the request is only
fullfilled if this is true. Example:
--> RTM_NEWLINK
<-- answer
<-- ACK (seq = 222)
--> RTM_SETLINK (expect = 222)
<-- ACK
Now if another netlink app interfers:
--> RTM_NEWLINK
<-- answer
<-- ACK (seq = 222)
-- other app --
--> RTM_SETLINK
<-- ACK (seq = 223)
-- back to first app --
--> RTM_SETLINK (expect = 222)
<-- ERROR
The application can then retry it's operation a few times and
finally give up. The main problem I see is to extend nlmsghdr
in a way it stays compatible.
> My thoughts now are you need to build on top of libnetlink - another
> library. Example, to administratively bring up a netdevice, one would
> call something like
>
> admin_up("eth0");
>
> This is not to say you cant build a competing library to libnetlink, i
> am just not sure it is worth the effort of having two competing
> libraries doing almost the same thing (that need maintanance).
I think it is, the feedback is overwhelming and people are already
contributing to support more netlink users. As I said, 95% of the
functionality is in iproute2 itself and not libnetlink. It is vital
to have some kind of library to abstract the low level netlink
functionality in a simple form to other applications. Currently it's
quite hard to for example access the tc class tree, one can use
libnetlink to do the request and parse the answer but everyone needs
to write their own TLV parsing routines.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-24 15:06 ` Thomas Graf
@ 2005-01-26 13:48 ` jamal
2005-01-26 14:35 ` Thomas Graf
2005-02-11 15:07 ` Dan Siemon
0 siblings, 2 replies; 44+ messages in thread
From: jamal @ 2005-01-26 13:48 UTC (permalink / raw)
To: Thomas Graf
Cc: Patrick McHardy, Stephen Hemminger, netdev, Werner Almesberger
On Mon, 2005-01-24 at 10:06, Thomas Graf wrote:
> I'm not talking of the nlmsg_seq but rather a a sequence number with
> global or nl_family scope. It gets increased whenever a netlink
> message of that family is processed and is returned with the ack. If
> a userspace application wants to enforce atomicy between two requests
> which cannot be batched because a answer is expected in between then
> it could provide the expected sequence number and the request is only
> fullfilled if this is true. Example:
>
> --> RTM_NEWLINK
> <-- answer
> <-- ACK (seq = 222)
> --> RTM_SETLINK (expect = 222)
> <-- ACK
>
> Now if another netlink app interfers:
>
> --> RTM_NEWLINK
> <-- answer
> <-- ACK (seq = 222)
>
> -- other app --
> --> RTM_SETLINK
> <-- ACK (seq = 223)
>
> -- back to first app --
> --> RTM_SETLINK (expect = 222)
> <-- ERROR
>
> The application can then retry it's operation a few times and
> finally give up. The main problem I see is to extend nlmsghdr
> in a way it stays compatible.
The best thing you could get out of this is a warning that something
changed under you i.e doesnt really solve the synchronization issue.
[And a lot more complexity is introduced - if you say you want to change
the netlink header and maintain state in the kernel].
> > My thoughts now are you need to build on top of libnetlink - another
> > library. Example, to administratively bring up a netdevice, one would
> > call something like
> >
> > admin_up("eth0");
> >
> > This is not to say you cant build a competing library to libnetlink, i
> > am just not sure it is worth the effort of having two competing
> > libraries doing almost the same thing (that need maintanance).
>
> I think it is, the feedback is overwhelming and people are already
> contributing to support more netlink users. As I said, 95% of the
> functionality is in iproute2 itself and not libnetlink. It is vital
> to have some kind of library to abstract the low level netlink
> functionality in a simple form to other applications. Currently it's
> quite hard to for example access the tc class tree, one can use
> libnetlink to do the request and parse the answer but everyone needs
> to write their own TLV parsing routines.
Your call really - you are the one who is going to maintain it;->
As for ease of use and avoiding users from knowing details of how
tlvs are put together etc - i think it doesnt matter how thats done
underneath the hood; it is still doable on top of current libnetlink. In
other words whats required, IMO, is something that hides netlink totaly
so that the programmer/user doesnt even get to see TLVs.
cheers,
jamal
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-26 13:48 ` jamal
@ 2005-01-26 14:35 ` Thomas Graf
2005-02-11 15:07 ` Dan Siemon
1 sibling, 0 replies; 44+ messages in thread
From: Thomas Graf @ 2005-01-26 14:35 UTC (permalink / raw)
To: jamal; +Cc: Patrick McHardy, Stephen Hemminger, netdev, Werner Almesberger
* jamal <1106747313.1107.7.camel@jzny.localdomain> 2005-01-26 08:48
> On Mon, 2005-01-24 at 10:06, Thomas Graf wrote:
>
> > I'm not talking of the nlmsg_seq but rather a a sequence number with
> > global or nl_family scope. It gets increased whenever a netlink
> > message of that family is processed and is returned with the ack. If
> > a userspace application wants to enforce atomicy between two requests
> > which cannot be batched because a answer is expected in between then
> > it could provide the expected sequence number and the request is only
> > fullfilled if this is true. Example:
> >
> > --> RTM_NEWLINK
> > <-- answer
> > <-- ACK (seq = 222)
> > --> RTM_SETLINK (expect = 222)
> > <-- ACK
> >
> > Now if another netlink app interfers:
> >
> > --> RTM_NEWLINK
> > <-- answer
> > <-- ACK (seq = 222)
> >
> > -- other app --
> > --> RTM_SETLINK
> > <-- ACK (seq = 223)
> >
> > -- back to first app --
> > --> RTM_SETLINK (expect = 222)
> > <-- ERROR
> >
> > The application can then retry it's operation a few times and
> > finally give up. The main problem I see is to extend nlmsghdr
> > in a way it stays compatible.
>
> The best thing you could get out of this is a warning that something
> changed under you i.e doesnt really solve the synchronization issue.
Why? If we do the check with regard to the rtnl sem we can guarantee
atomicity. The comparison of the expected seq and the current seq must
be done before any action and within the rtnl semaphore. It is very
unlikely that someone interfers so strict locking is pretty inefficient.
rtnl_send_atomic(msg, expect_seq)
retries := 10;
retry:
res := send_msg(msg, expect_seq);
if res = -ERETRY and --retries then
goto retry;
endif
if retries = 0 then
err "Timeout while trying to achieve atomic operation"
endif
and in the kernel:
rtnl_lock();
if expect_seq != seq then
rtnl_unlock()
return -ERETRY;
endif
... atomic action can take place here ...
Of course this only works if netlink requests itself are
synchronized in the relevant netlink family.
> [And a lot more complexity is introduced - if you say you want to change
> the netlink header and maintain state in the kernel].
This is the big problem, there is no padding gap common to all rtnl users.
What we can do is to set a flag in nlmsghdr stating that a u32 block of
data follows the nlmsg header before the netlink user specific header,
i.e.
+---------------------------------+
| nlmsghdr flags |= NLM_F_EXP_SEQ |
+---------------------------------+
| expected_seq (u32) |
+---------------------------------+
| netlink user specific data |
+---------------------------------+
I'd even go one step further and define a header options chain like in
IPv6 so we can add more header attributes later on, like:
+--------------------------------+
| nlmsghdr flags |= NLM_F_OPTS |
+--------------------------------+
| size=4, type=expt_seq, next=0 |
+- - - - - - - - - - - - - - - -+
| expected sequence |
+--------------------------------+
| netlink user specific data |
+--------------------------------+
Thoughts?
> Your call really - you are the one who is going to maintain it;->
> As for ease of use and avoiding users from knowing details of how
> tlvs are put together etc - i think it doesnt matter how thats done
> underneath the hood; it is still doable on top of current libnetlink. In
> other words whats required, IMO, is something that hides netlink totaly
> so that the programmer/user doesnt even get to see TLVs.
Agreed, I even hide the structs exported to usersapce to avoid breakage,
i.e. i don't export tc_stats directly for example.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-01-26 13:48 ` jamal
2005-01-26 14:35 ` Thomas Graf
@ 2005-02-11 15:07 ` Dan Siemon
2005-02-12 13:45 ` jamal
1 sibling, 1 reply; 44+ messages in thread
From: Dan Siemon @ 2005-02-11 15:07 UTC (permalink / raw)
To: netdev; +Cc: hadi
On Wed, 2005-26-01 at 08:48 -0500, jamal wrote:
> Your call really - you are the one who is going to maintain it;->
> As for ease of use and avoiding users from knowing details of how
> tlvs are put together etc - i think it doesnt matter how thats done
> underneath the hood; it is still doable on top of current libnetlink. In
> other words whats required, IMO, is something that hides netlink totaly
> so that the programmer/user doesnt even get to see TLVs.
(Sorry to join this thread so late.)
I'd like to make a little plug for my Linux QoS Library (LQL) [1]
project. LQL provides an abstraction of the kernel QoS features. Full
API documentation is available from the website.
I already have working bindings for one high level language (C# on Mono)
[2]. Creating bindings for Python, Perl etc should be quite easy.
Someone recently emailed me about starting work on Perl bindings.
There is still lots of work to do. Support for more QDiscs and
classifiers is needed and the socket handling code needs a rewrite to
better handle errors. Work continues and these things will be fixed.
[1] - http://www.coverfire.com/lql/
[2] - http://www.coverfire.com/lql-sharp/
--
OpenPGP key: http://www.coverfire.com/files/pubkey.txt
Key fingerprint: FB0A 2D8A A1E9 11B6 6CA3 0C53 742A 9EA8 891C BD98
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-02-11 15:07 ` Dan Siemon
@ 2005-02-12 13:45 ` jamal
2005-02-12 14:29 ` Thomas Graf
2005-02-12 22:07 ` Dan Siemon
0 siblings, 2 replies; 44+ messages in thread
From: jamal @ 2005-02-12 13:45 UTC (permalink / raw)
To: Dan Siemon; +Cc: netdev
On first impression, this looks very nice - I think you got the object
hierachy figured etc; i will look closely later.
What would be really interesting is to see (gulp) a SOAP/xml interface
on top of this. Is this something you can do with those "bindings"?
It seems to me from a library perspective, youa nd Thomas may have to
sync. And restricting to just QoS maybe be a limitation (netlink is the
friend you are looking for; so from networking perspective, give them
GUI clikers ways to set routes, static IPSEC SAs etc). Oh and it would
be interesting to have events (see that link come up down etc)
cheers,
jamal
On Fri, 2005-02-11 at 10:07, Dan Siemon wrote:
> On Wed, 2005-26-01 at 08:48 -0500, jamal wrote:
> > Your call really - you are the one who is going to maintain it;->
> > As for ease of use and avoiding users from knowing details of how
> > tlvs are put together etc - i think it doesnt matter how thats done
> > underneath the hood; it is still doable on top of current libnetlink. In
> > other words whats required, IMO, is something that hides netlink totaly
> > so that the programmer/user doesnt even get to see TLVs.
>
> (Sorry to join this thread so late.)
>
> I'd like to make a little plug for my Linux QoS Library (LQL) [1]
> project. LQL provides an abstraction of the kernel QoS features. Full
> API documentation is available from the website.
>
> I already have working bindings for one high level language (C# on Mono)
> [2]. Creating bindings for Python, Perl etc should be quite easy.
> Someone recently emailed me about starting work on Perl bindings.
>
> There is still lots of work to do. Support for more QDiscs and
> classifiers is needed and the socket handling code needs a rewrite to
> better handle errors. Work continues and these things will be fixed.
>
> [1] - http://www.coverfire.com/lql/
> [2] - http://www.coverfire.com/lql-sharp/
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-02-12 13:45 ` jamal
@ 2005-02-12 14:29 ` Thomas Graf
2005-02-12 22:07 ` Dan Siemon
1 sibling, 0 replies; 44+ messages in thread
From: Thomas Graf @ 2005-02-12 14:29 UTC (permalink / raw)
To: jamal; +Cc: Dan Siemon, netdev
> > I'd like to make a little plug for my Linux QoS Library (LQL) [1]
> > project. LQL provides an abstraction of the kernel QoS features. Full
> > API documentation is available from the website.
I've been looking at this before and I do like your approach. The
license prevents me from really using it buts that's not your problem.
What I really like about it are the bindings to other languages, that's
definitely a good thing.
* jamal <1108215923.1126.132.camel@jzny.localdomain> 2005-02-12 08:45
> What would be really interesting is to see (gulp) a SOAP/xml interface
> on top of this. Is this something you can do with those "bindings"?
Indeed, an xml interface would be really nice to have. I've been
implementing some xml bits for links and neighbour in libnl so far
but i'm not yet sure about the exact format until I'm sure there is
no existing format that would fit us.
> It seems to me from a library perspective, youa nd Thomas may have to
> sync.
Sure, I'm willing to sync as long as the result stays LGPL. The
architectures are quite different so I think the only code we can
share or convert are the actual qdisc/class/filter modules to parse
the netlink messages and do various translations of types and flags etc.
> And restricting to just QoS maybe be a limitation (netlink is the
> friend you are looking for; so from networking perspective, give them
> GUI clikers ways to set routes, static IPSEC SAs etc). Oh and it would
> be interesting to have events (see that link come up down etc)
Although it's possible to receive multicast group messages I haven't
added a specific API into it and you have to do the demuxing
of the various messages types yourself for now (via valid message
callback).
Maybe a quick status report about what's done:
- core netlink API: 80%, lacks better support for non-blocking sockets
and for multicast grouping messages
- link: 100%
- neighbour: 100%
- route: 70%, msg parser and attribte setting done, still lacks a good
dumping procedure and message building
- address: partial patch received which works more or less, someone has
started working on it again
- rule: 0%
- tc 50% still lacks implementations for various qdiscs and classifiers
I haven't touched any other netlink users such as xfrm but those should
be easier to do actually, rtnetlink is the biggest ;->
I've been spending some time on documenting the existing API and
about 40% is done by now. You can have a look at the current
progress of the documentation, the neighbour and link documentation
is nearly finished and gives you the best impression.
http://people.suug.ch/~tgr/libnl/doc/modules.html
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-02-12 13:45 ` jamal
2005-02-12 14:29 ` Thomas Graf
@ 2005-02-12 22:07 ` Dan Siemon
2005-02-12 22:32 ` Thomas Graf
1 sibling, 1 reply; 44+ messages in thread
From: Dan Siemon @ 2005-02-12 22:07 UTC (permalink / raw)
To: hadi; +Cc: netdev
On Sat, 2005-12-02 at 08:45 -0500, jamal wrote:
> On first impression, this looks very nice - I think you got the object
> hierachy figured etc; i will look closely later.
> What would be really interesting is to see (gulp) a SOAP/xml interface
> on top of this. Is this something you can do with those "bindings"?
Yes, a SOAP/XML-RPC interface should be quite possible. This is one of
the main reasons I went to the trouble of creating the Mono bindings. I
need to create some sort of XML interface to LQL in the next few weeks.
> It seems to me from a library perspective, youa nd Thomas may have to
> sync. And restricting to just QoS maybe be a limitation (netlink is the
> friend you are looking for; so from networking perspective, give them
> GUI clikers ways to set routes, static IPSEC SAs etc). Oh and it would
> be interesting to have events (see that link come up down etc)
I named the project Linux QoS Library before I realized that interface
parameters etc could be manipulated via Netlink. I have no intention of
limiting LQL to just QoS features. Eventually I'll probably stop
referring to it as Linux QoS Library and just use LQL.
As for combining my work with Thomas, I'm certainly willing to discuss
it.
--
OpenPGP key: http://www.coverfire.com/files/pubkey.txt
Key fingerprint: FB0A 2D8A A1E9 11B6 6CA3 0C53 742A 9EA8 891C BD98
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-02-12 22:07 ` Dan Siemon
@ 2005-02-12 22:32 ` Thomas Graf
2005-02-14 0:23 ` Dan Siemon
0 siblings, 1 reply; 44+ messages in thread
From: Thomas Graf @ 2005-02-12 22:32 UTC (permalink / raw)
To: Dan Siemon; +Cc: hadi, netdev
* Dan Siemon <1108246033.7554.18.camel@ganymede> 2005-02-12 17:07
> On Sat, 2005-12-02 at 08:45 -0500, jamal wrote:
> > On first impression, this looks very nice - I think you got the object
> > hierachy figured etc; i will look closely later.
> > What would be really interesting is to see (gulp) a SOAP/xml interface
> > on top of this. Is this something you can do with those "bindings"?
>
> Yes, a SOAP/XML-RPC interface should be quite possible. This is one of
> the main reasons I went to the trouble of creating the Mono bindings. I
> need to create some sort of XML interface to LQL in the next few weeks.
Before you go ahead, please consider its possible usages. If possible
it should conform to an existing format allowing for distributed
configuration of network nodes. If no such thing exist and you design
your own format please consider the following requirements, because it
would be sad if you waste effort that needs to be redone later on.
- all components of the networking configuration must be configurable.
This includes links, neighbours, routes, routing rules, traffic
control but also configuration parameters currently only accessible
via ethtool.
- The whole interface must take care of the byte order issues. This is
the most tricky part.
- It must be possible to extend it without breaking backward
compatibility.
> As for combining my work with Thomas, I'm certainly willing to discuss
> it.
So let's discuss it, from what I can see your library only consists of
basic netlink connection abilities and message parsers/builders on a
per netlink user basis. You do not provide any ways to customize it,
if the user of your library wants to send its own messages he's pretty
much on its own because the whole process of constructing the message,
sending it and waiting for the ack is hidden behind one single function.
The per object API is quite similiar, you also let the user set the
attributes to a object and then commit that object to the kernel.
Honestly speaking, your API doesn't fit my needs and the changes to
make it suiteable would be rather big so I'm not sure whether a merge
of my code into yours would make much sense and the only that could be
merged from your side to mine would be the additional support for two
or three qdiscs such as htb.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-02-12 22:32 ` Thomas Graf
@ 2005-02-14 0:23 ` Dan Siemon
2005-02-14 14:27 ` Thomas Graf
0 siblings, 1 reply; 44+ messages in thread
From: Dan Siemon @ 2005-02-14 0:23 UTC (permalink / raw)
To: Thomas Graf; +Cc: hadi, netdev
On Sat, 2005-12-02 at 23:32 +0100, Thomas Graf wrote:
> * Dan Siemon <1108246033.7554.18.camel@ganymede> 2005-02-12 17:07
> > Yes, a SOAP/XML-RPC interface should be quite possible. This is one of
> > the main reasons I went to the trouble of creating the Mono bindings. I
> > need to create some sort of XML interface to LQL in the next few weeks.
>
> Before you go ahead, please consider its possible usages. If possible
> it should conform to an existing format allowing for distributed
> configuration of network nodes. If no such thing exist and you design
> your own format please consider the following requirements, because it
> would be sad if you waste effort that needs to be redone later on.
The initial implementation will be very specific to LQLs methods. I
need this for a prototype application.
> - The whole interface must take care of the byte order issues. This is
> the most tricky part.
I don't see how byte order issues are a problem when using SOAP.
Example?
> > As for combining my work with Thomas, I'm certainly willing to discuss
> > it.
>
> So let's discuss it, from what I can see your library only consists of
> basic netlink connection abilities and message parsers/builders on a
> per netlink user basis. You do not provide any ways to customize it,
> if the user of your library wants to send its own messages he's pretty
> much on its own because the whole process of constructing the message,
> sending it and waiting for the ack is hidden behind one single function.
My main design goal for LQL is a nice C library for the existing QoS
elements (and later link and friends). As such public functions that
allow the user of the library to construct their own nlmsg packets is
not my main interest. The functions in the LQL namespace attempt to
hide all aspects of Netlink and the underlying communication to the
kernel.
However, I do have functions for manipulating raw messages. These
functions are all in the NL namespace (nl.c and nl.h). They are quite
purposely hidden from the public API documentation. Perhaps these
functions should be documented publicly; although for 99% of the people
using the library the last thing they want to do is build a netlink
message.
Examples:
gboolean nl_tcpacket_add_rtattr(TcPacket *pkt, unsigned short type,
unsigned short len, void *data);
This function adds a new rtattr to the message. There is a similar
function that adds a nested rtattr.
nl_tcpacket_do_command() will send the message and wait for an ACK.
Usage examples can be found in lql_qdisc_htb_helper.c.
> Honestly speaking, your API doesn't fit my needs and the changes to
> make it suiteable would be rather big so I'm not sure whether a merge
> of my code into yours would make much sense and the only that could be
> merged from your side to mine would be the additional support for two
> or three qdiscs such as htb.
I'm curious exactly what your needs are.
It does appear you are aiming for a somewhat more low level library than
I am. Whether or not that precludes some kind of merger I don't know.
--
OpenPGP key: http://www.coverfire.com/files/pubkey.txt
Key fingerprint: FB0A 2D8A A1E9 11B6 6CA3 0C53 742A 9EA8 891C BD98
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-02-14 0:23 ` Dan Siemon
@ 2005-02-14 14:27 ` Thomas Graf
2005-02-15 20:28 ` Dan Siemon
0 siblings, 1 reply; 44+ messages in thread
From: Thomas Graf @ 2005-02-14 14:27 UTC (permalink / raw)
To: Dan Siemon; +Cc: hadi, netdev
> > - The whole interface must take care of the byte order issues. This is
> > the most tricky part.
>
> I don't see how byte order issues are a problem when using SOAP.
> Example?
It depends on wehther your outline every qdisc/filter in the protocol.
If you do so it's not a problem but you have to extend your protocol
every time a new qdisc is introduced or an existing one changes. A
generic partly binary based protocol will have byte order issues.
My current idea, given I can't find an existing protocol, is to let
every netlink user describe its own format with a generic grammar so
the protocol can stay stable. One of the candidates is the netconf
specification which basically does what we need but is still in early
development.
> I'm curious exactly what your needs are.
Basically I need to be able to change the beavhiour of the message
parser to for example overwrite the sequence number checking in order
to do message multiplexing. It's not like I would be represenative
though.
> It does appear you are aiming for a somewhat more low level library than
> I am. Whether or not that precludes some kind of merger I don't know.
Yes, it seems so. It's a pitty that we waste effort by doing the same
nearly work but I really need the low level API and the possibility to
customize the parsing and sending code.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-02-14 14:27 ` Thomas Graf
@ 2005-02-15 20:28 ` Dan Siemon
2005-02-15 20:47 ` Thomas Graf
0 siblings, 1 reply; 44+ messages in thread
From: Dan Siemon @ 2005-02-15 20:28 UTC (permalink / raw)
To: Thomas Graf; +Cc: hadi, netdev
On Mon, 2005-14-02 at 15:27 +0100, Thomas Graf wrote:
> > I'm curious exactly what your needs are.
>
> Basically I need to be able to change the beavhiour of the message
> parser to for example overwrite the sequence number checking in order
> to do message multiplexing. It's not like I would be represenative
> though.
>
> > It does appear you are aiming for a somewhat more low level library than
> > I am. Whether or not that precludes some kind of merger I don't know.
>
> Yes, it seems so. It's a pitty that we waste effort by doing the same
> nearly work but I really need the low level API and the possibility to
> customize the parsing and sending code.
Perhaps we could agree on a single API for the low-level message parsing
and netlink message construction. At least then we would not be
duplicating bug-fixes in our netlink code.
Whether or not this sharing would be useful probably depends on if you
would continue to maintain your own non-GObject APIs for the various
QDiscs and classifiers. GObject makes the creation and maintenance of
the language bindings much easier so its basically necessary for my
goals.
I'm willing to switch the underlying implementation of LQL to use your
more featureful NL implementation if that means there won't be two
competing C APIs to the individual QDiscs etc.
--
OpenPGP key: http://www.coverfire.com/files/pubkey.txt
Key fingerprint: FB0A 2D8A A1E9 11B6 6CA3 0C53 742A 9EA8 891C BD98
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-02-15 20:28 ` Dan Siemon
@ 2005-02-15 20:47 ` Thomas Graf
2005-02-22 21:40 ` Dan Siemon
0 siblings, 1 reply; 44+ messages in thread
From: Thomas Graf @ 2005-02-15 20:47 UTC (permalink / raw)
To: Dan Siemon; +Cc: hadi, netdev
> Perhaps we could agree on a single API for the low-level message parsing
> and netlink message construction. At least then we would not be
> duplicating bug-fixes in our netlink code.
Sure, I think they're quite similiar. I abstracted the netlink message
and routing attributes building a bit and added some bits for
simplification.
http://people.suug.ch/~tgr/libnl/doc/group__msg.html
http://people.suug.ch/~tgr/libnl/doc/group__rtattr.html
I do not care what to use though as long as it is easy to use.
> Whether or not this sharing would be useful probably depends on if you
> would continue to maintain your own non-GObject APIs for the various
> QDiscs and classifiers. GObject makes the creation and maintenance of
> the language bindings much easier so its basically necessary for my
> goals.
I see, well I can extend my objects, I'm even willing to change the
architecture if needed. The only requirements from my side is to
keep the generic caching header to allow putting these objects into
generic caches and keep it simple to readd commit/rollback extesions
later on.
What is exactly required to make it GObject aware? I've never worked
with GOBject so far. Basically a qdisc looks like this at the moment:
struct rtnl_qdisc
{
NLHDR_COMMON; /* common fields required by cache */
NL_TCA_GENERIC(q); /* generic tc fields (parent, handle, ifindex ...) */
void *opts; /* qdisc specific options (e.g. rtnl_sch_fifo) */
};
The NLHDR_COMMON must stay first, the ordering of the others doesn't
matter.
> I'm willing to switch the underlying implementation of LQL to use your
> more featureful NL implementation if that means there won't be two
> competing C APIs to the individual QDiscs etc.
It would be nice if we find a way to integrate both without losing the
features of any side.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-02-15 20:47 ` Thomas Graf
@ 2005-02-22 21:40 ` Dan Siemon
2005-02-22 23:15 ` Thomas Graf
0 siblings, 1 reply; 44+ messages in thread
From: Dan Siemon @ 2005-02-22 21:40 UTC (permalink / raw)
To: Thomas Graf; +Cc: hadi, netdev
Sorry, for the tardy response.
On Tue, 2005-15-02 at 21:47 +0100, Thomas Graf wrote:
> I see, well I can extend my objects, I'm even willing to change the
> architecture if needed. The only requirements from my side is to
> keep the generic caching header to allow putting these objects into
> generic caches and keep it simple to readd commit/rollback extesions
> later on.
>
> What is exactly required to make it GObject aware? I've never worked
> with GOBject so far. Basically a qdisc looks like this at the moment:
>
> struct rtnl_qdisc
> {
> NLHDR_COMMON; /* common fields required by cache */
> NL_TCA_GENERIC(q); /* generic tc fields (parent, handle, ifindex ...) */
> void *opts; /* qdisc specific options (e.g. rtnl_sch_fifo) */
> };
>
> The NLHDR_COMMON must stay first, the ordering of the others doesn't
> matter.
That could be a problem. The GObject struct must be at the start so
that all sub-classes can be operated on with the g_object_ functions.
The only way to make these objects work with your caching scheme would
be to make a sub-class of GObject with the caching information. This
would have the benefit of adding ref counting etc.
The following URL will give you a bit of background on GObject.
http://www.le-hacker.org/papers/gobject/
--
OpenPGP key: http://www.coverfire.com/files/pubkey.txt
Key fingerprint: FB0A 2D8A A1E9 11B6 6CA3 0C53 742A 9EA8 891C BD98
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [RFC] batched tc to improve change throughput
2005-02-22 21:40 ` Dan Siemon
@ 2005-02-22 23:15 ` Thomas Graf
0 siblings, 0 replies; 44+ messages in thread
From: Thomas Graf @ 2005-02-22 23:15 UTC (permalink / raw)
To: Dan Siemon; +Cc: hadi, netdev
> > The NLHDR_COMMON must stay first, the ordering of the others doesn't
> > matter.
>
> That could be a problem. The GObject struct must be at the start so
> that all sub-classes can be operated on with the g_object_ functions.
> The only way to make these objects work with your caching scheme would
> be to make a sub-class of GObject with the caching information. This
> would have the benefit of adding ref counting etc.
It's not a problem, as you note we can put the gobject information
into NLHDR_COMMON. I'm not focusing on such bindings but if you want
to reuse my code, feel free.
^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2005-02-22 23:15 UTC | newest]
Thread overview: 44+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-01-17 15:23 [RFC] batched tc to improve change throughput Thomas Graf
2005-01-17 15:45 ` jamal
2005-01-17 16:05 ` Thomas Graf
2005-01-17 16:36 ` jamal
2005-01-17 16:56 ` Thomas Graf
2005-01-17 22:49 ` jamal
2005-01-18 13:44 ` Thomas Graf
2005-01-18 14:29 ` jamal
2005-01-18 14:36 ` Lennert Buytenhek
2005-01-18 14:43 ` jamal
2005-01-18 15:07 ` Thomas Graf
2005-01-18 15:20 ` Lennert Buytenhek
2005-01-19 14:24 ` jamal
2005-01-18 14:58 ` Thomas Graf
2005-01-18 15:23 ` Lennert Buytenhek
2005-01-19 14:13 ` jamal
2005-01-19 14:36 ` Thomas Graf
2005-01-19 16:45 ` Werner Almesberger
2005-01-19 16:54 ` Thomas Graf
2005-01-20 14:42 ` jamal
2005-01-20 15:35 ` Thomas Graf
2005-01-20 17:06 ` Stephen Hemminger
2005-01-20 17:19 ` Thomas Graf
2005-01-24 14:13 ` jamal
2005-01-24 15:06 ` Thomas Graf
2005-01-26 13:48 ` jamal
2005-01-26 14:35 ` Thomas Graf
2005-02-11 15:07 ` Dan Siemon
2005-02-12 13:45 ` jamal
2005-02-12 14:29 ` Thomas Graf
2005-02-12 22:07 ` Dan Siemon
2005-02-12 22:32 ` Thomas Graf
2005-02-14 0:23 ` Dan Siemon
2005-02-14 14:27 ` Thomas Graf
2005-02-15 20:28 ` Dan Siemon
2005-02-15 20:47 ` Thomas Graf
2005-02-22 21:40 ` Dan Siemon
2005-02-22 23:15 ` Thomas Graf
2005-01-18 15:07 ` Werner Almesberger
2005-01-19 14:08 ` Thomas Graf
2005-01-19 16:33 ` Werner Almesberger
2005-01-19 17:22 ` Thomas Graf
2005-01-17 18:00 ` Stephen Hemminger
2005-01-17 18:02 ` Stephen Hemminger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).