Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
       [not found] <200412270715.iBR7Fffe026855@hera.kernel.org>
@ 2004-12-27 12:16 ` Thomas Graf
  2004-12-28 13:20   ` jamal
  0 siblings, 1 reply; 42+ messages in thread
From: Thomas Graf @ 2004-12-27 12:16 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev

> ChangeSet 1.2055.37.1, 2004/11/17 16:08:01-08:00, util@deuroconsult.ro
> 
> 	[PKT_SCHED]: Allow using nfmark as key in U32 classifier.
> 	
> 	Signed-off-by: Catalin(ux aka Dino) BOIE <catab at umbrella.ro>
> 	Signed-off-by: David S. Miller <davem@davemloft.net>

I must have missed this one. This should have been implemented in the
metadata action module to make it available to all classifiers. We
should really stop to add more stuff to specific classifiers which have
to be removed once we have metamatch. I've made a proposal on paper
just need some more time to cook up a patch.

> +#ifdef CONFIG_CLS_U32_MARK                                                                                                                                             
> +	if (tb[TCA_U32_MARK-1]) {
> +		if (RTA_PAYLOAD(tb[TCA_U32_MARK-1]) < sizeof(struct tc_u32_mark))
> +			return -EINVAL;
> +		mark = RTA_DATA(tb[TCA_U32_MARK-1]);
> +		memcpy(&n->mark, mark, sizeof(struct tc_u32_mark));
> +		n->mark.success = 0;
> +	}                                                                                                                                                                
> +#endif                                                                                                                                                                 

This should have gone into u32_set_params

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2004-12-27 12:16 ` [PKT_SCHED]: Allow using nfmark as key in U32 classifier Thomas Graf
@ 2004-12-28 13:20   ` jamal
  2004-12-28 13:40     ` Thomas Graf
  0 siblings, 1 reply; 42+ messages in thread
From: jamal @ 2004-12-28 13:20 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David S. Miller, netdev

On Mon, 2004-12-27 at 07:16, Thomas Graf wrote:
> > ChangeSet 1.2055.37.1, 2004/11/17 16:08:01-08:00, util@deuroconsult.ro
> > 
> > 	[PKT_SCHED]: Allow using nfmark as key in U32 classifier.
> > 	
> > 	Signed-off-by: Catalin(ux aka Dino) BOIE <catab at umbrella.ro>
> > 	Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> I must have missed this one. This should have been implemented in the
> metadata action module to make it available to all classifiers.

You mean meta match. 

>  We
> should really stop to add more stuff to specific classifiers which have
> to be removed once we have metamatch. I've made a proposal on paper
> just need some more time to cook up a patch.

I have cycles now. Are you working on this or should i invest some of my
cycles in it? Lets fix this once and for all.

cheers,
jamal 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2004-12-28 13:20   ` jamal
@ 2004-12-28 13:40     ` Thomas Graf
  2004-12-28 13:59       ` jamal
  0 siblings, 1 reply; 42+ messages in thread
From: Thomas Graf @ 2004-12-28 13:40 UTC (permalink / raw)
  To: jamal; +Cc: David S. Miller, netdev

* jamal <1104240053.1100.53.camel@jzny.localdomain> 2004-12-28 08:20
> On Mon, 2004-12-27 at 07:16, Thomas Graf wrote:
> > > ChangeSet 1.2055.37.1, 2004/11/17 16:08:01-08:00, util@deuroconsult.ro
> > > 
> > > 	[PKT_SCHED]: Allow using nfmark as key in U32 classifier.
> > > 	
> > > 	Signed-off-by: Catalin(ux aka Dino) BOIE <catab at umbrella.ro>
> > > 	Signed-off-by: David S. Miller <davem@davemloft.net>
> > 
> > I must have missed this one. This should have been implemented in the
> > metadata action module to make it available to all classifiers.
> 
> You mean meta match. 

Yes, I was thinking about implementing it as action, any objections?

> >  We
> > should really stop to add more stuff to specific classifiers which have
> > to be removed once we have metamatch. I've made a proposal on paper
> > just need some more time to cook up a patch.
> 
> I have cycles now. Are you working on this or should i invest some of my
> cycles in it? Lets fix this once and for all.

I don't care, I'm still testing the patchset to make classifer changes
consistent again. A few big changes to tcindex/route classifier need
extensive testing.

My thoughts on meta match:

  A match consists of a header, lvalue, rvalue and stats. The header
  contains the handle, the requested operand (eq,lt,gt) and a flag to
  invert the meaning of the match. A value consists of a type and data
  where type specifies the actual metadata to be used. A few upper bits
  of the type specify the kind, i.e. wether it's numeric or a
  bytestring. Only values with matching upper bits can be compared.
  Additionally a mask and shift operator can be configured. Why so
  complicated? Comparing two kernel meta values can useful, realdev
  == indev when classyfing on tunnels, comparing backlogs, etc. Device
  matches should be made available as both numeric and bytestring match
  to have a fastpath when ifindex is stable and a slower path which is
  more flexible to changes.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2004-12-28 13:40     ` Thomas Graf
@ 2004-12-28 13:59       ` jamal
  2004-12-28 14:50         ` Thomas Graf
  2004-12-28 16:11         ` Thomas Graf
  0 siblings, 2 replies; 42+ messages in thread
From: jamal @ 2004-12-28 13:59 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David S. Miller, netdev

On Tue, 2004-12-28 at 08:40, Thomas Graf wrote:
> * jamal <1104240053.1100.53.camel@jzny.localdomain> 2004-12-28 08:20

> > You mean meta match. 
> 
> Yes, I was thinking about implementing it as action, any objections?
> 


If you implement the mothership match changes we discussed then it
should go as a match (as opposed to action). As an action its yet
another deferal for later cleanup.
So my preference is to get the changes we discussed then this meta
match.
I could whip out meta action for setting values if you are gonna work
on the match piece.

> > >  We
> > > should really stop to add more stuff to specific classifiers which have
> > > to be removed once we have metamatch. I've made a proposal on paper
> > > just need some more time to cook up a patch.
> > 
> > I have cycles now. Are you working on this or should i invest some of my
> > cycles in it? Lets fix this once and for all.
> 
> I don't care, I'm still testing the patchset to make classifer changes
> consistent again. A few big changes to tcindex/route classifier need
> extensive testing.
> 

If you have started working on it i would prefer you make the changes.
BTW, I am talking about the top level match changes we discussed.

> My thoughts on meta match:
>   
>   A match consists of a header, lvalue, rvalue and stats. The header
>   contains the handle, the requested operand (eq,lt,gt) and a flag to
>   invert the meaning of the match. A value consists of a type and data
>   where type specifies the actual metadata to be used. A few upper bits
>   of the type specify the kind, i.e. wether it's numeric or a
>   bytestring. Only values with matching upper bits can be compared.
>   Additionally a mask and shift operator can be configured. Why so
>   complicated? Comparing two kernel meta values can useful, realdev
>   == indev when classyfing on tunnels, comparing backlogs, etc. Device
>   matches should be made available as both numeric and bytestring match
>   to have a fastpath when ifindex is stable and a slower path which is
>   more flexible to changes.

Sounds reasonable at the high level. I am not sure i got the stats part.

Can you write out the BNF. Heres what i was thinking for meta action:

tc <OPERATION> action metaset [index ID] METAVARS
OPERATION:= ADD|DEL|GET|DUMP
METAVARS:= <[INDEV] [FWMARK] [TCINDEX] [PRIO] [CLASSID]>
INDEV = indev <devname>
FWMARK:= fwmark <value>
TCINDEX := tcindex <value>
PRIO := prio <value>
CLASSID := classid <class>

I notice you throw in some  mask and shift operator - not sure if i
could make use of it here.

cheers,
jamal

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2004-12-28 13:59       ` jamal
@ 2004-12-28 14:50         ` Thomas Graf
  2004-12-28 15:55           ` jamal
  2004-12-28 16:11         ` Thomas Graf
  1 sibling, 1 reply; 42+ messages in thread
From: Thomas Graf @ 2004-12-28 14:50 UTC (permalink / raw)
  To: jamal; +Cc: David S. Miller, netdev

* jamal <1104242397.1090.94.camel@jzny.localdomain> 2004-12-28 08:59
> On Tue, 2004-12-28 at 08:40, Thomas Graf wrote:
> > * jamal <1104240053.1100.53.camel@jzny.localdomain> 2004-12-28 08:20
> 
> > > You mean meta match. 
> > 
> > Yes, I was thinking about implementing it as action, any objections?
> > 
> 
> 
> If you implement the mothership match changes we discussed then it
> should go as a match (as opposed to action). As an action its yet
> another deferal for later cleanup.

I'm getting more and more careful because we already suffer from
various limitations by design of underlying layers. I agree that the
best way would be to make it a generic match but we will end up
implementing logic expressions code for every layer over and over.
I have to think a little more about it, here's an up-to-date brain dump:

Classifier extensions should no longer be configured over classifier
specific TLV types but rather be part of a nested TLV. The extesions
should be changeable directly without going through the classifier
changing code, i.e. RTM_NEWFEXT/RTM_DELFEXT or something alike. It
should be possible to create logic relations between extensions like
match indev = "eth0" or (nfmark gt 10 or nfmark lt 4).

Doesn't sound too bad but we're actually just implementing things
on top of classifiers that should actually be on the same level.

> So my preference is to get the changes we discussed then this meta
> match.
> I could whip out meta action for setting values if you are gonna work
> on the match piece.

I was thinking of combining these by simply introducing an ASSIGN
operator so we don't have redundant code. We could make a generic
metadata api so netfilter could make use of it.

> Sounds reasonable at the high level. I am not sure i got the stats part.

Simple hit/success counters per match to be returned as separate TLV.

> Can you write out the BNF. Heres what i was thinking for meta action:

tc <OPERATION> meta [NOT] VALUE OPERATOR VALUE
VALUE ::= { METAVARS | number | pattern }
OPERATOR ::= { = | > | < | assign }

where: typeof(METAVAR) for every value pair must be equal

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2004-12-28 14:50         ` Thomas Graf
@ 2004-12-28 15:55           ` jamal
  0 siblings, 0 replies; 42+ messages in thread
From: jamal @ 2004-12-28 15:55 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David S. Miller, netdev

On Tue, 2004-12-28 at 09:50, Thomas Graf wrote:
> * jamal <1104242397.1090.94.camel@jzny.localdomain> 2004-12-28 08:59
 
> > If you implement the mothership match changes we discussed then it
> > should go as a match (as opposed to action). As an action its yet
> > another deferal for later cleanup.
> 
> I'm getting more and more careful because we already suffer from
> various limitations by design of underlying layers. I agree that the
> best way would be to make it a generic match but we will end up
> implementing logic expressions code for every layer over and over.

Thats the point of this discussion ;-> We need to figure a way to do it
with minimal damage ;-> I actually dont think its such a bad idea to
change all classifiers this once as long as the backward compat applies.

> I have to think a little more about it, here's an up-to-date brain dump:
> Classifier extensions should no longer be configured over classifier
> specific TLV types but rather be part of a nested TLV. The extesions
> should be changeable directly without going through the classifier
> changing code, i.e. RTM_NEWFEXT/RTM_DELFEXT or something alike. 

Sensible. Also enforce that it gets configured when configuring the
filter - this way binding can happen at installation and not lookup
time.
Now we need to find a clever way to do little coding .. I will give it
some thinking in the background. 
Clearly this has to be interleaved with filtering as opposed to be at
the end of filtering as actions are.

> It
> should be possible to create logic relations between extensions like
> match indev = "eth0" or (nfmark gt 10 or nfmark lt 4).
> 

Ok, so several things from your text in requirements/desires department:
- Reusing existing classifiers is very valuable at that extension level.
We actually could do it today with continue/reclassify except it will be
faster if you could just point to the next classifier rule from the
current classification. Not sure if this makes sense..
- The idea of continue/reclassify is also valuable for match extensions.
- match extensions desire minimalist API. Need to write one page of code
approach ..
- I like the logical relationships you have.


> Doesn't sound too bad but we're actually just implementing things
> on top of classifiers that should actually be on the same level.
> 

Not in the case of the 2 page of code for extra match.

> > So my preference is to get the changes we discussed then this meta
> > match.
> > I could whip out meta action for setting values if you are gonna work
> > on the match piece.
> 
> I was thinking of combining these by simply introducing an ASSIGN
> operator so we don't have redundant code. 

That is still a hack. 

> We could make a generic
> metadata api so netfilter could make use of it.
> 

I can see tc using netfilter but not the other way round without
a lot of complexity.

> > Sounds reasonable at the high level. I am not sure i got the stats part.
> 
> Simple hit/success counters per match to be returned as separate TLV.
> 

Yes, but how do you use them to match?

> > Can you write out the BNF. Heres what i was thinking for meta action:
> 
> tc <OPERATION> meta [NOT] VALUE OPERATOR VALUE

hopefully within a filter since these are not really standalone.
as in filter ... extmatch meta ...

> VALUE ::= { METAVARS | number | pattern }
> OPERATOR ::= { = | > | < | assign }
> 
> where: typeof(METAVAR) for every value pair must be equal

so how would tcindex fit here?

cheers,
jamal

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2004-12-28 13:59       ` jamal
  2004-12-28 14:50         ` Thomas Graf
@ 2004-12-28 16:11         ` Thomas Graf
  2004-12-28 16:36           ` jamal
  1 sibling, 1 reply; 42+ messages in thread
From: Thomas Graf @ 2004-12-28 16:11 UTC (permalink / raw)
  To: jamal; +Cc: David S. Miller, netdev

> If you implement the mothership match changes we discussed then it
> should go as a match (as opposed to action). As an action its yet
> another deferal for later cleanup.

What about this...

We introduce a new API tcf_exts which holds all the things on top of
a classifier. There might be several instaces per classifier, i.e. one
per filter for u32 and fw and one per hash bucket for route/tcindex,
etc.

The classifier no longer knows about action/police/meta/... you name it
but rather forwards the TLV TCA_..._EXTS to the extensions API. Backward
compatibility is provided in the API (very simple).

The extension infrastructure builds a tree to implement logic
expressions where each node can be one of the supported types.

The action code would be transformed to an extension which would
mean that there could be multiple action chains. Example:

 cls -+ meta indev=ppp0
      |    \ <and> meta assign nfmark=2
      |             \ <and> action:gact redirect
 <or> |
      \ meta nfmark=0x20
           \ <and> action:mirred

Every extension node has a unique handle in the namespace of the tree.
Deletion of a node results in deletion of all sub nodes.

Configuration must go via change routine of classifier or via tp->get()
and some generic way to retrieve extension handle from classifier.

Thoughts?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2004-12-28 16:11         ` Thomas Graf
@ 2004-12-28 16:36           ` jamal
  2004-12-28 16:51             ` jamal
  2004-12-28 19:26             ` Thomas Graf
  0 siblings, 2 replies; 42+ messages in thread
From: jamal @ 2004-12-28 16:36 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David S. Miller, netdev

On Tue, 2004-12-28 at 11:11, Thomas Graf wrote:
> > If you implement the mothership match changes we discussed then it
> > should go as a match (as opposed to action). As an action its yet
> > another deferal for later cleanup.
> 
> What about this...
> 
> We introduce a new API tcf_exts which holds all the things on top of
> a classifier. There might be several instaces per classifier, i.e. one
> per filter for u32 and fw and one per hash bucket for route/tcindex,
> etc.
> 
> The classifier no longer knows about action/police/meta/... you name it
> but rather forwards the TLV TCA_..._EXTS to the extensions API. Backward
> compatibility is provided in the API (very simple).
> 

I think this sounds cleaner but is major surgery. The other issue i have
with it is semantically i see classification followed by actions.
Classification may have extended classification within it. Actions may
also have extended actions within them.
The majority of the surgery is going to be in ensuring that you can mix
and match actions, extended filters and extended actions.

> The extension infrastructure builds a tree to implement logic
> expressions where each node can be one of the supported types.
> 
> The action code would be transformed to an extension which would
> mean that there could be multiple action chains. Example:
> 
>  cls -+ meta indev=ppp0
>       |    \ <and> meta assign nfmark=2
>       |             \ <and> action:gact redirect
>  <or> |
>       \ meta nfmark=0x20
>            \ <and> action:mirred
> 
> Every extension node has a unique handle in the namespace of the tree.
> Deletion of a node results in deletion of all sub nodes.
> 
> Configuration must go via change routine of classifier or via tp->get()
> and some generic way to retrieve extension handle from classifier.
> 
> Thoughts?

Note my concerns above - we are talking major splicing!

Heres another approach:
The classifier is blind while executing those actions.
Data that needs to be embeded within the classifier is:
struct {extmatch type:extmatch void_data}.
extmatch_classify(extmatchdatastruct,skb) is a generic call which does a
lookup on the type and calls the proper callback. Callbacks return
standard classifier ret codes.
So an indev matcher will take the skb and compare against the indev data
stored in struct->void_data.
Only other call i can see needed is a registration function. extended
matchers register a callback and type.
user space stuff is easy.
Now with above i dont see how to fit your logical experssions - but its
a simple change and fits the requirement of writting the one page
extended matcher. The same thought could be extended to actions.
Sounds too easy unless i am intoxicated with the double-doubles i have
been conmsuming last few hours;->

thoughts?

cheers,
jamal

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2004-12-28 16:36           ` jamal
@ 2004-12-28 16:51             ` jamal
  2004-12-28 19:26             ` Thomas Graf
  1 sibling, 0 replies; 42+ messages in thread
From: jamal @ 2004-12-28 16:51 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David S. Miller, netdev

On Tue, 2004-12-28 at 11:36, jamal wrote:

I have to go out for a few hours, for completion sake:
- initialization happens like you said with extended matches TLV which
result in building one of those extended match datastruct bound on the
filter. The binding part is easy, the hard part is how you interleaf u32
matches for example vs indev. 

** Also i see your point that changing all the classifiers is painful,
but doing it this once so the next written classifier is easy is worth
the effort in my opinion.

cheers,
jamal

> Heres another approach:
> The classifier is blind while executing those actions.
> Data that needs to be embeded within the classifier is:
> struct {extmatch type:extmatch void_data}.
> extmatch_classify(extmatchdatastruct,skb) is a generic call which does a
> lookup on the type and calls the proper callback. Callbacks return
> standard classifier ret codes.
> So an indev matcher will take the skb and compare against the indev data
> stored in struct->void_data.
> Only other call i can see needed is a registration function. extended
> matchers register a callback and type.
> user space stuff is easy.
> Now with above i dont see how to fit your logical experssions - but its
> a simple change and fits the requirement of writting the one page
> extended matcher. The same thought could be extended to actions.
> Sounds too easy unless i am intoxicated with the double-doubles i have
> been conmsuming last few hours;->
> 
> thoughts?
> 
> cheers,
> jamal
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2004-12-28 16:36           ` jamal
  2004-12-28 16:51             ` jamal
@ 2004-12-28 19:26             ` Thomas Graf
  2004-12-28 21:14               ` jamal
  1 sibling, 1 reply; 42+ messages in thread
From: Thomas Graf @ 2004-12-28 19:26 UTC (permalink / raw)
  To: jamal; +Cc: David S. Miller, netdev

* jamal <1104251817.1090.164.camel@jzny.localdomain> 2004-12-28 11:36
> I think this sounds cleaner but is major surgery. The other issue i have
> with it is semantically i see classification followed by actions.
> Classification may have extended classification within it. Actions may
> also have extended actions within them.

Hmm... right, let's just leave the action as-is, we don't gain much
anyway. Nevertheless, actions should be part of the extensions API
but we can safely put them at the end of the extensions match routine
so it always comes last. So basically we end up with

1) cls specific matches
2) generic matches including logical expressions
3) action

We can always change it later without touching a single classifier.

> The classifier is blind while executing those actions.
> Data that needs to be embeded within the classifier is:
> struct {extmatch type:extmatch void_data}.

No, I'd really like to avoid having this but instead put a tcf_exts
struct into it which holds all the data needed by the generic part.
This also includes the tc_action pointer. This way we can get rid
of all action/policer related ifdefs in the classifiers, reduce the
chances of mistakes and don't need to touch classifier when changing
something in the generic part.

> extmatch_classify(extmatchdatastruct,skb) is a generic call which does a
> lookup on the type and calls the proper callback. Callbacks return
> standard classifier ret codes.

Exactly, I called it tcf_exts_match with the following return definition:
<0: error/unmatched filter, classifier must stop matching and move on to
    next filter or return mismatch if at the end.
 0: Normal match, classifier must do whathever it would to if a filter matches
>0: Classifier return code (TC_ACT_*), classifier must stop immediately
    and pass on the return code.

So basically I export this API:

tcf_exts_validate
  validates the input data and initalizes the action, it must not
  change any attributes. Validated data is stored in a temporary
  tcf_exts structure provided by the classifier (local variable)

tcf_exts_change
  Applies the changes from tcf_exts_validate to the classifier, must
  not fail under any circumstance. Classifier provides both, the
  destination pointer (hold in classifier data) and the temporary
  buffer from tcf_exts_valiate.

tcf_exts_match
  Runs all the matches and applies action.

tcf_exts_destroy
  Destroys a complete extensions configuration

tcf_exts_dump
  Dumps extensions into given skb.

tcf_exts_dump_stats
  Dumps statistics into given skb.

tcf_exts_is_predicative
  Returns 1 if a predicative extension is present, i.e. an extension
  which might cause further actions and thus overrule the regular
  tcf_result. In other words, returns 1 if a TC_ACT_* my be returned.

tcf_exts_available
  Returns 1 if at least 1 extension is present.

How does the exntesions API know about the classifier specifc TLV
types? Every classifier maintains a map which holds the types, this
is used in _validate, _dump and _dump_stats.

> - initialization happens like you said with extended matches TLV which
> result in building one of those extended match datastruct bound on the
> filter.

This is the part I'm unsure about. I want to keep it simple but also
powerful. The action part is clear and will be untouched. The generic
match part is more difficult, we either make userspace transfer the
complete tree every time or we introduce commands like in the
classifier. Second is probably better but a little bit more work.

The binding part is easy, the hard part is how you interleaf u32
> matches for example vs indev. 

Value TLV:
 TCA_META_VALUE_TYPE, struct tcf_meta_type
 TCA_META_VALUE_DATA, variable

struct tcf_meta_type
{
	__u16 kind;
	__u16 len;
};

Where kind:

enum
{
	/* numberic types */
	TCF_META_I_NUMBER = 0x100,
	TCF_META_I_NFMARK = 0x101,
	TCF_META_I_TCINDEX = 0x102,

	/* variable length types */
	TCF_META_V_PATTERN = 0x200,
	TCF_META_V_INDEV = 0x201,
};

A matching of the upper 4 bits means the values can be
compared. Of course userspace should check for this so
we only have to put in a BUG_ON assertion.

> ** Also i see your point that changing all the classifiers is painful,
> but doing it this once so the next written classifier is easy is worth
> the effort in my opinion.

Truly agreed, I did this work already and I'm now testing it. We can
easly put the generic match on top of it and all we have to do is add
TCA_XXX_EXTS for every classifier and add it to the map.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2004-12-28 19:26             ` Thomas Graf
@ 2004-12-28 21:14               ` jamal
  2004-12-28 22:10                 ` Thomas Graf
  0 siblings, 1 reply; 42+ messages in thread
From: jamal @ 2004-12-28 21:14 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David S. Miller, netdev

On Tue, 2004-12-28 at 14:26, Thomas Graf wrote:
> * jamal <1104251817.1090.164.camel@jzny.localdomain> 2004-12-28 11:36

> Hmm... right, let's just leave the action as-is, we don't gain much
> anyway. Nevertheless, actions should be part of the extensions API
> but we can safely put them at the end of the extensions match routine
> so it always comes last. So basically we end up with
> 
> 1) cls specific matches
> 2) generic matches including logical expressions
> 3) action

Add 4) action extensions as well.

> We can always change it later without touching a single classifier.
> 
> > The classifier is blind while executing those actions.
> > Data that needs to be embeded within the classifier is:
> > struct {extmatch type:extmatch void_data}.
> 
> No, I'd really like to avoid having this but instead put a tcf_exts
> struct into it which holds all the data needed by the generic part.
> This also includes the tc_action pointer. This way we can get rid
> of all action/policer related ifdefs in the classifiers, reduce the
> chances of mistakes and don't need to touch classifier when changing
> something in the generic part.

Whatever you had before is fine for action/policer - with intent to kill
policer eventually.
What i objected to is the indev and any other thing that has to do with
classification helping - thats not where it should fit.
Take u32 for example: The fit for match extensions is really at the key
level not at a layer above.
We need a sel2 which has new keys (which is easy because thats
transported in a TLV).

> > extmatch_classify(extmatchdatastruct,skb) is a generic call which does a
> > lookup on the type and calls the proper callback. Callbacks return
> > standard classifier ret codes.
> 
> Exactly, I called it tcf_exts_match with the following return definition:
> <0: error/unmatched filter, classifier must stop matching and move on to
>     next filter or return mismatch if at the end.
>  0: Normal match, classifier must do whathever it would to if a filter matches
> >0: Classifier return code (TC_ACT_*), classifier must stop immediately
>     and pass on the return code.
> 

Why not reuse what already exists in terms of classifier/filter return
codes? They are pretty sufficient and cover all the cases.

> So basically I export this API:
> 
> tcf_exts_validate
>   validates the input data and initalizes the action, it must not
>   change any attributes. Validated data is stored in a temporary
>   tcf_exts structure provided by the classifier (local variable)
> 
> tcf_exts_change
>   Applies the changes from tcf_exts_validate to the classifier, must
>   not fail under any circumstance. Classifier provides both, the
>   destination pointer (hold in classifier data) and the temporary
>   buffer from tcf_exts_valiate.
> 
> tcf_exts_match
>   Runs all the matches and applies action.
> 
> tcf_exts_destroy
>   Destroys a complete extensions configuration
> 
> tcf_exts_dump
>   Dumps extensions into given skb.
> 
> tcf_exts_dump_stats
>   Dumps statistics into given skb.
> 
> tcf_exts_is_predicative
>   Returns 1 if a predicative extension is present, i.e. an extension
>   which might cause further actions and thus overrule the regular
>   tcf_result. In other words, returns 1 if a TC_ACT_* my be returned.
> 
> tcf_exts_available
>   Returns 1 if at least 1 extension is present.
> 
> How does the exntesions API know about the classifier specifc TLV
> types? Every classifier maintains a map which holds the types, this
> is used in _validate, _dump and _dump_stats.
> 

Hrm, so someone writting the one page extension now has to fill in all
these functions? If yes this defeats the purpose of the exercise to
create a single page of code. The user should really have to write only
a match function and call a registration function to hook up into the
framework. To be clear:

struct tcf_ematch_ops
{
        struct tcf_ematch_ops    *next;
        char    kind[IFNAMSIZ];
	u32	type;
        int     (*classify)(struct sk_buff*, struct tcf_ematch_data);
}

A global list of these is kept somewhere and a register/unregister
manipulation exists.

And the mythical one page code that needs writting:
---------------------------------------------------

int
myveryownmatch(struct sk_buff *skb, struct tc_action tcf_ematch_data *e)
{
	return TCF_OK;
}

struct tcf_ematch_ops_ops my_tcf_ematch_ops = {
        .next           =       NULL,
        .name           =       "mymatch",
        .type           =       TCF_MY_MATCHID,
        .walk           =       myveryownmatch
};


static int __init
mymatch_init_module(void)
{
	ret = tcf_register_ematch(&my_tcf_ematch_ops);
	// check here, ematch may already be registered etc
}

static void __exit
mymatch_cleanup_module(void)
{
        tcf_unregister_ematch(&my_tcf_ematch_ops);
}

module_init(mymatch_init_module);
module_exit(mymatch_cleanup_module);
----------------


If what you describe above is internal - accessible via classifier then
fine (other than tcf_exts_match) - lathough it looks excessive.
I dont see these things calling actions. They are interleaved between
matches. At completion of matches/filtering then you call the action
code.

> > - initialization happens like you said with extended matches TLV which
> > result in building one of those extended match datastruct bound on the
> > filter.
> 
> This is the part I'm unsure about. I want to keep it simple but also
> powerful. The action part is clear and will be untouched. The generic
> match part is more difficult, we either make userspace transfer the
> complete tree every time or we introduce commands like in the
> classifier. Second is probably better but a little bit more work.

Whats wrong with extended TLVs you mentioned earlier?

match u32 ..
ematch indev ...
match u32 ...
ematch meta tcindex ..

the ematches are essentially TLVs on their own and are owned by
the classifier. The classifier doesnt know whats in them. It just
calls generic code to execute them.

> > The binding part is easy, the hard part is how you interleaf u32
> > matches for example vs indev. 
> 
> Value TLV:
>  TCA_META_VALUE_TYPE, struct tcf_meta_type
>  TCA_META_VALUE_DATA, variable
> 
> struct tcf_meta_type
> {
> 	__u16 kind;
> 	__u16 len;
> };
> 
> Where kind:
> 
> enum
> {
> 	/* numberic types */
> 	TCF_META_I_NUMBER = 0x100,
> 	TCF_META_I_NFMARK = 0x101,
> 	TCF_META_I_TCINDEX = 0x102,
> 
> 	/* variable length types */
> 	TCF_META_V_PATTERN = 0x200,
> 	TCF_META_V_INDEV = 0x201,
> };
> 
> A matching of the upper 4 bits means the values can be
> compared. Of course userspace should check for this so
> we only have to put in a BUG_ON assertion.


I think you are only refering to one ematch kind above --> for metadata.
What i talked about is arbitrary (example i could put a quick hack to
grep strings without writting a full classifier). Essentially what you
have fits just fine - you may need two ids; one for IDing as meta match
and other as tcindex etc. The second one can be hidden.

> > ** Also i see your point that changing all the classifiers is painful,
> > but doing it this once so the next written classifier is easy is worth
> > the effort in my opinion.
> 
> Truly agreed, I did this work already and I'm now testing it. We can
> easly put the generic match on top of it and all we have to do is add
> TCA_XXX_EXTS for every classifier and add it to the map.
> 

Refer to what i talked about above. The extension are little helpers i.e
they cant live on their own - they need classifiers. Just review and 
sync essentially. The action extensions as well fit the same way.

cheers,
jamal

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2004-12-28 21:14               ` jamal
@ 2004-12-28 22:10                 ` Thomas Graf
  2004-12-28 23:06                   ` jamal
  0 siblings, 1 reply; 42+ messages in thread
From: Thomas Graf @ 2004-12-28 22:10 UTC (permalink / raw)
  To: jamal; +Cc: David S. Miller, netdev

* jamal <1104268498.1090.254.camel@jzny.localdomain> 2004-12-28 16:14
> Whatever you had before is fine for action/policer - with intent to kill
> policer eventually.

I left it in for now but I see no reason why to do so actually. Old
iproute2 binaries should do just fine with the action backward
compatibility code?

> What i objected to is the indev and any other thing that has to do with
> classification helping - thats not where it should fit.
> Take u32 for example: The fit for match extensions is really at the key
> level not at a layer above.
> We need a sel2 which has new keys (which is easy because thats
> transported in a TLV).

Take a look at http://people.suug.ch/~tgr/patches/queue/03_tcf_exts_u32.diff

The extensions are on the same level as the selector. The patchset still
has errors in the patches for route and tcindex since it's non-trivial
to adapt them to allow changing parameter on-the-fly. The rest is tested
and works perfectly fine. I can create a subset or we can just take the
first few patches for now and do the development on u32/fw and port it
later.

> Why not reuse what already exists in terms of classifier/filter return
> codes? They are pretty sufficient and cover all the cases.

I do reuse them. TC_ACT_* from include/linux/pkt_cls.h

> Hrm, so someone writting the one page extension now has to fill in all
> these functions?

No, that's just how the classifier accesses the extensions API.

> [ematch api]

Exactly, this would be API visible to the matches.

> If what you describe above is internal - accessible via classifier then
> fine (other than tcf_exts_match) - lathough it looks excessive.

The validate/change split is needed to implement consistent changes
in classifiers. The current way causes corruption in classifer data
whenever an action configuration fails.

> I dont see these things calling actions. They are interleaved between
> matches. At completion of matches/filtering then you call the action
> code.

Right, tcf_exts_match calls the generic matches and at the very end
the action.

> Whats wrong with extended TLVs you mentioned earlier?
> 
> match u32 ..
> ematch indev ...
> match u32 ...
> ematch meta tcindex ..
> 
> the ematches are essentially TLVs on their own and are owned by
> the classifier. The classifier doesnt know whats in them. It just
> calls generic code to execute them.

They should go into TCA_XXX_EXTS as embeded TLVs. The problem is
not how to do it but rather how far to go. Do we want userspace
to be able to delete a single generic match? Do we want to only
allow replacing all matches? We will hit the limit of skbs at
some point if we keep on encapsulating. ;->

> I think you are only refering to one ematch kind above --> for metadata.

Correct. This would be the generic match for metadata.

> What i talked about is arbitrary (example i could put a quick hack to
> grep strings without writting a full classifier). Essentially what you
> have fits just fine - you may need two ids; one for IDing as meta match
> and other as tcindex etc. The second one can be hidden.

I don't get this.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2004-12-28 22:10                 ` Thomas Graf
@ 2004-12-28 23:06                   ` jamal
  2004-12-28 23:19                     ` Thomas Graf
  0 siblings, 1 reply; 42+ messages in thread
From: jamal @ 2004-12-28 23:06 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David S. Miller, netdev

On Tue, 2004-12-28 at 17:10, Thomas Graf wrote:
> * jamal <1104268498.1090.254.camel@jzny.localdomain> 2004-12-28 16:14
> > Whatever you had before is fine for action/policer - with intent to kill
> > policer eventually.
> 
> I left it in for now but I see no reason why to do so actually. Old
> iproute2 binaries should do just fine with the action backward
> compatibility code?
> 

Its maintainance work. Nothing it provides isnt provided by
new policer. 

> > What i objected to is the indev and any other thing that has to do with
> > classification helping - thats not where it should fit.
> > Take u32 for example: The fit for match extensions is really at the key
> > level not at a layer above.
> > We need a sel2 which has new keys (which is easy because thats
> > transported in a TLV).
> 
> Take a look at http://people.suug.ch/~tgr/patches/queue/03_tcf_exts_u32.diff
> 
> The extensions are on the same level as the selector. 

Ok, i see we may be talking about two separate things:->
that patch is fine for policer/action (I noticed you removed indev -
good thing). 
It is not proper spot for the matches and infact
should go in as a separate patch altogether (relation is very minimal).

For the matches, the checks are going to be per key _not_ at the
selector level; i.e:

struct tc_newu32_key
{
        __u32           mask;
        __u32           val;
        int             off;
        int             offmask;
	pointer to extendedmatches here.
};

Since these keys are packed in a selector and the selector is what gets
transported from/to user space we need a selector2 which packs these new
keys instead. Makes sense? i.e need a TCA_U32_SEL2 where the extended
matches are stored. 
Backward compatibility: New TC transports them in addition to
TCA_U32_SEL and old kernels ignore them.
old TC doesnt send them so new kernels can handle them just fine. Beauty
of TLVs.

Your check in the classifier is 
if (matched) {
	if (NULL != key.pointertoematch) {
		ret = call the generic ematch function
		if ret == OK
			continue with next match
		else failed
	}
}

u32_change receives the extended matches and populates them accordingly.
There is no need for a dump fucntion to exist for them. They get shipped
back the same way they came in - user space knows how to dump them.
a  key.pointertoematch could be infact a llist of these items. So the
struct looks like:

struct tcf_ematch_info
{

	struct tcf_ematch_info *next;
        __u32           type
        void		*data;
        may need a datalen here for dumping back to user space.
};

Makes sense?

Back to what i said earlier i can now write a single page of code
to scan for word "Thomas" if i get a match on TCP port 25 for all IP
addresses... i.e metadata is a subset of all this.

cheers,
jamal

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2004-12-28 23:06                   ` jamal
@ 2004-12-28 23:19                     ` Thomas Graf
  2004-12-28 23:39                       ` jamal
  0 siblings, 1 reply; 42+ messages in thread
From: Thomas Graf @ 2004-12-28 23:19 UTC (permalink / raw)
  To: jamal; +Cc: David S. Miller, netdev

* jamal <1104275197.1100.276.camel@jzny.localdomain> 2004-12-28 18:06
> Its maintainance work. Nothing it provides isnt provided by
> new policer. 

I'll remove it.

> It is not proper spot for the matches and infact
> should go in as a separate patch altogether (relation is very minimal).
> 
> For the matches, the checks are going to be per key _not_ at the
> selector level; i.e:
> 
> struct tc_newu32_key
> {
>         __u32           mask;
>         __u32           val;
>         int             off;
>         int             offmask;
> 	pointer to extendedmatches here.
> };
> 
> Since these keys are packed in a selector and the selector is what gets
> transported from/to user space we need a selector2 which packs these new
> keys instead. Makes sense? i.e need a TCA_U32_SEL2 where the extended
> matches are stored. 

Why? I don't get that. Generic matches must only be considered if all
keys of u32 match. u32 keys are just ANDed matches if one fails we can
directly declare the classifier as unmatched. The only thing we would
gain is that we could add multiple generic matches but with lack of real
logical expressions. I'd rather implemnt some simple form of logical
expression in the generic part so all classifiers can benfit.

> Makes sense?

Not for me. ;->

> Back to what i said earlier i can now write a single page of code
> to scan for word "Thomas" if i get a match on TCP port 25 for all IP
> addresses... i.e metadata is a subset of all this.

Agreed, and smart as you are you simply take the Knuth-Morris-Pratt
code out of my EGP patch to get some performance. ;->

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2004-12-28 23:19                     ` Thomas Graf
@ 2004-12-28 23:39                       ` jamal
  2004-12-29  0:09                         ` Thomas Graf
  0 siblings, 1 reply; 42+ messages in thread
From: jamal @ 2004-12-28 23:39 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David S. Miller, netdev

On Tue, 2004-12-28 at 18:19, Thomas Graf wrote:
> * jamal <1104275197.1100.276.camel@jzny.localdomain> 2004-12-28 18:06
> > Its maintainance work. Nothing it provides isnt provided by
> > new policer. 
> 
> I'll remove it.
> 

Well one of your goals was to reduce the ifdef  cluter.
That goal is achieved with that goal. So I would keep it as is for now
but kill it in the future. I am hoping we are talking about the same
thing here - the old policer ;->


> > Since these keys are packed in a selector and the selector is what gets
> > transported from/to user space we need a selector2 which packs these new
> > keys instead. Makes sense? i.e need a TCA_U32_SEL2 where the extended
> > matches are stored. 
> 
> Why? I don't get that. Generic matches must only be considered if all
> keys of u32 match. u32 keys are just ANDed matches if one fails we can
> directly declare the classifier as unmatched. 

yes - and that still applies here but you can now interleaf - as i
mentioned earlier:

match u32 ..
ematch string "Thomas" ...
match u32 ...
ematch meta tcindex ..

I dont wanna go into details of whether we could actually make the new
keys do more than just strict AND from left to right - but you can see
the potential to "fix" this if we are defining a new key ;->

The same applies for interleafing of actions and eactions.

> The only thing we would
> gain is that we could add multiple generic matches but with lack of real
> logical expressions. I'd rather implemnt some simple form of logical
> expression in the generic part so all classifiers can benfit.

Ok, the logical expressions are the tricky part. But refer to what i am
saying above. You still need to be backward compatible. But for the new
keys you could go onto the adventorous side. I havent given the logical
expressions much thought but i will in the background

> > Makes sense?
> 
> Not for me. ;->
> 
> > Back to what i said earlier i can now write a single page of code
> > to scan for word "Thomas" if i get a match on TCP port 25 for all IP
> > addresses... i.e metadata is a subset of all this.
> 
> Agreed, and smart as you are you simply take the Knuth-Morris-Pratt
> code out of my EGP patch to get some performance. ;->

Of course;->  Ive worked long enough in Linux to appreciate 
the definition of TheLinuxWay(tm) ;-> I also get to inherit your bugs
this way.

cheers,
jamal

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2004-12-28 23:39                       ` jamal
@ 2004-12-29  0:09                         ` Thomas Graf
  2004-12-29  1:13                           ` jamal
  0 siblings, 1 reply; 42+ messages in thread
From: Thomas Graf @ 2004-12-29  0:09 UTC (permalink / raw)
  To: jamal; +Cc: David S. Miller, netdev

* jamal <1104277165.1100.291.camel@jzny.localdomain> 2004-12-28 18:39
> On Tue, 2004-12-28 at 18:19, Thomas Graf wrote:
> > Why? I don't get that. Generic matches must only be considered if all
> > keys of u32 match. u32 keys are just ANDed matches if one fails we can
> > directly declare the classifier as unmatched. 
> 
> yes - and that still applies here but you can now interleaf - as i
> mentioned earlier:
> 
> match u32 ..
> ematch string "Thomas" ...
> match u32 ...
> ematch meta tcindex ..

Yes but the only avantage of this is that a u32 match can be
made dependant on a ematch. Is this really worth special
handling? It requires special handling not needed for any
of the other classifiers.

I understand your point but don't agree at the moment. I
might change my mind tomorrow ;->

> I dont wanna go into details of whether we could actually make the new
> keys do more than just strict AND from left to right - but you can see
> the potential to "fix" this if we are defining a new key ;->

We should rather do it on cls_api level, unfortunantely it's not that
simple but the current status of having one classifier kind per prio and
no way to interconnect them must be changed somewhen. 

> Ok, the logical expressions are the tricky part. But refer to what i am
> saying above. You still need to be backward compatible. But for the new
> keys you could go onto the adventorous side. I havent given the logical
> expressions much thought but i will in the background

Implementing logical expressions directly into u32 would be bad but
we could have u32 hold a expression tree rather than the ematch
directly which means you could do

match u32 ..
(ematch meta nfmark .. or string "...")
match u32 ..

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2004-12-29  0:09                         ` Thomas Graf
@ 2004-12-29  1:13                           ` jamal
  2004-12-29 12:48                             ` Thomas Graf
  0 siblings, 1 reply; 42+ messages in thread
From: jamal @ 2004-12-29  1:13 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David S. Miller, netdev

On Tue, 2004-12-28 at 19:09, Thomas Graf wrote:
> * jamal <1104277165.1100.291.camel@jzny.localdomain> 2004-12-28 18:39
> > On Tue, 2004-12-28 at 18:19, Thomas Graf wrote:
> > match u32 ..
> > ematch string "Thomas" ...
> > match u32 ...
> > ematch meta tcindex ..
> 
> Yes but the only avantage of this is that a u32 match can be
> made dependant on a ematch. Is this really worth special
> handling? It requires special handling not needed for any
> of the other classifiers.
> 
> I understand your point but don't agree at the moment. I
> might change my mind tomorrow ;->

no problem ;-> I think the effort is the same as in doing it the other
way - only result is more sophisticated. I havent investigated the other
classifiers - u32 tends to be more complex, so solving it on u32 solves
all the others typically.

> > I dont wanna go into details of whether we could actually make the new
> > keys do more than just strict AND from left to right - but you can see
> > the potential to "fix" this if we are defining a new key ;->
> 
> We should rather do it on cls_api level, unfortunantely it's not that
> simple but the current status of having one classifier kind per prio and
> no way to interconnect them must be changed somewhen. 
> 

Remember two levels: 
1) the classifier logical expressions (u32 and rsvp for example) - those
belong to cls api.

if u32 match .. ... AND rsvp ... OR route ...
evaluation is left to right with some brute logical OR and ANDs via the
continue and reclassify codes.

2) This issue is at a the single classifier/filter level, so its fair to
push that into the classifier logic. an extended match cannot live by
itself. Its parasitic on a real classifier - so the scope MUST be
restricted to classifier as a matter of fact.

> > Ok, the logical expressions are the tricky part. But refer to what i am
> > saying above. You still need to be backward compatible. But for the new
> > keys you could go onto the adventorous side. I havent given the logical
> > expressions much thought but i will in the background
> 
> Implementing logical expressions directly into u32 would be bad but
> we could have u32 hold a expression tree rather than the ematch
> directly which means you could do
> 
> match u32 ..
> (ematch meta nfmark .. or string "...")
> match u32 ..
> 

Or you could have:
match u32 OR
(ematch meta nfmark .. or string "...")
match u32 ..

Recall, the opportunity to do more in terms of logical expressions within u32 
exists because we can introduce more interesting keys ...

Anyways, I am going to introduce extended actions. I need to write a few
stupid little actions not worth the whole API (such as a checksum
validator/recomputer which i need to follow the pedit action for
stateless NAT). It would use exactly teh same interleaving logic as what
weve discussed.
I dont know where you and Patrick are with the code but it think it
would be safe for me to work off 2610-bk1. Tommorow i am working on some
e1000 stuff - but day after i should be able to touch the action code. 

cheers,
jamal

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2004-12-29  1:13                           ` jamal
@ 2004-12-29 12:48                             ` Thomas Graf
  2004-12-29 14:20                               ` jamal
  0 siblings, 1 reply; 42+ messages in thread
From: Thomas Graf @ 2004-12-29 12:48 UTC (permalink / raw)
  To: jamal; +Cc: David S. Miller, netdev

* jamal <1104282811.1090.314.camel@jzny.localdomain> 2004-12-28 20:13
> no problem ;-> I think the effort is the same as in doing it the other
> way - only result is more sophisticated. I havent investigated the other
> classifiers - u32 tends to be more complex, so solving it on u32 solves
> all the others typically.

OK, I've changed my mind after some thinking. It's little bit more work
but it's worth it. The ematch TLV contains an array of ematches, every
ematch contains the logic relation to the next (2 bits) and a flag to
invert the meaning (1 bit). A special ematch containing an index exists
to implement precdence.

A AND (B1 OR B2) AND C AND D


              ------->-PUSH-------
    -->--    /         -->--      \   -->--
   /     \  /         /     \      \ /     \
+-------+-------+-------+-------+-------+--------+
| A AND | B AND | C AND | D END | B1 OR | B2 END |
+-------+-------+-------+-------+-------+--------+
                    \                      /
                     --------<-POP---------

A simple check that a jump index is never smaller than the current
index (excluding backward jumps via stack) catches endless loop
and avoids the use of a ttl.


> Remember two levels: 
> 1) the classifier logical expressions (u32 and rsvp for example) - those
> belong to cls api.
> 
> if u32 match .. ... AND rsvp ... OR route ...
> evaluation is left to right with some brute logical OR and ANDs via the
> continue and reclassify codes.
> 
> 2) This issue is at a the single classifier/filter level, so its fair to
> push that into the classifier logic. an extended match cannot live by
> itself. Its parasitic on a real classifier - so the scope MUST be
> restricted to classifier as a matter of fact.

Absolutely agreed, I did a context switch without prior warning. ;->

> Or you could have:
> match u32 OR
> (ematch meta nfmark .. or string "...")
> match u32 ..
> 
> Recall, the opportunity to do more in terms of logical expressions within u32 
> exists because we can introduce more interesting keys ...

We could reuse the 8 unused bits after nkeys in tc_u32sel too. iproute2 sets
them to 0 so we can simply use them without anyone noticing.

> I dont know where you and Patrick are with the code but it think it
> would be safe for me to work off 2610-bk1. Tommorow i am working on some
> e1000 stuff - but day after i should be able to touch the action code. 

I'm not touching the action bits but Patrick is as far as I know.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2004-12-29 12:48                             ` Thomas Graf
@ 2004-12-29 14:20                               ` jamal
  2004-12-29 15:01                                 ` Thomas Graf
  0 siblings, 1 reply; 42+ messages in thread
From: jamal @ 2004-12-29 14:20 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David S. Miller, netdev, Patrick McHardy

On Wed, 2004-12-29 at 07:48, Thomas Graf wrote:
> * jamal <1104282811.1090.314.camel@jzny.localdomain> 2004-12-28 20:13
..
> The ematch TLV contains an array of ematches, every
> ematch contains the logic relation to the next (2 bits) and a flag to
> invert the meaning (1 bit). A special ematch containing an index exists
> to implement precdence.
> 
> A AND (B1 OR B2) AND C AND D
> 
> 
>               ------->-PUSH-------
>     -->--    /         -->--      \   -->--
>    /     \  /         /     \      \ /     \
> +-------+-------+-------+-------+-------+--------+
> | A AND | B AND | C AND | D END | B1 OR | B2 END |
> +-------+-------+-------+-------+-------+--------+
>                     \                      /
>                      --------<-POP---------
> 
> A simple check that a jump index is never smaller than the current
> index (excluding backward jumps via stack) catches endless loop
> and avoids the use of a ttl.
> 

Sounds good.
Given the opportunity: I think we need to put those flags as well for
the u32 keys(and other classifiers) so we can have similar logic?
Also in the case of u32 (to take this opportunity) i would like to stash
state inot a 16 bit ID to help in pretty printing the matches.
So if we have an extra 32 bits for flags:ID probably 8 bits for your
need for flags, 16 bits for private Id and maybe another 8 bit for
something else like versioning...
Thoughts?


> > Recall, the opportunity to do more in terms of logical expressions within u32 
> > exists because we can introduce more interesting keys ...
> 
> We could reuse the 8 unused bits after nkeys in tc_u32sel too. iproute2 sets
> them to 0 so we can simply use them without anyone noticing.

I would recommend just introducing the extra 32 bits per key. So much
easier.

> > I dont know where you and Patrick are with the code but it think it
> > would be safe for me to work off 2610-bk1. Tommorow i am working on some
> > e1000 stuff - but day after i should be able to touch the action code. 
> 
> I'm not touching the action bits but Patrick is as far as I know.

Ok, dont wanna conflict with work hes doing - so i will wait until
tommorow to see what hes upto. CCed him.

cheers,
jamal

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2004-12-29 14:20                               ` jamal
@ 2004-12-29 15:01                                 ` Thomas Graf
  2004-12-29 15:53                                   ` jamal
  0 siblings, 1 reply; 42+ messages in thread
From: Thomas Graf @ 2004-12-29 15:01 UTC (permalink / raw)
  To: jamal; +Cc: David S. Miller, netdev, Patrick McHardy

* jamal <1104330054.1089.329.camel@jzny.localdomain> 2004-12-29 09:20
> Given the opportunity: I think we need to put those flags as well for
> the u32 keys(and other classifiers) so we can have similar logic?

Sounds reasonable and easy to do if we introduce a new selector TLV.
Speaking of the other classifiers:

fw: Currently a list of ORed matches, nfmark transported via handle.
    We could change it to transfer the nfmark via a TLV and implement
    the same logic as in u32 (simple). The problem is mainly how to
    guarantee backwards compatibility, the handle would no longer
    tell about the nfmark being matched. OTOH, fw is no longer needed
    once we have metadata match. Adding a "always-true" classifier with
    ematch extension will completely replace it (except for the old
    path with netfilter disabled).

tcindex: No changes required and partly replaced with metadata match
         but not completely. It would still be perfectly fine to map
         dscp values to classids.  

route4: Also partly replaced with metadata match but we would lose
        the exellent fast paths. We can leave it as-is and have metadata
        match for more complex matches (slow) and route4 for simple but
        fast uses.

rsvp: Could theoretically be replaced with new u32 and extensive use
      of continue/reclassify but that's quite difficult. It's very
      specialized (and currently quite vulnerable to pskbs) and the use
      of it is clearly towards fast flow redirection. No need to change
      this.

So, the conclusion for me is that we can focus on u32 and new
classifiers and eventually make fw obsolete in the future.

> Also in the case of u32 (to take this opportunity) i would like to stash
> state inot a 16 bit ID to help in pretty printing the matches.

Not sure what you mean. Which "state"?

> So if we have an extra 32 bits for flags:ID probably 8 bits for your
> need for flags, 16 bits for private Id and maybe another 8 bit for
> something else like versioning...

Basically what I need is 3 bits for logic relations and at least 8
for precedence index or 4 bits and reuse one of the already existing
fields unused when key is used as container for sub keys. So, 8 bits
will suit me very well.

+---+---+-----+
| I | C |  R  |
+---+---+-----+

I := 1 Match is inverted
     0 Match is straight

C := 1 Container Key
     0 Normal Key

R := 0 0 Last Key
     0 1 AND
     1 0 OR

While we're at it we should increase nkeys to 16bit.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2004-12-29 15:01                                 ` Thomas Graf
@ 2004-12-29 15:53                                   ` jamal
  2004-12-30 17:43                                     ` Thomas Graf
  0 siblings, 1 reply; 42+ messages in thread
From: jamal @ 2004-12-29 15:53 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David S. Miller, netdev

On Wed, 2004-12-29 at 10:01, Thomas Graf wrote:
> * jamal <1104330054.1089.329.camel@jzny.localdomain> 2004-12-29 09:20
> > Given the opportunity: I think we need to put those flags as well for
> > the u32 keys(and other classifiers) so we can have similar logic?
> 
> Sounds reasonable and easy to do if we introduce a new selector TLV.

Shouldnt be and issue and both backward and forward compatible.

> Speaking of the other classifiers:
> 
> fw: Currently a list of ORed matches, nfmark transported via handle.
>     We could change it to transfer the nfmark via a TLV and implement
>     the same logic as in u32 (simple). The problem is mainly how to
>     guarantee backwards compatibility, the handle would no longer
>     tell about the nfmark being matched. OTOH, fw is no longer needed
>     once we have metadata match. Adding a "always-true" classifier with
>     ematch extension will completely replace it (except for the old
>     path with netfilter disabled).
> 

I would suspect we would end killing fwmark or it will get deprecated
for lack of use. So probably safer to leave it alone.

> tcindex: No changes required and partly replaced with metadata match
>          but not completely. It would still be perfectly fine to map
>          dscp values to classids.  
> 

This is a tricky one since it has those speacial cases.

> route4: Also partly replaced with metadata match but we would lose
>         the exellent fast paths. We can leave it as-is and have metadata
>         match for more complex matches (slow) and route4 for simple but
>         fast uses.
> 

nod.

> rsvp: Could theoretically be replaced with new u32 and extensive use
>       of continue/reclassify but that's quite difficult. It's very
>       specialized (and currently quite vulnerable to pskbs) and the use
>       of it is clearly towards fast flow redirection. No need to change
>       this.
> 
> So, the conclusion for me is that we can focus on u32 and new
> classifiers and eventually make fw obsolete in the future.
> 

Geez, I should have read to the end first ;-> So i agree with you.

> > Also in the case of u32 (to take this opportunity) i would like to stash
> > state inot a 16 bit ID to help in pretty printing the matches.
> 
> Not sure what you mean. Which "state"?
> 

One example of what state you could store:
In the case where i enter something readable in english, the display
back is raw; 
example:
match ip src 10.0.0.210/32
gets displayed as: match 0a0000d2/ffffffff at 12
And a lot of times its tricky to find exactly what "at 12" means.

If i store some ID that would tell me "IP" when i dump then i can pretty
print it in english in user space using ip_print().

> > So if we have an extra 32 bits for flags:ID probably 8 bits for your
> > need for flags, 16 bits for private Id and maybe another 8 bit for
> > something else like versioning...
> 
> Basically what I need is 3 bits for logic relations and at least 8
> for precedence index or 4 bits and reuse one of the already existing
> fields unused when key is used as container for sub keys. So, 8 bits
> will suit me very well.
> 
> +---+---+-----+
> | I | C |  R  |
> +---+---+-----+
> 
> I := 1 Match is inverted
>      0 Match is straight
> 
> C := 1 Container Key
>      0 Normal Key
> 
> R := 0 0 Last Key
>      0 1 AND
>      1 0 OR
> 
> While we're at it we should increase nkeys to 16bit.
> 

Sounds good to me since we have a new sel.
It may endup being tricky to be both fast and backward compat; but we'll
see what fun awaits when you start coding.

cheers,
jamal

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2004-12-29 15:53                                   ` jamal
@ 2004-12-30 17:43                                     ` Thomas Graf
  2004-12-31  4:58                                       ` jamal
  0 siblings, 1 reply; 42+ messages in thread
From: Thomas Graf @ 2004-12-30 17:43 UTC (permalink / raw)
  To: jamal; +Cc: David S. Miller, netdev

* jamal <1104335620.1025.22.camel@jzny.localdomain> 2004-12-29 10:53
> > > Also in the case of u32 (to take this opportunity) i would like to stash
> > > state inot a 16 bit ID to help in pretty printing the matches.
> > 
> > Not sure what you mean. Which "state"?
> > 
> 
> One example of what state you could store:
> In the case where i enter something readable in english, the display
> back is raw; 
> example:
> match ip src 10.0.0.210/32
> gets displayed as: match 0a0000d2/ffffffff at 12
> And a lot of times its tricky to find exactly what "at 12" means.
> 
> If i store some ID that would tell me "IP" when i dump then i can pretty
> print it in english in user space using ip_print().

Understood, we could store a map in userspace mapping those IDs to
pretty english match descriptions. I think avoiding to hardcode those
ids but rather just hold it for userspace is the best thing. OTOH, if
we give unique ids to matches we can use it instead of a separate ID.
Unique in terms of parent classid + filter handle + match handle must
be unique per interface. Thoughts? 

> Sounds good to me since we have a new sel.
> It may endup being tricky to be both fast and backward compat; but we'll
> see what fun awaits when you start coding.

I started developing concrete thoughts. The current u32 match could be
made a generic match just like meta which would give us a u32 w/o hashtables
on a always-true classifier. The problem arises with exactly those
hashtables.  u32 uses the same selector for this and furthermore even defines
stuff like hoff and hmask for this purpose. I have to read up again to
understand the hashing in full detail but this is the only issue I see that
we might face.

What I have in mind is, something like u32 but much simpler, w/o the
overhead of creating additional filters for hashing etc. Basically
this can be the always-true classifier which just implements the
generic matches tree. And have the existing u32 with the generic
matches added when hashing is required.

I planned to write these 2 additional generic matches so far. It's
pretty simple because I can just take over the code from EGP.

 KMP: Knuth-Morris-Pratt text search algorithm
 NByte: Compares any number of bytes against a pattern, very useful
        for comparing IPv6 addresses instead of creating 4 ANDed
        u32 matches.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2004-12-30 17:43                                     ` Thomas Graf
@ 2004-12-31  4:58                                       ` jamal
  2004-12-31 11:08                                         ` Thomas Graf
  0 siblings, 1 reply; 42+ messages in thread
From: jamal @ 2004-12-31  4:58 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David S. Miller, netdev

On Thu, 2004-12-30 at 12:43, Thomas Graf wrote:
> * jamal <1104335620.1025.22.camel@jzny.localdomain> 2004-12-29 10:53

> > If i store some ID that would tell me "IP" when i dump then i can pretty
> > print it in english in user space using ip_print().
> 
> Understood, we could store a map in userspace mapping those IDs to
> pretty english match descriptions. I think avoiding to hardcode those
> ids but rather just hold it for userspace is the best thing. 

We may be in sync:
I was thinking of just teaching tc to stash something there that it
understands on how to translate. Thinking about it now, this may not
be sufficient: perhaps we need a few bits in the selector to identify
the owner who installed the rule to begin with. Then it would be safe to
interpret the meaning of the ID (by the same app). Did you say there
were some unused bits in the selector?

> OTOH, if
> we give unique ids to matches we can use it instead of a separate ID.

Note: The above id i was talking about is "opaque" i.e the real meaning
of it is only known to the user space app that installed it (unloess you
want to define things in kernel headers)

> Unique in terms of parent classid + filter handle + match handle must
> be unique per interface. Thoughts? 

I think all you need really is to say "this match starts at IP" i.e such
a definition is global.
handles per rule already exist - and you can actually specify them when
installing a rule. Are those insufficient?

> > Sounds good to me since we have a new sel.
> > It may endup being tricky to be both fast and backward compat; but we'll
> > see what fun awaits when you start coding.
> 
> I started developing concrete thoughts. The current u32 match could be
> made a generic match just like meta which would give us a u32 w/o hashtables
> on a always-true classifier. The problem arises with exactly those
> hashtables.  u32 uses the same selector for this and furthermore even defines
> stuff like hoff and hmask for this purpose. I have to read up again to
> understand the hashing in full detail but this is the only issue I see that
> we might face.
> 

Why not make the always-true to be an extended match? actually a u32
match of 0 0 is always true. Those hashes are quiet tricky/flexible;
i would rather we clone u32 and call it something else then speacilize
it.

> What I have in mind is, something like u32 but much simpler, w/o the
> overhead of creating additional filters for hashing etc. Basically
> this can be the always-true classifier which just implements the
> generic matches tree. And have the existing u32 with the generic
> matches added when hashing is required.
> 

Whip anothe classifier is my opinion. Or write extended matches.

> I planned to write these 2 additional generic matches so far. It's
> pretty simple because I can just take over the code from EGP.
> 
>  KMP: Knuth-Morris-Pratt text search algorithm
>  NByte: Compares any number of bytes against a pattern, very useful
>         for comparing IPv6 addresses instead of creating 4 ANDed
>         u32 matches.

Both sound very appealing. You plan to do them as extended matches,
correct? KMP can be used for something like virus scanning? does it
maintain state?

cheers,
jamal

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2004-12-31  4:58                                       ` jamal
@ 2004-12-31 11:08                                         ` Thomas Graf
  2004-12-31 14:59                                           ` jamal
  0 siblings, 1 reply; 42+ messages in thread
From: Thomas Graf @ 2004-12-31 11:08 UTC (permalink / raw)
  To: jamal; +Cc: David S. Miller, netdev

* jamal <1104469111.1049.219.camel@jzny.localdomain> 2004-12-30 23:58
> On Thu, 2004-12-30 at 12:43, Thomas Graf wrote:
> > * jamal <1104335620.1025.22.camel@jzny.localdomain> 2004-12-29 10:53
> 
> > > If i store some ID that would tell me "IP" when i dump then i can pretty
> > > print it in english in user space using ip_print().
> > 
> > Understood, we could store a map in userspace mapping those IDs to
> > pretty english match descriptions. I think avoiding to hardcode those
> > ids but rather just hold it for userspace is the best thing. 
> 
> We may be in sync:
> I was thinking of just teaching tc to stash something there that it
> understands on how to translate. Thinking about it now, this may not
> be sufficient: perhaps we need a few bits in the selector to identify
> the owner who installed the rule to begin with. Then it would be safe to
> interpret the meaning of the ID (by the same app). Did you say there
> were some unused bits in the selector?

Right, but why not do this in userspace by having a global map
somewhere in a file? A u32 config could have been modified by
multiple pids and it would get really messy to store a pid for
every possible changeable item.

> I think all you need really is to say "this match starts at IP" i.e such
> a definition is global.
> handles per rule already exist - and you can actually specify them when
> installing a rule. Are those insufficient?

Those are absolutely sufficient. I was thinking of giving a match a
16bit ID which can be used for both, identifying and mapping, i.e:

__u8 kind; /* match type, for lookup in matchers table */
__u8 flags; /* Invert Flag + Relations */
__u16 handle; /* must be unique per selector, may be autogenerated */

I want to have those matches be as small as possible, so no nested
TLVs but rather this u32 + matcher specific data form a TLV together.

A selector consists of a TLV array of such matches. The first TLV,
type=1 becomes a header with the possibility to transfer classifier
specific options (such as hash table configuration for u32).

> Why not make the always-true to be an extended match? actually a u32
> match of 0 0 is always true. Those hashes are quiet tricky/flexible;
> i would rather we clone u32 and call it something else then speacilize
> it.

Agreed, I don't want to change u32 but I want to introduce ematches
in u32 as well so we can benefit from the hashing but for those who
don't need hashing u32 is already bloat so we can do a simple
always-true classifier which does nothing more than evaluating the
ematches. I want to have the u32 match be a ematch as well so the
always-true classifier would become a u32 alternative but without
the hashing overhead.

> Both sound very appealing. You plan to do them as extended matches,
> correct?

Excatly.

> KMP can be used for something like virus scanning? does it
> maintain state?

It requires the following parameters:
 - start offset
 - end offset
 - pattern
 - prefix table

and then will simply start at `start` and scans until `end` looking
for pattern with the help of the prefix table. Again, I'm not sure what
you mean by state ;->

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2004-12-31 11:08                                         ` Thomas Graf
@ 2004-12-31 14:59                                           ` jamal
  2004-12-31 15:39                                             ` Thomas Graf
  0 siblings, 1 reply; 42+ messages in thread
From: jamal @ 2004-12-31 14:59 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David S. Miller, netdev

On Fri, 2004-12-31 at 06:08, Thomas Graf wrote:
> * jamal <1104469111.1049.219.camel@jzny.localdomain> 2004-12-30 23:58

> > perhaps we need a few bits in the selector to identify
> > the owner who installed the rule to begin with. Then it would be safe to
> > interpret the meaning of the ID (by the same app). Did you say there
> > were some unused bits in the selector?
> 
> Right, but why not do this in userspace by having a global map
> somewhere in a file? A u32 config could have been modified by
> multiple pids and it would get really messy to store a pid for
> every possible changeable item.

The "PID" is per app - same as it is for rtm where the different known
routing apps like zebra are given an ID and certain IDs are left blank
for general use. So if you run multiple tcs they will all have the same
ID. The way the id is used in the rtm historically has been for allowing
multiple route protocols to install route entries (this way for example 
OSPF does/nt announce RIP routes etc). In our case this would mean:
"this rule was installed by tc - only it knows what the opaque value
10 means". In that case tc would be responsible to decode 10 which would
mean to it "this is an ip match therefore use ip_print() to pretty
print" and 10 is global to tc only and in that case stored in some
header file belonging to tc. 

> > I think all you need really is to say "this match starts at IP" i.e such
> > a definition is global.
> > handles per rule already exist - and you can actually specify them when
> > installing a rule. Are those insufficient?
> 
> Those are absolutely sufficient. I was thinking of giving a match a
> 16bit ID which can be used for both, identifying and mapping, i.e:
> 
> __u8 kind; /* match type, for lookup in matchers table */

255 possible matchers max?

> __u8 flags; /* Invert Flag + Relations */
> __u16 handle; /* must be unique per selector, may be autogenerated */

Ok this is the one used to store the opaque IDs - unique per app
so may be the same across multiple selectors.
Probably steal a few bits from here and use them in nkind.

> I want to have those matches be as small as possible, so no nested
> TLVs but rather this u32 + matcher specific data form a TLV together.
> 
> A selector consists of a TLV array of such matches. The first TLV,
> type=1 becomes a header with the possibility to transfer classifier
> specific options (such as hash table configuration for u32).

Maybe i misunderstood you. You are going to have:

SEL2
  |
  |
  +--match1
  |    |
  |    + -- extended match1
  .    .
  .    .
  .    .
  .    +---- extnded matchn
  |
  +--- matchn

Why does the first one have to be speacial?

One of the mistakes in u32 is the tight packing of the matches. ie the
match1-n above are packet together. If they were put in TLVs probably
only new thing that will be needed is MATCH2 TLV.
No harm in transporting an extra 32 bits for a TLV - its not like you
are going to have a million matches and need to save space.
So i would suggest everything under a TLV (SEL2->MATCH(ES)->EMATCH(ES))

> > Why not make the always-true to be an extended match? actually a u32
> > match of 0 0 is always true. Those hashes are quiet tricky/flexible;
> > i would rather we clone u32 and call it something else then speacilize
> > it.
> 
> Agreed, I don't want to change u32 but I want to introduce ematches
> in u32 as well so we can benefit from the hashing but for those who
> don't need hashing u32 is already bloat so we can do a simple
> always-true classifier which does nothing more than evaluating the
> ematches. I want to have the u32 match be a ematch as well so the
> always-true classifier would become a u32 alternative but without
> the hashing overhead.

Sounds fine to me. so a u32 match 00 emathes here ..

> It requires the following parameters:
>  - start offset
>  - end offset
>  - pattern
>  - prefix table
> 
> and then will simply start at `start` and scans until `end` looking
> for pattern with the help of the prefix table. Again, I'm not sure what
> you mean by state ;->

A virus would span several packets. So the state/knowldge of whether
something is a virus is spread across several packets. That knowledge
typically needs to be accumulated before making a call. Is this thing
capable of remembering?

cheers,
jamal

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2004-12-31 14:59                                           ` jamal
@ 2004-12-31 15:39                                             ` Thomas Graf
  2004-12-31 16:44                                               ` jamal
  0 siblings, 1 reply; 42+ messages in thread
From: Thomas Graf @ 2004-12-31 15:39 UTC (permalink / raw)
  To: jamal; +Cc: David S. Miller, netdev

* jamal <1104505142.1048.262.camel@jzny.localdomain> 2004-12-31 09:59
> On Fri, 2004-12-31 at 06:08, Thomas Graf wrote:
> > * jamal <1104469111.1049.219.camel@jzny.localdomain> 2004-12-30 23:58
> 
> > > perhaps we need a few bits in the selector to identify
> > > the owner who installed the rule to begin with. Then it would be safe to
> > > interpret the meaning of the ID (by the same app). Did you say there
> > > were some unused bits in the selector?
> > 
> > Right, but why not do this in userspace by having a global map
> > somewhere in a file? A u32 config could have been modified by
> > multiple pids and it would get really messy to store a pid for
> > every possible changeable item.
> 
> The "PID" is per app - same as it is for rtm where the different known
> routing apps like zebra are given an ID and certain IDs are left blank
> for general use. So if you run multiple tcs they will all have the same
> ID. The way the id is used in the rtm historically has been for allowing
> multiple route protocols to install route entries (this way for example 
> OSPF does/nt announce RIP routes etc). In our case this would mean:
> "this rule was installed by tc - only it knows what the opaque value
> 10 means". In that case tc would be responsible to decode 10 which would
> mean to it "this is an ip match therefore use ip_print() to pretty
> print" and 10 is global to tc only and in that case stored in some
> header file belonging to tc. 

Exactly so we don't need any PIDs stored. /etc/iproute2/tc_matches.
> > __u8 kind; /* match type, for lookup in matchers table */
> 
> 255 possible matchers max?

256 not enough?

> > __u8 flags; /* Invert Flag + Relations */
> > __u16 handle; /* must be unique per selector, may be autogenerated */
> 
> Ok this is the one used to store the opaque IDs - unique per app
> so may be the same across multiple selectors.

No, it must be unique, we will return EINVAL if it isn't. Most
apps will set it to 0 which will autogenerate it with a
generate && !get just like in u32.

> Probably steal a few bits from here and use them in nkind.

Fair, we can also steal a few bits from flags although I'd like to
keep at least 3 empty.

> Maybe i misunderstood you. You are going to have:
> 
> SEL2
>   |
>   |
>   +--match1
>   |    |
>   |    + -- extended match1
>   .    .
>   .    .
>   .    .
>   .    +---- extnded matchn
>   |
>   +--- matchn
> 
> 
> Why does the first one have to be speacial?

No, I'm going to have everything be ematches.

> One of the mistakes in u32 is the tight packing of the matches. ie the
> match1-n above are packet together. If they were put in TLVs probably
> only new thing that will be needed is MATCH2 TLV.
> No harm in transporting an extra 32 bits for a TLV - its not like you
> are going to have a million matches and need to save space.
> So i would suggest everything under a TLV (SEL2->MATCH(ES)->EMATCH(ES))

SEL2:

  TLV (TCA_U32_SEL2)
  +-------------------+
  | Selector header   |
  +-------------------+
  | Match 1 TLV       |
  +-------------------+
  | ...               |
  +-------------------+
  | Match N TLV       |
  +-------------------+

where Match TLV:

  +--------------------+
  | Match Header (u32) |
  +- - - - - - - - - - +
  | Match Data         |
  +--------------------+

Where Header:
  u8 kind;
  u8 flags;
  u16 handle;

(or some more for kind and less for handle)

where

kind := { 0 | match type }
handle := { 0 | unique }

where kind == 0 means that the match is a container and data
contains a u32 pointing into the match array.

where kind != 0 means a match to be looked up in the matcher
ops table.

with the following matches:
 - u32: {u8|u16|u32} at offset match
 - meta
 - kmp
 - nbyte
 - ...

In case SEL1 TLV is provided we simply create a flat
index with no containers and all AND relations.

Which means we can do:

Sel2
 \__match u32 at 2 16 0xff
 |
 + and match meta nfmark 2
 |
 + and container
    \__match u32 at 4 11 0xf0
     |
     + or nbyte "::1" at 12

Thoughts?

> A virus would span several packets. So the state/knowldge of whether
> something is a virus is spread across several packets. That knowledge
> typically needs to be accumulated before making a call. Is this thing
> capable of remembering?

Not and I would rather do it outside in a separate ematch to match
stateful information. I have to think some more about this though.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2004-12-31 15:39                                             ` Thomas Graf
@ 2004-12-31 16:44                                               ` jamal
  2004-12-31 17:32                                                 ` jamal
  2004-12-31 18:11                                                 ` Thomas Graf
  0 siblings, 2 replies; 42+ messages in thread
From: jamal @ 2004-12-31 16:44 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David S. Miller, netdev

On Fri, 2004-12-31 at 10:39, Thomas Graf wrote:
> * jamal <1104505142.1048.262.camel@jzny.localdomain> 2004-12-31 09:59

> Exactly so we don't need any PIDs stored. /etc/iproute2/tc_matches.

Sure - that would be a fine place to store the IDs. Although i am not
sure you want to call it matches - rather its u32 program identifiers.

> > > __u8 kind; /* match type, for lookup in matchers table */
> > 
> > 255 possible matchers max?
> 
> 256 not enough?

I dont know, possible given how easy it would be to add a new match
that it wont be sufficient.

> > > __u8 flags; /* Invert Flag + Relations */
> > > __u16 handle; /* must be unique per selector, may be autogenerated */
> > 
> > Ok this is the one used to store the opaque IDs - unique per app
> > so may be the same across multiple selectors.
> 
> No, it must be unique, we will return EINVAL if it isn't. Most
> apps will set it to 0 which will autogenerate it with a
> generate && !get just like in u32.
> 

We may be talking about different things - please double check.
The misunderstanding seems to be on scoping: PID is global and 
the opaque ID is per match.
IOW, theres a hierachy: 
A program(such as tc) installs a filter rule - we need to be able to
identify the program - this is the PID. Unique across all Linux.
A filter rule constitutes one or more matches. Different programs may
install different u32 rules. For most users its a single program - tc.
Each program that installs a rule will need to be identified by the PID.
Main purpose is so that it can decode what the second level number
means. This second level number is what i said was opaque. Its meaning
is per app. 
So as an example PID 0x10 identifies application tc and opaque code
0x20 for tc translates to mean "thats an ip match, so no need to dump
raw data - just dump it in english using ip_print()".
0x10 belongs to the selector; 0x20 is per match. 0x21 could mean "this
is a TCP match with options" etc
The ematches on the hand are purely decodable via user space without
needing the opaque numbers - the kind/type serves these just fine.

> > Probably steal a few bits from here and use them in nkind.
> 
> Fair, we can also steal a few bits from flags although I'd like to
> keep at least 3 empty.

the 16 bits for match handle sounds a little too generous; still 4 bits
or so from there should be fine.

> SEL2:
> 
>   TLV (TCA_U32_SEL2)
>   +-------------------+
>   | Selector header   |
>   +-------------------+
>   | Match 1 TLV       |
>   +-------------------+
>   | ...               |
>   +-------------------+
>   | Match N TLV       |
>   +-------------------+

nod.

> where Match TLV:
> 
>   +--------------------+
>   | Match Header (u32) |
>   +- - - - - - - - - - +
>   | Match Data         |
>   +--------------------+
> 
> Where Header:
>   u8 kind;
>   u8 flags;
>   u16 handle;
> 
> (or some more for kind and less for handle)

nod.

> where
> 
> kind := { 0 | match type }
> handle := { 0 | unique }
> 
> where kind == 0 means that the match is a container and data
> contains a u32 pointing into the match array.
> 

So essentiall good old u32 here?

> where kind != 0 means a match to be looked up in the matcher
> ops table.
> 
> with the following matches:
>  - u32: {u8|u16|u32} at offset match
>  - meta
>  - kmp
>  - nbyte
>  - ...

Everything is almost the same as to what i was saying - except i see the
u32 again there; isnt kind = 0 covering this?

> In case SEL1 TLV is provided we simply create a flat
> index with no containers and all AND relations.

No choice there.

> Which means we can do:
> 
> Sel2
>  \__match u32 at 2 16 0xff
>  |
>  + and match meta nfmark 2
>  |
>  + and container
>     \__match u32 at 4 11 0xf0
>      |
>      + or nbyte "::1" at 12
> 
> Thoughts?
> 

Ok, I may have understood what you are talking about in the second u32
where kind !=0. 
I think we are in sync. Go nuts.

> > A virus would span several packets. So the state/knowldge of whether
> > something is a virus is spread across several packets. That knowledge
> > typically needs to be accumulated before making a call. Is this thing
> > capable of remembering?
> 
> Not and I would rather do it outside in a separate ematch to match
> stateful information. I have to think some more about this though.

Its a non-trivial problem. Its ok to defer it for now but keep it in the
back of your mind. 

cheers,
jamal

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2004-12-31 16:44                                               ` jamal
@ 2004-12-31 17:32                                                 ` jamal
  2004-12-31 18:11                                                 ` Thomas Graf
  1 sibling, 0 replies; 42+ messages in thread
From: jamal @ 2004-12-31 17:32 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David S. Miller, netdev

On Fri, 2004-12-31 at 11:44, jamal wrote:

> > with the following matches:

One thing i just remembered: You need to know the length of the matches
and their data in order to store them. This is why i was earlier
preaching putting them in TLVs. Some things dont need the datalen
like u32 - however i suspect most will.

So either need a length somewhere in the header or use TLVs for the
ematches in which you can make T=kind - still have 32 bit inside body
but reserve bits not used for flags for future use. Thoughts?.

BTW, for deleting - should not allow deleting one of N matches. Deletion
should be at selector level. Replace can be used to pretend a single
match was deleted.

cheers,
jamal

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2004-12-31 16:44                                               ` jamal
  2004-12-31 17:32                                                 ` jamal
@ 2004-12-31 18:11                                                 ` Thomas Graf
  2004-12-31 18:19                                                   ` Thomas Graf
  2004-12-31 20:51                                                   ` jamal
  1 sibling, 2 replies; 42+ messages in thread
From: Thomas Graf @ 2004-12-31 18:11 UTC (permalink / raw)
  To: jamal; +Cc: David S. Miller, netdev

* jamal <1104511494.1048.303.camel@jzny.localdomain> 2004-12-31 11:44
> We may be talking about different things - please double check.
> The misunderstanding seems to be on scoping: PID is global and 
> the opaque ID is per match.
> IOW, theres a hierachy: 
> A program(such as tc) installs a filter rule - we need to be able to
> identify the program - this is the PID. Unique across all Linux.
> A filter rule constitutes one or more matches. Different programs may
> install different u32 rules. For most users its a single program - tc.
> Each program that installs a rule will need to be identified by the PID.
> Main purpose is so that it can decode what the second level number
> means. This second level number is what i said was opaque. Its meaning
> is per app. 
> So as an example PID 0x10 identifies application tc and opaque code
> 0x20 for tc translates to mean "thats an ip match, so no need to dump
> raw data - just dump it in english using ip_print()".
> 0x10 belongs to the selector; 0x20 is per match. 0x21 could mean "this
> is a TCP match with options" etc
> The ematches on the hand are purely decodable via user space without
> needing the opaque numbers - the kind/type serves these just fine.

Agreed I just don't get the reason for the PID. tc is usually called as
a new process instance when dumping. For me there are two possible
options, we can either introduce a ID system where an ID is assigned
to a match string in either a config file or a header file or we can
have tc write id -> desc maps to a global file somewhere where id
means match id + u32 handle + parent + dev. The first is probably
the better way. We could extend the match header to 64bit:

u16 handle
u16 matchID
u16 kind
u8 flags
u8 pad

> Its a non-trivial problem. Its ok to defer it for now but keep it in the
> back of your mind. 

Agreed.

* jamal <1104514372.1047.326.camel@jzny.localdomain> 2004-12-31 12:32
> One thing i just remembered: You need to know the length of the matches
> and their data in order to store them. This is why i was earlier
> preaching putting them in TLVs. Some things dont need the datalen
> like u32 - however i suspect most will.

It might not have been obvious, but every match is indeed in its
own TLV. The part I don't want to use own TLVs is to separate the
match header and match data. Match header is always the same size
and match data can be aligned as well. We need len attributes for
things like meta indev match, nbyte and kmp though. A Nbyte config
TLV would look like:

TCA_EMATCH
+-------------------------+
| struct tcf_ematch_hdr   |
| - - - - - - - - - - - --|
| ematch data             |
+-------------------------+

where ematch data contains nested TLVs such as

TCA_EMATCH_NBYTE_HDR     header, contains length of pattern + possibily more
TCA_EMATCH_NBYTE_START   lower limit of searching range (offset)
TCA_EMATCH_NBYTE_END     upper limit of searching range (offset)
TCA_EMATCH_NBYTE_PATTERN searching pattern, u8 []

The length in the header is required because we can't use
L from TCA_EMATCH_NBYTE_PATTERN since it might be padded.

Same would go for meta:

TCA_EMATCH_META_HDR
TCA_EMATCH_META_LVALUE
TCA_EMATCH_MEtA_RVALUE

If needed we can put in match specific stats via a _STATS TLV.

> So either need a length somewhere in the header or use TLVs for the
> ematches in which you can make T=kind - still have 32 bit inside body
> but reserve bits not used for flags for future use. Thoughts?.

I thought about the following ordering in the selctor TLV:

 T=1 generic selector header
 T=2 classifier specific selector header (u32 hashsing stuff goes here)
 T=3 ematch 1
 T=N ematch N

> BTW, for deleting - should not allow deleting one of N matches. Deletion
> should be at selector level. Replace can be used to pretend a single
> match was deleted.

Agreed.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2004-12-31 18:11                                                 ` Thomas Graf
@ 2004-12-31 18:19                                                   ` Thomas Graf
  2004-12-31 20:51                                                   ` jamal
  1 sibling, 0 replies; 42+ messages in thread
From: Thomas Graf @ 2004-12-31 18:19 UTC (permalink / raw)
  To: jamal; +Cc: David S. Miller, netdev

* Thomas Graf <20041231181153.GP32419@postel.suug.ch> 2004-12-31 19:11
>  T=1 generic selector header
>  T=2 classifier specific selector header (u32 hashsing stuff goes here)
>  T=3 ematch 1
>  T=N ematch N

Of course this should have been:

T=N ematch N-2

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2004-12-31 18:11                                                 ` Thomas Graf
  2004-12-31 18:19                                                   ` Thomas Graf
@ 2004-12-31 20:51                                                   ` jamal
  2005-01-01 12:10                                                     ` Thomas Graf
  2005-01-01 18:32                                                     ` Thomas Graf
  1 sibling, 2 replies; 42+ messages in thread
From: jamal @ 2004-12-31 20:51 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David S. Miller, netdev

On Fri, 2004-12-31 at 13:11, Thomas Graf wrote:
> * jamal <1104511494.1048.303.camel@jzny.localdomain> 2004-12-31 11:44

> Agreed I just don't get the reason for the PID. tc is usually called as
> a new process instance when dumping. 

Indeed. A new instance of tc should be able to delete or understand
what an old instance with different process ID installed.
The P in pid here stands for "program" not "process". Looking at it from
another angle it is the "owner" of that rule.
I gave an example of routes as a comparison:

Example:
[root@jzny root]# ip r ls
10.1.0.25 via 10.0.0.90 dev eth0  proto zebra 
10.0.0.0/24 dev eth0  proto kernel  scope link  src 10.0.0.9 
127.0.0.0/8 dev lo  scope link 
default via 10.0.0.1 dev eth0 
[root@jzny root]# 

See the "proto" field? Same logic - if tc installed those rules
that should read "tc". Same thinking for new u32 display with sel2.
More impaortantly though: 
If the u32 rule was installed by tc, then it can understand what the
code/opaqueid _in the match_ means. If i knew how tc used those opaque
values then i too in my program could intepret them when i dump even
though i didnt install the rule.
 
>  For me there are two possible
> options, we can either introduce a ID system where an ID is assigned
> to a match string in either a config file or a header file or we can
> have tc write id -> desc maps to a global file somewhere where id
> means match id + u32 handle + parent + dev. The first is probably
> the better way. We could extend the match header to 64bit:
> 
> u16 handle
> u16 matchID
> u16 kind
> u8 flags
> u8 pad


We need to know who installed the rule so we can intepret what the ID in
the match is.
Unless you see a desire that, in order to understand all this, we need
to also know on which device and which parent adds towards reaching that
goal then I am afraid this will overcomplicate things. Theres probably
other things you could gain from as well by having all those fields; you
dont need them for this simple case.

> > Its a non-trivial problem. Its ok to defer it for now but keep it in the
> > back of your mind. 
> 
> Agreed.

Lets review at a future date though.
 
> * jamal <1104514372.1047.326.camel@jzny.localdomain> 2004-12-31 12:32
> > One thing i just remembered: You need to know the length of the matches
> > and their data in order to store them. This is why i was earlier
> > preaching putting them in TLVs. Some things dont need the datalen
> > like u32 - however i suspect most will.
> 
> It might not have been obvious, but every match is indeed in its
> own TLV.

Ok, cool.  To recall:

>   TLV (TCA_U32_SEL2)
>   +-------------------+
>   | Selector header   |
>   +-------------------+
>   | Match 1 TLV       |
>   +-------------------+
>   | ...               |
>   +-------------------+
>   | Match N TLV       |
>   +-------------------+

You may actually need those Ts enumerated as if they are array
indices. Look at the way i transfer actions using "order"

> The part I don't want to use own TLVs is to separate the
> match header and match data. Match header is always the same size
> and match data can be aligned as well. We need len attributes for
> things like meta indev match, nbyte and kmp though. A Nbyte config
> TLV would look like:
> 
> TCA_EMATCH
> +-------------------------+
> | struct tcf_ematch_hdr   |
> | - - - - - - - - - - - --|
> | ematch data             |
> +-------------------------+

I was more worried about the matches not being TLVs.
So this looks good.

> where ematch data contains nested TLVs such as
> 
> TCA_EMATCH_NBYTE_HDR     header, contains length of pattern + possibily more
> TCA_EMATCH_NBYTE_START   lower limit of searching range (offset)
> TCA_EMATCH_NBYTE_END     upper limit of searching range (offset)
> TCA_EMATCH_NBYTE_PATTERN searching pattern, u8 []
> 
> The length in the header is required because we can't use
> L from TCA_EMATCH_NBYTE_PATTERN since it might be padded.
> 

My view was length is also a common field. Theres also another reason
why you want length viewable in a dumb way:
--> you dont really wanna force people to write dumpers for these
ematchers (goal: keep this interface as simple as it can be); i.e dont
need any pretty formater in the kernel.
If you have a length then you can reconstruct the TCA_EMATCH easily
without caring about the content. This is the path i started taking in 
eactions. Refer to my notes i sent earlier on the mythical one page
ematch/eaction.
If someone wants funky stuff - write a classifier.

> Same would go for meta:
> 
> TCA_EMATCH_META_HDR
> TCA_EMATCH_META_LVALUE
> TCA_EMATCH_MEtA_RVALUE
> 
> If needed we can put in match specific stats via a _STATS TLV.

Stats are the other thing that adds complexity to the API. If you can
make it optional then that would be best - I was thinking to not even
have it in.

> > So either need a length somewhere in the header or use TLVs for the
> > ematches in which you can make T=kind - still have 32 bit inside body
> > but reserve bits not used for flags for future use. Thoughts?.
> 
> I thought about the following ordering in the selctor TLV:
> 
>  T=1 generic selector header
>
>  T=2 classifier specific selector header (u32 hashsing stuff goes here)
>  T=3 ematch 1
>  T=N ematch N

I thought we already agreed on the layout:
SEL2- which may nest E/MATCHEs TLVs. Sel2 not being very different from
original selector. May be i didnt follow. 

cheers,
jamal

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2004-12-31 20:51                                                   ` jamal
@ 2005-01-01 12:10                                                     ` Thomas Graf
  2005-01-01 23:29                                                       ` jamal
  2005-01-01 18:32                                                     ` Thomas Graf
  1 sibling, 1 reply; 42+ messages in thread
From: Thomas Graf @ 2005-01-01 12:10 UTC (permalink / raw)
  To: jamal; +Cc: David S. Miller, netdev

* jamal <1104526311.1047.379.camel@jzny.localdomain> 2004-12-31 15:51
> On Fri, 2004-12-31 at 13:11, Thomas Graf wrote:
> > * jamal <1104511494.1048.303.camel@jzny.localdomain> 2004-12-31 11:44
> 
> > Agreed I just don't get the reason for the PID. tc is usually called as
> > a new process instance when dumping. 
> 
> Indeed. A new instance of tc should be able to delete or understand
> what an old instance with different process ID installed.
> The P in pid here stands for "program" not "process". Looking at it from
> another angle it is the "owner" of that rule.

Ahh, now this makes sense. Sorry, I misunderstood you all the time.

> > u16 handle
> > u16 matchID
> > u16 kind
> > u8 flags
> > u8 pad
> 
> We need to know who installed the rule so we can intepret what the ID in
> the match is.

Yes but this should go into the selector header.

> You may actually need those Ts enumerated as if they are array
> indices. Look at the way i transfer actions using "order"

Right, order would be N-2. I don't see any reason for storing it, I
didn't had need for it in EGP and I used exactly the same techniques
as in the action code.

> My view was length is also a common field. Theres also another reason
> why you want length viewable in a dumb way:
> --> you dont really wanna force people to write dumpers for these
> ematchers (goal: keep this interface as simple as it can be); i.e dont
> need any pretty formater in the kernel.
> If you have a length then you can reconstruct the TCA_EMATCH easily
> without caring about the content. This is the path i started taking in 
> eactions. Refer to my notes i sent earlier on the mythical one page
> ematch/eaction.
> If someone wants funky stuff - write a classifier.

Very simple ematches will only require a struct for configuration
so the dumping is not more than 5 lines of code. The length can be
calculated via RTA_PAYLOAD(ematch_tlv) - sizeof(ematch_hdr). This
of course requires the struct to be aligned to RTA_ALIGN but
that's generally not a problem at all.

I understand your concern but I also want to allow a little bit
more complicated ematches such as KMP or later a very simple
regular expression implementation.

Here's some code from the simple_cmp key of EGP giving a good
idea how a simple ematch will look like:

static int
sc_match(struct egp_cls *cls, struct egp_key *k)
{
       struct egp_key_sc *sc = k->data;
       u32 lvalue = cls->ops->read(cls, &sc->left);
       u32 rvalue = cls->ops->read(cls, &sc->right);

       switch (sc->op) {
#define SC(a, b) case EGP_OP_##a: return b
               SC(NONE, 0);
               SC(EQ, lvalue == rvalue);
               SC(NE, lvalue != rvalue);
               SC(LT, lvalue <  rvalue);
               SC(LE, lvalue <= rvalue);
               SC(GT, lvalue >  rvalue);
               SC(GE, lvalue >= rvalue);
#undef  SC
       }

       BUG();
       return 0;
}

static int
sc_validate(struct egp_config *conf, struct egp_ops *ops, void *d, size_t len)
{
       int err;
       struct egp_key_sc *sc = d;

       if (len != sizeof(*sc) || sc->op > EGP_OP_MAX)
               return -EINVAL;
       return 0;
}

static int
sc_replace(struct egp_cls *cls, struct egp_key *k, void *d, size_t len)
{
       k->data = kmalloc(len, GFP_KERNEL);
       if (NULL == k->data)
               return -ENOBUFS;
       memcpy(k->data, d, len);
       return 0;
}

static int
sc_dump(struct egp_cls *cls, struct sk_buff *skb, struct egp_key *k)
{
       struct egp_key_sc *sc = k->data;
       RTA_PUT(skb, TCA_EGP_KEY_DATA, sizeof(*sc), sc);
       return 0;

rtattr_failure:
       return -1;
}

static void
sc_free_data(struct egp_cls *cls, struct egp_key *k)
{
       if (k->data)
               kfree(k->data);
}

OTOH, on more complex ematches data might be nested TLVs with
optional parameters. etc.

> Stats are the other thing that adds complexity to the API. If you can
> make it optional then that would be best - I was thinking to not even
> have it in.

It's probably enough to allow optional generic hits/success stats
per match.

> I thought we already agreed on the layout:
> SEL2- which may nest E/MATCHEs TLVs. Sel2 not being very different from
> original selector. May be i didnt follow. 

You did follow but I made the existing u32 match a ematch as well.
Things like the program ID goes into the selector header T=1 and
classifier specific selector bits such as the hashing bits goes
into T=2. Thinking of it it's probably cleaner to put things like
hmark, hoff into its own TLV. So the selector TLV contains the
selector header at T=1 and nested ematch TLVs at T=2..T=N.

I think we're mostly in sync so I'll start working on it and
we can review again.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2005-01-01 12:10                                                     ` Thomas Graf
@ 2005-01-01 23:29                                                       ` jamal
  2005-01-02  0:06                                                         ` Thomas Graf
  0 siblings, 1 reply; 42+ messages in thread
From: jamal @ 2005-01-01 23:29 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David S. Miller, netdev

On Sat, 2005-01-01 at 07:10, Thomas Graf wrote:
> * jamal <1104526311.1047.379.camel@jzny.localdomain> 2004-12-31 15:51

> > We need to know who installed the rule so we can intepret what the ID in
> > the match is.
> 
> Yes but this should go into the selector header.
> 

Agreed - PID needs to go into selector. The other ID is per match.

> > You may actually need those Ts enumerated as if they are array
> > indices. Look at the way i transfer actions using "order"
> 
> Right, order would be N-2. I don't see any reason for storing it, I
> didn't had need for it in EGP and I used exactly the same techniques
> as in the action code.
> 

If youve done it on EGP then go ahead and use that scheme since you are
comfortable with it.

> > My view was length is also a common field. Theres also another reason
> > why you want length viewable in a dumb way:
> > --> you dont really wanna force people to write dumpers for these
> > ematchers (goal: keep this interface as simple as it can be); i.e dont
> > need any pretty formater in the kernel.
> > If you have a length then you can reconstruct the TCA_EMATCH easily
> > without caring about the content. This is the path i started taking in 
> > eactions. Refer to my notes i sent earlier on the mythical one page
> > ematch/eaction.
> > If someone wants funky stuff - write a classifier.
> 
> Very simple ematches will only require a struct for configuration
> so the dumping is not more than 5 lines of code. The length can be
> calculated via RTA_PAYLOAD(ematch_tlv) - sizeof(ematch_hdr). This
> of course requires the struct to be aligned to RTA_ALIGN but
> that's generally not a problem at all.

Does the ematch API include a dump()? I dont think it should - thats the
point i was making. Should be simple.

> I understand your concern but I also want to allow a little bit
> more complicated ematches such as KMP or later a very simple
> regular expression implementation.
> 
> Here's some code from the simple_cmp key of EGP giving a good
> idea how a simple ematch will look like:
> 
> static int
> sc_match(struct egp_cls *cls, struct egp_key *k)
> {
>        struct egp_key_sc *sc = k->data;
>        u32 lvalue = cls->ops->read(cls, &sc->left);
>        u32 rvalue = cls->ops->read(cls, &sc->right);
> 
>        switch (sc->op) {
> #define SC(a, b) case EGP_OP_##a: return b
>                SC(NONE, 0);
>                SC(EQ, lvalue == rvalue);
>                SC(NE, lvalue != rvalue);
>                SC(LT, lvalue <  rvalue);
>                SC(LE, lvalue <= rvalue);
>                SC(GT, lvalue >  rvalue);
>                SC(GE, lvalue >= rvalue);
> #undef  SC
>        }
> 
>        BUG();
>        return 0;
> }
> 

nice and valid for API.

> static int
> sc_validate(struct egp_config *conf, struct egp_ops *ops, void *d, size_t len)
> {
>        int err;
>        struct egp_key_sc *sc = d;
> 
>        if (len != sizeof(*sc) || sc->op > EGP_OP_MAX)
>                return -EINVAL;
>        return 0;
> }
> 

Even this is too much for a simple ematch. Validation should happen in
user space or maybe at the mother clasifier. maybe a maxsize,minsize
attribute is needed in the ematch struct.

> static int
> sc_replace(struct egp_cls *cls, struct egp_key *k, void *d, size_t len)
> {
>        k->data = kmalloc(len, GFP_KERNEL);
>        if (NULL == k->data)
>                return -ENOBUFS;
>        memcpy(k->data, d, len);
>        return 0;
> }

Equivalent of this should be done by the mother classifier.
All it needs to know is the length (and no other details). And the
length is known from the L in TLV.

> static int
> sc_dump(struct egp_cls *cls, struct sk_buff *skb, struct egp_key *k)
> {
>        struct egp_key_sc *sc = k->data;
>        RTA_PUT(skb, TCA_EGP_KEY_DATA, sizeof(*sc), sc);
>        return 0;
> 
> rtattr_failure:
>        return -1;
> }
> 

Again if you store length, you dont need this. Mother classifier can do
it.

> static void
> sc_free_data(struct egp_cls *cls, struct egp_key *k)
> {
>        if (k->data)
>                kfree(k->data);
> }

Dont need this either; mother classifier can handle it.

> OTOH, on more complex ematches data might be nested TLVs with
> optional parameters. etc.
> 
> > Stats are the other thing that adds complexity to the API. If you can
> > make it optional then that would be best - I was thinking to not even
> > have it in.
> 
> It's probably enough to allow optional generic hits/success stats
> per match.

Even at the moment classifiers dont have stats. If you want stats
you could add a simple gact accept action. 
Note also: think of the 100K rules being added and amount of RAM used;

> > I thought we already agreed on the layout:
> > SEL2- which may nest E/MATCHEs TLVs. Sel2 not being very different from
> > original selector. May be i didnt follow. 
> 
> You did follow but I made the existing u32 match a ematch as well.
> Things like the program ID goes into the selector header T=1 and
> classifier specific selector bits such as the hashing bits goes
> into T=2. Thinking of it it's probably cleaner to put things like
> hmark, hoff into its own TLV. So the selector TLV contains the
> selector header at T=1 and nested ematch TLVs at T=2..T=N.
> 

Note that the ematch is supposed to be a very very simple thing...
Something a fireman who has implemented a iptables target can whip in an
hour. Keep in mind that design goal. Non trivial coding needed or poor
usability is the major problem with tc in general. Avoid that theme.
An ematch _needs_  a mother classifier such as u32. u32 has a very nice
and very flexible layout - it can be trained to build any sort of tree.
Maybe the first step should be to not even have u32 as an ematch ..

> I think we're mostly in sync so I'll start working on it and
> we can review again.

Maybe i will wait for the code. 

cheers,
jamal

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2005-01-01 23:29                                                       ` jamal
@ 2005-01-02  0:06                                                         ` Thomas Graf
  2005-01-03 14:36                                                           ` jamal
  0 siblings, 1 reply; 42+ messages in thread
From: Thomas Graf @ 2005-01-02  0:06 UTC (permalink / raw)
  To: jamal; +Cc: David S. Miller, netdev

* jamal <1104622164.1048.444.camel@jzny.localdomain> 2005-01-01 18:29
> Does the ematch API include a dump()? I dont think it should - thats the
> point i was making. Should be simple.

Yes, although simple ematches are not required to implement dump.

> > [validate]
> Even this is too much for a simple ematch. Validation should happen in
> user space or maybe at the mother clasifier. maybe a maxsize,minsize
> attribute is needed in the ematch struct.
> > [replace]
> Equivalent of this should be done by the mother classifier.
> All it needs to know is the length (and no other details). And the
> length is known from the L in TLV.

I merged validate/replace into change which takes a unsigned long
for storage and a data/len parameter. It's up to the ematch what he
does with it, data may contain a u32 directly and the ematch might
save it in the unsigned long or a ematch may allocate a structure.

Why are you focused on hiding so much? I'd rather try to make it
simple but still allow more complex ematches to exist.

Have a look at http://people.suug.ch/~tgr/tmp/ematch.diff

I broke the API down to:

change
match
destroy (optional, only for complex ematches)
dump (optional, only for complex ematches)

> Even at the moment classifiers dont have stats. If you want stats
> you could add a simple gact accept action. 
> Note also: think of the 100K rules being added and amount of RAM used;

Agreed, I didn't add them so far, it's up to the ematch whether
it wants to maintain stats or not.

> Note that the ematch is supposed to be a very very simple thing...
> Something a fireman who has implemented a iptables target can whip in an
> hour. Keep in mind that design goal. Non trivial coding needed or poor
> usability is the major problem with tc in general. Avoid that theme.
> An ematch _needs_  a mother classifier such as u32. u32 has a very nice
> and very flexible layout - it can be trained to build any sort of tree.
> Maybe the first step should be to not even have u32 as an ematch ..

I understand your point but I want to at least be able to implement
some more complex stuff. Hiding too much can be bad too. Having only 2
functions to implement is really easy, the rest can be done the LinuxWay(tm) ;->

Have a look at the code and tell me what you think.

Here's an example ematch, I find this quite simple already.

static in foo_change(struct tcf_proto *tp, void *data, int len, struct
			tcf_ematch *m)
{
	if (len != sizeof(u32))
		return -EINVAL;
	m->data = *(u32*)data;
	return 0;
}

static int foo_match(struct sk_buff *skb, struct tcf_ematch *m)
{
	return skb->security == m->data;
}

static struct tcf_ematch_ops foo_ops = {
	.kind	= 111,
	.change = foo_change,
	.match  = foo_match,
	.owner  = THIS_MODULE,
	.link	= LIST_HEAD_INIT(foo_ops.link)
}

static int __init foo_init(void)
{
	return tcf_ematch_register(&foo_ops);
}

static void __exit foo_exit(void)
{
	return tcf_ematch_unregister(&foo_ops);
}

module_init(init_foo);
module_exit(exit_foo);

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2005-01-02  0:06                                                         ` Thomas Graf
@ 2005-01-03 14:36                                                           ` jamal
  2005-01-03 15:02                                                             ` Thomas Graf
  0 siblings, 1 reply; 42+ messages in thread
From: jamal @ 2005-01-03 14:36 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David S. Miller, netdev

On Sat, 2005-01-01 at 19:06, Thomas Graf wrote:
> * jamal <1104622164.1048.444.camel@jzny.localdomain> 2005-01-01 18:29
> > Does the ematch API include a dump()? I dont think it should - thats the
> > point i was making. Should be simple.
> 
> Yes, although simple ematches are not required to implement dump.

ok. I realize its optional - but i wouldnt even give the writter of
ematch the opportunity to write one. Want something more complex? write
a classifier.

> > > [validate]
> > Even this is too much for a simple ematch. Validation should happen in
> > user space or maybe at the mother clasifier. maybe a maxsize,minsize
> > attribute is needed in the ematch struct.
> > > [replace]
> > Equivalent of this should be done by the mother classifier.
> > All it needs to know is the length (and no other details). And the
> > length is known from the L in TLV.
> 
> I merged validate/replace into change which takes a unsigned long
> for storage and a data/len parameter. It's up to the ematch what he
> does with it, data may contain a u32 directly and the ematch might
> save it in the unsigned long or a ematch may allocate a structure.

Again allowing for this may be overkill. Just send the same structure
the ematch needs in exactly the same form it needs it and you dont need
this. 

> Why are you focused on hiding so much? I'd rather try to make it
> simple but still allow more complex ematches to exist.
> 
> Have a look at http://people.suug.ch/~tgr/tmp/ematch.diff

Thanks, give me a few hours then i will look.

> I broke the API down to:
> 
> change
> match
> destroy (optional, only for complex ematches)
> dump (optional, only for complex ematches)
> 

Other than match, anything really should be optional for simplicity
sake.

> > Note that the ematch is supposed to be a very very simple thing...
> > Something a fireman who has implemented a iptables target can whip in an
> > hour. Keep in mind that design goal. Non trivial coding needed or poor
> > usability is the major problem with tc in general. Avoid that theme.
> > An ematch _needs_  a mother classifier such as u32. u32 has a very nice
> > and very flexible layout - it can be trained to build any sort of tree.
> > Maybe the first step should be to not even have u32 as an ematch ..
> 
> I understand your point but I want to at least be able to implement
> some more complex stuff. Hiding too much can be bad too. Having only 2
> functions to implement is really easy, the rest can be done the LinuxWay(tm) ;->
> 

True - but the goals of this ematch is to be simple ;->

> Have a look at the code and tell me what you think.
> 
> Here's an example ematch, I find this quite simple already.
> 
> static in foo_change(struct tcf_proto *tp, void *data, int len, struct
> 			tcf_ematch *m)
> {
> 	if (len != sizeof(u32))
> 		return -EINVAL;
> 	m->data = *(u32*)data;
> 	return 0;
> }
> 

I still say not needed ;->

> static int foo_match(struct sk_buff *skb, struct tcf_ematch *m)
> {
> 	return skb->security == m->data;
> }

makes sense.

> static struct tcf_ematch_ops foo_ops = {
> 	.kind	= 111,
> 	.change = foo_change,
> 	.match  = foo_match,
> 	.owner  = THIS_MODULE,
> 	.link	= LIST_HEAD_INIT(foo_ops.link)
> }

whats the .link for?

> static int __init foo_init(void)
> {
> 	return tcf_ematch_register(&foo_ops);
> }
> 
> static void __exit foo_exit(void)
> {
> 	return tcf_ematch_unregister(&foo_ops);
> }
> 
> module_init(init_foo);
> module_exit(exit_foo);

All looks good.  Give me a few hours, first day of working week will
slow me down.

cheers,
jamal

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2005-01-03 14:36                                                           ` jamal
@ 2005-01-03 15:02                                                             ` Thomas Graf
  2005-01-03 15:55                                                               ` jamal
  0 siblings, 1 reply; 42+ messages in thread
From: Thomas Graf @ 2005-01-03 15:02 UTC (permalink / raw)
  To: jamal; +Cc: David S. Miller, netdev

* jamal <1104763012.1047.524.camel@jzny.localdomain> 2005-01-03 09:36
> On Sat, 2005-01-01 at 19:06, Thomas Graf wrote:
> > * jamal <1104622164.1048.444.camel@jzny.localdomain> 2005-01-01 18:29
> > > Does the ematch API include a dump()? I dont think it should - thats the
> > > point i was making. Should be simple.
> > 
> > Yes, although simple ematches are not required to implement dump.
> 
> ok. I realize its optional - but i wouldnt even give the writter of
> ematch the opportunity to write one. Want something more complex? write
> a classifier.

A classifier is at least 300 lines and you lose the ability to use the
logic relations.

> Again allowing for this may be overkill. Just send the same structure
> the ematch needs in exactly the same form it needs it and you dont need
> this. 

Compromise: If change/dump is not provided the api allocates and
memcpy's itself resptively dumps m->data. Simple ematches don't have to
care and can simple access m->data, more complex ematches can implement
their own change/dump. Does that sound beter?

> whats the .link for?

I use list.h to chain ematch_ops and it's better to have it initialized.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2005-01-03 15:02                                                             ` Thomas Graf
@ 2005-01-03 15:55                                                               ` jamal
  2005-01-03 16:26                                                                 ` Thomas Graf
  0 siblings, 1 reply; 42+ messages in thread
From: jamal @ 2005-01-03 15:55 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David S. Miller, netdev

On Mon, 2005-01-03 at 10:02, Thomas Graf wrote:

> > ok. I realize its optional - but i wouldnt even give the writter of
> > ematch the opportunity to write one. Want something more complex? write
> > a classifier.
> 
> A classifier is at least 300 lines and you lose the ability to use the
> logic relations.

Sure, but simpler-than-classifier is where we started ;->
I think in time we should revamp this a little more, but classifiers we
have today already we just cant get rid of them now. Over time, I agree 
we should revamp.

> > Again allowing for this may be overkill. Just send the same structure
> > the ematch needs in exactly the same form it needs it and you dont need
> > this. 
> 
> Compromise: If change/dump is not provided the api allocates and
> memcpy's itself resptively dumps m->data. Simple ematches don't have to
> care and can simple access m->data, more complex ematches can implement
> their own change/dump. Does that sound beter?

Yes, indeed. A copy instead of reference is owned by the classifier.
To me its a valid compromise. Dont want those matches to be shared
in any case. Also dont want user to know shit about TLVs.
Neither in the kernel, nor in user space. Simplicty.

Sorry, havent looked at the code yet. Is the patch you posted the same
as you have on the url earlier?

> > whats the .link for?
> 
> I use list.h to chain ematch_ops and it's better to have it initialized.

I relaized that - but should the user know anything about this?

cheers,
jamal

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2005-01-03 15:55                                                               ` jamal
@ 2005-01-03 16:26                                                                 ` Thomas Graf
  0 siblings, 0 replies; 42+ messages in thread
From: Thomas Graf @ 2005-01-03 16:26 UTC (permalink / raw)
  To: jamal; +Cc: David S. Miller, netdev

* jamal <1104767726.1049.591.camel@jzny.localdomain> 2005-01-03 10:55
> Sorry, havent looked at the code yet. Is the patch you posted the same
> as you have on the url earlier?

Yes, with a few minor fixes and I made it depend on
CONFIG_NET_CLS_EMATCH and added nil macros when it's not defined.
Those macros need some more work though.

TODO:
 - KConfig doc
 - make change optional
 - dump m->data if no dump implementation is provided

The question I'm yet unsure is whether to strictly return 1/0 or
allow classifier return codes.

Given I'm done with the above, a minimal ematch will look  like:

static int sec_match(struct sk_buff *skb, struct tcf_ematch *m)
{
	return *(u32 *) m->data == skb->security;
}

static struct tcf_ematch_ops sec_ops = {
	.kind = 11,
	.match = sec_match,
	.owner = THIS_MODULE
}

... init/exit calling tcf_em_(un_register ...

What do you think about checking RTA_PAYLOAD of the ematch data
for < sizeof(unsigned long) and assign it to m->data without
allocation? Would save us nonse allocations. The yet unused 8bit
in tcf_ematch_hdr would hold a flag to indicate that m->data
contains the data directly.

> I relaized that - but should the user know anything about this?

Actually no but initializing it in tcf_em_register wouldn't serve
any purpose so we can either remove it or leave it there.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2004-12-31 20:51                                                   ` jamal
  2005-01-01 12:10                                                     ` Thomas Graf
@ 2005-01-01 18:32                                                     ` Thomas Graf
  2005-01-01 23:42                                                       ` jamal
  1 sibling, 1 reply; 42+ messages in thread
From: Thomas Graf @ 2005-01-01 18:32 UTC (permalink / raw)
  To: jamal; +Cc: David S. Miller, netdev

* jamal <1104526311.1047.379.camel@jzny.localdomain> 2004-12-31 15:51
> >  T=1 generic selector header
> >
> >  T=2 classifier specific selector header (u32 hashsing stuff goes here)
> >  T=3 ematch 1
> >  T=N ematch N
> 
> I thought we already agreed on the layout:
> SEL2- which may nest E/MATCHEs TLVs. Sel2 not being very different from
> original selector. May be i didnt follow. 

OK, I changed my mind while implementing it and a selector now looks
like this:

Selector TLV:

 +----------------------------+
 | TCA_EMATCH_TREE_HDR        |
 +----------------------------+
 | TCA_EMATCH_TREE_LIST       |
 | +------------------------+ |
 | | T=1 Match 1            | |
 | +------------------------+ |
 | | T=2 Match 2            | |
 | +------------------------+ |
 | | T=N Match N            | |
 | +------------------------+ |
 +----------------------------+

So we can put more into the selector if needed without breaking
compatibility. TCA_EMATCH_TREE_HDR currently contains `nmatches'
specifying N and progid holding the PID you talked about.
The match TLVs must have a continous numbering because I don't
want to define limits as done in the action code.

I'll post an RFC patch tomorrow implementing the API and a
simple ematch.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2005-01-01 18:32                                                     ` Thomas Graf
@ 2005-01-01 23:42                                                       ` jamal
  2005-01-02  0:13                                                         ` Thomas Graf
  0 siblings, 1 reply; 42+ messages in thread
From: jamal @ 2005-01-01 23:42 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David S. Miller, netdev

On Sat, 2005-01-01 at 13:32, Thomas Graf wrote:

> OK, I changed my mind while implementing it and a selector now looks
> like this:
> 
> Selector TLV:
> 
>  +----------------------------+
>  | TCA_EMATCH_TREE_HDR        |
>  +----------------------------+
>  | TCA_EMATCH_TREE_LIST       |
>  | +------------------------+ |
>  | | T=1 Match 1            | |
>  | +------------------------+ |
>  | | T=2 Match 2            | |
>  | +------------------------+ |
>  | | T=N Match N            | |
>  | +------------------------+ |
>  +----------------------------+
> 

what happened to the good old SEL TLV (which i believe we called SEL2
now); or maybe thats what contains this TLV?

> So we can put more into the selector if needed without breaking
> compatibility. TCA_EMATCH_TREE_HDR currently contains `nmatches'
> specifying N and progid holding the PID you talked about.

Ok, so i think you may be saying the old selector stays intact then
(sans the matches)?
Why do you need to specify "nmatches".
You know exactly where each one starts and ends (from the TLVs).

What is TCA_EMATCH_TREE_LIST for? Looks like another TLV nesting. Not
needed, you just plumb the T=1,..T=N right after the header.

> The match TLVs must have a continous numbering because I don't
> want to define limits as done in the action code.

I think the way you have it is fine - and believe it is the way the
action code has it for the list.

> I'll post an RFC patch tomorrow implementing the API and a
> simple ematch.

Nice. I have started implementing the eaction code but too obsessed with
this other thing i am working on - hopefully i will get to it before my
vacation expires.

cheers,
jamal

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2005-01-01 23:42                                                       ` jamal
@ 2005-01-02  0:13                                                         ` Thomas Graf
  2005-01-03 14:39                                                           ` jamal
  0 siblings, 1 reply; 42+ messages in thread
From: Thomas Graf @ 2005-01-02  0:13 UTC (permalink / raw)
  To: jamal; +Cc: David S. Miller, netdev

* jamal <1104622934.1047.460.camel@jzny.localdomain> 2005-01-01 18:42
> what happened to the good old SEL TLV (which i believe we called SEL2
> now); or maybe thats what contains this TLV?

Please look at the patch I posted in the other post. I think
we missudnerstand each other.

> > So we can put more into the selector if needed without breaking
> > compatibility. TCA_EMATCH_TREE_HDR currently contains `nmatches'
> > specifying N and progid holding the PID you talked about.
> 
> Ok, so i think you may be saying the old selector stays intact then
> (sans the matches)?

Right, old selector TLV statys as-is. Although I have to look
u32 closely before I can make final statements.

> Why do you need to specify "nmatches".

It's mainly a shortcut to validate precedence jumps so I
can avoid traversing the RTA chain twice. It could be
avoided but is quite handy to speed things up and
also acts for validation purposes to check consistency of
the match list.

> What is TCA_EMATCH_TREE_LIST for? Looks like another TLV nesting. Not
> needed, you just plumb the T=1,..T=N right after the header.

No, what if we need some more stuff in the selector TLV? We can't
modify the header TLV w/o breaking backwards compatibility. Adding
this addtional nesting allows to simply add stuff after TREE_LIST
TLV.

> I think the way you have it is fine - and believe it is the way the
> action code has it for the list.

You're using a maximum prio aren't you? I use a RTA_OK() loop
supporting unlimited number of matches without the need to
allocate rtattr pointer array.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PKT_SCHED]: Allow using nfmark as key in U32 classifier.
  2005-01-02  0:13                                                         ` Thomas Graf
@ 2005-01-03 14:39                                                           ` jamal
  0 siblings, 0 replies; 42+ messages in thread
From: jamal @ 2005-01-03 14:39 UTC (permalink / raw)
  To: Thomas Graf; +Cc: David S. Miller, netdev

On Sat, 2005-01-01 at 19:13, Thomas Graf wrote:
> * jamal <1104622934.1047.460.camel@jzny.localdomain> 2005-01-01 18:42
> > what happened to the good old SEL TLV (which i believe we called SEL2
> > now); or maybe thats what contains this TLV?
> 
> Please look at the patch I posted in the other post. I think
> we missudnerstand each other.

Will do.

> > Why do you need to specify "nmatches".
> 
> It's mainly a shortcut to validate precedence jumps so I
> can avoid traversing the RTA chain twice. It could be
> avoided but is quite handy to speed things up and
> also acts for validation purposes to check consistency of
> the match list.

Ok. 

> > What is TCA_EMATCH_TREE_LIST for? Looks like another TLV nesting. Not
> > needed, you just plumb the T=1,..T=N right after the header.
> 
> No, what if we need some more stuff in the selector TLV? We can't
> modify the header TLV w/o breaking backwards compatibility. Adding
> this addtional nesting allows to simply add stuff after TREE_LIST
> TLV.

Good point.

> > I think the way you have it is fine - and believe it is the way the
> > action code has it for the list.
> 
> You're using a maximum prio aren't you? I use a RTA_OK() loop
> supporting unlimited number of matches without the need to
> allocate rtattr pointer array.

Sounds reasonable;  
Note in my case, you probably want a limit on how many actions you chain
in one policy - I would suggest you do the same as well. In other words
even if you do it that way, have a limit check/setting somewhere.

cheers,
jamal

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2005-01-03 16:26 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <200412270715.iBR7Fffe026855@hera.kernel.org>
2004-12-27 12:16 ` [PKT_SCHED]: Allow using nfmark as key in U32 classifier Thomas Graf
2004-12-28 13:20   ` jamal
2004-12-28 13:40     ` Thomas Graf
2004-12-28 13:59       ` jamal
2004-12-28 14:50         ` Thomas Graf
2004-12-28 15:55           ` jamal
2004-12-28 16:11         ` Thomas Graf
2004-12-28 16:36           ` jamal
2004-12-28 16:51             ` jamal
2004-12-28 19:26             ` Thomas Graf
2004-12-28 21:14               ` jamal
2004-12-28 22:10                 ` Thomas Graf
2004-12-28 23:06                   ` jamal
2004-12-28 23:19                     ` Thomas Graf
2004-12-28 23:39                       ` jamal
2004-12-29  0:09                         ` Thomas Graf
2004-12-29  1:13                           ` jamal
2004-12-29 12:48                             ` Thomas Graf
2004-12-29 14:20                               ` jamal
2004-12-29 15:01                                 ` Thomas Graf
2004-12-29 15:53                                   ` jamal
2004-12-30 17:43                                     ` Thomas Graf
2004-12-31  4:58                                       ` jamal
2004-12-31 11:08                                         ` Thomas Graf
2004-12-31 14:59                                           ` jamal
2004-12-31 15:39                                             ` Thomas Graf
2004-12-31 16:44                                               ` jamal
2004-12-31 17:32                                                 ` jamal
2004-12-31 18:11                                                 ` Thomas Graf
2004-12-31 18:19                                                   ` Thomas Graf
2004-12-31 20:51                                                   ` jamal
2005-01-01 12:10                                                     ` Thomas Graf
2005-01-01 23:29                                                       ` jamal
2005-01-02  0:06                                                         ` Thomas Graf
2005-01-03 14:36                                                           ` jamal
2005-01-03 15:02                                                             ` Thomas Graf
2005-01-03 15:55                                                               ` jamal
2005-01-03 16:26                                                                 ` Thomas Graf
2005-01-01 18:32                                                     ` Thomas Graf
2005-01-01 23:42                                                       ` jamal
2005-01-02  0:13                                                         ` Thomas Graf
2005-01-03 14:39                                                           ` jamal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).