* More tc action mess
@ 2005-01-19 5:09 Patrick McHardy
2005-01-19 13:10 ` jamal
0 siblings, 1 reply; 6+ messages in thread
From: Patrick McHardy @ 2005-01-19 5:09 UTC (permalink / raw)
To: jamal; +Cc: Maillist netdev
I probably found the reason for the problems with the ipt action you were
talking about. netfilter targets, like tc actions, expect a struct
sk_buff **,
but the ipt action does:
struct sk_buff *skb = *pskb;
...
ret = p->t->u.kernel.target->target(&skb, skb->dev, NULL,
which of course doesn't make much sense. Unfortunately, tcf_action_exec
does the same nonsense:
int tcf_action_exec(struct sk_buff *skb, struct tc_action *act,
...
ret = a->ops->act(&skb, a);
This means we must convert all paths on which tcf_action_exec is called
to use struct sk_buff ** :(
On egress q->enqueue owns the skb, so for this path we must convert the
qdiscs classification function and all classifierts. On ingress the skb
is owned by netif_receive_skb, so we need to convert the ingress qdisc's
enqueue function to take a sk_buff **, or better add a ingress_filter
function to avoid changing all enqueue calls in other qdiscs.
I have an big, ugly, incomplete patch that does this, but teaching
classifiers to behave correctly when the packet changes under them in
the middle of classification is hard, so any other solutions are welcome.
Regards
Patrick
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: More tc action mess
2005-01-19 5:09 More tc action mess Patrick McHardy
@ 2005-01-19 13:10 ` jamal
2005-01-19 20:22 ` Patrick McHardy
0 siblings, 1 reply; 6+ messages in thread
From: jamal @ 2005-01-19 13:10 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Maillist netdev
On Wed, 2005-01-19 at 00:09, Patrick McHardy wrote:
> I probably found the reason for the problems with the ipt action you were
> talking about. netfilter targets, like tc actions, expect a struct
> sk_buff **,
> but the ipt action does:
>
> struct sk_buff *skb = *pskb;
> ...
> ret = p->t->u.kernel.target->target(&skb, skb->dev, NULL,
>
> which of course doesn't make much sense. Unfortunately, tcf_action_exec
> does the same nonsense:
>
> int tcf_action_exec(struct sk_buff *skb, struct tc_action *act,
> ...
> ret = a->ops->act(&skb, a);
>
> This means we must convert all paths on which tcf_action_exec is called
> to use struct sk_buff ** :(
No, just restore the code that you took out in one of your patches
right above that line which reads:
----
if (skb_cloned(skb)) {
if (pskb_expand_head(skb, 0, 0, GFP_ATOMIC)) {
return -1;
}
}
----
Depending on what you do in netfilter lately, you may wanna take out
the skb_cloned() call.
cheers,
jamal
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: More tc action mess
2005-01-19 13:10 ` jamal
@ 2005-01-19 20:22 ` Patrick McHardy
2005-01-20 4:30 ` jamal
0 siblings, 1 reply; 6+ messages in thread
From: Patrick McHardy @ 2005-01-19 20:22 UTC (permalink / raw)
To: hadi; +Cc: Maillist netdev
jamal wrote:
>On Wed, 2005-01-19 at 00:09, Patrick McHardy wrote:
>
>
>>This means we must convert all paths on which tcf_action_exec is called
>>to use struct sk_buff ** :(
>>
>>
>
>
>No, just restore the code that you took out in one of your patches
>right above that line which reads:
>
>----
> if (skb_cloned(skb)) {
> if (pskb_expand_head(skb, 0, 0, GFP_ATOMIC)) {
> return -1;
> }
> }
>----
>
>Depending on what you do in netfilter lately, you may wanna take out
>the skb_cloned() call.
>
>
This does not help. Netfilter calls skb_ip_make_writable if it has to
touch the packet, if it is shared or cloned the packet will be copied.
Despite this, this is hardly a fix as long as the ->act function takes
a struct sk_buff **.
Regards
Patrick
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: More tc action mess
2005-01-19 20:22 ` Patrick McHardy
@ 2005-01-20 4:30 ` jamal
2005-01-20 6:32 ` Patrick McHardy
0 siblings, 1 reply; 6+ messages in thread
From: jamal @ 2005-01-20 4:30 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Maillist netdev
On Wed, 2005-01-19 at 15:22, Patrick McHardy wrote:
> jamal wrote:
>
> >On Wed, 2005-01-19 at 00:09, Patrick McHardy wrote:
> >
> >
> >>This means we must convert all paths on which tcf_action_exec is called
> >>to use struct sk_buff ** :(
> >>
> >>
> >
> >
> >No, just restore the code that you took out in one of your patches
> >right above that line which reads:
> >
> >----
> > if (skb_cloned(skb)) {
> > if (pskb_expand_head(skb, 0, 0, GFP_ATOMIC)) {
> > return -1;
> > }
> > }
> >----
> >
> >Depending on what you do in netfilter lately, you may wanna take out
> >the skb_cloned() call.
> >
> >
>
> This does not help. Netfilter calls skb_ip_make_writable if it has to
> touch the packet, if it is shared or cloned the packet will be copied.
for ipt, just restore:
if (pskb_expand_head(skb, 0, 0, GFP_ATOMIC))
return -ENOMEM
for the rest, the action code will handle all just fine. There will only
be one copy per serious of tramplings. This is design intent.
Note when i said i had issues with iptable targets, it has absolutely
nothing to do with this - quiet a few of iptable targets i tested work
fine; the problem was that the ones that didnt work were crashing and i
did trace to some missing checks (which would be non-issue if the
invocation was made from netfilter). I still need your help on this.
> Despite this, this is hardly a fix as long as the ->act function takes
> a struct sk_buff **.
Are shared skbs broken or do you see any memory leaks? These would be
worth fixing[1].
Give me some examples which show something is broken then we can have a
more coherent discussion. I am willing to kill ipt if it is the problem.
cheers,
jamal
[1]Yes, someday it may be worth going through all of the stack and
easing the way skbs are shuttled around so we dont need to play games;
but even i didnt dare when i was doing this code given the amount of
work and intrusiveness involved.
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: More tc action mess
2005-01-20 4:30 ` jamal
@ 2005-01-20 6:32 ` Patrick McHardy
2005-01-20 13:44 ` jamal
0 siblings, 1 reply; 6+ messages in thread
From: Patrick McHardy @ 2005-01-20 6:32 UTC (permalink / raw)
To: hadi; +Cc: Maillist netdev
jamal wrote:
>On Wed, 2005-01-19 at 15:22, Patrick McHardy wrote:
>
>
>>This does not help. Netfilter calls skb_ip_make_writable if it has to
>>touch the packet, if it is shared or cloned the packet will be copied.
>>
>>
>
>for ipt, just restore:
>
>if (pskb_expand_head(skb, 0, 0, GFP_ATOMIC))
> return -ENOMEM
>
>for the rest, the action code will handle all just fine. There will only
>be one copy per serious of tramplings. This is design intent.
>
>
It seems to be more clever than I initially realized :) But it does need
some fixing for ipt. The ->act function takes a struct sk_buff **, which
implies the skb might get replaced, but tcf_action_exec only takes a struct
sk_buff * and doesn't own the skb. So if the skb is replaced in the action
the owner ends up with a pointer to freed memory. pskb_expand_head is not
enough to stop netfilter from copying, at ingress the the skb might be
nonlinear, in which case it is copied. On egress it seems fine.
>Give me some examples which show something is broken then we can have a
>more coherent discussion. I am willing to kill ipt if it is the problem.
>
Instead of killing ipt we could teach netfilter to be smarter about this.
If the data that needs to be mangled is in the non-linear range we could
just linearize the skb. Additionally we should change act_api to only pass
single skb pointers around, to avoid all confusion and possible trouble.
This seems a lot better than changing almost entire net/sched to pass
double pointers around :)
Regards
Patrick
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: More tc action mess
2005-01-20 6:32 ` Patrick McHardy
@ 2005-01-20 13:44 ` jamal
0 siblings, 0 replies; 6+ messages in thread
From: jamal @ 2005-01-20 13:44 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Maillist netdev
On Thu, 2005-01-20 at 01:32, Patrick McHardy wrote:
> It seems to be more clever than I initially realized :)
Praise be to the people behind pskb_expand (Alexey and Dave). Amen.
> But it does need
> some fixing for ipt. The ->act function takes a struct sk_buff **, which
> implies the skb might get replaced, but tcf_action_exec only takes a struct
> sk_buff * and doesn't own the skb.
> So if the skb is replaced in the action the owner ends up with a pointer to
> freed memory. pskb_expand_head is not
> enough to stop netfilter from copying, at ingress the the skb might be
> nonlinear, in which case it is copied. On egress it seems fine.
The whole idea behind the psk_expand is to _copy_ the data while
preserving the skb head i.e anyone in the path before the expand
continues to see the old data (tcpdump for example or copied packets via
mirror in the preceeding actions etc) and anyone after the expand sees
the new data ;->. Thats the punch line. So there is no problem with
tcf_action_exec() way of life.
I have some very simple environmental rules which say
"thou shalt call skb_expand in the case someone else is referencing the
skb". Unfortunately since i have no control over netfilter, its safer to
just expand everytime. I added the check for skb cloned after i saw the
changes for make_writeable(?) going in. But it may be too dangerous an
assumption.
I thought netfilter already freed after copying - unless this has
changed since about 2.6.5 when i last studied things, there should be no
memory leak.
Can you get non-linear skbs on ingress? Recall, this is right after
packet comes off the NIC. I dont see them at the moment until some
clever NIC appears in the market.
> >Give me some examples which show something is broken then we can have a
> >more coherent discussion. I am willing to kill ipt if it is the problem.
> >
> Instead of killing ipt we could teach netfilter to be smarter about this.
> If the data that needs to be mangled is in the non-linear range we could
> just linearize the skb.
I have no issue incuring a penalty when using ipt. Its unfortunate, but
fine. I cant see any memory leak if copying includes freeing in
netfilter already (and in particular that the new scheme seemed to be
using expand as well). If that is changed in netilter, restoring it
would probably be a good idea.
One thing that would be really useful is to be able to tell netfilter
its ok to mangle the packet without copying it. Or vice versa receive a
signal from it to say "packet has been mangled" so actions can trample
some more on it.
> Additionally we should change act_api to only pass
> single skb pointers around, to avoid all confusion and possible trouble.
tou mean exec() call? I would prefer not to do this. My brain has
already sweated enough to make sure that this works and performance is
good.
> This seems a lot better than changing almost entire net/sched to pass
> double pointers around :)
While i did entertain this thought before submitting the code, and
infact that was the avenue i started at, i was pretty sure Dave+Alexey
would never have accepted the code (and for a very good reason too) ;->
So nobody ever saw the code ;->
cheers,
jamal
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2005-01-20 13:44 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-01-19 5:09 More tc action mess Patrick McHardy
2005-01-19 13:10 ` jamal
2005-01-19 20:22 ` Patrick McHardy
2005-01-20 4:30 ` jamal
2005-01-20 6:32 ` Patrick McHardy
2005-01-20 13:44 ` jamal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).