* IMQ / new Dummy device post.
@ 2004-04-15 9:42 Andy Furniss
2004-04-15 12:15 ` jamal
0 siblings, 1 reply; 35+ messages in thread
From: Andy Furniss @ 2004-04-15 9:42 UTC (permalink / raw)
To: netdev
>What am i looking for?
>1) users and authors of IMQ to tell me if this achieves what IMQ >started
>as. I have to say I DONT like the level of obstrutiveness from IMQ as >is
>today. The code added by this is small (100 or less lines on top of
>dummy) and doesnt touch any of the main core bits.
>2) testing of the above by people who use IMQ
>3) If someone has better ideas - i am not religious about keeping this;
>but it certainly cant be the blasphemy that IMQ introduces
I am just a user and would drop IMQ without hesitation for something you
consider more elegant, but I am not sure whether or not dummy will do
what I want.
The only reason I use IMQ (+ NAT patch) is that I need to shape ingress
(I know I can't shape it "properly" from the wrong end of the bottleneck
without an intelligent app, but the ingress policer does not let me
share local and forwarded bandwidth and is not fair per user if I just
throttle the whole link).
I am not sure if dummy will sort this for me, there may be some other way?
Basically all I need is something I can use HTB on where the qos ingress
box is on this diagram.
http://www.docum.org/stef.coene/qos/kptd/
Andy.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
2004-04-15 9:42 Andy Furniss
@ 2004-04-15 12:15 ` jamal
2004-04-15 19:35 ` Andy Furniss
0 siblings, 1 reply; 35+ messages in thread
From: jamal @ 2004-04-15 12:15 UTC (permalink / raw)
To: Andy Furniss; +Cc: netdev
On Thu, 2004-04-15 at 05:42, Andy Furniss wrote:
>
> The only reason I use IMQ (+ NAT patch) is that I need to shape ingress
> (I know I can't shape it "properly" from the wrong end of the bottleneck
> without an intelligent app, but the ingress policer does not let me
> share local and forwarded bandwidth and is not fair per user if I just
> throttle the whole link).
>
> I am not sure if dummy will sort this for me, there may be some other way?
The summary is dummy can do what IMQ used to; it is however not related
to iptables/netfilter.
> Basically all I need is something I can use HTB on where the qos ingress
> box is on this diagram.
Yes you can attach a HTB. Look at the posted example in the previous
email and replace prio with HTB.
Not sure i answered your questions.
Again to emphasize: I will send patches only to people interested.
People have to ask directly;
this is my way of monitoring what is being tested. At some point i will
make the latest patches available to everyone.
cheers,
jamal
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
2004-04-15 12:15 ` jamal
@ 2004-04-15 19:35 ` Andy Furniss
2004-04-16 3:52 ` jamal
0 siblings, 1 reply; 35+ messages in thread
From: Andy Furniss @ 2004-04-15 19:35 UTC (permalink / raw)
To: hadi; +Cc: netdev
jamal wrote:
> On Thu, 2004-04-15 at 05:42, Andy Furniss wrote:
>
>
>>The only reason I use IMQ (+ NAT patch) is that I need to shape ingress
>>(I know I can't shape it "properly" from the wrong end of the bottleneck
>>without an intelligent app, but the ingress policer does not let me
>>share local and forwarded bandwidth and is not fair per user if I just
>>throttle the whole link).
>>
>>I am not sure if dummy will sort this for me, there may be some other way?
>
>
> The summary is dummy can do what IMQ used to; it is however not related
> to iptables/netfilter.
>
>
>>Basically all I need is something I can use HTB on where the qos ingress
>>box is on this diagram.
>
>
> Yes you can attach a HTB. Look at the posted example in the previous
> email and replace prio with HTB.
> Not sure i answered your questions.
What I want to know is what state IP packets will be in if I
filter/shape with dummy - In my case I would need them to have been
demasqued so I can tell the difference between local and to be forwarded
ingress traffic.
Ie. where on the KPTD would dummy be - IMQ appears twice and by using
the IMQ nat patch I can use the prerouting one to filter/shape the
packets after they are denatted.
Andy.
>
> Again to emphasize: I will send patches only to people interested.
> People have to ask directly;
> this is my way of monitoring what is being tested. At some point i will
> make the latest patches available to everyone.
>
> cheers,
> jamal
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
2004-04-15 19:35 ` Andy Furniss
@ 2004-04-16 3:52 ` jamal
2004-04-16 19:35 ` Andy Furniss
0 siblings, 1 reply; 35+ messages in thread
From: jamal @ 2004-04-16 3:52 UTC (permalink / raw)
To: Andy Furniss; +Cc: netdev
On Thu, 2004-04-15 at 15:35, Andy Furniss wrote:
> jamal wrote:
> What I want to know is what state IP packets will be in if I
Just to be sure, this is not specific just to IP; it could be ARP, IPX,
v6 etc.
>
> filter/shape with dummy - In my case I would need them to have been
> demasqued so I can tell the difference between local and to be forwarded
> ingress traffic.
The packets are grabbed before NAT on the way in and after NAT on the
way out.
Coming from non-local machines before NAT you can redirect to a dummy
device; and also be able to redirect on their way back to the non-local;
to use the example i posted earlier:
----
$TC qdisc add dev dummy0 root handle 1: prio
$TC qdisc add dev dummy0 parent 1:1 handle 10: sfq
$TC qdisc add dev dummy0 parent 1:2 handle 20: tbf rate 20kbit buffer
1600 limit
3000
$TC qdisc add dev dummy0 parent 1:3 handle 30:
sfq
$TC filter add dev dummy0 protocol ip pref 1 parent 1: handle 1 fw
classid 1:1
$TC filter add dev dummy0 protocol ip pref 2 parent 1: handle 2 fw
classid 1:2
ifconfig dummy0 up
#deal with ingress of eth0 first
$TC qdisc add dev eth0 ingress
# redirect all IP packets arriving from 10.0.0.21/24 in eth0 to dummy0
# use mark 1 --> puts them onto class 1:1 of dummy
#
$TC filter add dev eth0 parent ffff: protocol ip prio 10 u32 \
match ip src 10.0.0.21/24 flowid 1:1 \
action ipt -j MARK --set-mark 1 \
action mirred egress redirect dev dummy0
#deal with egress of eth0
$TC qdisc add dev eth0 root handle 1: prio
# redirect all IP packets going to 10.0.0.21/24 in eth0 to dummy0
# use mark 2 --> puts them onto class 1:2 of dummy
#
$TC filter add dev eth0 parent 1:0 protocol ip prio 10 u32 \
match ip dst 10.0.0.21/24 flowid 1:1 \
action ipt -j MARK --set-mark 2 \
action mirred egress redirect dev dummy0
-----
I havent tested the above but it should work (sans syntax bugs). If it
doesnt then we have a bug that needs fixing.
> Ie. where on the KPTD would dummy be - IMQ appears twice and by using
> the IMQ nat patch I can use the prerouting one to filter/shape the
> packets after they are denatted.
>
does the above help?
cheers,
jamal
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
2004-04-16 3:52 ` jamal
@ 2004-04-16 19:35 ` Andy Furniss
[not found] ` <1082145341.1026.125.camel@jzny.localdomain>
0 siblings, 1 reply; 35+ messages in thread
From: Andy Furniss @ 2004-04-16 19:35 UTC (permalink / raw)
To: hadi; +Cc: netdev
jamal wrote:
> On Thu, 2004-04-15 at 15:35, Andy Furniss wrote:
>
>>jamal wrote:
>
>
>>What I want to know is what state IP packets will be in if I
>
>
> Just to be sure, this is not specific just to IP; it could be ARP, IPX,
> v6 etc.
>
>
>>
>>filter/shape with dummy - In my case I would need them to have been
>>demasqued so I can tell the difference between local and to be forwarded
>>ingress traffic.
>
>
> The packets are grabbed before NAT on the way in and after NAT on the
> way out.
This is what I wanted to know. Is it possible to make an option to get
them after NAT in and pre NAT out?
> Coming from non-local machines before NAT you can redirect to a dummy
> device; and also be able to redirect on their way back to the non-local;
> to use the example i posted earlier:
>
> ----
> $TC qdisc add dev dummy0 root handle 1: prio
> $TC qdisc add dev dummy0 parent 1:1 handle 10: sfq
> $TC qdisc add dev dummy0 parent 1:2 handle 20: tbf rate 20kbit buffer
> 1600 limit
> 3000
> $TC qdisc add dev dummy0 parent 1:3 handle 30:
> sfq
>
> $TC filter add dev dummy0 protocol ip pref 1 parent 1: handle 1 fw
> classid 1:1
> $TC filter add dev dummy0 protocol ip pref 2 parent 1: handle 2 fw
> classid 1:2
>
> ifconfig dummy0 up
>
> #deal with ingress of eth0 first
> $TC qdisc add dev eth0 ingress
>
> # redirect all IP packets arriving from 10.0.0.21/24 in eth0 to dummy0
> # use mark 1 --> puts them onto class 1:1 of dummy
> #
> $TC filter add dev eth0 parent ffff: protocol ip prio 10 u32 \
> match ip src 10.0.0.21/24 flowid 1:1 \
> action ipt -j MARK --set-mark 1 \
> action mirred egress redirect dev dummy0
>
> #deal with egress of eth0
> $TC qdisc add dev eth0 root handle 1: prio
>
> # redirect all IP packets going to 10.0.0.21/24 in eth0 to dummy0
> # use mark 2 --> puts them onto class 1:2 of dummy
> #
> $TC filter add dev eth0 parent 1:0 protocol ip prio 10 u32 \
> match ip dst 10.0.0.21/24 flowid 1:1 \
> action ipt -j MARK --set-mark 2 \
> action mirred egress redirect dev dummy0
> -----
>
> I havent tested the above but it should work (sans syntax bugs). If it
> doesnt then we have a bug that needs fixing.
I don't think this applies to my setup Masqerading many local onto one
real address.
>
>
>>Ie. where on the KPTD would dummy be - IMQ appears twice and by using
>>the IMQ nat patch I can use the prerouting one to filter/shape the
>>packets after they are denatted.
>>
>
>
> does the above help?
Yes - Thanks.
Andy.
>
> cheers,
> jamal
>
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
[not found] ` <1082145341.1026.125.camel@jzny.localdomain>
@ 2004-04-17 10:39 ` Andy Furniss
2004-04-17 12:09 ` jamal
0 siblings, 1 reply; 35+ messages in thread
From: Andy Furniss @ 2004-04-17 10:39 UTC (permalink / raw)
To: hadi; +Cc: netdev
jamal wrote:
> On Fri, 2004-04-16 at 15:35, Andy Furniss wrote:
>
>
>>This is what I wanted to know. Is it possible to make an option to get
>>them after NAT in and pre NAT out?
>
>
> No i dont plan to. Why do you want to go that path?
I think it's the only way I can shape/share my ingress traffic between a
process (eg. bittorrent/squid) running on my shaping machine and
traffic that is forwarded to my LAN. I masquerade onto one real dynamic IP.
In the case of pre nat outbound - I know people can mark pre NAT and
shape on that, but it would allow people with big LANs doing NAT to use
WRR/ESFQ on src for egress traffic.
>
>
>>I don't think this applies to my setup Masqerading many local onto one
>>real address.
>
>
> If you have local on eth0(or substitute with whatever device you have
> local on), the example i gave should work fine. You just have to change
> the way you approach the setup. In case i didnt understand you, please
> post the details of your setup.
My setup is very simple - the only reason I use IMQ+NAT patch is because
I want to use my gateway/shaping PC to run bittorrent and I want the LAN
machines to have priority/fair share of incoming traffic. I guess my
setup is not that common - more common are people who run squid on the
same PC they shape/do NAT on.
ppp0 one dynamic real IP -> gateway PC -> eth0 -> LAN 192.168.0.0/24
|
-> local process.
Andy.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
2004-04-17 10:39 ` Andy Furniss
@ 2004-04-17 12:09 ` jamal
2004-04-17 21:56 ` Andy Furniss
2004-04-19 12:33 ` syrius.ml
0 siblings, 2 replies; 35+ messages in thread
From: jamal @ 2004-04-17 12:09 UTC (permalink / raw)
To: Andy Furniss; +Cc: netdev
On Sat, 2004-04-17 at 06:39, Andy Furniss wrote:
> > No i dont plan to. Why do you want to go that path?
>
> I think it's the only way I can shape/share my ingress traffic between a
> process (eg. bittorrent/squid) running on my shaping machine and
> traffic that is forwarded to my LAN. I masquerade onto one real dynamic IP.
I think i am almost understanding you now. Your main concern is people
using bittorrent to upload to you, correct?
Is there a way to recognize packets going to/from bittorent?
> In the case of pre nat outbound - I know people can mark pre NAT and
> shape on that, but it would allow people with big LANs doing NAT to use
> WRR/ESFQ on src for egress traffic.
Dont jump into the HOW; lets get to your setup and dissect it. Like i
said, dont think in terms of IMQ but still think in terms of meeting
your requirements.
Your setup is certainly new to me (at least from what i have been told
or read on how people use IMQ) - so thanks for posting. This is the kind
of thing i needed to hear about.
> My setup is very simple - the only reason I use IMQ+NAT patch is because
> I want to use my gateway/shaping PC to run bittorrent and I want the LAN
> machines to have priority/fair share of incoming traffic. I guess my
> setup is not that common - more common are people who run squid on the
> same PC they shape/do NAT on.
>
> ppp0 one dynamic real IP -> gateway PC -> eth0 -> LAN 192.168.0.0/24
> |
> -> local process.
Ok good. Assuming you have attached your HTB etc on one or more dummy
devices.
- packets from local Lan can be marked at ingress and redirect to a
dummy if needed. Infact you can do this on the egress at ppp0 as well
using the new tc -i <inputdev> that i introduced. So this is easy.
- packets from the bittorent process can be marked by iptables before
they get NATed (is this right?). Such packets can then be redirected to
dummy from egress of ppp0 using fw classifier. So again this is easy.
- The third path is packets that come in from ppp0, get demasquareded,
then have to either go a) to the LAN/eth0 or b)localhost bittorent
process. You want to restrict b) - is that correct? I have some
suggestion, but need you to verify this part.
cheers,
jamal
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
2004-04-17 12:09 ` jamal
@ 2004-04-17 21:56 ` Andy Furniss
2004-04-18 14:28 ` jamal
2004-04-19 12:33 ` syrius.ml
1 sibling, 1 reply; 35+ messages in thread
From: Andy Furniss @ 2004-04-17 21:56 UTC (permalink / raw)
To: hadi; +Cc: netdev
jamal wrote:
> On Sat, 2004-04-17 at 06:39, Andy Furniss wrote:
>
>
>>>No i dont plan to. Why do you want to go that path?
>>
>>I think it's the only way I can shape/share my ingress traffic between a
>> process (eg. bittorrent/squid) running on my shaping machine and
>>traffic that is forwarded to my LAN. I masquerade onto one real dynamic IP.
>
>
> I think i am almost understanding you now. Your main concern is people
> using bittorrent to upload to you, correct?
> Is there a way to recognize packets going to/from bittorent?
Quite possibly (though I think it uses connmark which I can't use as I
use connbytes to get new tcps out of slowstart).
I also sometimes use wget and I've seen posts on LARTC from people who
use squid and need to solve the same problem.
>
>
>>In the case of pre nat outbound - I know people can mark pre NAT and
>>shape on that, but it would allow people with big LANs doing NAT to use
>>WRR/ESFQ on src for egress traffic.
>
>
> Dont jump into the HOW; lets get to your setup and dissect it. Like i
> said, dont think in terms of IMQ but still think in terms of meeting
> your requirements.
> Your setup is certainly new to me (at least from what i have been told
> or read on how people use IMQ) - so thanks for posting. This is the kind
> of thing i needed to hear about.
>
>
>>My setup is very simple - the only reason I use IMQ+NAT patch is because
>>I want to use my gateway/shaping PC to run bittorrent and I want the LAN
>>machines to have priority/fair share of incoming traffic. I guess my
>>setup is not that common - more common are people who run squid on the
>>same PC they shape/do NAT on.
>>
>>ppp0 one dynamic real IP -> gateway PC -> eth0 -> LAN 192.168.0.0/24
>> |
>> -> local process.
>
>
>
> Ok good. Assuming you have attached your HTB etc on one or more dummy
> devices.
>
> - packets from local Lan can be marked at ingress and redirect to a
> dummy if needed. Infact you can do this on the egress at ppp0 as well
> using the new tc -i <inputdev> that i introduced. So this is easy.
>
> - packets from the bittorent process can be marked by iptables before
> they get NATed (is this right?). Such packets can then be redirected to
> dummy from egress of ppp0 using fw classifier. So again this is easy.
Yes - egress is sortable without IMQ.
>
> - The third path is packets that come in from ppp0, get demasquareded,
> then have to either go a) to the LAN/eth0 or b)localhost bittorent
> process. You want to restrict b)
Well not just restrict - dynamically share per IP total incoming
bandwidth with LAN traffic using HTB.
Andy.
- is that correct? I have some
> suggestion, but need you to verify this part.
>
> cheers,
> jamal
>
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
2004-04-17 21:56 ` Andy Furniss
@ 2004-04-18 14:28 ` jamal
2004-04-18 16:35 ` Andy Furniss
0 siblings, 1 reply; 35+ messages in thread
From: jamal @ 2004-04-18 14:28 UTC (permalink / raw)
To: Andy Furniss; +Cc: netdev
On Sat, 2004-04-17 at 17:56, Andy Furniss wrote:
> jamal wrote:
> > I think i am almost understanding you now. Your main concern is people
> > using bittorrent to upload to you, correct?
> > Is there a way to recognize packets going to/from bittorent?
>
> Quite possibly (though I think it uses connmark which I can't use as I
> use connbytes to get new tcps out of slowstart).
You are speaking Inuit to me. What is connmark? and what is the relation
to tcp slowstart.
> I also sometimes use wget and I've seen posts on LARTC from people who
> use squid and need to solve the same problem.
I am gonna assume that you have some way to recognize the flows destined
to localhost which you want to punish.
> >
> >
> >>ppp0 one dynamic real IP -> gateway PC -> eth0 -> LAN 192.168.0.0/24
> >> |
> >> -> local process.
> >
> >
> >
> > Ok good. Assuming you have attached your HTB etc on one or more dummy
> > devices.
> > - The third path is packets that come in from ppp0, get demasquareded,
> > then have to either go a) to the LAN/eth0 or b)localhost bittorent
> > process. You want to restrict b)
>
> Well not just restrict - dynamically share per IP total incoming
> bandwidth with LAN traffic using HTB.
Sure - thats assumed since you attach HTB to the dummy device.
To accomodate your need for b), the idea would be as follows:
packet gets demasquared, mark it with a fwmark based on some recognition
you have for bittorent or squid and lastly policy route it to the dummy
device based on fwmark (since routing happens last).
I will need to modify the dummy to not drop such packets which are
fwmarked.
cheers,
jamal
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
2004-04-18 14:28 ` jamal
@ 2004-04-18 16:35 ` Andy Furniss
2004-04-18 20:34 ` Andy Furniss
2004-04-18 20:53 ` jamal
0 siblings, 2 replies; 35+ messages in thread
From: Andy Furniss @ 2004-04-18 16:35 UTC (permalink / raw)
To: hadi; +Cc: netdev
jamal wrote:
> On Sat, 2004-04-17 at 17:56, Andy Furniss wrote:
>
>>jamal wrote:
>
>
>
>>>I think i am almost understanding you now. Your main concern is people
>>>using bittorrent to upload to you, correct?
>>>Is there a way to recognize packets going to/from bittorent?
>>
>>Quite possibly (though I think it uses connmark which I can't use as I
>>use connbytes to get new tcps out of slowstart).
>
>
> You are speaking Inuit to me. What is connmark? and what is the relation
> to tcp slowstart.
>
>
Connmark is a netfilter patch which is required by the type of P2P
limiting/marking projects on sf.net that could mark bittorrent traffic.
It is incompatable with the connbytes patch which I use to mark the
first x KB of new connections. Doing this lets me send new TCps to a
short queue which is capped at 50% of my bandwidth. This means that some
packets get dropped and the slowstart phase is ended before it's
exponential nature floods my ISP buffer.
Put another way - I can game without latency spikes while a couple of
people are browsing "heavy .jpg" type websites. It only works well if my
link is otherwise clear - but this is a common situation for my home
setup.
>>I also sometimes use wget and I've seen posts on LARTC from people who
>>use squid and need to solve the same problem.
>
>
> I am gonna assume that you have some way to recognize the flows destined
> to localhost which you want to punish.
>
>
>>>
>
>
>>>>ppp0 one dynamic real IP -> gateway PC -> eth0 -> LAN 192.168.0.0/24
>>>> |
>>>> -> local process.
>>>
>>>
>>>
>>>Ok good. Assuming you have attached your HTB etc on one or more dummy
>>>devices.
>
>
>>>- The third path is packets that come in from ppp0, get demasquareded,
>>>then have to either go a) to the LAN/eth0 or b)localhost bittorent
>>>process. You want to restrict b)
>>
>>Well not just restrict - dynamically share per IP total incoming
>>bandwidth with LAN traffic using HTB.
>
>
> Sure - thats assumed since you attach HTB to the dummy device.
>
> To accomodate your need for b), the idea would be as follows:
> packet gets demasquared, mark it with a fwmark
I guess you really mean mark then demasquerade.
> based on some recognition
> you have for bittorent or squid and lastly policy route it to the dummy
> device based on fwmark (since routing happens last).
> I will need to modify the dummy to not drop such packets which are
> fwmarked.
OK I can see this as a possibility - assuming I can mark. Maybe conmark
will be OK with connbytes sometime. I don't really know how to use it,
but if it is possible to mark egress connections in output and have
connmark match their incoming packets that would be a solution. I
haven't got a clue if connmark can do this, though, just speculating.
Does anyone else know, and why it's not compatable with connbytes?
Andy.
> cheers,
> jamal
>
>
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
2004-04-18 16:35 ` Andy Furniss
@ 2004-04-18 20:34 ` Andy Furniss
2004-04-18 21:07 ` jamal
2004-04-18 20:53 ` jamal
1 sibling, 1 reply; 35+ messages in thread
From: Andy Furniss @ 2004-04-18 20:34 UTC (permalink / raw)
To: Andy Furniss; +Cc: hadi, netdev
Andy Furniss wrote:
<snip>
>> To accomodate your need for b), the idea would be as follows:
>> packet gets demasquared, mark it with a fwmark
>
>
> I guess you really mean mark then demasquerade.
>
>> based on some recognition
>> you have for bittorent or squid and lastly policy route it to the dummy
>> device based on fwmark (since routing happens last).
>> I will need to modify the dummy to not drop such packets which are
>> fwmarked.
>
>
> OK I can see this as a possibility - assuming I can mark. Maybe conmark
> will be OK with connbytes sometime. I don't really know how to use it,
> but if it is possible to mark egress connections in output and have
> connmark match their incoming packets that would be a solution. I
> haven't got a clue if connmark can do this, though, just speculating.
Hmm second thoughts - if I can route packets to dummy after demasquerade
then I don't need to mark - I can use u32 as I do now to seperate per
IP. Am I missing something here?
>
> Does anyone else know, and why it's not compatable with connbytes?
>
> Andy.
>
>> cheers,
>> jamal
>>
>>
>>
>
>
>
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
2004-04-18 16:35 ` Andy Furniss
2004-04-18 20:34 ` Andy Furniss
@ 2004-04-18 20:53 ` jamal
2004-04-18 21:23 ` Martin Josefsson
1 sibling, 1 reply; 35+ messages in thread
From: jamal @ 2004-04-18 20:53 UTC (permalink / raw)
To: Andy Furniss; +Cc: netdev
On Sun, 2004-04-18 at 12:35, Andy Furniss wrote:
> Connmark is a netfilter patch which is required by the type of P2P
> limiting/marking projects on sf.net that could mark bittorrent traffic.
just from the sounds of it, appears it may be able to mark a group of
related flows with the same fwmark.
> It is incompatable with the connbytes patch which I use to mark the
> first x KB of new connections. Doing this lets me send new TCps to a
> short queue which is capped at 50% of my bandwidth. This means that some
> packets get dropped and the slowstart phase is ended before it's
> exponential nature floods my ISP buffer.
seems very similar in concept to what Alex (alex@pilotsoft.com) was
trying to achieve.
> > To accomodate your need for b), the idea would be as follows:
> > packet gets demasquared, mark it with a fwmark
>
> I guess you really mean mark then demasquerade.
Either should work fine.
> OK I can see this as a possibility - assuming I can mark. Maybe conmark
sounds like connmark maybe what you want.
> will be OK with connbytes sometime. I don't really know how to use it,
> but if it is possible to mark egress connections in output and have
> connmark match their incoming packets that would be a solution. I
> haven't got a clue if connmark can do this, though, just speculating.
>
> Does anyone else know, and why it's not compatable with connbytes?
>
some of the netfilter people should be able to help.
cheers,
jamal
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
2004-04-18 20:34 ` Andy Furniss
@ 2004-04-18 21:07 ` jamal
2004-04-18 21:31 ` Andy Furniss
0 siblings, 1 reply; 35+ messages in thread
From: jamal @ 2004-04-18 21:07 UTC (permalink / raw)
To: Andy Furniss; +Cc: netdev
On Sun, 2004-04-18 at 16:34, Andy Furniss wrote:
> Hmm second thoughts - if I can route packets to dummy after demasquerade
> then I don't need to mark - I can use u32 as I do now to seperate per
> IP. Am I missing something here?
The problem is dummy had some speacial reason for existence in the old
days of slip/ppp dummy acts as blackhole;
some apps insist(ed) on getting a static IP address
on primary interface when you are offline. So people would typically
setup routes to the dummy device where packets just get swalloed.
I have a feeling there are people who still use this functionality
somewhere in the globe (sorry i am from .ca dont know what that means
anymore;->). And i dont want to break this functionality.
So what i was thinking is i will have dummy spare any fwmarked packets
and reinject them back.
Another alternative is to just fsck this backward compatibility mode
because people could use blackhole routes today.
Yet another alternative is to create a brand new device and call it
something like imq2. For such little code, this may be overkill.
cheers,
jamal
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
2004-04-18 20:53 ` jamal
@ 2004-04-18 21:23 ` Martin Josefsson
2004-04-18 21:58 ` Andy Furniss
0 siblings, 1 reply; 35+ messages in thread
From: Martin Josefsson @ 2004-04-18 21:23 UTC (permalink / raw)
To: jamal; +Cc: Andy Furniss, netdev
[-- Attachment #1: Type: text/plain, Size: 1376 bytes --]
On Sun, 2004-04-18 at 22:53, jamal wrote:
> On Sun, 2004-04-18 at 12:35, Andy Furniss wrote:
>
> > Connmark is a netfilter patch which is required by the type of P2P
> > limiting/marking projects on sf.net that could mark bittorrent traffic.
>
> just from the sounds of it, appears it may be able to mark a group of
> related flows with the same fwmark.
connmark is like nfmark but it marks the connection-entry in
ip_conntrack instead. And then you can "restore" that mark to the nfmark
of the packet at any time you want with filter rules.
> > will be OK with connbytes sometime. I don't really know how to use it,
> > but if it is possible to mark egress connections in output and have
> > connmark match their incoming packets that would be a solution. I
> > haven't got a clue if connmark can do this, though, just speculating.
> >
> > Does anyone else know, and why it's not compatable with connbytes?
> >
>
> some of the netfilter people should be able to help.
with connmark you mark the connection, and then you can "restore" that
mark to packets in either direction in the mangle table of iptables.
connmark isn't incompatible with connbytes. It's just that both patches
modify the same part of the code, a struct, and the patch program can't
handle that. You'll have to fix some rejects by hand, that's it.
--
/Martin
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
2004-04-18 21:07 ` jamal
@ 2004-04-18 21:31 ` Andy Furniss
2004-04-18 21:45 ` Andy Furniss
0 siblings, 1 reply; 35+ messages in thread
From: Andy Furniss @ 2004-04-18 21:31 UTC (permalink / raw)
To: hadi; +Cc: netdev
jamal wrote:
> On Sun, 2004-04-18 at 16:34, Andy Furniss wrote:
>
>
>
>>Hmm second thoughts - if I can route packets to dummy after demasquerade
>>then I don't need to mark - I can use u32 as I do now to seperate per
>>IP. Am I missing something here?
>
>
> The problem is dummy had some speacial reason for existence in the old
> days of slip/ppp dummy acts as blackhole;
> some apps insist(ed) on getting a static IP address
> on primary interface when you are offline. So people would typically
> setup routes to the dummy device where packets just get swalloed.
>
> I have a feeling there are people who still use this functionality
> somewhere in the globe (sorry i am from .ca dont know what that means
> anymore;->). And i dont want to break this functionality.
> So what i was thinking is i will have dummy spare any fwmarked packets
> and reinject them back.
I think this would still be a solution for me - I allready mark
everything coming in on ppp0 in prerouting filter (pre demasquerade)
into three classes - interactive, new and
bulk. I then use u32 to further share bulk per dst IP post demasquerade
on the HTB/IMQ. So as long as I can route to dummy post demasquerade I
don't need IMQ. This would be alot better than messing around with connmark.
Andy.
> Another alternative is to just fsck this backward compatibility mode
> because people could use blackhole routes today.
> Yet another alternative is to create a brand new device and call it
> something like imq2. For such little code, this may be overkill.
>
> cheers,
> jamal
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
2004-04-18 21:31 ` Andy Furniss
@ 2004-04-18 21:45 ` Andy Furniss
0 siblings, 0 replies; 35+ messages in thread
From: Andy Furniss @ 2004-04-18 21:45 UTC (permalink / raw)
To: Andy Furniss; +Cc: hadi, netdev
Andy Furniss wrote:
> I think this would still be a solution for me - I allready mark
> everything coming in on ppp0 in prerouting filter
I meant mangle not filter of course :-)
Andy.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
2004-04-18 21:23 ` Martin Josefsson
@ 2004-04-18 21:58 ` Andy Furniss
2004-04-19 8:14 ` Martin Josefsson
0 siblings, 1 reply; 35+ messages in thread
From: Andy Furniss @ 2004-04-18 21:58 UTC (permalink / raw)
To: Martin Josefsson; +Cc: jamal, netdev
Martin Josefsson wrote:
> On Sun, 2004-04-18 at 22:53, jamal wrote:
>
>>On Sun, 2004-04-18 at 12:35, Andy Furniss wrote:
>>
>>
>>>Connmark is a netfilter patch which is required by the type of P2P
>>>limiting/marking projects on sf.net that could mark bittorrent traffic.
>>
>>just from the sounds of it, appears it may be able to mark a group of
>>related flows with the same fwmark.
>
>
> connmark is like nfmark but it marks the connection-entry in
> ip_conntrack instead. And then you can "restore" that mark to the nfmark
> of the packet at any time you want with filter rules.
>
>
>>>will be OK with connbytes sometime. I don't really know how to use it,
>>>but if it is possible to mark egress connections in output and have
>>>connmark match their incoming packets that would be a solution. I
>>>haven't got a clue if connmark can do this, though, just speculating.
>>>
>>>Does anyone else know, and why it's not compatable with connbytes?
>>>
>>
>>some of the netfilter people should be able to help.
>
>
> with connmark you mark the connection, and then you can "restore" that
> mark to packets in either direction in the mangle table of iptables.
>
> connmark isn't incompatible with connbytes. It's just that both patches
> modify the same part of the code, a struct, and the patch program can't
> handle that. You'll have to fix some rejects by hand, that's it.
>
Thanks for that - though I hope not to have to use it now, just to
confirm - does it work in all of the 5 mangle tables or more
specifically could I mark every connection from local processes in
output and restore the marks in prerouting?
Andy.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
2004-04-18 21:58 ` Andy Furniss
@ 2004-04-19 8:14 ` Martin Josefsson
0 siblings, 0 replies; 35+ messages in thread
From: Martin Josefsson @ 2004-04-19 8:14 UTC (permalink / raw)
To: Andy Furniss; +Cc: jamal, netdev
On Sun, 18 Apr 2004, Andy Furniss wrote:
> > with connmark you mark the connection, and then you can "restore" that
> > mark to packets in either direction in the mangle table of iptables.
> >
> > connmark isn't incompatible with connbytes. It's just that both patches
> > modify the same part of the code, a struct, and the patch program can't
> > handle that. You'll have to fix some rejects by hand, that's it.
> >
>
> Thanks for that - though I hope not to have to use it now, just to
> confirm - does it work in all of the 5 mangle tables or more
> specifically could I mark every connection from local processes in
> output and restore the marks in prerouting?
It works in all of the 5 mangle chains iirc.
/Martin
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
2004-04-17 12:09 ` jamal
2004-04-17 21:56 ` Andy Furniss
@ 2004-04-19 12:33 ` syrius.ml
1 sibling, 0 replies; 35+ messages in thread
From: syrius.ml @ 2004-04-19 12:33 UTC (permalink / raw)
To: hadi; +Cc: Andy Furniss, netdev
[...]
> - packets from local Lan can be marked at ingress and redirect to a
> dummy if needed. Infact you can do this on the egress at ppp0 as well
> using the new tc -i <inputdev> that i introduced. So this is easy.
Hmm I don't see -i option in the patches.
what's the point of having a -i <inputdev> option ?
does it mean I can work on egress at ppp0 and use -i eth0 to match packets
that were originaly from eth0 ?
I personnaly use netfilter to mark packets and 'handle $mark fw' to
filter them. (that may explain why i don't see the point :-)
--
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
@ 2004-04-19 14:22 syrius.ml
2004-04-20 2:15 ` jamal
0 siblings, 1 reply; 35+ messages in thread
From: syrius.ml @ 2004-04-19 14:22 UTC (permalink / raw)
To: hadi; +Cc: netdev
Ok it's seems to be working as expected for ipv4 traffic:
Here is how i'm actually using it:
(i use netfilter (for both ipv4 & ipv6) to mark packets)
OUT=dummy0
$TC qdisc add dev $OUT root handle 1: htb default 20
# CLASSES
$TC class add dev $OUT parent 1: classid 1:1 htb rate $UL ceil $ULC \
prio 0
$TC class add dev $OUT parent 1:1 classid 1:10 htb rate \
$UL1 ceil $ULC1 quantum $QU1 prio 1
$TC class add dev $OUT parent 1:1 classid 1:2 htb rate $UL01 ceil \
$ULC01 quantum $QU01 prio 2
$TC class add dev $OUT parent 1:2 classid 1:20 htb rate $UL2 ceil \
$ULC2 quantum $QU2 prio 3
$TC class add dev $OUT parent 1:2 classid 1:3 htb rate $UL02 ceil \
$ULC02 quantum $QU02 prio 4
$TC class add dev $OUT parent 1:3 classid 1:30 htb rate $UL3 ceil \
$ULC3 quantum $QU3 prio 5
$TC class add dev $OUT parent 1:3 classid 1:40 htb rate $UL4 ceil \
$ULC4 quantum $QU4 prio 5
$TC class add dev $OUT parent 1:3 classid 1:50 htb rate $UL5 ceil \
$ULC5 quantum $QU5 prio 7
$TC qdisc add dev $OUT parent 1:10 handle 110: pfifo limit 50
$TC qdisc add dev $OUT parent 1:20 handle 120: sfq perturb 10
$TC qdisc add dev $OUT parent 1:30 handle 130: sfq perturb 10
$TC qdisc add dev $OUT parent 1:40 handle 140: sfq perturb 10
$TC qdisc add dev $OUT parent 1:50 handle 150: sfq perturb 10
# FILTERS
$TC filter add dev $OUT parent 1: protocol ip prio 10 handle 1 fw \
flowid 1:10
$TC filter add dev $OUT parent 1: protocol ipv6 prio 11 handle 1 fw \
flowid 1:10
$TC filter add dev $OUT parent 1: protocol ip prio 12 handle 2 fw \
flowid 1:20
$TC filter add dev $OUT parent 1: protocol ipv6 prio 13 handle 2 fw \
flowid 1:20
$TC filter add dev $OUT parent 1: protocol ip prio 14 handle 3 fw \
flowid 1:30
$TC filter add dev $OUT parent 1: protocol ipv6 prio 15 handle 3 fw \
flowid 1:30
$TC filter add dev $OUT parent 1: protocol ip prio 16 handle 4 fw \
flowid 1:40
$TC filter add dev $OUT parent 1: protocol ipv6 prio 17 handle 4 fw \
flowid 1:40
$TC filter add dev $OUT parent 1: protocol ip prio 18 handle 5 fw \
flowid 1:50
$TC filter add dev $OUT parent 1: protocol ipv6 prio 19 handle 5 fw \
flowid 1:50
$IP link set $OUT up
$TC qdisc add dev ppp0 root handle 1: prio
$TC filter add dev ppp0 parent 1:0 protocol ip prio 10 u32 \
match u32 0 0 flowid 1:1 action mirred egress redirect dev dummy0
$TC qdisc add dev tun0 root handle 1: prio
$TC filter add dev tun0 parent 1:0 protocol ip prio 10 u32 \
match u32 0 0 flowid 1:1 action mirred egress redirect dev dummy0
$TC qdisc add dev sit1 root handle 1: prio
$TC filter add dev sit1 parent 1:0 protocol ipv6 prio 10 u32 \
match u32 0 0 flowid 1:1 action mirred egress redirect dev dummy0
but it doesn't work with ipv6 traffic.
If I try to ping6 somehost, i sometimes get "ping: sendmsg: No buffer
space available" messages
anyway, there's nothing going out on sit1.
Is it the correct way to do it ?
--
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
2004-04-19 14:22 IMQ / new Dummy device post syrius.ml
@ 2004-04-20 2:15 ` jamal
2004-04-21 1:43 ` syrius.ml
0 siblings, 1 reply; 35+ messages in thread
From: jamal @ 2004-04-20 2:15 UTC (permalink / raw)
To: syrius.ml; +Cc: netdev
On Mon, 2004-04-19 at 10:22, syrius.ml@no-log.org wrote:
[..]
If you already marked the packets before they hit egress then you
dont need use the ipt mark action. So what you are doing is correct
> $TC qdisc add dev ppp0 root handle 1: prio
> $TC filter add dev ppp0 parent 1:0 protocol ip prio 10 u32 \
> match u32 0 0 flowid 1:1 action mirred egress redirect dev dummy0
Note: this will do for ipv4; if you want ipv6 add a new rule,
in addition to above if you want ipv4, with "protocol ip" replaced by
"protocol ipv6"
> $TC qdisc add dev tun0 root handle 1: prio
> $TC filter add dev tun0 parent 1:0 protocol ip prio 10 u32 \
> match u32 0 0 flowid 1:1 action mirred egress redirect dev dummy0
>
> $TC qdisc add dev sit1 root handle 1: prio
> $TC filter add dev sit1 parent 1:0 protocol ipv6 prio 10 u32 \
> match u32 0 0 flowid 1:1 action mirred egress redirect dev dummy0
not sure if you need the above but i dont know your setup sufficiently
to be 100%
> but it doesn't work with ipv6 traffic.
> If I try to ping6 somehost, i sometimes get "ping: sendmsg: No buffer
> space available" messages
> anyway, there's nothing going out on sit1.
>
> Is it the correct way to do it ?
Seems right. Try adding the new ipv6 rule on ppp0 and if you are still
having problems try dumping some stats for the filters and see if they
are incrementing. eg
tc -s filter show parent 1:0 dev ppp0
also a ifconfig on the dummy0 should show starts going up
cheers,
jamal
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
2004-04-20 2:15 ` jamal
@ 2004-04-21 1:43 ` syrius.ml
2004-04-21 12:49 ` syrius.ml
0 siblings, 1 reply; 35+ messages in thread
From: syrius.ml @ 2004-04-21 1:43 UTC (permalink / raw)
To: hadi; +Cc: netdev
jamal <hadi@cyberus.ca> writes:
[...]
>> $TC qdisc add dev ppp0 root handle 1: prio
>> $TC filter add dev ppp0 parent 1:0 protocol ip prio 10 u32 \
>> match u32 0 0 flowid 1:1 action mirred egress redirect dev dummy0
> Note: this will do for ipv4; if you want ipv6 add a new rule,
> in addition to above if you want ipv4, with "protocol ip" replaced by
> "protocol ipv6"
>> $TC qdisc add dev tun0 root handle 1: prio
>> $TC filter add dev tun0 parent 1:0 protocol ip prio 10 u32 \
>> match u32 0 0 flowid 1:1 action mirred egress redirect dev dummy0
>> $TC qdisc add dev sit1 root handle 1: prio
>> $TC filter add dev sit1 parent 1:0 protocol ipv6 prio 10 u32 \
>> match u32 0 0 flowid 1:1 action mirred egress redirect dev dummy0
> not sure if you need the above but i dont know your setup sufficiently
> to be 100%
using 'protocol ipv6' on ppp0 rather than sit1 did the trick.
It's even simplier ! I don't have to create filters for each ipv6 tunnel.
Considering the ipv4 over udp tun0 tunnel, i guess i should prevent
those udp packets to be matched by the filter on ppp0.
I'll optimize it later.
Cheers
--
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
2004-04-21 1:43 ` syrius.ml
@ 2004-04-21 12:49 ` syrius.ml
2004-04-21 20:19 ` syrius.ml
0 siblings, 1 reply; 35+ messages in thread
From: syrius.ml @ 2004-04-21 12:49 UTC (permalink / raw)
To: hadi; +Cc: netdev
skput:under: c88c93aa:98 put:14 dev:tun0kernel BUG at skbuff.c:113!
invalid operand: 0000
CPU: 0
EIP: 0010:[<c01bc2e9>] Tainted: P
EFLAGS: 00010286
eax: 00000028 ebx: c46afe60 ecx: 00000000 edx: c6a5e000
esi: c88c6780 edi: 00000001 ebp: c5573d78 esp: c5573d64
ds: 0018 es: 0018 ss: 0018
Process openvpn (pid: 940, stackpage=c5573000)
Stack: c022a180 c88c93aa 00000062 0000000e c5db4824 c5573da8 c88c93b2
c582e3a0
0000000e c88c93aa 42924e9d c58eccc0 c582ebe0 c582e3a0 c57dae00
c582ebe0
c5f17f20 c5573dbc c01cd277 c5573dc4 c57dae00 c5573e60 c5573e28
c890f104
Call Trace: [<c88c93aa>] [<c88c93b2>] [<c88c93aa>] [<c01cd277>]
[<c890f104>]
[<c01bc5bd>] [<c01bc736>] [<c01f20fc>] [<c89123f9>] [<c89120fc>]
[<c01c0a6a>]
[<c01c0b6a>] [<c01c0d66>] [<c01c0e83>] [<c0119507>] [<c892cc35>]
[<c01347a0>]
[<c0134883>] [<c0106ff3>]
Code: 0f 0b 71 00 0a 92 22 c0 89 ec 5d c3 8d 74 26 00 8d bc 27 00
<0>Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing
I'm going to recompile the kernel with frame pointers, and i'll feed
the oops thru ksymoops.
First I have to narrow the problem so I can tell how to reproduce it,
and I'll give more informations.
--
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
2004-04-21 12:49 ` syrius.ml
@ 2004-04-21 20:19 ` syrius.ml
2004-04-22 13:16 ` jamal
0 siblings, 1 reply; 35+ messages in thread
From: syrius.ml @ 2004-04-21 20:19 UTC (permalink / raw)
To: hadi; +Cc: netdev
[-- Attachment #1: Type: text/plain, Size: 1192 bytes --]
ok, i'm able to reproduce it with a simpler setup.
Let's consider I'm using the new dummy device on machine connected to
a ethernet lan. this host is using openvpn to establish a vpn tunnel.
debian:~# ip a l dev eth0
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:0c:29:25:4a:b6 brd ff:ff:ff:ff:ff:ff
inet 192.168.5.4/24 brd 192.168.5.255 scope global eth0
debian:~# ip a l dev tun0
4: tun0: <POINTOPOINT,MULTICAST,NOARP,UP> mtu 1447 qdisc pfifo_fast
qlen 10
link/ppp
inet 172.16.0.2 peer 172.16.0.1/32 scope global tun0
I attach the script i'm using to setup a simple new dummy+htb setup.
it's very simple, I do not use iptables to mark packets, i do not use
filter with htb, everything goes to the default classes (1:20 & 2:20)
I can verify it's working as expected with a simple ping -f 192.168.5.1
(or a ping -f 192.168.5.4 from 192.168.5.1)
after the ping -f 192.168.5.1 (let's say i let it run 30sec), if a do
ping 172.16.0.1 the oops appears !
I attach the result of ksymoops.
Please tell me if you're able to reproduce it.
I'm ok to try with another vpn software, but I don't think it has
anything to do with openvpn.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: qos script --]
[-- Type: text/x-sh, Size: 1830 bytes --]
#!/bin/sh
TC=/usr/local/iproute2/sbin/tc
IP=/usr/local/iproute2/sbin/ip
OUT=dummy0
IN=dummy1
tc qdisc ls | sed -e 's/^.*dev \([a-zA-Z0-9]\+\) .*$/\1/' | sort -u | \
while read a; do
tc qdisc del dev $a root &>/dev/null
tc qdisc del dev $a ingress &>/dev/null
done
ifconfig $IN down &> /dev/null
ifconfig $OUT down &> /dev/null
if [ "$1" = "stop" ]
then
exit
fi
###### uplink
# ROOT
$TC qdisc add dev $OUT root handle 1: htb default 20
# CLASSES
$TC class add dev $OUT parent 1: classid 1:1 htb rate 200kbit ceil 200kbit prio 0
$TC class add dev $OUT parent 1:1 classid 1:10 htb rate 100kbit ceil 200kbit prio 1
$TC class add dev $OUT parent 1:1 classid 1:20 htb rate 100kbit ceil 100kbit prio 2
$IP link set $OUT up
$TC qdisc add dev eth0 root handle 1: prio
$TC filter add dev eth0 parent 1:0 protocol ip prio 10 u32 \
match u32 0 0 flowid 1:1 action mirred egress redirect dev $OUT
$TC qdisc add dev tun0 root handle 1: prio
$TC filter add dev tun0 parent 1:0 protocol ip prio 10 u32 \
match u32 0 0 flowid 1:1 action mirred egress redirect dev $OUT
modprobe dummy
###### downlink
$TC qdisc add dev $IN root handle 2: htb default 20
# CLASSES
$TC class add dev $IN parent 2: classid 2:1 htb rate 100kbit ceil 200kbit prio 0
$TC class add dev $IN parent 2:1 classid 2:10 htb rate 100kbit ceil 200kbit prio 1
$TC class add dev $IN parent 2:1 classid 2:20 htb rate 100kbit ceil 100kbit prio 3
$IP link set $IN up
$TC qdisc add dev eth0 ingress
$TC filter add dev eth0 parent ffff: protocol ip prio 10 u32 \
match u32 0 0 flowid 1:1 action mirred egress redirect dev $IN
$TC qdisc add dev tun0 ingress
$TC filter add dev tun0 parent ffff: protocol ip prio 10 u32 \
match u32 0 0 flowid 1:1 action mirred egress redirect dev $IN
[-- Attachment #3: ksymoops result --]
[-- Type: text/plain, Size: 2859 bytes --]
ksymoops 2.4.9 on i686 2.4.25. Options used
-v /mnt/vmlinux (specified)
-k /mnt/ksyms (specified)
-l /mnt/modules (specified)
-o /lib/modules/2.4.25 (specified)
-m /mnt/System.map (specified)
skput:under: c88863d8:126 put:14 dev:tun0kernel BUG at skbuff.c:113!
invalid operand: 0000
CPU: 0
EIP: 0010:[<c01e088b>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00000282
eax: 00000029 ebx: c639ff40 ecx: c6434000 edx: c6777f7c
esi: c8888780 edi: 00000001 ebp: c6435db4 esp: c6435da0
ds: 0018 es: 0018 ss: 0018
Process openvpn (pid: 415, stackpage=c6435000)
Stack: c025e0a0 c88863d8 0000007e 0000000e c6706224 c6435de4 c88863e4 c67c0480
0000000e c88863d8 c1167420 c61eae00 c67c0540 c67c0480 c5ce8dc0 00000000
c6424880 c6435df8 c01f294d c6435e00 c5ce8dc0 c67c0540 c6435e64 c888311a
Call Trace: [<c88863d8>] [<c88863e4>] [<c88863d8>] [<c01f294d>] [<c888311a>]
[<c021f17d>] [<c888a462>] [<c888a12b>] [<c01e5108>] [<c01e520f>] [<c01e54a9>]
[<c01e55aa>] [<c011a854>] [<c8843c9f>] [<c8843353>] [<c013642b>] [<c01071bb>]
Code: 0f 0b 71 00 27 d1 25 c0 c9 c3 8d 74 26 00 8d bc 27 00 00 00
>>EIP; c01e088b <skb_under_panic+3b/50> <=====
>>ebx; c639ff40 <_end+609c4e0/84fc620>
>>ecx; c6434000 <_end+61305a0/84fc620>
>>edx; c6777f7c <_end+647451c/84fc620>
>>esi; c8888780 <[dummy]__module_license+a8/19a8>
>>ebp; c6435db4 <_end+6132354/84fc620>
>>esp; c6435da0 <_end+6132340/84fc620>
Trace; c88863d8 <[mirred]tcf_mirred+168/1d0>
Trace; c88863e4 <[mirred]tcf_mirred+174/1d0>
Trace; c88863d8 <[mirred]tcf_mirred+168/1d0>
Trace; c01f294d <tcf_action_exec+5d/90>
Trace; c888311a <[cls_u32]u32_classify+9a/1d0>
Trace; c021f17d <inet_recvmsg+4d/70>
Trace; c888a462 <[sch_ingress]tc_classify+52/cf>
Trace; c888a12b <[sch_ingress]ingress_enqueue+2b/80>
Trace; c01e5108 <ing_filter+68/c0>
Trace; c01e520f <netif_receive_skb+af/2d0>
Trace; c01e54a9 <process_backlog+79/110>
Trace; c01e55aa <net_rx_action+6a/100>
Trace; c011a854 <do_softirq+94/a0>
Trace; c8843c9f <[tun]tun_get_user+df/165>
Trace; c8843353 <[tun]tun_chr_write+33/40>
Trace; c013642b <sys_write+9b/130>
Trace; c01071bb <system_call+33/38>
Code; c01e088b <skb_under_panic+3b/50>
00000000 <_EIP>:
Code; c01e088b <skb_under_panic+3b/50> <=====
0: 0f 0b ud2a <=====
Code; c01e088d <skb_under_panic+3d/50>
2: 71 00 jno 4 <_EIP+0x4>
Code; c01e088f <skb_under_panic+3f/50>
4: 27 daa
Code; c01e0890 <skb_under_panic+40/50>
5: d1 25 c0 c9 c3 8d shll 0x8dc3c9c0
Code; c01e0896 <skb_under_panic+46/50>
b: 74 26 je 33 <_EIP+0x33>
Code; c01e0898 <skb_under_panic+48/50>
d: 00 8d bc 27 00 00 add %cl,0x27bc(%ebp)
<0>Kernel panic: Aiee, killing interrupt handler!
[-- Attachment #4: Type: text/plain, Size: 6 bytes --]
--
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
2004-04-21 20:19 ` syrius.ml
@ 2004-04-22 13:16 ` jamal
2004-04-22 17:43 ` syrius.ml
0 siblings, 1 reply; 35+ messages in thread
From: jamal @ 2004-04-22 13:16 UTC (permalink / raw)
To: syrius.ml; +Cc: netdev
Hi there,
On Wed, 2004-04-21 at 16:19, syrius.ml@no-log.org wrote:
> ok, i'm able to reproduce it with a simpler setup.
>
> Let's consider I'm using the new dummy device on machine connected to
> a ethernet lan. this host is using openvpn to establish a vpn tunnel.
[..]
> after the ping -f 192.168.5.1 (let's say i let it run 30sec), if a do
> ping 172.16.0.1 the oops appears !
>
> I attach the result of ksymoops.
>
> Please tell me if you're able to reproduce it.
> I'm ok to try with another vpn software, but I don't think it has
> anything to do with openvpn.
>
It may be more related to the tun device that software uses. Tun is an
interesting netdevice. I dont have a setup to reproduce this.
BTW, doesnt the packet eventually make it to eth0 coming from the vpn?
Also the other direction is true (always starts at the eth0 level); if
yes, why do you have to redirect packets from the tap device?
Try the following to debug:
remove the egress qdisc from the tap device and run the test.
(this part: $TC qdisc add dev tun0 root handle 1: prio)
If thats till ooopses, remove the ingress attachment to the tun.
And if that still fails, compile both tun and dummy into the kernel
(as opposed to modules) and reproduce the oops.
Additionaly some useful tools are stats on the dummy devices as well
as the actions (example: tc -s filter ls dev eth0 parent ffff:)
cheers,
jamal
________________________________________________________________________
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
2004-04-22 13:16 ` jamal
@ 2004-04-22 17:43 ` syrius.ml
2004-04-23 11:29 ` jamal
0 siblings, 1 reply; 35+ messages in thread
From: syrius.ml @ 2004-04-22 17:43 UTC (permalink / raw)
To: hadi; +Cc: netdev
Hi,
It oops when using the ingress qdisc + action mirred egress redirect
filter on tun0. (no egress at all, no ingress on eth0)
It doesn't oops using an ingress qdisc + a simple police+drop filter
on tun0...
--
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
2004-04-22 17:43 ` syrius.ml
@ 2004-04-23 11:29 ` jamal
2004-04-24 14:14 ` tun device - bug or feature? WAS(Re: " jamal
0 siblings, 1 reply; 35+ messages in thread
From: jamal @ 2004-04-23 11:29 UTC (permalink / raw)
To: syrius.ml; +Cc: netdev
Hi there,
On Thu, 2004-04-22 at 13:43, syrius.ml@no-log.org wrote:
> Hi,
>
> It oops when using the ingress qdisc + action mirred egress redirect
> filter on tun0. (no egress at all, no ingress on eth0)
> It doesn't oops using an ingress qdisc + a simple police+drop filter
> on tun0...
Ok, so you have narrowed it down to mirred, tun and ingress qdisc - is
that correct? Were you using openvpn to recreate this?
BTW, would this happen if you dont issue the ping -f initially in above
setup? If yes, before you send create the problem can you send a few
pings through tap device and send me output of dmesg?
I just did a simple test with a basic program and couldnt reproduce it.
I dont have the proper setup, can you do a basic test with some other
tunneling s/ware?
The doc for tun mentions:
http://vtun.sourceforge.net and http://perso.enst.fr/~beyssac/pipsec/
Please compile in tun and dummy into the kernel.
BTW, i think we should take this offline; send your response directly to
me. Anyone else interested in this conversation email both of us and we
will cc you.
cheers,
jamal
^ permalink raw reply [flat|nested] 35+ messages in thread
* tun device - bug or feature? WAS(Re: IMQ / new Dummy device post.
2004-04-23 11:29 ` jamal
@ 2004-04-24 14:14 ` jamal
2004-04-26 4:38 ` David S. Miller
2004-04-26 19:31 ` Max Krasnyansky
0 siblings, 2 replies; 35+ messages in thread
From: jamal @ 2004-04-24 14:14 UTC (permalink / raw)
To: netdev; +Cc: syrius.ml, Maxim Krasnyansky, Jeff Garzik, David S. Miller
Maxim,
When TUN_TUN_DEV is used, before the packet is injected into
the netif_rx() only skb->mac.raw = skb->data is set; the other headers
are not adjusted. Typically netdevs would do a
skb_pull(skb,dev->hard_header_len) to make the adjustment.
I have a feeling this is design intent thats why i didnt send you a
patch.
Jeff, Dave: Would it be fair to say when packets get injected into the
stack by a netdev via netif_rx(), the skb headers are expected to be
ponting into some specific places? I am not sure if theres a hard
fastened rule defined anywhere.
cheers,
jamal
On Fri, 2004-04-23 at 07:29, jamal wrote:
> Hi there,
>
> On Thu, 2004-04-22 at 13:43, syrius.ml@no-log.org wrote:
> > Hi,
> >
> > It oops when using the ingress qdisc + action mirred egress redirect
> > filter on tun0. (no egress at all, no ingress on eth0)
> > It doesn't oops using an ingress qdisc + a simple police+drop filter
> > on tun0...
>
> Ok, so you have narrowed it down to mirred, tun and ingress qdisc - is
> that correct? Were you using openvpn to recreate this?
> BTW, would this happen if you dont issue the ping -f initially in above
> setup? If yes, before you send create the problem can you send a few
> pings through tap device and send me output of dmesg?
>
> I just did a simple test with a basic program and couldnt reproduce it.
> I dont have the proper setup, can you do a basic test with some other
> tunneling s/ware?
> The doc for tun mentions:
> http://vtun.sourceforge.net and http://perso.enst.fr/~beyssac/pipsec/
> Please compile in tun and dummy into the kernel.
>
> BTW, i think we should take this offline; send your response directly to
> me. Anyone else interested in this conversation email both of us and we
> will cc you.
>
> cheers,
> jamal
>
>
>
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: tun device - bug or feature? WAS(Re: IMQ / new Dummy device post.
2004-04-24 14:14 ` tun device - bug or feature? WAS(Re: " jamal
@ 2004-04-26 4:38 ` David S. Miller
2004-04-26 19:31 ` Max Krasnyansky
1 sibling, 0 replies; 35+ messages in thread
From: David S. Miller @ 2004-04-26 4:38 UTC (permalink / raw)
To: hadi; +Cc: netdev, syrius.ml, maxk, jgarzik
On 24 Apr 2004 10:14:43 -0400
jamal <hadi@cyberus.ca> wrote:
> Jeff, Dave: Would it be fair to say when packets get injected into the
> stack by a netdev via netif_rx(), the skb headers are expected to be
> ponting into some specific places? I am not sure if theres a hard
> fastened rule defined anywhere.
What do ipv4 tunnels do? They merely modify 'nh' and 'mac' ".raw" and
pass the packet in.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: tun device - bug or feature? WAS(Re: IMQ / new Dummy device post.
2004-04-24 14:14 ` tun device - bug or feature? WAS(Re: " jamal
2004-04-26 4:38 ` David S. Miller
@ 2004-04-26 19:31 ` Max Krasnyansky
2004-04-27 2:22 ` jamal
2004-05-08 11:55 ` jamal
1 sibling, 2 replies; 35+ messages in thread
From: Max Krasnyansky @ 2004-04-26 19:31 UTC (permalink / raw)
To: hadi; +Cc: netdev, syrius.ml, Jeff Garzik, David S. Miller
On Sat, 2004-04-24 at 07:14, jamal wrote:
> Maxim,
>
> When TUN_TUN_DEV is used, before the packet is injected into
> the netif_rx() only skb->mac.raw = skb->data is set; the other headers
> are not adjusted. Typically netdevs would do a
> skb_pull(skb,dev->hard_header_len) to make the adjustment.
> I have a feeling this is design intent thats why i didnt send you a
> patch.
Well TUN does not have any hw headers so there is nothing to pull :).
Basically it does what ever PPP driver does. Which is
skb_pull(skb, 2); /* chop off protocol */
skb->dev = ppp->dev;
skb->protocol = htons(npindex_to_ethertype[npi]);
skb->mac.raw = skb->data;
netif_rx(skb);
Max
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: tun device - bug or feature? WAS(Re: IMQ / new Dummy device post.
2004-04-26 19:31 ` Max Krasnyansky
@ 2004-04-27 2:22 ` jamal
2004-05-08 11:55 ` jamal
1 sibling, 0 replies; 35+ messages in thread
From: jamal @ 2004-04-27 2:22 UTC (permalink / raw)
To: Max Krasnyansky; +Cc: netdev, syrius.ml, Jeff Garzik, David S. Miller
On Mon, 2004-04-26 at 15:31, Max Krasnyansky wrote:
> Well TUN does not have any hw headers so there is nothing to pull :).
didnt notice the dev->hard_header_len being 0 before ;->
In that case it makes sense to have nothing to pull.
Theres about 5 devices like that.
I need to rethink a little on behavior of mirred with devices that have
no hardware headers. I may speacial case them.
cheers,
jamal
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: tun device - bug or feature? WAS(Re: IMQ / new Dummy device post.
2004-04-26 19:31 ` Max Krasnyansky
2004-04-27 2:22 ` jamal
@ 2004-05-08 11:55 ` jamal
2004-05-10 17:18 ` Max Krasnyansky
1 sibling, 1 reply; 35+ messages in thread
From: jamal @ 2004-05-08 11:55 UTC (permalink / raw)
To: Max Krasnyansky; +Cc: netdev, syrius.ml, Jeff Garzik, David S. Miller
Max, Dave, Jeff,
I get what was bothering me now - it took me a while to formulate it:
TUN_TUN_DEV dev->type is ARPHRD_PPP
dev->type is really related to link layer header, perhaps at the low
level if neighbor discovery works well then we have a link-headerless
packet which gets manipulated with the correct header by some generic
code. The combination of dev->type and dev->hard_header_len works
together to achieve this.
In the case of TUN_TUN_DEV, the header_len is 0 ;->
To be of type ARPHRD_PPP, tun needs to have a header_len which is the
size of the l2 ppp header.
As an example, TUN_TAP_DEV is fine as type ARPHRD_ETHER and header_len
of ETH_HLEN.
A lot of devices are abusing this system, tun is not the only one.
My suggestion is to change dev->type to ARPHRD_VOID for TUN_TUN_DEV or
we introduce something like ARPHDR_NONE for devices with link layer
headers of size 0.
thoughts?
cheers,
jamal
On Mon, 2004-04-26 at 15:31, Max Krasnyansky wrote:
> On Sat, 2004-04-24 at 07:14, jamal wrote:
> > Maxim,
> >
> > When TUN_TUN_DEV is used, before the packet is injected into
> > the netif_rx() only skb->mac.raw = skb->data is set; the other headers
> > are not adjusted. Typically netdevs would do a
> > skb_pull(skb,dev->hard_header_len) to make the adjustment.
> > I have a feeling this is design intent thats why i didnt send you a
> > patch.
> Well TUN does not have any hw headers so there is nothing to pull :).
> Basically it does what ever PPP driver does. Which is
>
> skb_pull(skb, 2); /* chop off protocol */
> skb->dev = ppp->dev;
> skb->protocol = htons(npindex_to_ethertype[npi]);
> skb->mac.raw = skb->data;
> netif_rx(skb);
>
> Max
>
>
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: tun device - bug or feature? WAS(Re: IMQ / new Dummy device post.
2004-05-08 11:55 ` jamal
@ 2004-05-10 17:18 ` Max Krasnyansky
2004-06-05 13:24 ` PATCH: " jamal
0 siblings, 1 reply; 35+ messages in thread
From: Max Krasnyansky @ 2004-05-10 17:18 UTC (permalink / raw)
To: hadi; +Cc: netdev, syrius.ml, Jeff Garzik, David S. Miller
On Sat, 2004-05-08 at 04:55, jamal wrote:
> Max, Dave, Jeff,
>
> I get what was bothering me now - it took me a while to formulate it:
>
> TUN_TUN_DEV dev->type is ARPHRD_PPP
> dev->type is really related to link layer header, perhaps at the low
> level if neighbor discovery works well then we have a link-headerless
> packet which gets manipulated with the correct header by some generic
> code. The combination of dev->type and dev->hard_header_len works
> together to achieve this.
> In the case of TUN_TUN_DEV, the header_len is 0 ;->
> To be of type ARPHRD_PPP, tun needs to have a header_len which is the
> size of the l2 ppp header.
> As an example, TUN_TAP_DEV is fine as type ARPHRD_ETHER and header_len
> of ETH_HLEN.
>
> A lot of devices are abusing this system, tun is not the only one.
>
> My suggestion is to change dev->type to ARPHRD_VOID for TUN_TUN_DEV or
> we introduce something like ARPHDR_NONE for devices with link layer
> headers of size 0.
>
> thoughts?
I have no problem with that. I mean introducing new ARPHDR_ type.
ARPHDR_PPP was simply most appropriate for TUN that's why I picked it.
I vote for ARPHDR_NONE.
Thanks
Max
^ permalink raw reply [flat|nested] 35+ messages in thread
* PATCH: Re: tun device - bug or feature? WAS(Re: IMQ / new Dummy device post.
2004-05-10 17:18 ` Max Krasnyansky
@ 2004-06-05 13:24 ` jamal
2004-06-05 21:42 ` David S. Miller
0 siblings, 1 reply; 35+ messages in thread
From: jamal @ 2004-06-05 13:24 UTC (permalink / raw)
To: Max Krasnyansky; +Cc: netdev, syrius.ml, Jeff Garzik, David S. Miller
[-- Attachment #1: Type: text/plain, Size: 1443 bytes --]
Ok, trivial patch attached. Applies to both latest 2.6 and 2.4
I will go hunting for more drivers that do this; for now, a good start
here.
cheers,
jamal
On Mon, 2004-05-10 at 13:18, Max Krasnyansky wrote:
> On Sat, 2004-05-08 at 04:55, jamal wrote:
> > Max, Dave, Jeff,
> >
> > I get what was bothering me now - it took me a while to formulate it:
> >
> > TUN_TUN_DEV dev->type is ARPHRD_PPP
> > dev->type is really related to link layer header, perhaps at the low
> > level if neighbor discovery works well then we have a link-headerless
> > packet which gets manipulated with the correct header by some generic
> > code. The combination of dev->type and dev->hard_header_len works
> > together to achieve this.
> > In the case of TUN_TUN_DEV, the header_len is 0 ;->
> > To be of type ARPHRD_PPP, tun needs to have a header_len which is the
> > size of the l2 ppp header.
> > As an example, TUN_TAP_DEV is fine as type ARPHRD_ETHER and header_len
> > of ETH_HLEN.
> >
> > A lot of devices are abusing this system, tun is not the only one.
> >
> > My suggestion is to change dev->type to ARPHRD_VOID for TUN_TUN_DEV or
> > we introduce something like ARPHDR_NONE for devices with link layer
> > headers of size 0.
> >
> > thoughts?
>
> I have no problem with that. I mean introducing new ARPHDR_ type.
> ARPHDR_PPP was simply most appropriate for TUN that's why I picked it.
> I vote for ARPHDR_NONE.
>
> Thanks
> Max
>
>
>
>
[-- Attachment #2: tun24 --]
[-- Type: text/plain, Size: 878 bytes --]
--- /usr/src/2426/include/linux/if_arp.h 2002-02-25 14:38:13.000000000 -0500
+++ /usr/src/2426-mod/include/linux/if_arp.h 2004-06-04 15:10:15.000000000 -0400
@@ -85,6 +85,7 @@
#define ARPHRD_IEEE80211_PRISM 802 /* IEEE 802.11 + Prism2 header */
#define ARPHRD_VOID 0xFFFF /* Void type, nothing is known */
+#define ARPHRD_NONE 0xFFFE /* zero header length */
/* ARP protocol opcodes. */
#define ARPOP_REQUEST 1 /* ARP request */
--- /usr/src/2426/drivers/net/tun.c 2002-08-02 20:39:44.000000000 -0400
+++ /usr/src/2426-mod/drivers/net/tun.c 2004-06-04 15:10:50.000000000 -0400
@@ -138,8 +138,8 @@
dev->addr_len = 0;
dev->mtu = 1500;
- /* Type PPP seems most suitable */
- dev->type = ARPHRD_PPP;
+ /* Zero header length */
+ dev->type = ARPHRD_NONE;
dev->flags = IFF_POINTOPOINT | IFF_NOARP | IFF_MULTICAST;
dev->tx_queue_len = 10;
break;
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: PATCH: Re: tun device - bug or feature? WAS(Re: IMQ / new Dummy device post.
2004-06-05 13:24 ` PATCH: " jamal
@ 2004-06-05 21:42 ` David S. Miller
0 siblings, 0 replies; 35+ messages in thread
From: David S. Miller @ 2004-06-05 21:42 UTC (permalink / raw)
To: hadi; +Cc: maxk, netdev, syrius.ml, jgarzik
On 05 Jun 2004 09:24:56 -0400
jamal <hadi@cyberus.ca> wrote:
> Ok, trivial patch attached. Applies to both latest 2.6 and 2.4
> I will go hunting for more drivers that do this; for now, a good start
> here.
Applied to both 2.4.x and 2.6.x, thanks Jamal.
^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2004-06-05 21:42 UTC | newest]
Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-19 14:22 IMQ / new Dummy device post syrius.ml
2004-04-20 2:15 ` jamal
2004-04-21 1:43 ` syrius.ml
2004-04-21 12:49 ` syrius.ml
2004-04-21 20:19 ` syrius.ml
2004-04-22 13:16 ` jamal
2004-04-22 17:43 ` syrius.ml
2004-04-23 11:29 ` jamal
2004-04-24 14:14 ` tun device - bug or feature? WAS(Re: " jamal
2004-04-26 4:38 ` David S. Miller
2004-04-26 19:31 ` Max Krasnyansky
2004-04-27 2:22 ` jamal
2004-05-08 11:55 ` jamal
2004-05-10 17:18 ` Max Krasnyansky
2004-06-05 13:24 ` PATCH: " jamal
2004-06-05 21:42 ` David S. Miller
-- strict thread matches above, loose matches on Subject: below --
2004-04-15 9:42 Andy Furniss
2004-04-15 12:15 ` jamal
2004-04-15 19:35 ` Andy Furniss
2004-04-16 3:52 ` jamal
2004-04-16 19:35 ` Andy Furniss
[not found] ` <1082145341.1026.125.camel@jzny.localdomain>
2004-04-17 10:39 ` Andy Furniss
2004-04-17 12:09 ` jamal
2004-04-17 21:56 ` Andy Furniss
2004-04-18 14:28 ` jamal
2004-04-18 16:35 ` Andy Furniss
2004-04-18 20:34 ` Andy Furniss
2004-04-18 21:07 ` jamal
2004-04-18 21:31 ` Andy Furniss
2004-04-18 21:45 ` Andy Furniss
2004-04-18 20:53 ` jamal
2004-04-18 21:23 ` Martin Josefsson
2004-04-18 21:58 ` Andy Furniss
2004-04-19 8:14 ` Martin Josefsson
2004-04-19 12:33 ` syrius.ml
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).