* Re: IMQ / new Dummy device post.
@ 2004-04-19 14:22 syrius.ml
2004-04-20 2:15 ` jamal
0 siblings, 1 reply; 35+ messages in thread
From: syrius.ml @ 2004-04-19 14:22 UTC (permalink / raw)
To: hadi; +Cc: netdev
Ok it's seems to be working as expected for ipv4 traffic:
Here is how i'm actually using it:
(i use netfilter (for both ipv4 & ipv6) to mark packets)
OUT=dummy0
$TC qdisc add dev $OUT root handle 1: htb default 20
# CLASSES
$TC class add dev $OUT parent 1: classid 1:1 htb rate $UL ceil $ULC \
prio 0
$TC class add dev $OUT parent 1:1 classid 1:10 htb rate \
$UL1 ceil $ULC1 quantum $QU1 prio 1
$TC class add dev $OUT parent 1:1 classid 1:2 htb rate $UL01 ceil \
$ULC01 quantum $QU01 prio 2
$TC class add dev $OUT parent 1:2 classid 1:20 htb rate $UL2 ceil \
$ULC2 quantum $QU2 prio 3
$TC class add dev $OUT parent 1:2 classid 1:3 htb rate $UL02 ceil \
$ULC02 quantum $QU02 prio 4
$TC class add dev $OUT parent 1:3 classid 1:30 htb rate $UL3 ceil \
$ULC3 quantum $QU3 prio 5
$TC class add dev $OUT parent 1:3 classid 1:40 htb rate $UL4 ceil \
$ULC4 quantum $QU4 prio 5
$TC class add dev $OUT parent 1:3 classid 1:50 htb rate $UL5 ceil \
$ULC5 quantum $QU5 prio 7
$TC qdisc add dev $OUT parent 1:10 handle 110: pfifo limit 50
$TC qdisc add dev $OUT parent 1:20 handle 120: sfq perturb 10
$TC qdisc add dev $OUT parent 1:30 handle 130: sfq perturb 10
$TC qdisc add dev $OUT parent 1:40 handle 140: sfq perturb 10
$TC qdisc add dev $OUT parent 1:50 handle 150: sfq perturb 10
# FILTERS
$TC filter add dev $OUT parent 1: protocol ip prio 10 handle 1 fw \
flowid 1:10
$TC filter add dev $OUT parent 1: protocol ipv6 prio 11 handle 1 fw \
flowid 1:10
$TC filter add dev $OUT parent 1: protocol ip prio 12 handle 2 fw \
flowid 1:20
$TC filter add dev $OUT parent 1: protocol ipv6 prio 13 handle 2 fw \
flowid 1:20
$TC filter add dev $OUT parent 1: protocol ip prio 14 handle 3 fw \
flowid 1:30
$TC filter add dev $OUT parent 1: protocol ipv6 prio 15 handle 3 fw \
flowid 1:30
$TC filter add dev $OUT parent 1: protocol ip prio 16 handle 4 fw \
flowid 1:40
$TC filter add dev $OUT parent 1: protocol ipv6 prio 17 handle 4 fw \
flowid 1:40
$TC filter add dev $OUT parent 1: protocol ip prio 18 handle 5 fw \
flowid 1:50
$TC filter add dev $OUT parent 1: protocol ipv6 prio 19 handle 5 fw \
flowid 1:50
$IP link set $OUT up
$TC qdisc add dev ppp0 root handle 1: prio
$TC filter add dev ppp0 parent 1:0 protocol ip prio 10 u32 \
match u32 0 0 flowid 1:1 action mirred egress redirect dev dummy0
$TC qdisc add dev tun0 root handle 1: prio
$TC filter add dev tun0 parent 1:0 protocol ip prio 10 u32 \
match u32 0 0 flowid 1:1 action mirred egress redirect dev dummy0
$TC qdisc add dev sit1 root handle 1: prio
$TC filter add dev sit1 parent 1:0 protocol ipv6 prio 10 u32 \
match u32 0 0 flowid 1:1 action mirred egress redirect dev dummy0
but it doesn't work with ipv6 traffic.
If I try to ping6 somehost, i sometimes get "ping: sendmsg: No buffer
space available" messages
anyway, there's nothing going out on sit1.
Is it the correct way to do it ?
--
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: IMQ / new Dummy device post.
2004-04-19 14:22 IMQ / new Dummy device post syrius.ml
@ 2004-04-20 2:15 ` jamal
2004-04-21 1:43 ` syrius.ml
0 siblings, 1 reply; 35+ messages in thread
From: jamal @ 2004-04-20 2:15 UTC (permalink / raw)
To: syrius.ml; +Cc: netdev
On Mon, 2004-04-19 at 10:22, syrius.ml@no-log.org wrote:
[..]
If you already marked the packets before they hit egress then you
dont need use the ipt mark action. So what you are doing is correct
> $TC qdisc add dev ppp0 root handle 1: prio
> $TC filter add dev ppp0 parent 1:0 protocol ip prio 10 u32 \
> match u32 0 0 flowid 1:1 action mirred egress redirect dev dummy0
Note: this will do for ipv4; if you want ipv6 add a new rule,
in addition to above if you want ipv4, with "protocol ip" replaced by
"protocol ipv6"
> $TC qdisc add dev tun0 root handle 1: prio
> $TC filter add dev tun0 parent 1:0 protocol ip prio 10 u32 \
> match u32 0 0 flowid 1:1 action mirred egress redirect dev dummy0
>
> $TC qdisc add dev sit1 root handle 1: prio
> $TC filter add dev sit1 parent 1:0 protocol ipv6 prio 10 u32 \
> match u32 0 0 flowid 1:1 action mirred egress redirect dev dummy0
not sure if you need the above but i dont know your setup sufficiently
to be 100%
> but it doesn't work with ipv6 traffic.
> If I try to ping6 somehost, i sometimes get "ping: sendmsg: No buffer
> space available" messages
> anyway, there's nothing going out on sit1.
>
> Is it the correct way to do it ?
Seems right. Try adding the new ipv6 rule on ppp0 and if you are still
having problems try dumping some stats for the filters and see if they
are incrementing. eg
tc -s filter show parent 1:0 dev ppp0
also a ifconfig on the dummy0 should show starts going up
cheers,
jamal
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
2004-04-20 2:15 ` jamal
@ 2004-04-21 1:43 ` syrius.ml
2004-04-21 12:49 ` syrius.ml
0 siblings, 1 reply; 35+ messages in thread
From: syrius.ml @ 2004-04-21 1:43 UTC (permalink / raw)
To: hadi; +Cc: netdev
jamal <hadi@cyberus.ca> writes:
[...]
>> $TC qdisc add dev ppp0 root handle 1: prio
>> $TC filter add dev ppp0 parent 1:0 protocol ip prio 10 u32 \
>> match u32 0 0 flowid 1:1 action mirred egress redirect dev dummy0
> Note: this will do for ipv4; if you want ipv6 add a new rule,
> in addition to above if you want ipv4, with "protocol ip" replaced by
> "protocol ipv6"
>> $TC qdisc add dev tun0 root handle 1: prio
>> $TC filter add dev tun0 parent 1:0 protocol ip prio 10 u32 \
>> match u32 0 0 flowid 1:1 action mirred egress redirect dev dummy0
>> $TC qdisc add dev sit1 root handle 1: prio
>> $TC filter add dev sit1 parent 1:0 protocol ipv6 prio 10 u32 \
>> match u32 0 0 flowid 1:1 action mirred egress redirect dev dummy0
> not sure if you need the above but i dont know your setup sufficiently
> to be 100%
using 'protocol ipv6' on ppp0 rather than sit1 did the trick.
It's even simplier ! I don't have to create filters for each ipv6 tunnel.
Considering the ipv4 over udp tun0 tunnel, i guess i should prevent
those udp packets to be matched by the filter on ppp0.
I'll optimize it later.
Cheers
--
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: IMQ / new Dummy device post.
2004-04-21 1:43 ` syrius.ml
@ 2004-04-21 12:49 ` syrius.ml
2004-04-21 20:19 ` syrius.ml
0 siblings, 1 reply; 35+ messages in thread
From: syrius.ml @ 2004-04-21 12:49 UTC (permalink / raw)
To: hadi; +Cc: netdev
skput:under: c88c93aa:98 put:14 dev:tun0kernel BUG at skbuff.c:113!
invalid operand: 0000
CPU: 0
EIP: 0010:[<c01bc2e9>] Tainted: P
EFLAGS: 00010286
eax: 00000028 ebx: c46afe60 ecx: 00000000 edx: c6a5e000
esi: c88c6780 edi: 00000001 ebp: c5573d78 esp: c5573d64
ds: 0018 es: 0018 ss: 0018
Process openvpn (pid: 940, stackpage=c5573000)
Stack: c022a180 c88c93aa 00000062 0000000e c5db4824 c5573da8 c88c93b2
c582e3a0
0000000e c88c93aa 42924e9d c58eccc0 c582ebe0 c582e3a0 c57dae00
c582ebe0
c5f17f20 c5573dbc c01cd277 c5573dc4 c57dae00 c5573e60 c5573e28
c890f104
Call Trace: [<c88c93aa>] [<c88c93b2>] [<c88c93aa>] [<c01cd277>]
[<c890f104>]
[<c01bc5bd>] [<c01bc736>] [<c01f20fc>] [<c89123f9>] [<c89120fc>]
[<c01c0a6a>]
[<c01c0b6a>] [<c01c0d66>] [<c01c0e83>] [<c0119507>] [<c892cc35>]
[<c01347a0>]
[<c0134883>] [<c0106ff3>]
Code: 0f 0b 71 00 0a 92 22 c0 89 ec 5d c3 8d 74 26 00 8d bc 27 00
<0>Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing
I'm going to recompile the kernel with frame pointers, and i'll feed
the oops thru ksymoops.
First I have to narrow the problem so I can tell how to reproduce it,
and I'll give more informations.
--
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: IMQ / new Dummy device post.
2004-04-21 12:49 ` syrius.ml
@ 2004-04-21 20:19 ` syrius.ml
2004-04-22 13:16 ` jamal
0 siblings, 1 reply; 35+ messages in thread
From: syrius.ml @ 2004-04-21 20:19 UTC (permalink / raw)
To: hadi; +Cc: netdev
[-- Attachment #1: Type: text/plain, Size: 1192 bytes --]
ok, i'm able to reproduce it with a simpler setup.
Let's consider I'm using the new dummy device on machine connected to
a ethernet lan. this host is using openvpn to establish a vpn tunnel.
debian:~# ip a l dev eth0
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
link/ether 00:0c:29:25:4a:b6 brd ff:ff:ff:ff:ff:ff
inet 192.168.5.4/24 brd 192.168.5.255 scope global eth0
debian:~# ip a l dev tun0
4: tun0: <POINTOPOINT,MULTICAST,NOARP,UP> mtu 1447 qdisc pfifo_fast
qlen 10
link/ppp
inet 172.16.0.2 peer 172.16.0.1/32 scope global tun0
I attach the script i'm using to setup a simple new dummy+htb setup.
it's very simple, I do not use iptables to mark packets, i do not use
filter with htb, everything goes to the default classes (1:20 & 2:20)
I can verify it's working as expected with a simple ping -f 192.168.5.1
(or a ping -f 192.168.5.4 from 192.168.5.1)
after the ping -f 192.168.5.1 (let's say i let it run 30sec), if a do
ping 172.16.0.1 the oops appears !
I attach the result of ksymoops.
Please tell me if you're able to reproduce it.
I'm ok to try with another vpn software, but I don't think it has
anything to do with openvpn.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: qos script --]
[-- Type: text/x-sh, Size: 1830 bytes --]
#!/bin/sh
TC=/usr/local/iproute2/sbin/tc
IP=/usr/local/iproute2/sbin/ip
OUT=dummy0
IN=dummy1
tc qdisc ls | sed -e 's/^.*dev \([a-zA-Z0-9]\+\) .*$/\1/' | sort -u | \
while read a; do
tc qdisc del dev $a root &>/dev/null
tc qdisc del dev $a ingress &>/dev/null
done
ifconfig $IN down &> /dev/null
ifconfig $OUT down &> /dev/null
if [ "$1" = "stop" ]
then
exit
fi
###### uplink
# ROOT
$TC qdisc add dev $OUT root handle 1: htb default 20
# CLASSES
$TC class add dev $OUT parent 1: classid 1:1 htb rate 200kbit ceil 200kbit prio 0
$TC class add dev $OUT parent 1:1 classid 1:10 htb rate 100kbit ceil 200kbit prio 1
$TC class add dev $OUT parent 1:1 classid 1:20 htb rate 100kbit ceil 100kbit prio 2
$IP link set $OUT up
$TC qdisc add dev eth0 root handle 1: prio
$TC filter add dev eth0 parent 1:0 protocol ip prio 10 u32 \
match u32 0 0 flowid 1:1 action mirred egress redirect dev $OUT
$TC qdisc add dev tun0 root handle 1: prio
$TC filter add dev tun0 parent 1:0 protocol ip prio 10 u32 \
match u32 0 0 flowid 1:1 action mirred egress redirect dev $OUT
modprobe dummy
###### downlink
$TC qdisc add dev $IN root handle 2: htb default 20
# CLASSES
$TC class add dev $IN parent 2: classid 2:1 htb rate 100kbit ceil 200kbit prio 0
$TC class add dev $IN parent 2:1 classid 2:10 htb rate 100kbit ceil 200kbit prio 1
$TC class add dev $IN parent 2:1 classid 2:20 htb rate 100kbit ceil 100kbit prio 3
$IP link set $IN up
$TC qdisc add dev eth0 ingress
$TC filter add dev eth0 parent ffff: protocol ip prio 10 u32 \
match u32 0 0 flowid 1:1 action mirred egress redirect dev $IN
$TC qdisc add dev tun0 ingress
$TC filter add dev tun0 parent ffff: protocol ip prio 10 u32 \
match u32 0 0 flowid 1:1 action mirred egress redirect dev $IN
[-- Attachment #3: ksymoops result --]
[-- Type: text/plain, Size: 2859 bytes --]
ksymoops 2.4.9 on i686 2.4.25. Options used
-v /mnt/vmlinux (specified)
-k /mnt/ksyms (specified)
-l /mnt/modules (specified)
-o /lib/modules/2.4.25 (specified)
-m /mnt/System.map (specified)
skput:under: c88863d8:126 put:14 dev:tun0kernel BUG at skbuff.c:113!
invalid operand: 0000
CPU: 0
EIP: 0010:[<c01e088b>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00000282
eax: 00000029 ebx: c639ff40 ecx: c6434000 edx: c6777f7c
esi: c8888780 edi: 00000001 ebp: c6435db4 esp: c6435da0
ds: 0018 es: 0018 ss: 0018
Process openvpn (pid: 415, stackpage=c6435000)
Stack: c025e0a0 c88863d8 0000007e 0000000e c6706224 c6435de4 c88863e4 c67c0480
0000000e c88863d8 c1167420 c61eae00 c67c0540 c67c0480 c5ce8dc0 00000000
c6424880 c6435df8 c01f294d c6435e00 c5ce8dc0 c67c0540 c6435e64 c888311a
Call Trace: [<c88863d8>] [<c88863e4>] [<c88863d8>] [<c01f294d>] [<c888311a>]
[<c021f17d>] [<c888a462>] [<c888a12b>] [<c01e5108>] [<c01e520f>] [<c01e54a9>]
[<c01e55aa>] [<c011a854>] [<c8843c9f>] [<c8843353>] [<c013642b>] [<c01071bb>]
Code: 0f 0b 71 00 27 d1 25 c0 c9 c3 8d 74 26 00 8d bc 27 00 00 00
>>EIP; c01e088b <skb_under_panic+3b/50> <=====
>>ebx; c639ff40 <_end+609c4e0/84fc620>
>>ecx; c6434000 <_end+61305a0/84fc620>
>>edx; c6777f7c <_end+647451c/84fc620>
>>esi; c8888780 <[dummy]__module_license+a8/19a8>
>>ebp; c6435db4 <_end+6132354/84fc620>
>>esp; c6435da0 <_end+6132340/84fc620>
Trace; c88863d8 <[mirred]tcf_mirred+168/1d0>
Trace; c88863e4 <[mirred]tcf_mirred+174/1d0>
Trace; c88863d8 <[mirred]tcf_mirred+168/1d0>
Trace; c01f294d <tcf_action_exec+5d/90>
Trace; c888311a <[cls_u32]u32_classify+9a/1d0>
Trace; c021f17d <inet_recvmsg+4d/70>
Trace; c888a462 <[sch_ingress]tc_classify+52/cf>
Trace; c888a12b <[sch_ingress]ingress_enqueue+2b/80>
Trace; c01e5108 <ing_filter+68/c0>
Trace; c01e520f <netif_receive_skb+af/2d0>
Trace; c01e54a9 <process_backlog+79/110>
Trace; c01e55aa <net_rx_action+6a/100>
Trace; c011a854 <do_softirq+94/a0>
Trace; c8843c9f <[tun]tun_get_user+df/165>
Trace; c8843353 <[tun]tun_chr_write+33/40>
Trace; c013642b <sys_write+9b/130>
Trace; c01071bb <system_call+33/38>
Code; c01e088b <skb_under_panic+3b/50>
00000000 <_EIP>:
Code; c01e088b <skb_under_panic+3b/50> <=====
0: 0f 0b ud2a <=====
Code; c01e088d <skb_under_panic+3d/50>
2: 71 00 jno 4 <_EIP+0x4>
Code; c01e088f <skb_under_panic+3f/50>
4: 27 daa
Code; c01e0890 <skb_under_panic+40/50>
5: d1 25 c0 c9 c3 8d shll 0x8dc3c9c0
Code; c01e0896 <skb_under_panic+46/50>
b: 74 26 je 33 <_EIP+0x33>
Code; c01e0898 <skb_under_panic+48/50>
d: 00 8d bc 27 00 00 add %cl,0x27bc(%ebp)
<0>Kernel panic: Aiee, killing interrupt handler!
[-- Attachment #4: Type: text/plain, Size: 6 bytes --]
--
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: IMQ / new Dummy device post.
2004-04-21 20:19 ` syrius.ml
@ 2004-04-22 13:16 ` jamal
2004-04-22 17:43 ` syrius.ml
0 siblings, 1 reply; 35+ messages in thread
From: jamal @ 2004-04-22 13:16 UTC (permalink / raw)
To: syrius.ml; +Cc: netdev
Hi there,
On Wed, 2004-04-21 at 16:19, syrius.ml@no-log.org wrote:
> ok, i'm able to reproduce it with a simpler setup.
>
> Let's consider I'm using the new dummy device on machine connected to
> a ethernet lan. this host is using openvpn to establish a vpn tunnel.
[..]
> after the ping -f 192.168.5.1 (let's say i let it run 30sec), if a do
> ping 172.16.0.1 the oops appears !
>
> I attach the result of ksymoops.
>
> Please tell me if you're able to reproduce it.
> I'm ok to try with another vpn software, but I don't think it has
> anything to do with openvpn.
>
It may be more related to the tun device that software uses. Tun is an
interesting netdevice. I dont have a setup to reproduce this.
BTW, doesnt the packet eventually make it to eth0 coming from the vpn?
Also the other direction is true (always starts at the eth0 level); if
yes, why do you have to redirect packets from the tap device?
Try the following to debug:
remove the egress qdisc from the tap device and run the test.
(this part: $TC qdisc add dev tun0 root handle 1: prio)
If thats till ooopses, remove the ingress attachment to the tun.
And if that still fails, compile both tun and dummy into the kernel
(as opposed to modules) and reproduce the oops.
Additionaly some useful tools are stats on the dummy devices as well
as the actions (example: tc -s filter ls dev eth0 parent ffff:)
cheers,
jamal
________________________________________________________________________
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
2004-04-22 13:16 ` jamal
@ 2004-04-22 17:43 ` syrius.ml
2004-04-23 11:29 ` jamal
0 siblings, 1 reply; 35+ messages in thread
From: syrius.ml @ 2004-04-22 17:43 UTC (permalink / raw)
To: hadi; +Cc: netdev
Hi,
It oops when using the ingress qdisc + action mirred egress redirect
filter on tun0. (no egress at all, no ingress on eth0)
It doesn't oops using an ingress qdisc + a simple police+drop filter
on tun0...
--
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
2004-04-22 17:43 ` syrius.ml
@ 2004-04-23 11:29 ` jamal
2004-04-24 14:14 ` tun device - bug or feature? WAS(Re: " jamal
0 siblings, 1 reply; 35+ messages in thread
From: jamal @ 2004-04-23 11:29 UTC (permalink / raw)
To: syrius.ml; +Cc: netdev
Hi there,
On Thu, 2004-04-22 at 13:43, syrius.ml@no-log.org wrote:
> Hi,
>
> It oops when using the ingress qdisc + action mirred egress redirect
> filter on tun0. (no egress at all, no ingress on eth0)
> It doesn't oops using an ingress qdisc + a simple police+drop filter
> on tun0...
Ok, so you have narrowed it down to mirred, tun and ingress qdisc - is
that correct? Were you using openvpn to recreate this?
BTW, would this happen if you dont issue the ping -f initially in above
setup? If yes, before you send create the problem can you send a few
pings through tap device and send me output of dmesg?
I just did a simple test with a basic program and couldnt reproduce it.
I dont have the proper setup, can you do a basic test with some other
tunneling s/ware?
The doc for tun mentions:
http://vtun.sourceforge.net and http://perso.enst.fr/~beyssac/pipsec/
Please compile in tun and dummy into the kernel.
BTW, i think we should take this offline; send your response directly to
me. Anyone else interested in this conversation email both of us and we
will cc you.
cheers,
jamal
^ permalink raw reply [flat|nested] 35+ messages in thread
* tun device - bug or feature? WAS(Re: IMQ / new Dummy device post.
2004-04-23 11:29 ` jamal
@ 2004-04-24 14:14 ` jamal
2004-04-26 4:38 ` David S. Miller
2004-04-26 19:31 ` Max Krasnyansky
0 siblings, 2 replies; 35+ messages in thread
From: jamal @ 2004-04-24 14:14 UTC (permalink / raw)
To: netdev; +Cc: syrius.ml, Maxim Krasnyansky, Jeff Garzik, David S. Miller
Maxim,
When TUN_TUN_DEV is used, before the packet is injected into
the netif_rx() only skb->mac.raw = skb->data is set; the other headers
are not adjusted. Typically netdevs would do a
skb_pull(skb,dev->hard_header_len) to make the adjustment.
I have a feeling this is design intent thats why i didnt send you a
patch.
Jeff, Dave: Would it be fair to say when packets get injected into the
stack by a netdev via netif_rx(), the skb headers are expected to be
ponting into some specific places? I am not sure if theres a hard
fastened rule defined anywhere.
cheers,
jamal
On Fri, 2004-04-23 at 07:29, jamal wrote:
> Hi there,
>
> On Thu, 2004-04-22 at 13:43, syrius.ml@no-log.org wrote:
> > Hi,
> >
> > It oops when using the ingress qdisc + action mirred egress redirect
> > filter on tun0. (no egress at all, no ingress on eth0)
> > It doesn't oops using an ingress qdisc + a simple police+drop filter
> > on tun0...
>
> Ok, so you have narrowed it down to mirred, tun and ingress qdisc - is
> that correct? Were you using openvpn to recreate this?
> BTW, would this happen if you dont issue the ping -f initially in above
> setup? If yes, before you send create the problem can you send a few
> pings through tap device and send me output of dmesg?
>
> I just did a simple test with a basic program and couldnt reproduce it.
> I dont have the proper setup, can you do a basic test with some other
> tunneling s/ware?
> The doc for tun mentions:
> http://vtun.sourceforge.net and http://perso.enst.fr/~beyssac/pipsec/
> Please compile in tun and dummy into the kernel.
>
> BTW, i think we should take this offline; send your response directly to
> me. Anyone else interested in this conversation email both of us and we
> will cc you.
>
> cheers,
> jamal
>
>
>
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: tun device - bug or feature? WAS(Re: IMQ / new Dummy device post.
2004-04-24 14:14 ` tun device - bug or feature? WAS(Re: " jamal
@ 2004-04-26 4:38 ` David S. Miller
2004-04-26 19:31 ` Max Krasnyansky
1 sibling, 0 replies; 35+ messages in thread
From: David S. Miller @ 2004-04-26 4:38 UTC (permalink / raw)
To: hadi; +Cc: netdev, syrius.ml, maxk, jgarzik
On 24 Apr 2004 10:14:43 -0400
jamal <hadi@cyberus.ca> wrote:
> Jeff, Dave: Would it be fair to say when packets get injected into the
> stack by a netdev via netif_rx(), the skb headers are expected to be
> ponting into some specific places? I am not sure if theres a hard
> fastened rule defined anywhere.
What do ipv4 tunnels do? They merely modify 'nh' and 'mac' ".raw" and
pass the packet in.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: tun device - bug or feature? WAS(Re: IMQ / new Dummy device post.
2004-04-24 14:14 ` tun device - bug or feature? WAS(Re: " jamal
2004-04-26 4:38 ` David S. Miller
@ 2004-04-26 19:31 ` Max Krasnyansky
2004-04-27 2:22 ` jamal
2004-05-08 11:55 ` jamal
1 sibling, 2 replies; 35+ messages in thread
From: Max Krasnyansky @ 2004-04-26 19:31 UTC (permalink / raw)
To: hadi; +Cc: netdev, syrius.ml, Jeff Garzik, David S. Miller
On Sat, 2004-04-24 at 07:14, jamal wrote:
> Maxim,
>
> When TUN_TUN_DEV is used, before the packet is injected into
> the netif_rx() only skb->mac.raw = skb->data is set; the other headers
> are not adjusted. Typically netdevs would do a
> skb_pull(skb,dev->hard_header_len) to make the adjustment.
> I have a feeling this is design intent thats why i didnt send you a
> patch.
Well TUN does not have any hw headers so there is nothing to pull :).
Basically it does what ever PPP driver does. Which is
skb_pull(skb, 2); /* chop off protocol */
skb->dev = ppp->dev;
skb->protocol = htons(npindex_to_ethertype[npi]);
skb->mac.raw = skb->data;
netif_rx(skb);
Max
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: tun device - bug or feature? WAS(Re: IMQ / new Dummy device post.
2004-04-26 19:31 ` Max Krasnyansky
@ 2004-04-27 2:22 ` jamal
2004-05-08 11:55 ` jamal
1 sibling, 0 replies; 35+ messages in thread
From: jamal @ 2004-04-27 2:22 UTC (permalink / raw)
To: Max Krasnyansky; +Cc: netdev, syrius.ml, Jeff Garzik, David S. Miller
On Mon, 2004-04-26 at 15:31, Max Krasnyansky wrote:
> Well TUN does not have any hw headers so there is nothing to pull :).
didnt notice the dev->hard_header_len being 0 before ;->
In that case it makes sense to have nothing to pull.
Theres about 5 devices like that.
I need to rethink a little on behavior of mirred with devices that have
no hardware headers. I may speacial case them.
cheers,
jamal
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: tun device - bug or feature? WAS(Re: IMQ / new Dummy device post.
2004-04-26 19:31 ` Max Krasnyansky
2004-04-27 2:22 ` jamal
@ 2004-05-08 11:55 ` jamal
2004-05-10 17:18 ` Max Krasnyansky
1 sibling, 1 reply; 35+ messages in thread
From: jamal @ 2004-05-08 11:55 UTC (permalink / raw)
To: Max Krasnyansky; +Cc: netdev, syrius.ml, Jeff Garzik, David S. Miller
Max, Dave, Jeff,
I get what was bothering me now - it took me a while to formulate it:
TUN_TUN_DEV dev->type is ARPHRD_PPP
dev->type is really related to link layer header, perhaps at the low
level if neighbor discovery works well then we have a link-headerless
packet which gets manipulated with the correct header by some generic
code. The combination of dev->type and dev->hard_header_len works
together to achieve this.
In the case of TUN_TUN_DEV, the header_len is 0 ;->
To be of type ARPHRD_PPP, tun needs to have a header_len which is the
size of the l2 ppp header.
As an example, TUN_TAP_DEV is fine as type ARPHRD_ETHER and header_len
of ETH_HLEN.
A lot of devices are abusing this system, tun is not the only one.
My suggestion is to change dev->type to ARPHRD_VOID for TUN_TUN_DEV or
we introduce something like ARPHDR_NONE for devices with link layer
headers of size 0.
thoughts?
cheers,
jamal
On Mon, 2004-04-26 at 15:31, Max Krasnyansky wrote:
> On Sat, 2004-04-24 at 07:14, jamal wrote:
> > Maxim,
> >
> > When TUN_TUN_DEV is used, before the packet is injected into
> > the netif_rx() only skb->mac.raw = skb->data is set; the other headers
> > are not adjusted. Typically netdevs would do a
> > skb_pull(skb,dev->hard_header_len) to make the adjustment.
> > I have a feeling this is design intent thats why i didnt send you a
> > patch.
> Well TUN does not have any hw headers so there is nothing to pull :).
> Basically it does what ever PPP driver does. Which is
>
> skb_pull(skb, 2); /* chop off protocol */
> skb->dev = ppp->dev;
> skb->protocol = htons(npindex_to_ethertype[npi]);
> skb->mac.raw = skb->data;
> netif_rx(skb);
>
> Max
>
>
>
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: tun device - bug or feature? WAS(Re: IMQ / new Dummy device post.
2004-05-08 11:55 ` jamal
@ 2004-05-10 17:18 ` Max Krasnyansky
2004-06-05 13:24 ` PATCH: " jamal
0 siblings, 1 reply; 35+ messages in thread
From: Max Krasnyansky @ 2004-05-10 17:18 UTC (permalink / raw)
To: hadi; +Cc: netdev, syrius.ml, Jeff Garzik, David S. Miller
On Sat, 2004-05-08 at 04:55, jamal wrote:
> Max, Dave, Jeff,
>
> I get what was bothering me now - it took me a while to formulate it:
>
> TUN_TUN_DEV dev->type is ARPHRD_PPP
> dev->type is really related to link layer header, perhaps at the low
> level if neighbor discovery works well then we have a link-headerless
> packet which gets manipulated with the correct header by some generic
> code. The combination of dev->type and dev->hard_header_len works
> together to achieve this.
> In the case of TUN_TUN_DEV, the header_len is 0 ;->
> To be of type ARPHRD_PPP, tun needs to have a header_len which is the
> size of the l2 ppp header.
> As an example, TUN_TAP_DEV is fine as type ARPHRD_ETHER and header_len
> of ETH_HLEN.
>
> A lot of devices are abusing this system, tun is not the only one.
>
> My suggestion is to change dev->type to ARPHRD_VOID for TUN_TUN_DEV or
> we introduce something like ARPHDR_NONE for devices with link layer
> headers of size 0.
>
> thoughts?
I have no problem with that. I mean introducing new ARPHDR_ type.
ARPHDR_PPP was simply most appropriate for TUN that's why I picked it.
I vote for ARPHDR_NONE.
Thanks
Max
^ permalink raw reply [flat|nested] 35+ messages in thread
* PATCH: Re: tun device - bug or feature? WAS(Re: IMQ / new Dummy device post.
2004-05-10 17:18 ` Max Krasnyansky
@ 2004-06-05 13:24 ` jamal
2004-06-05 21:42 ` David S. Miller
0 siblings, 1 reply; 35+ messages in thread
From: jamal @ 2004-06-05 13:24 UTC (permalink / raw)
To: Max Krasnyansky; +Cc: netdev, syrius.ml, Jeff Garzik, David S. Miller
[-- Attachment #1: Type: text/plain, Size: 1443 bytes --]
Ok, trivial patch attached. Applies to both latest 2.6 and 2.4
I will go hunting for more drivers that do this; for now, a good start
here.
cheers,
jamal
On Mon, 2004-05-10 at 13:18, Max Krasnyansky wrote:
> On Sat, 2004-05-08 at 04:55, jamal wrote:
> > Max, Dave, Jeff,
> >
> > I get what was bothering me now - it took me a while to formulate it:
> >
> > TUN_TUN_DEV dev->type is ARPHRD_PPP
> > dev->type is really related to link layer header, perhaps at the low
> > level if neighbor discovery works well then we have a link-headerless
> > packet which gets manipulated with the correct header by some generic
> > code. The combination of dev->type and dev->hard_header_len works
> > together to achieve this.
> > In the case of TUN_TUN_DEV, the header_len is 0 ;->
> > To be of type ARPHRD_PPP, tun needs to have a header_len which is the
> > size of the l2 ppp header.
> > As an example, TUN_TAP_DEV is fine as type ARPHRD_ETHER and header_len
> > of ETH_HLEN.
> >
> > A lot of devices are abusing this system, tun is not the only one.
> >
> > My suggestion is to change dev->type to ARPHRD_VOID for TUN_TUN_DEV or
> > we introduce something like ARPHDR_NONE for devices with link layer
> > headers of size 0.
> >
> > thoughts?
>
> I have no problem with that. I mean introducing new ARPHDR_ type.
> ARPHDR_PPP was simply most appropriate for TUN that's why I picked it.
> I vote for ARPHDR_NONE.
>
> Thanks
> Max
>
>
>
>
[-- Attachment #2: tun24 --]
[-- Type: text/plain, Size: 878 bytes --]
--- /usr/src/2426/include/linux/if_arp.h 2002-02-25 14:38:13.000000000 -0500
+++ /usr/src/2426-mod/include/linux/if_arp.h 2004-06-04 15:10:15.000000000 -0400
@@ -85,6 +85,7 @@
#define ARPHRD_IEEE80211_PRISM 802 /* IEEE 802.11 + Prism2 header */
#define ARPHRD_VOID 0xFFFF /* Void type, nothing is known */
+#define ARPHRD_NONE 0xFFFE /* zero header length */
/* ARP protocol opcodes. */
#define ARPOP_REQUEST 1 /* ARP request */
--- /usr/src/2426/drivers/net/tun.c 2002-08-02 20:39:44.000000000 -0400
+++ /usr/src/2426-mod/drivers/net/tun.c 2004-06-04 15:10:50.000000000 -0400
@@ -138,8 +138,8 @@
dev->addr_len = 0;
dev->mtu = 1500;
- /* Type PPP seems most suitable */
- dev->type = ARPHRD_PPP;
+ /* Zero header length */
+ dev->type = ARPHRD_NONE;
dev->flags = IFF_POINTOPOINT | IFF_NOARP | IFF_MULTICAST;
dev->tx_queue_len = 10;
break;
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: PATCH: Re: tun device - bug or feature? WAS(Re: IMQ / new Dummy device post.
2004-06-05 13:24 ` PATCH: " jamal
@ 2004-06-05 21:42 ` David S. Miller
0 siblings, 0 replies; 35+ messages in thread
From: David S. Miller @ 2004-06-05 21:42 UTC (permalink / raw)
To: hadi; +Cc: maxk, netdev, syrius.ml, jgarzik
On 05 Jun 2004 09:24:56 -0400
jamal <hadi@cyberus.ca> wrote:
> Ok, trivial patch attached. Applies to both latest 2.6 and 2.4
> I will go hunting for more drivers that do this; for now, a good start
> here.
Applied to both 2.4.x and 2.6.x, thanks Jamal.
^ permalink raw reply [flat|nested] 35+ messages in thread
* IMQ / new Dummy device post.
@ 2004-04-15 9:42 Andy Furniss
2004-04-15 12:15 ` jamal
0 siblings, 1 reply; 35+ messages in thread
From: Andy Furniss @ 2004-04-15 9:42 UTC (permalink / raw)
To: netdev
>What am i looking for?
>1) users and authors of IMQ to tell me if this achieves what IMQ >started
>as. I have to say I DONT like the level of obstrutiveness from IMQ as >is
>today. The code added by this is small (100 or less lines on top of
>dummy) and doesnt touch any of the main core bits.
>2) testing of the above by people who use IMQ
>3) If someone has better ideas - i am not religious about keeping this;
>but it certainly cant be the blasphemy that IMQ introduces
I am just a user and would drop IMQ without hesitation for something you
consider more elegant, but I am not sure whether or not dummy will do
what I want.
The only reason I use IMQ (+ NAT patch) is that I need to shape ingress
(I know I can't shape it "properly" from the wrong end of the bottleneck
without an intelligent app, but the ingress policer does not let me
share local and forwarded bandwidth and is not fair per user if I just
throttle the whole link).
I am not sure if dummy will sort this for me, there may be some other way?
Basically all I need is something I can use HTB on where the qos ingress
box is on this diagram.
http://www.docum.org/stef.coene/qos/kptd/
Andy.
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
2004-04-15 9:42 Andy Furniss
@ 2004-04-15 12:15 ` jamal
2004-04-15 19:35 ` Andy Furniss
0 siblings, 1 reply; 35+ messages in thread
From: jamal @ 2004-04-15 12:15 UTC (permalink / raw)
To: Andy Furniss; +Cc: netdev
On Thu, 2004-04-15 at 05:42, Andy Furniss wrote:
>
> The only reason I use IMQ (+ NAT patch) is that I need to shape ingress
> (I know I can't shape it "properly" from the wrong end of the bottleneck
> without an intelligent app, but the ingress policer does not let me
> share local and forwarded bandwidth and is not fair per user if I just
> throttle the whole link).
>
> I am not sure if dummy will sort this for me, there may be some other way?
The summary is dummy can do what IMQ used to; it is however not related
to iptables/netfilter.
> Basically all I need is something I can use HTB on where the qos ingress
> box is on this diagram.
Yes you can attach a HTB. Look at the posted example in the previous
email and replace prio with HTB.
Not sure i answered your questions.
Again to emphasize: I will send patches only to people interested.
People have to ask directly;
this is my way of monitoring what is being tested. At some point i will
make the latest patches available to everyone.
cheers,
jamal
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
2004-04-15 12:15 ` jamal
@ 2004-04-15 19:35 ` Andy Furniss
2004-04-16 3:52 ` jamal
0 siblings, 1 reply; 35+ messages in thread
From: Andy Furniss @ 2004-04-15 19:35 UTC (permalink / raw)
To: hadi; +Cc: netdev
jamal wrote:
> On Thu, 2004-04-15 at 05:42, Andy Furniss wrote:
>
>
>>The only reason I use IMQ (+ NAT patch) is that I need to shape ingress
>>(I know I can't shape it "properly" from the wrong end of the bottleneck
>>without an intelligent app, but the ingress policer does not let me
>>share local and forwarded bandwidth and is not fair per user if I just
>>throttle the whole link).
>>
>>I am not sure if dummy will sort this for me, there may be some other way?
>
>
> The summary is dummy can do what IMQ used to; it is however not related
> to iptables/netfilter.
>
>
>>Basically all I need is something I can use HTB on where the qos ingress
>>box is on this diagram.
>
>
> Yes you can attach a HTB. Look at the posted example in the previous
> email and replace prio with HTB.
> Not sure i answered your questions.
What I want to know is what state IP packets will be in if I
filter/shape with dummy - In my case I would need them to have been
demasqued so I can tell the difference between local and to be forwarded
ingress traffic.
Ie. where on the KPTD would dummy be - IMQ appears twice and by using
the IMQ nat patch I can use the prerouting one to filter/shape the
packets after they are denatted.
Andy.
>
> Again to emphasize: I will send patches only to people interested.
> People have to ask directly;
> this is my way of monitoring what is being tested. At some point i will
> make the latest patches available to everyone.
>
> cheers,
> jamal
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
2004-04-15 19:35 ` Andy Furniss
@ 2004-04-16 3:52 ` jamal
2004-04-16 19:35 ` Andy Furniss
0 siblings, 1 reply; 35+ messages in thread
From: jamal @ 2004-04-16 3:52 UTC (permalink / raw)
To: Andy Furniss; +Cc: netdev
On Thu, 2004-04-15 at 15:35, Andy Furniss wrote:
> jamal wrote:
> What I want to know is what state IP packets will be in if I
Just to be sure, this is not specific just to IP; it could be ARP, IPX,
v6 etc.
>
> filter/shape with dummy - In my case I would need them to have been
> demasqued so I can tell the difference between local and to be forwarded
> ingress traffic.
The packets are grabbed before NAT on the way in and after NAT on the
way out.
Coming from non-local machines before NAT you can redirect to a dummy
device; and also be able to redirect on their way back to the non-local;
to use the example i posted earlier:
----
$TC qdisc add dev dummy0 root handle 1: prio
$TC qdisc add dev dummy0 parent 1:1 handle 10: sfq
$TC qdisc add dev dummy0 parent 1:2 handle 20: tbf rate 20kbit buffer
1600 limit
3000
$TC qdisc add dev dummy0 parent 1:3 handle 30:
sfq
$TC filter add dev dummy0 protocol ip pref 1 parent 1: handle 1 fw
classid 1:1
$TC filter add dev dummy0 protocol ip pref 2 parent 1: handle 2 fw
classid 1:2
ifconfig dummy0 up
#deal with ingress of eth0 first
$TC qdisc add dev eth0 ingress
# redirect all IP packets arriving from 10.0.0.21/24 in eth0 to dummy0
# use mark 1 --> puts them onto class 1:1 of dummy
#
$TC filter add dev eth0 parent ffff: protocol ip prio 10 u32 \
match ip src 10.0.0.21/24 flowid 1:1 \
action ipt -j MARK --set-mark 1 \
action mirred egress redirect dev dummy0
#deal with egress of eth0
$TC qdisc add dev eth0 root handle 1: prio
# redirect all IP packets going to 10.0.0.21/24 in eth0 to dummy0
# use mark 2 --> puts them onto class 1:2 of dummy
#
$TC filter add dev eth0 parent 1:0 protocol ip prio 10 u32 \
match ip dst 10.0.0.21/24 flowid 1:1 \
action ipt -j MARK --set-mark 2 \
action mirred egress redirect dev dummy0
-----
I havent tested the above but it should work (sans syntax bugs). If it
doesnt then we have a bug that needs fixing.
> Ie. where on the KPTD would dummy be - IMQ appears twice and by using
> the IMQ nat patch I can use the prerouting one to filter/shape the
> packets after they are denatted.
>
does the above help?
cheers,
jamal
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: IMQ / new Dummy device post.
2004-04-16 3:52 ` jamal
@ 2004-04-16 19:35 ` Andy Furniss
[not found] ` <1082145341.1026.125.camel@jzny.localdomain>
0 siblings, 1 reply; 35+ messages in thread
From: Andy Furniss @ 2004-04-16 19:35 UTC (permalink / raw)
To: hadi; +Cc: netdev
jamal wrote:
> On Thu, 2004-04-15 at 15:35, Andy Furniss wrote:
>
>>jamal wrote:
>
>
>>What I want to know is what state IP packets will be in if I
>
>
> Just to be sure, this is not specific just to IP; it could be ARP, IPX,
> v6 etc.
>
>
>>
>>filter/shape with dummy - In my case I would need them to have been
>>demasqued so I can tell the difference between local and to be forwarded
>>ingress traffic.
>
>
> The packets are grabbed before NAT on the way in and after NAT on the
> way out.
This is what I wanted to know. Is it possible to make an option to get
them after NAT in and pre NAT out?
> Coming from non-local machines before NAT you can redirect to a dummy
> device; and also be able to redirect on their way back to the non-local;
> to use the example i posted earlier:
>
> ----
> $TC qdisc add dev dummy0 root handle 1: prio
> $TC qdisc add dev dummy0 parent 1:1 handle 10: sfq
> $TC qdisc add dev dummy0 parent 1:2 handle 20: tbf rate 20kbit buffer
> 1600 limit
> 3000
> $TC qdisc add dev dummy0 parent 1:3 handle 30:
> sfq
>
> $TC filter add dev dummy0 protocol ip pref 1 parent 1: handle 1 fw
> classid 1:1
> $TC filter add dev dummy0 protocol ip pref 2 parent 1: handle 2 fw
> classid 1:2
>
> ifconfig dummy0 up
>
> #deal with ingress of eth0 first
> $TC qdisc add dev eth0 ingress
>
> # redirect all IP packets arriving from 10.0.0.21/24 in eth0 to dummy0
> # use mark 1 --> puts them onto class 1:1 of dummy
> #
> $TC filter add dev eth0 parent ffff: protocol ip prio 10 u32 \
> match ip src 10.0.0.21/24 flowid 1:1 \
> action ipt -j MARK --set-mark 1 \
> action mirred egress redirect dev dummy0
>
> #deal with egress of eth0
> $TC qdisc add dev eth0 root handle 1: prio
>
> # redirect all IP packets going to 10.0.0.21/24 in eth0 to dummy0
> # use mark 2 --> puts them onto class 1:2 of dummy
> #
> $TC filter add dev eth0 parent 1:0 protocol ip prio 10 u32 \
> match ip dst 10.0.0.21/24 flowid 1:1 \
> action ipt -j MARK --set-mark 2 \
> action mirred egress redirect dev dummy0
> -----
>
> I havent tested the above but it should work (sans syntax bugs). If it
> doesnt then we have a bug that needs fixing.
I don't think this applies to my setup Masqerading many local onto one
real address.
>
>
>>Ie. where on the KPTD would dummy be - IMQ appears twice and by using
>>the IMQ nat patch I can use the prerouting one to filter/shape the
>>packets after they are denatted.
>>
>
>
> does the above help?
Yes - Thanks.
Andy.
>
> cheers,
> jamal
>
>
^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2004-06-05 21:42 UTC | newest]
Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-19 14:22 IMQ / new Dummy device post syrius.ml
2004-04-20 2:15 ` jamal
2004-04-21 1:43 ` syrius.ml
2004-04-21 12:49 ` syrius.ml
2004-04-21 20:19 ` syrius.ml
2004-04-22 13:16 ` jamal
2004-04-22 17:43 ` syrius.ml
2004-04-23 11:29 ` jamal
2004-04-24 14:14 ` tun device - bug or feature? WAS(Re: " jamal
2004-04-26 4:38 ` David S. Miller
2004-04-26 19:31 ` Max Krasnyansky
2004-04-27 2:22 ` jamal
2004-05-08 11:55 ` jamal
2004-05-10 17:18 ` Max Krasnyansky
2004-06-05 13:24 ` PATCH: " jamal
2004-06-05 21:42 ` David S. Miller
-- strict thread matches above, loose matches on Subject: below --
2004-04-15 9:42 Andy Furniss
2004-04-15 12:15 ` jamal
2004-04-15 19:35 ` Andy Furniss
2004-04-16 3:52 ` jamal
2004-04-16 19:35 ` Andy Furniss
[not found] ` <1082145341.1026.125.camel@jzny.localdomain>
2004-04-17 10:39 ` Andy Furniss
2004-04-17 12:09 ` jamal
2004-04-17 21:56 ` Andy Furniss
2004-04-18 14:28 ` jamal
2004-04-18 16:35 ` Andy Furniss
2004-04-18 20:34 ` Andy Furniss
2004-04-18 21:07 ` jamal
2004-04-18 21:31 ` Andy Furniss
2004-04-18 21:45 ` Andy Furniss
2004-04-18 20:53 ` jamal
2004-04-18 21:23 ` Martin Josefsson
2004-04-18 21:58 ` Andy Furniss
2004-04-19 8:14 ` Martin Josefsson
2004-04-19 12:33 ` syrius.ml
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).