All of lore.kernel.org
 help / color / mirror / Atom feed
* [LARTC] HTB and PRIO qdiscs introducing extra latency when
@ 2005-07-27 10:28 Jonathan Lynch
  2005-07-27 13:25 ` [LARTC] HTB and PRIO qdiscs introducing extra latency when output Andy Furniss
                   ` (9 more replies)
  0 siblings, 10 replies; 14+ messages in thread
From: Jonathan Lynch @ 2005-07-27 10:28 UTC (permalink / raw)
  To: lartc


Im using a Linux machine with standard pc hardware with 3 seperate PCI
network interfaces to operate as a DiffServ core router using Linux
traffic control. The machine is a P4 2.8ghz, 512mb RAM running fedora
core 3 with the 2.6.12.3 kernel. All links and network interfaces are
full duplex fast ethernet. IP forwarding is enabled in the kernel. All
hosts on the network have their time sychronised using a stratum 1
server on the same VLAN. Below is a ascii diagram of the network. 

(network A) edge router ------>core router---->edge router (network C)
                                    ^
                                    |
                                    |
                               edge router
                               (network B)

Core Router Configuration:
---------------------------
The core router implements the expedited forwarding PHB. I have tried 2
Different Configurations.
1. HTB qdisc with two htb classes. One which services VoIP traffic
(marked with EF codepoint) VoIP traffic is guaranteed to serviced at a
minimum rate of 1500 kbit. This htb class is serviced by a fifo queue
with a limit of 5 packets. The 2nd htb class guarantees all other
traffic to serviced at a minimum rate of 5mbit. The RED qdisc services
this htb class.

2. PRIO qdisc with token a bucket filter to service VoIP traffic (marked
with EF codepoint) VoIP traffic with a guaranteed minimum rate of 1500
kbit. A RED qdisc to service all other traffic.

Test 1.
---------------------------
VoIP traffic originates from network A and is destined to network C. The
throughput of VoIP traffic is 350 kbit. No other traffic passes through
the core router during this time. These Voip packets are marked with the
EF codepoint. Using either of the above mentioned configurations for the
core router, the delay of the VoIP traffic in travelling from network A
to network C passing through the core router is 0.25 milliseconds.

Test 2.
---------------------------
Again VoIP traffic originates from network A and is destined to netwotk
C with a throughput of 350 kbit. TCP traffic also originates from
another host in network A and is destined for another host in network C.
More TCP traffic originates from network B and is destined to network C.
This TCP traffic is from transfering large files through http. As a
result a bottleneck is created at the outgoing interface of the core
router to network C. The combined TCP traffic from these sources is
nearly 100 mbit. Using either of the above mentioned configurations for
the core router, the delay of the VoIP traffic in travelling from
network A to network C passing through the core router is 30ms
milliseconds with 0% loss. There is a considerable amount of TCP packets
dropped.

Could anyone tell me why the delay is so high (30ms) for VoIP packets
which are treated with the EF phb when the outgoing interface of core
router to network c is saturated ?

Is it due to operating system factors ?
Has anyone else had similar experiences ?

Also I would appreciate if anyone could give me performace metrics as to
approximately how many packets per second a router running Linux with
standard pc hardware can forward. Or even mention any factors that would
affect this performance. Im assume the system interrupt frequncy HZ will
affect performance in some way.

Jonathan Lynch


Note: I already posted the same question to the list a few weeks back
but got no reply. I have reworded my question so it is clearer.

-----------------------------------------------------------------------------------------------
The config I used for each setup is included below. These are slight
modifications that are supplied with iproute2 source code.

Config 1 using htb
-------------------
tc qdisc add dev $1 handle 1:0 root dsmark indices 64 set_tc_index
tc filter add dev $1 parent 1:0 protocol ip prio 1 tcindex mask 0xfc
shift 2

Main htb qdisc & class
tc qdisc add dev $1 parent 1:0 handle 2:0 htb
tc class add dev $1 parent 2:0 classid 2:1 htb rate 100Mbit ceil 100Mbit

EF Class (2:10)
tc class add dev $1 parent 2:1 classid 2:10 htb rate 1500Kbit ceil
100Mbit
tc qdisc add dev $1 parent 2:10 pfifo limit 5
tc filter add dev $1 parent 2:0 protocol ip prio 1 handle 0x2e tcindex
classid 2:10 pass_on

BE Class (2:20)
tc class add dev $1 parent 2:1 classid 2:20 htb rate 5Mbit ceil 100Mbit
tc qdisc add dev $1 parent 2:20 red limit 60KB min 15KB max 45KB burst
20 avpkt 1000 bandwidth 100Mbit probability 0.4
tc filter add dev $1 parent 2:0 protocol ip prio 2 handle 0 tcindex mask
0 classid 2:20 pass_on

Config 2 using PRIO
-------------------
Main dsmark & classifier
tc qdisc add dev $1 handle 1:0 root dsmark indices 64 set_tc_index
tc filter add dev $1 parent 1:0 protocol ip prio 1 tcindex mask 0xfc
shift 2

Main prio queue
tc qdisc add dev $1 parent 1:0 handle 2:0 prio
tc qdisc add dev $1 parent 2:1 tbf rate 1.5Mbit burst 1.5kB limit 1.6kB
tc filter add dev $1 parent 2:0 protocol ip prio 1 handle 0x2e tcindex
classid 2:1 pass_on

BE class(2:2)
tc qdisc add dev $1 parent 2:2 red limit 60KB min 15KB max 45KB burst 20
avpkt 1000 bandwidth 100Mbit probability 0.4
tc filter add dev $1 parent 2:0 protocol ip prio 2 handle 0 tcindex mask
0 classid 2:2 pass_on


_______________________________________________
LARTC mailing list
LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc

^ permalink raw reply	[flat|nested] 14+ messages in thread
* Re: [LARTC] HTB and PRIO qdiscs introducing extra latency when output
@ 2005-08-10 18:37 Jonathan Lynch
  2005-08-11 16:36 ` Andy Furniss
  0 siblings, 1 reply; 14+ messages in thread
From: Jonathan Lynch @ 2005-08-10 18:37 UTC (permalink / raw)
  To: lartc

Andy, thanks for all the feedback. I was away on holidays for the last
week and am only back today. I have a few more questions which are
listed below.



On Wed, 2005-08-03 at 15:04 +0100, Andy Furniss wrote:
> Jonathan Lynch wrote:
> > I did the same tests that I outlined earlier, but this time by setting
> > hysteresis to 0. The config for the core router is included at the
> > bottom. The graphs for the delay of the voip stream and the traffic
> > going through the core router can be found at the following addresses.
> > 
> > http://140.203.56.30/~jlynch/htb/core_router_hysteresis.png
> > http://140.203.56.30/~jlynch/htb/voip_stream_24761_hysteresis.png
> > 
> > 
> > The max delay of the stream has dropped to 1.8ms. Again the jitter seems
> > to be around 1ms. There seems to be a pattern going whereby the delay
> > reaches about 1.6ms then drops back to 0.4 ms, jumps back to 1.6ms and
> > then back to 0.4ms repeatedly and then it rises from 0.5ms gradually and
> > repeats this behaviour. Is there any explanation to this pattern ?
> > 
> > Would it have anything go to do with burst being 1ms ?
> 
> Yes I suppose if you could sample truly randomly you would get a proper 
> distribution - I guess the pattern arises because your timers are 
> synchronised for the test.



I dont understand what you mean when you say "if you could sample truly
randomly you would get a proper distribution". 

Also having the timers synchronized will allow for more accurate
measurements of the delay. I cant see how this would have an impact on
the pattern.


> > 
> > When the ceil is specified as being 90mbit, is this at IP level ? 
> > What does this correspond to when a Mbit = 1,000,000 bits. Im a bit
> > confused with the way tc interprets this rate.
> 
> Yes htb uses ip level length (but you can specify overhead & min size) , 
> the rate calculations use a lookup table which is likely to have a 
> granularity of 8 bytes (you can see this with tc -s -d class ls .. look 
> for /8 after the burst/cburst).
> 
> There is a choice in 2.6 configs about using CPU/jiffies/gettimeofday - 
> I use CPU and now I've got a ping that does < 1 sec I get the same 
> results as you.
> 

I have the default setting which is to set it to jiffies. There is a
comment in the kernal config for Packet scheduler clock source that
mentions that Jiffies "its resolution is too low for accurate shaping
except at very low speed". I will recompile the kernel and try the CPU
option tomorrow to see if there is any change.

> > 
> > If the ceil is based at IP level then the max ceil is going to be a
> > value between 54 Mbit and 97 Mbit (not the tc values) for a 100 Mbit
> > interface depending on the size of the packets passing through, right ?
> > 
> > Minimum Ethernet frame
> > 148,809 * (46 * 8) =   148,809 * 368 = 54,761,712 Mbps
> > 
> > Maximum Ethernet frame
> > 8,127 * (1500 * 8) =   8,127 * 12,000 =  97,524,000 Mbps
> 
> If you use the overhead option I think you will be to overcome this 
> limitation and push the rates closer to 100mbit.
> 
> 
> > About the red settings, I dont understand properly how to configure the
> > settings. I was using the configuration that came with the examples.
> 
> I don't use red it was just something I noticed - maybe making it longer 
> would help, maybe my test wasn't rerpresentative.
> 
> FWIW I had a play around with HFSC (not that I know what I am doing 
> really) and at 92mbit managed to get -
> 
> rtt min/avg/max/mdev = 0.330/0.414/0.493/0.051 ms loaded
> from
> rtt min/avg/max/mdev = 0.114/0.133/0.187/0.028 ms idle
> 
> and that was through a really cheap switch.
> 
> Andy.


> looked up ethernet overheads and found the figure of 38 bytes per 
> frame, the 46 is min eth payload size? and looking at the way mpu is 
> handled by the tc rate table generator I think you would need to use
> 46 
> + 38 as mpu.
> 
> So on every htb line that has a rate put ..... overhead 38 mpu 84
> 
> I haven't checked those figures or tested close to limits though, the 
> 12k burst would need increasing a bit aswell or that will slightly
> over 
> limit rate at HZ\x1000.
> 
> 
> 
> 
> I haven't checked those figures or tested close to limits though, the 
> 12k burst would need increasing a bit aswell or that will slightly over 
> limit rate at HZ\x1000.
> 
> It seems that htb still uses ip level for burst so 12k is enough.
> 
> With the overhead at 38 I can ceil at 99mbit OK.


I didnt realise such options existed for htb (mpu + overhead). These parameters are not mentioned in the man pages or in the htb manual. 
I presume I have to patch tc to get these features ?. 


Yep 46 is the minimum eth payload size and 38 is the min overhead for ethernet frames.

interframe gap	96bits	12 bytes	
+preamble	56bits	 7 bytes
+sfd		 8bits	 1 byte
+eth header             14 bytes     
+crc                     4 bytes
			---------
			38 bytes overhead per ethernet frame.



Jonathan



_______________________________________________
LARTC mailing list
LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc

^ permalink raw reply	[flat|nested] 14+ messages in thread
* Re: [LARTC] HTB and PRIO qdiscs introducing extra latency when output
@ 2005-08-20 20:51 Jonathan Lynch
  0 siblings, 0 replies; 14+ messages in thread
From: Jonathan Lynch @ 2005-08-20 20:51 UTC (permalink / raw)
  To: lartc


I did a number of tests and there doesn't appear to be any noticeable
differences between using CPU and JIFFIES (HZ\x1000) as packet scheduler
source. 

I didn't mention that the outgoing interface on the core router has 2 ip
addresses. One is vlan tagged for the test network im running and the
other is not tagged, so the vast majority of outgoing traffic will be
tagged on the outgoing interface. 


Question 1. When sending a packet on a tagged interface, the 4 byte vlan
header is added between the source address and the ethertype field. As
far as I know the tag is added and removed in the device driver and so
it does not show up when using ethereal or tcpdump. The driver in use is
the e1000 which supports VLANS. As a result the minimum pkt size for
tagged packets is 68bytes and maximum is 1522 bytes.

When using MPU and overhead in htb, does that mean that the overhead
should be increased to 42 (38 + 4 for vlan header) and MPU = 88 (42 +
46). There are both tagged and untagged packets frames on this
interface.


I have run tests using MPU of 84 and overhead of 38 and mpu 88 and
overhead of 42 and increasing the ceil to 90, 95, 97 and 99Mbit. There
are spikes in the delay and the higher I increase the ceil, the higher
the spikes are. 

I have graphs of the tests I done. The filename of the graph should
explain the settings that were used in the test. Hysteresis is set to 0,
CPU as the packet scheduler clock source, mpu and overhead ( 38 and 84
respectively) or vlan mpu and overhead (42 and 88 respectively) and then
the ceiling at either 90,95 or 99 mbit. 


21957_hysteresis_CPU_mpu_overhead_95_ceil.png    
22762_hysteresis_CPU_vlan_mpu_overhead_90_ceil.png 
22875_hysteresis_CPU_vlan_mpu_overhead_99_ceil.png 
24135_hysteresis_CPU_mpu_overhead_90_ceil.png      
24143_hysteresis_CPU_vlan_mpu_overhead_95_ceil.png 
24262_hysteresis_CPU_mpu_overhead_99_ceil.png   

They can be found at

http://www.compsoc.nuigalway.ie/~jlynch/htb/

   

MPU and overhead seems to be used mainly in places where the size of
frames are fixed. Does it make a difference using it with Ethernet where
the frame size is variable ?

When you said you could ceil at 99mbit ok did you look at the max
delay ? Did u notice spikes like what are in my graphs ? Do you have any
idea what could be causing these spikes ??



Other observations: 

I was using values from /proc/net/dev to measure the throughput going
through the core router and I noticed that different network drivers
increment the packet counters differently. (ie for e1000 for a packet
with max Ethernet payload of 1500 bytes byte counter incremented by 1518
bytes which includes eth header and trailer. 3c59x on the other hand
increments by 1514 bytes which does not include the eth trailer. For
vlan tagged packets e1000 increments byte counter by 1522 bytes, 3c59x
increments by 1518 bytes. Have you come across this before ?

Also according to the source of tc, specifically in tc_util.c it refers
to this page http://physics.nist.gov/cuu/Units/binary.html as to how
rates are specified in tc. So i presume tc uses this to specify rate,
ceil etc. This doesnt seem to be mentioned anywhere. 

static const struct rate_suffix {
        const char *name;
        double scale;
} suffixes[] = {
        { "bit",        1. },
        { "Kibit",      1024. },
        { "kbit",       1000. },
        { "mibit",      1024.*1024. },
        { "mbit",       1000000. },
        { "gibit",      1024.*1024.*1024. },
        { "gbit",       1000000000. },
        { "tibit",      1024.*1024.*1024.*1024. },
        { "tbit",       1000000000000. },
        { "Bps",        8. },
        { "KiBps",      8.*1024. },
        { "KBps",       8000. },
        { "MiBps",      8.*1024*1024. },
        { "MBps",       8000000. },
        { "GiBps",      8.*1024.*1024.*1024. },
        { "GBps",       8000000000. },
        { "TiBps",      8.*1024.*1024.*1024.*1024. },
        { "TBps",       8000000000000. },
        { NULL }
};


Jonathan



On Thu, 2005-08-11 at 17:36 +0100, Andy Furniss wrote: 
> Jonathan Lynch wrote:
> 
> > 
> > I dont understand what you mean when you say "if you could sample truly
> > randomly you would get a proper distribution". 
> > 
> > Also having the timers synchronized will allow for more accurate
> > measurements of the delay. I cant see how this would have an impact on
> > the pattern.
> 
> I mean it's possibly just to do with the test if a 0ms - 1ms delay is 
> expected then you could see patterns arising depending on how you 
> measure delay/clock drift or something.
> 
> Now I have two pings that do intervels < 1 sec - the inetutils GNU ping 
> guys implemented it for me :-),  and I also have the iputils one I can 
> simulate a stream better.
> 
> While doing this I noticed that iputils ping actually gives lower 
> latency readings when sending many pps. Using tcpdump deltas I can see 
> the network latency is the same however many pps I do - it's just that 
> when measuring <1ms delays and doing many pps it seems that some code 
> gets cached (guess) and the reported delay changes as a result.
> 
> I mention that just to illustrate that measuring small delays can be 
> misleading and influenced by the exact nature of your setup.
> 
> 
> 
> > 
> > I have the default setting which is to set it to jiffies. There is a
> > comment in the kernal config for Packet scheduler clock source that
> > mentions that Jiffies "its resolution is too low for accurate shaping
> > except at very low speed". I will recompile the kernel and try the CPU
> > option tomorrow to see if there is any change.
> 
> Maybe not in the case of htb - I use CPU and see similar results, the 
> comment about accurate shaping was probably written when HZ\x100, but I 
> suppose it will be better for something :-)
> 
> 
> 
> > 
> > I didnt realise such options existed for htb (mpu + overhead). These parameters are not mentioned in the man pages or in the htb manual. 
> > I presume I have to patch tc to get these features ?. 
> 
> 
> There is mention on the htb page - it was added as a patch so was not 
> designed in, which explains why burst doesn't use it.
> 
> You don't need to patch recent iproute2 it's already in there.
> 
> Andy.

_______________________________________________
LARTC mailing list
LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/cgi-bin/mailman/listinfo/lartc

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2005-10-19 10:59 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-27 10:28 [LARTC] HTB and PRIO qdiscs introducing extra latency when Jonathan Lynch
2005-07-27 13:25 ` [LARTC] HTB and PRIO qdiscs introducing extra latency when output Andy Furniss
2005-07-27 15:37 ` Jonathan Lynch
2005-07-27 21:53 ` Andy Furniss
2005-07-28 16:37 ` Jonathan Lynch
2005-07-28 21:49 ` Andy Furniss
2005-08-02 20:59 ` Jonathan Lynch
2005-08-03 14:04 ` Andy Furniss
2005-08-03 19:32 ` Andy Furniss
2005-08-04 18:06 ` Andy Furniss
2005-10-19 10:59 ` Andy Furniss
  -- strict thread matches above, loose matches on Subject: below --
2005-08-10 18:37 Jonathan Lynch
2005-08-11 16:36 ` Andy Furniss
2005-08-20 20:51 Jonathan Lynch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.