netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* HFSC classes going out of bounds, regression in recent kernels?
@ 2010-04-06 15:22 Denys Fedorysychenko
  2010-04-06 15:37 ` Patrick McHardy
  0 siblings, 1 reply; 6+ messages in thread
From: Denys Fedorysychenko @ 2010-04-06 15:22 UTC (permalink / raw)
  To: netdev, Jeff Garzik, Eric Dumazet, Patrick McHardy

Hi

I notice on one of my QoS machines that HFSC start going out of bandwidth 
limits. The most terrible thing - it happens suddenly, and if i just relaunch 
QoS script - everything will work fine.
I'm not sure it is not my mistake, but most probably it is a bug.
I can't tell for sure when it is happened, last kernel was on this machine 
2.6.28 i guess, or maybe even older.

The only possible cause - that i must setup stab for each qdisc, not only 
root. But i see it goes wild too much on graphs (up to 25-26 Mbps for 5 min 
average).
Here is qdisc setup:

qdisc bfifo 110: parent 1:110 limit 100000b
qdisc hfsc 1: root refcnt 2
 linklayer ethernet overhead 4 mpu 64 mtu 2047 tsize 512
qdisc bfifo 120: parent 1:120 limit 100000b
qdisc bfifo 130: parent 1:130 limit 100000b
qdisc bfifo 140: parent 1:140 limit 100000b
qdisc bfifo 150: parent 1:150 limit 1000000b
qdisc bfifo 160: parent 1:160 limit 100000b
qdisc bfifo 170: parent 1:170 limit 100000b
qdisc bfifo 180: parent 1:180 limit 100000b
qdisc bfifo 190: parent 1:190 limit 100000b


All offloading (except checksumming) disabled on eth0 and eth0.33.


Here is example of stats, on this snapshot total bandwidth goes out of bounds 
20Mbit/s to 21.6 Mbit/s.

Router# tc -s -d class show dev eth0.33;sleep 10;tc -s -d class show dev 
eth0.33;                                                                                                                 
class hfsc 1:110 parent 1:100 leaf 110: sc m1 0bit d 0us m2 8000bit                                                                                                                                       
 Sent 1544928 bytes 18392 pkt (dropped 0, overlimits 0 requeues 0)                                                                                                                                        
 rate 1344bit 2pps backlog 0b 0p requeues 0                                                                                                                                                               
 period 18392 work 1544928 bytes rtwork 1411368 bytes level 0                                                                                                                                             
                                                                                                                                                                                                          
class hfsc 1: root                                                                                                                                                                                        
 Sent 4180 bytes 24 pkt (dropped 0, overlimits 0 requeues 0)                                                                                                                                              
 backlog 0b 0p requeues 0                                                                                                                                                                                 
 period 0 level 2                                                                                                                                                                                         
                                                                                                                                                                                                          
class hfsc 1:100 parent 1: sc m1 0bit d 0us m2 20000Kbit ul m1 0bit d 0us m2 
20000Kbit                                                                                                                    
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)                                                                                                                                                  
 rate 0bit 0pps backlog 0b 0p requeues 0                                                                                                                                                                  
 period 74 work 22978893888 bytes level 1                                                                                                                                                                 
                                                                                                                                                                                                          
class hfsc 1:130 parent 1:100 leaf 130: sc m1 0bit d 0us m2 7000Kbit ul m1 
0bit d 0us m2 20000Kbit                                                                                                        
 Sent 5900522796 bytes 19334198 pkt (dropped 0, overlimits 0 requeues 0)                                                                                                                                  
 rate 5826Kbit 2360pps backlog 0b 0p requeues 0                                                                                                                                                           
 period 13454851 work 5900522796 bytes rtwork 3473525428 bytes level 0                                                                                                                                    
                                                                                                                                                                                                          
class hfsc 1:120 parent 1:100 leaf 120: sc m1 0bit d 0us m2 3000Kbit ul m1 
0bit d 0us m2 20000Kbit                                                                                                        
 Sent 659289392 bytes 3151731 pkt (dropped 484, overlimits 0 requeues 0)                                                                                                                                  
 rate 490560bit 329pps backlog 0b 0p requeues 0                                                                                                                                                           
 period 2902223 work 659289392 bytes rtwork 391958652 bytes level 0                                                                                                                                       
                                                                                                                                                                                                          
class hfsc 1:150 parent 1:100 leaf 150: sc m1 0bit d 0us m2 2000Kbit ul m1 
0bit d 0us m2 20000Kbit                                                                                                        
 Sent 15792563632 bytes 79195890 pkt (dropped 376595, overlimits 0 requeues 0)                                                                                                                            
 rate 14354Kbit 8650pps backlog 0b 0p requeues 0                                                                                                                                                          
 period 9906952 work 15792563632 bytes rtwork 2219169504 bytes level 0                                                                                                                                    
                                                                                                                                                                                                          
class hfsc 1:140 parent 1:100 leaf 140: sc m1 0bit d 0us m2 2000Kbit ul m1 
0bit d 0us m2 20000Kbit                                                                                                        
 Sent 87850052 bytes 540018 pkt (dropped 0, overlimits 0 requeues 0)                                                                                                                                      
 rate 47024bit 70pps backlog 0b 0p requeues 0                                                                                                                                                             
 period 527595 work 87850052 bytes rtwork 82404272 bytes level 0                                                                                                                                          
                                                                                                                                                                                                          
class hfsc 1:170 parent 1:100 leaf 170: sc m1 0bit d 0us m2 256000bit ul m1 
0bit d 0us m2 20000Kbit                                                                                                       
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)                                                                                                                                                  
 rate 0bit 0pps backlog 0b 0p requeues 0                                                                                                                                                                  
 period 0 level 0                                                                                                                                                                                         
                                                                                                                                                                                                          
class hfsc 1:160 parent 1:100 leaf 160: sc m1 0bit d 0us m2 256000bit ul m1 
0bit d 0us m2 20000Kbit                                                                                                       
 Sent 75659416 bytes 58924 pkt (dropped 201, overlimits 0 requeues 0)                                                                                                                                     
 rate 256bit 0pps backlog 0b 0p requeues 0                                                                                                                                                                
 period 27786 work 75659416 bytes rtwork 14074548 bytes level 0                                                                                                                                           
                                                                                                                                                                                                          
class hfsc 1:190 parent 1:100 leaf 190: sc m1 0bit d 0us m2 200000bit ul m1 
0bit d 0us m2 400000bit                                                                                                       
 Sent 459905752 bytes 5083942 pkt (dropped 2305517, overlimits 0 requeues 0)                                                                                                                              
 rate 400112bit 548pps backlog 0b 1094p requeues 0                                                                                                                                                        
 period 3 work 459806144 bytes rtwork 229903112 bytes level 0                                                                                                                                             
                                                                                                                                                                                                          
class hfsc 1:180 parent 1:100 leaf 180: sc m1 0bit d 0us m2 2000Kbit ul m1 
0bit d 0us m2 20000Kbit                                                                                                        
 Sent 1657528 bytes 7985 pkt (dropped 0, overlimits 0 requeues 0)                                                                                                                                         
 rate 1320bit 1pps backlog 0b 0p requeues 0                                                                                                                                                               
 period 7097 work 1657528 bytes rtwork 1380512 bytes level 0                                                                                                                                              

------------------------
After 10 seconds
                                                                                                                                                                                                          
class hfsc 1:110 parent 1:100 leaf 110: sc m1 0bit d 0us m2 8000bit                                                                                                                                       
 Sent 1546608 bytes 18412 pkt (dropped 0, overlimits 0 requeues 0)                                                                                                                                        
 rate 1344bit 2pps backlog 0b 0p requeues 0                                                                                                                                                               
 period 18412 work 1546608 bytes rtwork 1413048 bytes level 0                                                                                                                                             
                                                                                                                                                                                                          
class hfsc 1: root                                                                                                                                                                                        
 Sent 4180 bytes 24 pkt (dropped 0, overlimits 0 requeues 0)                                                                                                                                              
 backlog 0b 0p requeues 0                                                                                                                                                                                 
 period 0 level 2                                                                                                                                                                                         
                                                                                                                                                                                                          
class hfsc 1:100 parent 1: sc m1 0bit d 0us m2 20000Kbit ul m1 0bit d 0us m2 
20000Kbit                                                                                                                    
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)                                                                                                                                                  
 rate 0bit 0pps backlog 0b 0p requeues 0                                                                                                                                                                  
 period 74 work 23004936640 bytes level 1                                                                                                                                                                 

class hfsc 1:130 parent 1:100 leaf 130: sc m1 0bit d 0us m2 7000Kbit ul m1 
0bit d 0us m2 20000Kbit
 Sent 5907643408 bytes 19355979 pkt (dropped 0, overlimits 0 requeues 0)
 rate 5663Kbit 2226pps backlog 0b 6p requeues 0
 period 13471026 work 5907643020 bytes rtwork 3477346092 bytes level 0

class hfsc 1:120 parent 1:100 leaf 120: sc m1 0bit d 0us m2 3000Kbit ul m1 
0bit d 0us m2 20000Kbit
 Sent 659705168 bytes 3154632 pkt (dropped 484, overlimits 0 requeues 0)
 rate 380008bit 303pps backlog 0b 0p requeues 0
 period 2905063 work 659705168 bytes rtwork 392246028 bytes level 0

class hfsc 1:150 parent 1:100 leaf 150: sc m1 0bit d 0us m2 2000Kbit ul m1 
0bit d 0us m2 20000Kbit
 Sent 15810495520 bytes 79281120 pkt (dropped 376595, overlimits 0 requeues 0)
 rate 14362Kbit 8547pps backlog 0b 0p requeues 0
 period 9928987 work 15810495520 bytes rtwork 2221499996 bytes level 0

class hfsc 1:140 parent 1:100 leaf 140: sc m1 0bit d 0us m2 2000Kbit ul m1 
0bit d 0us m2 20000Kbit
 Sent 87918300 bytes 540801 pkt (dropped 0, overlimits 0 requeues 0)
 rate 50528bit 74pps backlog 0b 0p requeues 0
 period 528353 work 87918300 bytes rtwork 82463492 bytes level 0

class hfsc 1:170 parent 1:100 leaf 170: sc m1 0bit d 0us m2 256000bit ul m1 
0bit d 0us m2 20000Kbit
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
 rate 0bit 0pps backlog 0b 0p requeues 0
 period 0 level 0

class hfsc 1:160 parent 1:100 leaf 160: sc m1 0bit d 0us m2 256000bit ul m1 
0bit d 0us m2 20000Kbit
 Sent 75659480 bytes 58925 pkt (dropped 201, overlimits 0 requeues 0)
 rate 96bit 0pps backlog 0b 0p requeues 0
 period 27787 work 75659480 bytes rtwork 14074612 bytes level 0

class hfsc 1:190 parent 1:100 leaf 190: sc m1 0bit d 0us m2 200000bit ul m1 
0bit d 0us m2 400000bit
 Sent 460405988 bytes 5089449 pkt (dropped 2307857, overlimits 0 requeues 0)
 rate 399960bit 548pps backlog 0b 1109p requeues 0
 period 3 work 460306304 bytes rtwork 230153172 bytes level 0

class hfsc 1:180 parent 1:100 leaf 180: sc m1 0bit d 0us m2 2000Kbit ul m1 
0bit d 0us m2 20000Kbit
 Sent 1662240 bytes 8007 pkt (dropped 0, overlimits 0 requeues 0)
 rate 2088bit 1pps backlog 0b 0p requeues 0
 period 7118 work 1662240 bytes rtwork 1384776 bytes level 0

qdisc setup

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: HFSC classes going out of bounds, regression in recent kernels?
  2010-04-06 15:22 HFSC classes going out of bounds, regression in recent kernels? Denys Fedorysychenko
@ 2010-04-06 15:37 ` Patrick McHardy
  2010-04-06 15:45   ` Denys Fedorysychenko
                     ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Patrick McHardy @ 2010-04-06 15:37 UTC (permalink / raw)
  To: Denys Fedorysychenko; +Cc: netdev, Jeff Garzik, Eric Dumazet

[-- Attachment #1: Type: text/plain, Size: 593 bytes --]

Denys Fedorysychenko wrote:
> I notice on one of my QoS machines that HFSC start going out of bandwidth 
> limits. The most terrible thing - it happens suddenly, and if i just relaunch 
> QoS script - everything will work fine.

That sounds like there's an overflow somewhere.

> I'm not sure it is not my mistake, but most probably it is a bug.
> I can't tell for sure when it is happened, last kernel was on this machine 
> 2.6.28 i guess, or maybe even older.

Looking through the recent patches in this area, my prime suspect
is the attached patch. Does reverting it make any difference?


[-- Attachment #2: x --]
[-- Type: text/plain, Size: 1726 bytes --]

commit a4a710c4a7490587406462bf1d54504b7783d7d7
Author: Jarek Poplawski <jarkao2@gmail.com>
Date:   Mon Jun 8 22:05:13 2009 +0000

    pkt_sched: Change PSCHED_SHIFT from 10 to 6
    
    Change PSCHED_SHIFT from 10 to 6 to increase schedulers time
    resolution. This will increase 16x a number of (internal) ticks per
    nanosecond, and is needed to improve accuracy of schedulers based on
    rate tables, like HTB, TBF or CBQ, with rates above 100Mbit. It is
    assumed this change is safe for 32bit accounting of time diffs up
    to 2 minutes, which should be enough for common use (extremely low
    rate values may overflow, so get inaccurate instead). To make full
    use of this change an updated iproute2 will be needed. (But using
    older iproute2 should be safe too.)
    
    This change breaks ticks - microseconds similarity, so some minor code
    fixes might be needed. It is also planned to change naming adequately
    eg. to PSCHED_TICKS2NS() etc. in the near future.
    
    Reported-by: Antonio Almeida <vexwek@gmail.com>
    Tested-by: Antonio Almeida <vexwek@gmail.com>
    Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index cd0e026..120935b 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -41,8 +41,8 @@ static inline void *qdisc_priv(struct Qdisc *q)
 typedef u64	psched_time_t;
 typedef long	psched_tdiff_t;
 
-/* Avoid doing 64 bit divide by 1000 */
-#define PSCHED_SHIFT			10
+/* Avoid doing 64 bit divide */
+#define PSCHED_SHIFT			6
 #define PSCHED_US2NS(x)			((s64)(x) << PSCHED_SHIFT)
 #define PSCHED_NS2US(x)			((x) >> PSCHED_SHIFT)
 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: HFSC classes going out of bounds, regression in recent kernels?
  2010-04-06 15:37 ` Patrick McHardy
@ 2010-04-06 15:45   ` Denys Fedorysychenko
  2010-04-11  2:10   ` Denys Fedorysychenko
  2010-05-20  1:34   ` Denys Fedorysychenko
  2 siblings, 0 replies; 6+ messages in thread
From: Denys Fedorysychenko @ 2010-04-06 15:45 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, Jeff Garzik, Eric Dumazet

On Tuesday 06 April 2010 18:37:39 Patrick McHardy wrote:
> Denys Fedorysychenko wrote:
> > I notice on one of my QoS machines that HFSC start going out of bandwidth
> > limits. The most terrible thing - it happens suddenly, and if i just
> > relaunch QoS script - everything will work fine.
> 
> That sounds like there's an overflow somewhere.
> 
> > I'm not sure it is not my mistake, but most probably it is a bug.
> > I can't tell for sure when it is happened, last kernel was on this
> > machine 2.6.28 i guess, or maybe even older.
> 
> Looking through the recent patches in this area, my prime suspect
> is the attached patch. Does reverting it make any difference?
> 
I will try to upgrade soon, it is critical router, so probably i will do this 
tonight. 
I guess with reverting this patch also it will hurt shaper resolution on high 
speeds... not a case for me, but for other people.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: HFSC classes going out of bounds, regression in recent kernels?
  2010-04-06 15:37 ` Patrick McHardy
  2010-04-06 15:45   ` Denys Fedorysychenko
@ 2010-04-11  2:10   ` Denys Fedorysychenko
  2010-05-20  1:34   ` Denys Fedorysychenko
  2 siblings, 0 replies; 6+ messages in thread
From: Denys Fedorysychenko @ 2010-04-11  2:10 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, Jeff Garzik, Eric Dumazet

On Tuesday 06 April 2010 18:37:39 Patrick McHardy wrote:
> Denys Fedorysychenko wrote:
> > I notice on one of my QoS machines that HFSC start going out of bandwidth
> > limits. The most terrible thing - it happens suddenly, and if i just
> > relaunch QoS script - everything will work fine.
> 
> That sounds like there's an overflow somewhere.
> 
> > I'm not sure it is not my mistake, but most probably it is a bug.
> > I can't tell for sure when it is happened, last kernel was on this
> > machine 2.6.28 i guess, or maybe even older.
> 
> Looking through the recent patches in this area, my prime suspect
> is the attached patch. Does reverting it make any difference?
> 
Hi, i made sure - bug not related to this patch. I try to patch (also had to 
test latest stable release, to make sure it is appearing on it), and each test 
taking 1 day. If i "restart" script to resetup HFSC - it works for a while 
fine. At next day, peak time period - it goes wild.

There is another thing also, i am going to try now, old kernel was 64-bit, but 
now 32bit, so i will try to shift to 64bit again.

It can be not regression, but a bug. I guess not many people use HFSC, even it 
is definitely better than HTB. Or i'm wrong?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: HFSC classes going out of bounds, regression in recent kernels?
  2010-04-06 15:37 ` Patrick McHardy
  2010-04-06 15:45   ` Denys Fedorysychenko
  2010-04-11  2:10   ` Denys Fedorysychenko
@ 2010-05-20  1:34   ` Denys Fedorysychenko
  2010-05-31 17:53     ` Michal Soltys
  2 siblings, 1 reply; 6+ messages in thread
From: Denys Fedorysychenko @ 2010-05-20  1:34 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, Jeff Garzik, Eric Dumazet

I am trying to track down HFSC bug.
It seems, most probably it is related to PSCHED_SHIFT at the end, i am doing 
testing again. I will try to do complete clean build, maybe last time some .o 
was left or i forgot to do make clean.

SM_SHIFT in HFSC is calculated as 30 - PSCHED_SHIFT, and it is shifted too 
much (or not enough) with new changes (ISM_SHIFT seems wrong too). So it is 
most probably overflow or not enough resolution.
I will try to change PSCHED_SHIFT back to confirm that, and at least i found 
way to reproduce bug.

Additionally in sch_hfsc.c i notice mentioned that PSCHED_SHIFT 10 is tick per 
1024us, but i try to calculate their table (in source comments), it doesn't 
fit with my calculations based on 1024us/tick, but fits well with 1024 
nanosecond.

Is it was 1024ns per tick and now 64ns per tick? Or it is microseconds(us) ?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: HFSC classes going out of bounds, regression in recent kernels?
  2010-05-20  1:34   ` Denys Fedorysychenko
@ 2010-05-31 17:53     ` Michal Soltys
  0 siblings, 0 replies; 6+ messages in thread
From: Michal Soltys @ 2010-05-31 17:53 UTC (permalink / raw)
  To: Denys Fedorysychenko; +Cc: Patrick McHardy, netdev, Jeff Garzik, Eric Dumazet

On 10-05-20 03:34, Denys Fedorysychenko wrote:
> I am trying to track down HFSC bug.
> It seems, most probably it is related to PSCHED_SHIFT at the end, i am doing
> testing again. I will try to do complete clean build, maybe last time some .o
> was left or i forgot to do make clean.
> 
> SM_SHIFT in HFSC is calculated as 30 - PSCHED_SHIFT, and it is shifted too
> much (or not enough) with new changes (ISM_SHIFT seems wrong too). So it is
> most probably overflow or not enough resolution.
> I will try to change PSCHED_SHIFT back to confirm that, and at least i found
> way to reproduce bug.
> 
> Additionally in sch_hfsc.c i notice mentioned that PSCHED_SHIFT 10 is tick per
> 1024us, but i try to calculate their table (in source comments), it doesn't
> fit with my calculations based on 1024us/tick, but fits well with 1024
> nanosecond.
> 
> Is it was 1024ns per tick and now 64ns per tick? Or it is microseconds(us) ?
> --

In the old table (when PSCHED_SHIFT was 10) in the hfsc file, the requirements 
to keep 4 decimal digits were 20 and 18 for SM_ and ISM_ respectively. 
For the reference:

 *  bits/sec      100Kbps     1Mbps     10Mbps     100Mbps    1Gbps
 *  ------------+-------------------------------------------------------
 *  bytes/1.024us 12.8e-3    128e-3     1280e-3    12800e-3   128000e-3
 *  1.024us/byte  78.125     7.8125     0.78125    0.078125   0.0078125
 *
 * So, for PSCHED_SHIFT 10 we need: SM_SHIFT 20, ISM_SHIFT 18.


Considering the table - and if I didn't miss anything - they were swapped.

psticks/byte in its "corner" case (0.0078125) requires 1e6, which corrsponds to 
ISM_SHIFT == 20.

Similary, corner case for bytes/pstick (0.0128) requires 1e5, where 
SM_SHIFT == 17 would be sufficient.

The above assuming pstick is 1.024us

Currently, with PSCHED_SHIFT == 6 (so pstick is 64ns), the define macros 
actually hit the spot, giving following values:

bytes/pstick: SM_SHIFT  24 (0.0008 @ 100kbit)
psticks/byte: ISM_SHIFT 14 (0.125  @ 1gbit)

But in generic case - the macros will not yield appropriate results - e.g. 
for PSCHED_SHIFT == 10, SM_SHIFT would be 20 and ISM_SHIFT would be 18 (not 
enough for 4 decimal digits).

On a related subject - it would probably be desirable to consider smaller 
speeds than 100kbit, as adsl links with mere 128kbit upstream total are 
nothing out of ordinary. If we considered 10kbit as the reference value, we 
would need SM_SHIFT 27 to cover 0.00008 @ 10kbit. Not sure if 10gbit is 
desirable too - ISM_SHIFT would need 17 then. All assuming that 4 decimal 
digits is what we really need.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-05-31 18:15 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-06 15:22 HFSC classes going out of bounds, regression in recent kernels? Denys Fedorysychenko
2010-04-06 15:37 ` Patrick McHardy
2010-04-06 15:45   ` Denys Fedorysychenko
2010-04-11  2:10   ` Denys Fedorysychenko
2010-05-20  1:34   ` Denys Fedorysychenko
2010-05-31 17:53     ` Michal Soltys

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).