Re: [PATCH 2/2]: [NET_SCHED]: Making rate table lookups more flexible.

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jesper Dangaard Brouer <jdb@comx.dk>
To: Patrick McHardy <kaber@trash.net>
Cc: Jesper Dangaard Brouer <hawk@diku.dk>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	Stephen Hemminger <shemminger@linux-foundation.org>
Subject: Re: [PATCH 2/2]: [NET_SCHED]: Making rate table lookups more	flexible.
Date: Wed, 05 Sep 2007 15:58:18 +0200	[thread overview]
Message-ID: <1189000698.28083.22.camel@localhost.localdomain> (raw)
In-Reply-To: <46DD86F9.8000902@trash.net>

[-- Attachment #1: Type: text/plain, Size: 3532 bytes --]

On Tue, 2007-09-04 at 18:25 +0200, Patrick McHardy wrote:
> Jesper Dangaard Brouer wrote:
> > On Sun, 2007-09-02 at 23:16 +0200, Patrick McHardy wrote:
> > 
> >>Jesper Dangaard Brouer wrote:
> >>
> >>>On Sun, 2 Sep 2007, Patrick McHardy wrote:
> >>>
> >>>Lets focus on the general case, where the functionality actually is 
> >>>needed right away.
> >>>
> >>>In the general case:
> >>>
> >>>- The rate table needs to be aligned (cell_align=-1).
> >>>  (currently, we miscalculates up to 7 bytes on every lookup)
> >>
> >>We will always do that, thats a consequence of storing the
> >>transmission times for multiples of 8b.
> > 
> > 
> > The issue is that we use the lower boundary for calculating the transmit
> > cost. Thus, a 15 bytes packet only have a transmit cost of 8 bytes.
> 
> I believe this is something that should be fixed anyway,
> its better to overestimate than underestimate to stay
> in control of the queue. 

Well, I have attached a patch that uses the upper boundry instead.
The patch uses the cell_align feature.

The patch is very simple it self, but figure out what happens the rtab
array requires a little illustration:

Illustrating the rate table array:
 Legend description
  rtab[x]   : Array index x of rtab[x]
  xmit_sz   : Transmit size contained in rtab[x] (normal transmit time)
  maps[a-b] : Packet sizes from a to b, will map into rtab[x]

Current/old rate table mapping (cell_log:3):
 rtab[0]:=xmit_sz:0  maps[0-7]
 rtab[1]:=xmit_sz:8  maps[8-15]
 rtab[2]:=xmit_sz:16 maps[16-23]
 rtab[3]:=xmit_sz:24 maps[24-31]
 rtab[4]:=xmit_sz:32 maps[32-39]
 rtab[5]:=xmit_sz:40 maps[40-47]
 rtab[6]:=xmit_sz:48 maps[48-55]

New rate table mapping, with kernel cell_align support.
 rtab[0]:=xmit_sz:8  maps[0-8]
 rtab[1]:=xmit_sz:16 maps[9-16]
 rtab[2]:=xmit_sz:24 maps[17-24]
 rtab[3]:=xmit_sz:32 maps[25-32]
 rtab[4]:=xmit_sz:40 maps[33-40]
 rtab[5]:=xmit_sz:48 maps[41-48]
 rtab[6]:=xmit_sz:56 maps[49-56]

New TC util on a kernel WITHOUT support for cell_align
 rtab[0]:=xmit_sz:8 maps[0-7]
 rtab[1]:=xmit_sz:16 maps[8-15]
 rtab[2]:=xmit_sz:24 maps[16-23]
 rtab[3]:=xmit_sz:32 maps[24-31]
 rtab[4]:=xmit_sz:40 maps[32-39]
 rtab[5]:=xmit_sz:48 maps[40-47]
 rtab[6]:=xmit_sz:56 maps[48-55]

Notice that without the kernel cell_align feature, we are only off by
one byte.  That should be acceptable, when somebody uses a new TC util
on a old kernel. 

> We could additionally make the
> rate tables more finegrained (optionally).

That is actually already possible with the approach used to handle
overflow of the rate table ("TSO" large packet support).  By setting
cell_log=0, and letting the overflow code handle the rest, we get a very
fingrained lookup.


> >>>- The existing tc overhead calc can be made more accurate.
> >>>  (by adding overhead before doing the lookup, instead of the
> >>>   current solution where the rate table is modified with its
> >>>   limited resolution)
> >>
> >>Please demonstrate this with patches (one for the overhead
> >>calculation, one for the cell_align thing), then we can
> >>continue this discussion.
> > 
> > 
> > I have attached a patch for the overhead calculation.

Attached is a patch that uses "the cell_align thing".

> Thanks, I probably won't get to looking into this until
> after the netfilter workshop next week.

Okay, but I'll see you at the workshop, so I might bug you there ;-)

-- 
Med venlig hilsen / Best regards
  Jesper Brouer
  ComX Networks A/S
  Linux Network developer
  Cand. Scient Datalog / MSc.
  Author of http://adsl-optimizer.dk

[-- Attachment #2: upperbound_rate_table_aligned.patch --]
[-- Type: text/x-patch, Size: 2156 bytes --]

commit 9a21e8bd56a5f057fc9f605e061c22d264ec27ef
Author: Jesper Dangaard Brouer <hawk@comx.dk>
Date:   Wed Sep 5 15:24:51 2007 +0200

    [IPROUTE2]: Change the rate table calc of transmit cost to use upper bound value.
    
    Patrick McHardy, Cite: 'its better to overestimate than underestimate
    to stay in control of the queue'.
    
    Illustrating the rate table array:
     Legend description
       rtab[x]   : Array index x of rtab[x]
       xmit_sz   : Transmit size contained in rtab[x] (normally transmit time)
       maps[a-b] : Packet sizes from a to b, will map into rtab[x]
    
    Current/old rate table mapping (cell_log:3):
     rtab[0]:=xmit_sz:0  maps[0-7]
     rtab[1]:=xmit_sz:8  maps[8-15]
     rtab[2]:=xmit_sz:16 maps[16-23]
     rtab[3]:=xmit_sz:24 maps[24-31]
     rtab[4]:=xmit_sz:32 maps[32-39]
     rtab[5]:=xmit_sz:40 maps[40-47]
     rtab[6]:=xmit_sz:48 maps[48-55]
    
    New rate table mapping, with kernel cell_align support.
     rtab[0]:=xmit_sz:8  maps[0-8]
     rtab[1]:=xmit_sz:16 maps[9-16]
     rtab[2]:=xmit_sz:24 maps[17-24]
     rtab[3]:=xmit_sz:32 maps[25-32]
     rtab[4]:=xmit_sz:40 maps[33-40]
     rtab[5]:=xmit_sz:48 maps[41-48]
     rtab[6]:=xmit_sz:56 maps[49-56]
    
    New TC util on a kernel WITHOUT support for cell_align
     rtab[0]:=xmit_sz:8 maps[0-7]
     rtab[1]:=xmit_sz:16 maps[8-15]
     rtab[2]:=xmit_sz:24 maps[16-23]
     rtab[3]:=xmit_sz:32 maps[24-31]
     rtab[4]:=xmit_sz:40 maps[32-39]
     rtab[5]:=xmit_sz:48 maps[40-47]
     rtab[6]:=xmit_sz:56 maps[48-55]
    
    Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>

diff --git a/tc/tc_core.c b/tc/tc_core.c
index c713a18..752b07c 100644
--- a/tc/tc_core.c
+++ b/tc/tc_core.c
@@ -84,11 +84,12 @@ int tc_calc_rtable(struct tc_ratespec *r, __u32 *rtab, int cell_log, unsigned mt
 			cell_log++;
 	}
 	for (i=0; i<256; i++) {
-		unsigned sz = (i<<cell_log);
+		unsigned sz = ((i+1)<<cell_log);
 		if (sz < mpu)
 			sz = mpu;
 		rtab[i] = tc_calc_xmittime(bps, sz);
 	}
+	r->cell_align=-1; // Due to the sz calc
 	r->cell_log=cell_log;
 	return cell_log;
 }

[-- Attachment #3: cleanup_tc_calc_rtable_git.patch --]
[-- Type: text/x-patch, Size: 6028 bytes --]

commit 29044ac37e30d9662ad1bb83290a007c492ad7b2
Author: Jesper Dangaard Brouer <hawk@comx.dk>
Date:   Wed Sep 5 10:47:47 2007 +0200

    [IPROUTE2]: Cleanup: tc_calc_rtable().
    
    Change tc_calc_rtable() to take a tc_ratespec struct as an
    argument. (cell_log still needs to be passed on as a parameter,
    because -1 indicate that the cell_log needs to be computed by the
    function.).
    
    Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>

diff --git a/tc/m_police.c b/tc/m_police.c
index 5d2528b..acdfd22 100644
--- a/tc/m_police.c
+++ b/tc/m_police.c
@@ -263,22 +263,20 @@ int act_parse_police(struct action_util *a,int *argc_p, char ***argv_p, int tca_
 	}
 
 	if (p.rate.rate) {
-		if ((Rcell_log = tc_calc_rtable(p.rate.rate, rtab, Rcell_log, mtu, mpu)) < 0) {
+		p.rate.mpu = mpu;
+		if (tc_calc_rtable(&p.rate, rtab, Rcell_log, mtu) < 0) {
 			fprintf(stderr, "TBF: failed to calculate rate table.\n");
 			return -1;
 		}
 		p.burst = tc_calc_xmittime(p.rate.rate, buffer);
-		p.rate.cell_log = Rcell_log;
-		p.rate.mpu = mpu;
 	}
 	p.mtu = mtu;
 	if (p.peakrate.rate) {
-		if ((Pcell_log = tc_calc_rtable(p.peakrate.rate, ptab, Pcell_log, mtu, mpu)) < 0) {
+		p.peakrate.mpu = mpu;
+		if (tc_calc_rtable(&p.peakrate, ptab, Pcell_log, mtu) < 0) {
 			fprintf(stderr, "POLICE: failed to calculate peak rate table.\n");
 			return -1;
 		}
-		p.peakrate.cell_log = Pcell_log;
-		p.peakrate.mpu = mpu;
 	}
 
 	tail = NLMSG_TAIL(n);
diff --git a/tc/q_cbq.c b/tc/q_cbq.c
index f2b4ce8..df98312 100644
--- a/tc/q_cbq.c
+++ b/tc/q_cbq.c
@@ -137,12 +137,11 @@ static int cbq_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl
 	if (allot < (avpkt*3)/2)
 		allot = (avpkt*3)/2;
 
-	if ((cell_log = tc_calc_rtable(r.rate, rtab, cell_log, allot, mpu)) < 0) {
+	r.mpu = mpu;
+	if (tc_calc_rtable(&r, rtab, cell_log, allot) < 0) {
 		fprintf(stderr, "CBQ: failed to calculate rate table.\n");
 		return -1;
 	}
-	r.cell_log = cell_log;
-	r.mpu = mpu;
 
 	if (ewma_log < 0)
 		ewma_log = TC_CBQ_DEF_EWMA;
@@ -336,12 +335,11 @@ static int cbq_parse_class_opt(struct qdisc_util *qu, int argc, char **argv, str
 		unsigned pktsize = wrr.allot;
 		if (wrr.allot < (lss.avpkt*3)/2)
 			wrr.allot = (lss.avpkt*3)/2;
-		if ((cell_log = tc_calc_rtable(r.rate, rtab, cell_log, pktsize, mpu)) < 0) {
+		r.mpu = mpu;
+		if (tc_calc_rtable(&r, rtab, cell_log, pktsize) < 0) {
 			fprintf(stderr, "CBQ: failed to calculate rate table.\n");
 			return -1;
 		}
-		r.cell_log = cell_log;
-		r.mpu = mpu;
 	}
 	if (ewma_log < 0)
 		ewma_log = TC_CBQ_DEF_EWMA;
diff --git a/tc/q_htb.c b/tc/q_htb.c
index b579ebe..cca77fa 100644
--- a/tc/q_htb.c
+++ b/tc/q_htb.c
@@ -212,19 +212,17 @@ static int htb_parse_class_opt(struct qdisc_util *qu, int argc, char **argv, str
 	opt.ceil.mpu = mpu;
 	opt.rate.mpu = mpu;
 
-	if ((cell_log = tc_calc_rtable(opt.rate.rate, rtab, cell_log, mtu, mpu)) < 0) {
+	if (tc_calc_rtable(&opt.rate, rtab, cell_log, mtu) < 0) {
 		fprintf(stderr, "htb: failed to calculate rate table.\n");
 		return -1;
 	}
 	opt.buffer = tc_calc_xmittime(opt.rate.rate, buffer);
-	opt.rate.cell_log = cell_log;
 
-	if ((ccell_log = tc_calc_rtable(opt.ceil.rate, ctab, cell_log, mtu, mpu)) < 0) {
+	if (tc_calc_rtable(&opt.ceil, ctab, ccell_log, mtu) < 0) {
 		fprintf(stderr, "htb: failed to calculate ceil rate table.\n");
 		return -1;
 	}
 	opt.cbuffer = tc_calc_xmittime(opt.ceil.rate, cbuffer);
-	opt.ceil.cell_log = ccell_log;
 
 	tail = NLMSG_TAIL(n);
 	addattr_l(n, 1024, TCA_OPTIONS, NULL, 0);
diff --git a/tc/q_tbf.c b/tc/q_tbf.c
index 1fc05f4..c7b4f0f 100644
--- a/tc/q_tbf.c
+++ b/tc/q_tbf.c
@@ -170,21 +170,20 @@ static int tbf_parse_opt(struct qdisc_util *qu, int argc, char **argv, struct nl
 		opt.limit = lim;
 	}
 
-	if ((Rcell_log = tc_calc_rtable(opt.rate.rate, rtab, Rcell_log, mtu, mpu)) < 0) {
+	opt.rate.mpu = mpu;
+	if (tc_calc_rtable(&opt.rate, rtab, Rcell_log, mtu) < 0) {
 		fprintf(stderr, "TBF: failed to calculate rate table.\n");
 		return -1;
 	}
 	opt.buffer = tc_calc_xmittime(opt.rate.rate, buffer);
-	opt.rate.cell_log = Rcell_log;
-	opt.rate.mpu = mpu;
+
 	if (opt.peakrate.rate) {
-		if ((Pcell_log = tc_calc_rtable(opt.peakrate.rate, ptab, Pcell_log, mtu, mpu)) < 0) {
+		opt.peakrate.mpu = mpu;
+		if (tc_calc_rtable(&opt.peakrate, ptab, Pcell_log, mtu) < 0) {
 			fprintf(stderr, "TBF: failed to calculate peak rate table.\n");
 			return -1;
 		}
 		opt.mtu = tc_calc_xmittime(opt.peakrate.rate, mtu);
-		opt.peakrate.cell_log = Pcell_log;
-		opt.peakrate.mpu = mpu;
 	}
 
 	tail = NLMSG_TAIL(n);
diff --git a/tc/tc_core.c b/tc/tc_core.c
index 1ab0ba0..c713a18 100644
--- a/tc/tc_core.c
+++ b/tc/tc_core.c
@@ -69,10 +69,11 @@ unsigned tc_calc_xmitsize(unsigned rate, unsigned ticks)
    rtab[pkt_len>>cell_log] = pkt_xmit_time
  */
 
-int tc_calc_rtable(unsigned bps, __u32 *rtab, int cell_log, unsigned mtu,
-		   unsigned mpu)
+int tc_calc_rtable(struct tc_ratespec *r, __u32 *rtab, int cell_log, unsigned mtu)
 {
 	int i;
+	unsigned bps = r->rate;
+	unsigned mpu = r->mpu;
 
 	if (mtu == 0)
 		mtu = 2047;
@@ -88,6 +89,7 @@ int tc_calc_rtable(unsigned bps, __u32 *rtab, int cell_log, unsigned mtu,
 			sz = mpu;
 		rtab[i] = tc_calc_xmittime(bps, sz);
 	}
+	r->cell_log=cell_log;
 	return cell_log;
 }
 
diff --git a/tc/tc_core.h b/tc/tc_core.h
index a139da6..e98a7b4 100644
--- a/tc/tc_core.h
+++ b/tc/tc_core.h
@@ -13,7 +13,7 @@ long tc_core_time2ktime(long time);
 long tc_core_ktime2time(long ktime);
 unsigned tc_calc_xmittime(unsigned rate, unsigned size);
 unsigned tc_calc_xmitsize(unsigned rate, unsigned ticks);
-int tc_calc_rtable(unsigned bps, __u32 *rtab, int cell_log, unsigned mtu, unsigned mpu);
+int tc_calc_rtable(struct tc_ratespec *r, __u32 *rtab, int cell_log, unsigned mtu);
 
 int tc_setup_estimator(unsigned A, unsigned time_const, struct tc_estimator *est);

     prev parent reply	other threads:[~2007-09-05 13:58 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-31 12:22 [PATCH 2/2]: [NET_SCHED]: Making rate table lookups more flexible Jesper Dangaard Brouer
2007-09-01  7:10 ` Patrick McHardy
2007-09-01 21:56   ` Jesper Dangaard Brouer
2007-09-02 14:35     ` Patrick McHardy
2007-09-02 18:56       ` Jesper Dangaard Brouer
2007-09-02 21:16         ` Patrick McHardy
2007-09-03 14:19           ` Jesper Dangaard Brouer
2007-09-04 16:25             ` Patrick McHardy
2007-09-05 13:58               ` Jesper Dangaard Brouer [this message]

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:c713a18 dfblob:752b07c dfblob:5d2528b dfblob:acdfd22
dfblob:f2b4ce8 dfblob:df98312 dfblob:b579ebe dfblob:cca77fa
dfblob:1fc05f4 dfblob:c7b4f0f dfblob:1ab0ba0 dfblob:c713a18
dfblob:a139da6 dfblob:e98a7b4 )
 OR (
bs:"Re: [PATCH 2/2]: [NET_SCHED]: Making rate table lookups more	flexible." )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1189000698.28083.22.camel@localhost.localdomain \
    --to=jdb@comx.dk \
    --cc=hawk@diku.dk \
    --cc=kaber@trash.net \
    --cc=netdev@vger.kernel.org \
    --cc=shemminger@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).