* [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
@ 2006-10-16 23:34 Russell Stuart
2006-10-17 13:07 ` jamal
0 siblings, 1 reply; 25+ messages in thread
From: Russell Stuart @ 2006-10-16 23:34 UTC (permalink / raw)
To: netdev, David Miller
Cc: Jamal Hadi Salim, Patrick McHardy, Jesper Dangaard Brouer
The Linux traffic's control engine inaccurately calculates
transmission times for packets sent over ADSL links. For
some packet sizes the error rises to over 50%. This occurs
because ADSL uses ATM as its link layer transport, and ATM
transmits packets in fixed sized 53 byte cells.
This changes the kernel rate table lookup, to be able to lookup
packet transmission times over all ATM links, including ADSL,
with perfect accuracy. The accuracy is dependent on the rate
table that is calculated in userspace by iproute2 command tc.
A longer presentation of the patch, its rational, what it
does and how to use it can be found here:
http://www.stuart.id.au/russell/files/tc/tc-atm/
A earlier version of the patch, and a _detailed_ empirical
investigation of its effects can be found here:
http://www.adsl-optimizer.dk/
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: Russell Stuart <russell-tcatm@stuart.id.au>
---
diff -Nurp kernel-source-2.6.16.orig/include/linux/pkt_sched.h kernel-source-2.6.16/include/linux/pkt_sched.h
--- kernel-source-2.6.16.orig/include/linux/pkt_sched.h 2006-03-20 15:53:29.000000000 +1000
+++ kernel-source-2.6.16/include/linux/pkt_sched.h 2006-06-13 11:42:12.000000000 +1000
@@ -77,8 +77,9 @@ struct tc_ratespec
{
unsigned char cell_log;
unsigned char __reserved;
- unsigned short feature;
- short addend;
+ unsigned short feature; /* Always 0 in pre-atm patch kernels */
+ char cell_align; /* Always 0 in pre-atm patch kernels */
+ unsigned char __unused;
unsigned short mpu;
__u32 rate;
};
diff -Nurp kernel-source-2.6.16.orig/include/net/sch_generic.h kernel-source-2.6.16/include/net/sch_generic.h
--- kernel-source-2.6.16.orig/include/net/sch_generic.h 2006-03-20 15:53:29.000000000 +1000
+++ kernel-source-2.6.16/include/net/sch_generic.h 2006-06-13 11:42:12.000000000 +1000
@@ -307,4 +307,18 @@ drop:
return NET_XMIT_DROP;
}
+/* Lookup a qdisc_rate_table to determine how long it will take to send a
+ packet given its size.
+ */
+static inline u32 qdisc_l2t(struct qdisc_rate_table* rtab, int pktlen)
+{
+ int slot = pktlen + rtab->rate.cell_align;
+ if (slot < 0)
+ slot = 0;
+ slot >>= rtab->rate.cell_log;
+ if (slot > 255)
+ return rtab->data[255] + 1;
+ return rtab->data[slot];
+}
+
#endif
diff -Nurp kernel-source-2.6.16.orig/net/sched/act_police.c kernel-source-2.6.16/net/sched/act_police.c
--- kernel-source-2.6.16.orig/net/sched/act_police.c 2006-03-20 15:53:29.000000000 +1000
+++ kernel-source-2.6.16/net/sched/act_police.c 2006-06-13 11:42:12.000000000 +1000
@@ -33,8 +33,8 @@
#include <net/sock.h>
#include <net/act_api.h>
-#define L2T(p,L) ((p)->R_tab->data[(L)>>(p)->R_tab->rate.cell_log])
-#define L2T_P(p,L) ((p)->P_tab->data[(L)>>(p)->P_tab->rate.cell_log])
+#define L2T(p,L) qdisc_l2t((p)->R_tab,L)
+#define L2T_P(p,L) qdisc_l2t((p)->P_tab,L)
#define PRIV(a) ((struct tcf_police *) (a)->priv)
/* use generic hash table */
diff -Nurp kernel-source-2.6.16.orig/net/sched/sch_cbq.c kernel-source-2.6.16/net/sched/sch_cbq.c
--- kernel-source-2.6.16.orig/net/sched/sch_cbq.c 2006-03-20 15:53:29.000000000 +1000
+++ kernel-source-2.6.16/net/sched/sch_cbq.c 2006-06-13 11:42:12.000000000 +1000
@@ -193,7 +193,7 @@ struct cbq_sched_data
};
-#define L2T(cl,len) ((cl)->R_tab->data[(len)>>(cl)->R_tab->rate.cell_log])
+#define L2T(cl,len) qdisc_l2t((cl)->R_tab,len)
static __inline__ unsigned cbq_hash(u32 h)
diff -Nurp kernel-source-2.6.16.orig/net/sched/sch_htb.c kernel-source-2.6.16/net/sched/sch_htb.c
--- kernel-source-2.6.16.orig/net/sched/sch_htb.c 2006-03-20 15:53:29.000000000 +1000
+++ kernel-source-2.6.16/net/sched/sch_htb.c 2006-06-13 11:42:12.000000000 +1000
@@ -206,12 +206,10 @@ struct htb_class
static __inline__ long L2T(struct htb_class *cl,struct qdisc_rate_table *rate,
int size)
{
- int slot = size >> rate->rate.cell_log;
- if (slot > 255) {
+ long result = qdisc_l2t(rate, size);
+ if (result > rate->data[255])
cl->xstats.giants++;
- slot = 255;
- }
- return rate->data[slot];
+ return result;
}
struct htb_sched
diff -Nurp kernel-source-2.6.16.orig/net/sched/sch_tbf.c kernel-source-2.6.16/net/sched/sch_tbf.c
--- kernel-source-2.6.16.orig/net/sched/sch_tbf.c 2006-03-20 15:53:29.000000000 +1000
+++ kernel-source-2.6.16/net/sched/sch_tbf.c 2006-06-13 11:42:12.000000000 +1000
@@ -132,8 +132,8 @@ struct tbf_sched_data
struct Qdisc *qdisc; /* Inner qdisc, default - bfifo queue */
};
-#define L2T(q,L) ((q)->R_tab->data[(L)>>(q)->R_tab->rate.cell_log])
-#define L2T_P(q,L) ((q)->P_tab->data[(L)>>(q)->P_tab->rate.cell_log])
+#define L2T(q,L) qdisc_l2t((q)->R_tab,L)
+#define L2T_P(q,L) qdisc_l2t((q)->P_tab,L)
static int tbf_enqueue(struct sk_buff *skb, struct Qdisc* sch)
{
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
2006-10-16 23:34 [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel) Russell Stuart
@ 2006-10-17 13:07 ` jamal
2006-10-19 3:41 ` David Miller
2006-10-19 14:38 ` Patrick McHardy
0 siblings, 2 replies; 25+ messages in thread
From: jamal @ 2006-10-17 13:07 UTC (permalink / raw)
To: Russell Stuart
Cc: netdev, David Miller, Patrick McHardy, Jesper Dangaard Brouer
On Tue, 2006-17-10 at 09:34 +1000, Russell Stuart wrote:
> The Linux traffic's control engine inaccurately calculates
> transmission times for packets sent over ADSL links. For
> some packet sizes the error rises to over 50%. This occurs
> because ADSL uses ATM as its link layer transport, and ATM
> transmits packets in fixed sized 53 byte cells.
>
> This changes the kernel rate table lookup, to be able to lookup
> packet transmission times over all ATM links, including ADSL,
> with perfect accuracy. The accuracy is dependent on the rate
> table that is calculated in userspace by iproute2 command tc.
>
> A longer presentation of the patch, its rational, what it
> does and how to use it can be found here:
> http://www.stuart.id.au/russell/files/tc/tc-atm/
>
> A earlier version of the patch, and a _detailed_ empirical
> investigation of its effects can be found here:
> http://www.adsl-optimizer.dk/
>
> Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
> Signed-off-by: Russell Stuart <russell-tcatm@stuart.id.au>
ACKed-by: Jamal Hadi Salim
When Patrick has his patch ready after this goes in we can revisit.
cheers,
jamal
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
2006-10-17 13:07 ` jamal
@ 2006-10-19 3:41 ` David Miller
2006-10-19 14:38 ` Patrick McHardy
1 sibling, 0 replies; 25+ messages in thread
From: David Miller @ 2006-10-19 3:41 UTC (permalink / raw)
To: hadi; +Cc: russell-tcatm, netdev, kaber, hawk
From: jamal <hadi@cyberus.ca>
Date: Tue, 17 Oct 2006 09:07:24 -0400
> ACKed-by: Jamal Hadi Salim
>
> When Patrick has his patch ready after this goes in we can revisit.
If anything it's too late to put this into 2.6.19 so I'll likely
therefore toss it into net-2.6.20 whenever I open that tree up.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
2006-10-17 13:07 ` jamal
2006-10-19 3:41 ` David Miller
@ 2006-10-19 14:38 ` Patrick McHardy
2006-10-20 0:49 ` jamal
2006-10-23 11:22 ` Russell Stuart
1 sibling, 2 replies; 25+ messages in thread
From: Patrick McHardy @ 2006-10-19 14:38 UTC (permalink / raw)
To: hadi; +Cc: Russell Stuart, netdev, David Miller, Jesper Dangaard Brouer
jamal wrote:
> ACKed-by: Jamal Hadi Salim
>
> When Patrick has his patch ready after this goes in we can revisit.
NACK.
I still think this patch shouldn't go in. There's no point in doing the
same thing twice, and I haven't heard a compelling argument why it has
to be done in a way that only helps qdiscs using rtabs while ignoring
statistics and estimators (I even provided a patch to show how to do
it without these limitations).
Besides that:
+static inline u32 qdisc_l2t(struct qdisc_rate_table* rtab, int pktlen)
+{
+ int slot = pktlen + rtab->rate.cell_align;
+ if (slot < 0)
+ slot = 0;
Why would it go negative? A negative cell_align doesn't make sense I
guess.
+ slot >>= rtab->rate.cell_log;
+ if (slot > 255)
+ return rtab->data[255] + 1;
Whats the point of this? Is it just to keep htb giant statistics
working?
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
2006-10-19 14:38 ` Patrick McHardy
@ 2006-10-20 0:49 ` jamal
2006-10-20 8:54 ` Patrick McHardy
2006-10-23 11:22 ` Russell Stuart
1 sibling, 1 reply; 25+ messages in thread
From: jamal @ 2006-10-20 0:49 UTC (permalink / raw)
To: Patrick McHardy
Cc: Russell Stuart, netdev, David Miller, Jesper Dangaard Brouer
On Thu, 2006-19-10 at 16:38 +0200, Patrick McHardy wrote:
> jamal wrote:
> > ACKed-by: Jamal Hadi Salim
> >
> > When Patrick has his patch ready after this goes in we can revisit.
>
> NACK.
>
> I still think this patch shouldn't go in. There's no point in doing the
> same thing twice, and I haven't heard a compelling argument why it has
> to be done in a way that only helps qdiscs using rtabs while ignoring
> statistics and estimators (I even provided a patch to show how to do
> it without these limitations).
The poor guy has been persistent and has some good ideas and we need to
encourage him to stick around. Why dont you help him get the patch in
the shape you think is reasonable? I know you are busy elsewhere and
your patch has been a while since you last promised. I will try to help
as well.
> Besides that:
I will let Russell respond to the critique (As well as the concept of
stats and estimator); Russell please try to be brief and to the point (I
still have to learn that lesson myself ;->).
cheers,
jamal
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
2006-10-20 0:49 ` jamal
@ 2006-10-20 8:54 ` Patrick McHardy
0 siblings, 0 replies; 25+ messages in thread
From: Patrick McHardy @ 2006-10-20 8:54 UTC (permalink / raw)
To: hadi; +Cc: Russell Stuart, netdev, David Miller, Jesper Dangaard Brouer
jamal wrote:
> The poor guy has been persistent and has some good ideas and we need to
> encourage him to stick around. Why dont you help him get the patch in
> the shape you think is reasonable? I know you are busy elsewhere and
> your patch has been a while since you last promised. I will try to help
> as well.
I have explained what I would like to see several times and sent
a patch to demonstrate it. I'd be happy to help further (I would
like to see this go in eventually), but there's nothing I can do
besides repeating myself as long as we only get to see the same
patch again and again.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
2006-10-19 14:38 ` Patrick McHardy
2006-10-20 0:49 ` jamal
@ 2006-10-23 11:22 ` Russell Stuart
2006-10-23 12:39 ` Patrick McHardy
1 sibling, 1 reply; 25+ messages in thread
From: Russell Stuart @ 2006-10-23 11:22 UTC (permalink / raw)
To: Patrick McHardy; +Cc: hadi, netdev, David Miller, Jesper Dangaard Brouer
On Thu, 2006-10-19 at 16:38 +0200, Patrick McHardy wrote:
> I still think this patch shouldn't go in. There's no point in doing the
> same thing twice, and I haven't heard a compelling argument why it has
> to be done in a way that only helps qdiscs using rtabs while ignoring
> statistics and estimators (I even provided a patch to show how to do
> it without these limitations).
As far as I can see one patch changes the way the
kernel calculates packet lengths, and the other modifies
RTAB. I can not see the significant overlap between the
patches that you talk about.
As for why haven't got a new patch from me that addresses
the "doing the same thing twice" issue - it is because I
can see no such issue. I have asked you repeatedly to
explain it further, but you have not done so.
As for "providing a patch" - I believe at the time you
called it something you were working on. As far as I
know you still are working on it. Besides, even if you
did intend me to take it further, I am not particularly
interested in the packet length issue as it does not
effect the real world performance of any of the qdiscs
I use.
> Besides that:
>
> +static inline u32 qdisc_l2t(struct qdisc_rate_table* rtab, int pktlen)
> +{
> + int slot = pktlen + rtab->rate.cell_align;
> + if (slot < 0)
> + slot = 0;
>
> Why would it go negative? A negative cell_align doesn't make sense I
> guess.
A negative cell align is possible, and in fact is typical.
If slot ended up being less than 0 then the configuration
parameters passed to "tc" would of been wrong - they could
not of matched the actual traffic. The error can't be
detected in "tc", but it can't be allowed to cause the
kernel to index off the end of an array either.
> + slot >>= rtab->rate.cell_log;
> + if (slot > 255)
> + return rtab->data[255] + 1;
>
> Whats the point of this? Is it just to keep htb giant statistics
> working?
Yes.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
2006-10-23 11:22 ` Russell Stuart
@ 2006-10-23 12:39 ` Patrick McHardy
2006-10-23 21:54 ` Russell Stuart
0 siblings, 1 reply; 25+ messages in thread
From: Patrick McHardy @ 2006-10-23 12:39 UTC (permalink / raw)
To: Russell Stuart; +Cc: hadi, netdev, David Miller, Jesper Dangaard Brouer
Russell Stuart wrote:
> On Thu, 2006-10-19 at 16:38 +0200, Patrick McHardy wrote:
>
>>I still think this patch shouldn't go in. There's no point in doing the
>>same thing twice, and I haven't heard a compelling argument why it has
>>to be done in a way that only helps qdiscs using rtabs while ignoring
>>statistics and estimators (I even provided a patch to show how to do
>>it without these limitations).
>
>
> As far as I can see one patch changes the way the
> kernel calculates packet lengths, and the other modifies
> RTAB. I can not see the significant overlap between the
> patches that you talk about.
The implementation may be different, but the intention and the
result is the same. I probably would mind less if it wouldn't
affect userspace compatibility, but we need to carry this stuff
for ever even if we add another implementation that covers all
qdiscs.
> As for why haven't got a new patch from me that addresses
> the "doing the same thing twice" issue - it is because I
> can see no such issue. I have asked you repeatedly to
> explain it further, but you have not done so.
What do I need to explain further? As I stated several times,
I would like to see a patch that handles all qdiscs. And it
should probably not have hardcoded ATM values, there is other
media that behaves similar.
> As for "providing a patch" - I believe at the time you
> called it something you were working on. As far as I
> know you still are working on it. Besides, even if you
> did intend me to take it further, I am not particularly
> interested in the packet length issue as it does not
> effect the real world performance of any of the qdiscs
> I use.
No, I'm not working on it currently, it was more meant for
demonstration purposes. If you're not interested in taking
the opinion of people working on the code into account thats
your problem.
>>Besides that:
>>
>>+static inline u32 qdisc_l2t(struct qdisc_rate_table* rtab, int pktlen)
>>+{
>>+ int slot = pktlen + rtab->rate.cell_align;
>>+ if (slot < 0)
>>+ slot = 0;
>>
>>Why would it go negative? A negative cell_align doesn't make sense I
>>guess.
>
>
> A negative cell align is possible, and in fact is typical.
> If slot ended up being less than 0 then the configuration
> parameters passed to "tc" would of been wrong - they could
> not of matched the actual traffic. The error can't be
> detected in "tc", but it can't be allowed to cause the
> kernel to index off the end of an array either.
I'm not sure I understand what you're saying here. The transmission
time gets _smaller_ by transmitting over ATM? Does this have anything
to do with the off-by-one during rate table calculation you or
Jesper noticed?
>>+ slot >>= rtab->rate.cell_log;
>>+ if (slot > 255)
>>+ return rtab->data[255] + 1;
>>
>>Whats the point of this? Is it just to keep htb giant statistics
>>working?
>
>
> Yes.
TBF and policers already make sure this can never happen, this is
what HTB should do as well. If it is also needed for CBQ it is
a bugfix and should be done seperately.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
2006-10-23 12:39 ` Patrick McHardy
@ 2006-10-23 21:54 ` Russell Stuart
2006-10-24 16:19 ` Patrick McHardy
0 siblings, 1 reply; 25+ messages in thread
From: Russell Stuart @ 2006-10-23 21:54 UTC (permalink / raw)
To: Patrick McHardy; +Cc: hadi, netdev, David Miller, Jesper Dangaard Brouer
On Mon, 2006-10-23 at 14:39 +0200, Patrick McHardy wrote:
> The implementation may be different, but the intention and the
> result is the same. I probably would mind less if it wouldn't
> affect userspace compatibility, but we need to carry this stuff
> for ever even if we add another implementation that covers all
> qdiscs.
So where is the overlap? Your patch will make qdiscs
that use packet length work with ATM. Ours makes the
rest, ie those that use Alexy's RTAB, work with ATM.
They would probably both apply with minimal conflicts.
You have previously said you have no intention of
changing the current RTAB implantation with your STAB
patch.
As an aside non-work conserving qdisc's that do
scheduling are the real targets of ATM patch. The
rest are not effected by ATM overly. The only one
of those that doesn't use Alexy's RTAB is the one you
introduced - HFSC. You are the best person to fix
things so HFSC does work with ATM, and that is what
I thought you were doing with the STAB patch.
> What do I need to explain further? As I stated several times,
> I would like to see a patch that handles all qdiscs. And it
> should probably not have hardcoded ATM values, there is other
> media that behaves similar.
I am not aware of other media that behaves in a similar
way, although I am no expert. What have I missed? The
hard coded ATM values don't effect this patch btw, they
are a user space thing only.
> >>Besides that:
> >>
> >>+static inline u32 qdisc_l2t(struct qdisc_rate_table* rtab, int pktlen)
> >>+{
> >>+ int slot = pktlen + rtab->rate.cell_align;
> >>+ if (slot < 0)
> >>+ slot = 0;
> >>
> >>Why would it go negative? A negative cell_align doesn't make sense I
> >>guess.
> >
> >
> > A negative cell align is possible, and in fact is typical.
> > If slot ended up being less than 0 then the configuration
> > parameters passed to "tc" would of been wrong - they could
> > not of matched the actual traffic. The error can't be
> > detected in "tc", but it can't be allowed to cause the
> > kernel to index off the end of an array either.
>
> I'm not sure I understand what you're saying here. The transmission
> time gets _smaller_ by transmitting over ATM? Does this have anything
> to do with the off-by-one during rate table calculation you or
> Jesper noticed?
There is nothing I would describe as an "off-by-one
error" in the RTAB calculation, so I can't be sure
what you are referring to. The packetisation done
by ATM does introduce rounding / truncation errors
of up to ((1 << cell_log) - 1). Negative cell
alignments is the easiest way to fix that.
> >>+ slot >>= rtab->rate.cell_log;
> >>+ if (slot > 255)
> >>+ return rtab->data[255] + 1;
> >>
> >>Whats the point of this? Is it just to keep htb giant statistics
> >>working?
> >
> >
> > Yes.
>
> TBF and policers already make sure this can never happen, this is
> what HTB should do as well. If it is also needed for CBQ it is
> a bugfix and should be done seperately.
OK. I will change it.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
2006-10-23 21:54 ` Russell Stuart
@ 2006-10-24 16:19 ` Patrick McHardy
2006-10-24 20:00 ` Jesper Dangaard Brouer
2006-10-24 23:46 ` Russell Stuart
0 siblings, 2 replies; 25+ messages in thread
From: Patrick McHardy @ 2006-10-24 16:19 UTC (permalink / raw)
To: Russell Stuart; +Cc: hadi, netdev, David Miller, Jesper Dangaard Brouer
Russell Stuart wrote:
> On Mon, 2006-10-23 at 14:39 +0200, Patrick McHardy wrote:
>
>>The implementation may be different, but the intention and the
>>result is the same. I probably would mind less if it wouldn't
>>affect userspace compatibility, but we need to carry this stuff
>>for ever even if we add another implementation that covers all
>>qdiscs.
>
>
> So where is the overlap? Your patch will make qdiscs
> that use packet length work with ATM. Ours makes the
> rest, ie those that use Alexy's RTAB, work with ATM.
> They would probably both apply with minimal conflicts.
> You have previously said you have no intention of
> changing the current RTAB implantation with your STAB
> patch.
No, my patch works for qdiscs with and without RTABs, this
is where they overlap.
> As an aside non-work conserving qdisc's that do
> scheduling are the real targets of ATM patch. The
> rest are not effected by ATM overly. The only one
> of those that doesn't use Alexy's RTAB is the one you
> introduced - HFSC. You are the best person to fix
> things so HFSC does work with ATM, and that is what
> I thought you were doing with the STAB patch.
No, as we already discussed, SFQ uses the packet size for
calculating remaining quanta, and fairness would increase
if the real transmission size (and time) were used. RED
uses the backlog size to calculate the drop probabilty
(and supports attaching inner qdiscs nowadays), so keeping
accurate backlog statistics seems to be a win as well
(besides their use for estimators). It is also possible
to specify the maximum latency for TBF instead of a byte
limit (which is passed as max. backlog value to the inner
bfifo qdisc), this would also need accurate backlog statistics.
>>What do I need to explain further? As I stated several times,
>>I would like to see a patch that handles all qdiscs. And it
>>should probably not have hardcoded ATM values, there is other
>>media that behaves similar.
>
>
> I am not aware of other media that behaves in a similar
> way, although I am no expert.
Ethernet, VLAN, Tunnels, ... its especially useful for tunnels
if you also shape on the underlying device since the qdisc
on the tunnel device and the qdisc on the underlying device
should ideally be in sync (otherwise no accurate bandwidth
reservation is possible).
> What have I missed? The
> hard coded ATM values don't effect this patch btw, they
> are a user space thing only.
Sure, I'm just mentioning it. Is seems to be quite deeply codified
in userspace though.
>>>A negative cell align is possible, and in fact is typical.
>>>If slot ended up being less than 0 then the configuration
>>>parameters passed to "tc" would of been wrong - they could
>>>not of matched the actual traffic. The error can't be
>>>detected in "tc", but it can't be allowed to cause the
>>>kernel to index off the end of an array either.
>>
>>I'm not sure I understand what you're saying here. The transmission
>>time gets _smaller_ by transmitting over ATM? Does this have anything
>>to do with the off-by-one during rate table calculation you or
>>Jesper noticed?
>
>
> There is nothing I would describe as an "off-by-one
> error" in the RTAB calculation, so I can't be sure
> what you are referring to.
Either you or Jesper pointed to this code in iproute:
for (i=0; i<256; i++) {
unsigned sz = (i<<cell_log);
...
rtab[i] = tc_core_usec2tick(1000000*((double)sz/bps));
which tends to underestimate the transmission time by using
the smallest possible size for each cell.
> The packetisation done
> by ATM does introduce rounding / truncation errors
> of up to ((1 << cell_log) - 1). Negative cell
> alignments is the easiest way to fix that.
Doesn't this cause underestimates as well for minimum sized (within
a cell size) packets? The error is the same as the one I mentioned
above (apparently in the opposite direction though), which is why I
thought this might be related.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
2006-10-24 16:19 ` Patrick McHardy
@ 2006-10-24 20:00 ` Jesper Dangaard Brouer
2006-10-24 23:46 ` Russell Stuart
1 sibling, 0 replies; 25+ messages in thread
From: Jesper Dangaard Brouer @ 2006-10-24 20:00 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Russell Stuart, hadi, netdev, David Miller
On Tue, 24 Oct 2006, Patrick McHardy wrote:
> Russell Stuart wrote:
>> On Mon, 2006-10-23 at 14:39 +0200, Patrick McHardy wrote:
>>
>>> The implementation may be different, but the intention and the
>>> result is the same. I probably would mind less if it wouldn't
>>> affect userspace compatibility, but we need to carry this stuff
>>> for ever even if we add another implementation that covers all
>>> qdiscs.
>>
>>
>> So where is the overlap? Your patch will make qdiscs
>> that use packet length work with ATM. Ours makes the
>> rest, ie those that use Alexy's RTAB, work with ATM.
>> They would probably both apply with minimal conflicts.
>> You have previously said you have no intention of
>> changing the current RTAB implantation with your STAB
>> patch.
>
> No, my patch works for qdiscs with and without RTABs, this
> is where they overlap.
Patrick, your patch does not solve the alignment issue of the RTABs.
(The kernel part of) our patch (Me and Russells), simply gives the ability
to align the RTAB. We will still need this ability when your STAB patch
is in place.
<cut>
>
> Sure, I'm just mentioning it. Is seems to be quite deeply codified
> in userspace though.
Yes, the userspace patch is where we do the "ugly" complicated stuff.
<cut>
>>> I'm not sure I understand what you're saying here. The transmission
>>> time gets _smaller_ by transmitting over ATM? Does this have anything
>>> to do with the off-by-one during rate table calculation you or
>>> Jesper noticed?
>>
>>
>> There is nothing I would describe as an "off-by-one
>> error" in the RTAB calculation, so I can't be sure
>> what you are referring to.
>
> Either you or Jesper pointed to this code in iproute:
It was me, Jesper.
> for (i=0; i<256; i++) {
> unsigned sz = (i<<cell_log);
> ...
> rtab[i] = tc_core_usec2tick(1000000*((double)sz/bps));
>
> which tends to underestimate the transmission time by using
> the smallest possible size for each cell.
Correct.
My original patch only corrected this "off-by-one error" (and parsed on an
overhead argument to the kernel).
Russell, took the step further and made it possible to adjust the
alignment from userspace.
<cut>
Cheers,
Jesper Brouer
--
-------------------------------------------------------------------
MSc. Master of Computer Science
Dept. of Computer Science, University of Copenhagen
Author of http://www.adsl-optimizer.dk
-------------------------------------------------------------------
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
2006-10-24 16:19 ` Patrick McHardy
2006-10-24 20:00 ` Jesper Dangaard Brouer
@ 2006-10-24 23:46 ` Russell Stuart
2006-11-30 13:07 ` Patrick McHardy
1 sibling, 1 reply; 25+ messages in thread
From: Russell Stuart @ 2006-10-24 23:46 UTC (permalink / raw)
To: Patrick McHardy; +Cc: hadi, netdev, David Miller, Jesper Dangaard Brouer
On Tue, 2006-10-24 at 18:19 +0200, Patrick McHardy wrote:
> No, my patch works for qdiscs with and without RTABs, this
> is where they overlap.
Could you explain how this works? I didn't see how
qdiscs that used RTAB to measure rates of transmission
could use your STAB to do the same thing. At least not
without substantial modifications to your patch.
> > As an aside non-work conserving qdisc's that do
> > scheduling are the real targets of ATM patch. The
> > rest are not effected by ATM overly. The only one
> > of those that doesn't use Alexy's RTAB is the one you
> > introduced - HFSC. You are the best person to fix
> > things so HFSC does work with ATM, and that is what
> > I thought you were doing with the STAB patch.
>
> No, as we already discussed, SFQ uses the packet size for
> calculating remaining quanta, and fairness would increase
> if the real transmission size (and time) were used. RED
> uses the backlog size to calculate the drop probabilty
> (and supports attaching inner qdiscs nowadays), so keeping
> accurate backlog statistics seems to be a win as well
> (besides their use for estimators). It is also possible
> to specify the maximum latency for TBF instead of a byte
> limit (which is passed as max. backlog value to the inner
> bfifo qdisc), this would also need accurate backlog statistics.
This is all beside the point if you can show how
you patch gets rid of RTAB - currently I am acting
under the assumption it doesn't. If it does you
get all you describe for free.
Otherwise - yes, you are correct. The ATM patch does
not introduce accurate packet lengths into the kernel,
which is what is required to give the benefits you
describe. But that was never the ATM patches goal.
The ATM patch gives accurate rate calculations for ATM
links, nothing more. Accurate packet length calculations
is apparently the goal of your patch, and I wish you
luck with it.
> Ethernet, VLAN, Tunnels, ... its especially useful for tunnels
> if you also shape on the underlying device since the qdisc
> on the tunnel device and the qdisc on the underlying device
> should ideally be in sync (otherwise no accurate bandwidth
> reservation is possible).
Hmmm - not as far as I am aware. In all those cases
the IP layer breaks up the data into MTU sized packets
before they get to the scheduler. ATM is the only
technology I am known of where setting the MTU to be
bigger than the end to end link can support is normal.
> > What have I missed? The
> > hard coded ATM values don't effect this patch btw, they
> > are a user space thing only.
>
> Sure, I'm just mentioning it. Is seems to be quite deeply codified
> in userspace though.
We will have to disagree on that. Jesper and I had the
discussion. It came down to using a #define, or letting
it be defined on the command line. I actually wrote
code for both. In the end I decided the command line
option was a waste of time. The difference in lines of
code is not huge, however.
> Either you or Jesper pointed to this code in iproute:
>
> for (i=0; i<256; i++) {
> unsigned sz = (i<<cell_log);
> ...
> rtab[i] = tc_core_usec2tick(1000000*((double)sz/bps));
>
> which tends to underestimate the transmission time by using
> the smallest possible size for each cell.
This is going to be long, Patrick. I know you don't
like long emails, so if you don't have the patience
for it stop reading now. The remainder of this email
addresses this one point.
Firstly, yes you are correct. It will under some
circumstances underestimate the number of cells it
takes to send a packet. The reason is because the
whole aim of the ATM patch was to make maximum use
of the ATM link, while at the same time keeping
control of scheduling decisions. To keep control of
scheduling decisions, we must _never_ overestimate
the speed of the link. If we do the ISP will take
control of the scheduling.
At first sight this seems a minor issue. Its not, because
the error can be large. An example of overestimating the
link speed would be were one RTAB entry covers both the
2 and 3 cell cases. If we say the IP packet is going to
use 2 cells, and in fact it uses 3, then the error is 50%.
This is a huge error, and in fact eliminating this error
is the whole point of the ATM patch.
As an example of its impact, I was trying to make VOIP
work over a shared link. If the ISP starts making the
scheduling decisions then VOIP packets start being
dropped or delayed, rendering VOIP unusable. So in
order to use VOIP on the link I have to understate the
link capacity by 50%. As it happens, VOIP generates a
stream of packets in the 2-3 cell size range, the actual
size depending on the codec negotiated by the end points.
Jesper in his thesis gives perhaps an more important
example what happens if you overestimate the link speed.
It turns out in interacts with TCP's flow control badly,
slowing down all TCP flows over the link. The reasons
are subtle so I won't go into it here. But the end
result is if you overestimate the link speed and let the
ISP do the scheduling, you end up under-utilising the
ATM link.
So in the ATM patch there is a deliberate design decision -
we always assign an RTAB entry the smallest cell size it
covers. Originally Jesper and I wrote our own versions
of the ATM patch independently, and we both made the same
design decision - I presume for the same reason.
Secondly, and possibly more importantly, the ATM patch is
designed so that a single RTAB entry always covers exactly
one cell size. So on a patched kernel the underestimate
never occurs - the rate returned by the RTAB is always
exactly correct. In fact, that aspect of it seems to cause
you the most trouble - the off by one error and so on. The
code you point out is only there so the new version of "tc"
also works as well as it can for non-patched kernels.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
2006-10-24 23:46 ` Russell Stuart
@ 2006-11-30 13:07 ` Patrick McHardy
2007-01-17 23:07 ` Russell Stuart
0 siblings, 1 reply; 25+ messages in thread
From: Patrick McHardy @ 2006-11-30 13:07 UTC (permalink / raw)
To: Russell Stuart; +Cc: hadi, netdev, David Miller, Jesper Dangaard Brouer
First, sorry for letting you wait so long ..
Russell Stuart wrote:
> On Tue, 2006-10-24 at 18:19 +0200, Patrick McHardy wrote:
>
>>No, my patch works for qdiscs with and without RTABs, this
>>is where they overlap.
>
>
> Could you explain how this works? I didn't see how
> qdiscs that used RTAB to measure rates of transmission
> could use your STAB to do the same thing. At least not
> without substantial modifications to your patch.
Qdiscs don't use RTABs to measure rates but to calculate
transmission times. Transmission time is always related
to the length, the difference between our patches is that
you modify the RTABs in advance to include the overhead
in the calculation, my patch changes the length used to
look up the transmission time. Which works with or
without RTABs.
>>No, as we already discussed, SFQ uses the packet size for
>>calculating remaining quanta, and fairness would increase
>>if the real transmission size (and time) were used. RED
>>uses the backlog size to calculate the drop probabilty
>>(and supports attaching inner qdiscs nowadays), so keeping
>>accurate backlog statistics seems to be a win as well
>>(besides their use for estimators). It is also possible
>>to specify the maximum latency for TBF instead of a byte
>>limit (which is passed as max. backlog value to the inner
>>bfifo qdisc), this would also need accurate backlog statistics.
>
>
> This is all beside the point if you can show how
> you patch gets rid of RTAB - currently I am acting
> under the assumption it doesn't. If it does you
> get all you describe for free.
Why?
> Otherwise - yes, you are correct. The ATM patch does
> not introduce accurate packet lengths into the kernel,
> which is what is required to give the benefits you
> describe. But that was never the ATM patches goal.
> The ATM patch gives accurate rate calculations for ATM
> links, nothing more. Accurate packet length calculations
> is apparently the goal of your patch, and I wish you
> luck with it.
Again, its not rate calculations but transmission time
calculations, which _are a function of the length_.
>>Ethernet, VLAN, Tunnels, ... its especially useful for tunnels
>>if you also shape on the underlying device since the qdisc
>>on the tunnel device and the qdisc on the underlying device
>>should ideally be in sync (otherwise no accurate bandwidth
>>reservation is possible).
>
>
> Hmmm - not as far as I am aware. In all those cases
> the IP layer breaks up the data into MTU sized packets
> before they get to the scheduler. ATM is the only
> technology I am known of where setting the MTU to be
> bigger than the end to end link can support is normal.
Thats not the point. If I want to do scheduling on the
ipip device and on the underlying device at the same
time I need to reserve the amount of bandwidth given to
the ipip device + the bandwidth uses for encapsulation
on the underlying device. The easy way to do this is
to use the same amount of bandwidth on both devices
and make the scheduler on the ipip device aware that
some overhead is going to be added. The hard way is
to calculate the worst case (bandwidth / minimum packet
size * overhead per packet) and add that on the
underlying device.
>>Either you or Jesper pointed to this code in iproute:
>>
>> for (i=0; i<256; i++) {
>> unsigned sz = (i<<cell_log);
>>...
>> rtab[i] = tc_core_usec2tick(1000000*((double)sz/bps));
>>
>>which tends to underestimate the transmission time by using
>>the smallest possible size for each cell.
>
>
> Firstly, yes you are correct. It will under some
> circumstances underestimate the number of cells it
> takes to send a packet. The reason is because the
> whole aim of the ATM patch was to make maximum use
> of the ATM link, while at the same time keeping
> control of scheduling decisions. To keep control of
> scheduling decisions, we must _never_ overestimate
> the speed of the link. If we do the ISP will take
> control of the scheduling.
Underestimating the transmission time is equivalent to
overestimating the rate.
> At first sight this seems a minor issue. Its not, because
> the error can be large. An example of overestimating the
> link speed would be were one RTAB entry covers both the
> 2 and 3 cell cases. If we say the IP packet is going to
> use 2 cells, and in fact it uses 3, then the error is 50%.
> This is a huge error, and in fact eliminating this error
> is the whole point of the ATM patch.
>
> As an example of its impact, I was trying to make VOIP
> work over a shared link. If the ISP starts making the
> scheduling decisions then VOIP packets start being
> dropped or delayed, rendering VOIP unusable. So in
> order to use VOIP on the link I have to understate the
> link capacity by 50%. As it happens, VOIP generates a
> stream of packets in the 2-3 cell size range, the actual
> size depending on the codec negotiated by the end points.
>
> Jesper in his thesis gives perhaps an more important
> example what happens if you overestimate the link speed.
> It turns out in interacts with TCP's flow control badly,
> slowing down all TCP flows over the link. The reasons
> are subtle so I won't go into it here. But the end
> result is if you overestimate the link speed and let the
> ISP do the scheduling, you end up under-utilising the
> ATM link.
>
> So in the ATM patch there is a deliberate design decision -
> we always assign an RTAB entry the smallest cell size it
> covers. Originally Jesper and I wrote our own versions
> of the ATM patch independently, and we both made the same
> design decision - I presume for the same reason.
>
> Secondly, and possibly more importantly, the ATM patch is
> designed so that a single RTAB entry always covers exactly
> one cell size. So on a patched kernel the underestimate
> never occurs - the rate returned by the RTAB is always
> exactly correct. In fact, that aspect of it seems to cause
> you the most trouble - the off by one error and so on. The
> code you point out is only there so the new version of "tc"
> also works as well as it can for non-patched kernels.
I'm not really convinced, but I mostly lost interest in this
in the mean time, so let me retract my NACK and let others
decide.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
2006-11-30 13:07 ` Patrick McHardy
@ 2007-01-17 23:07 ` Russell Stuart
2007-01-18 4:05 ` Patrick McHardy
0 siblings, 1 reply; 25+ messages in thread
From: Russell Stuart @ 2007-01-17 23:07 UTC (permalink / raw)
To: Patrick McHardy; +Cc: hadi, netdev, David Miller, Jesper Dangaard Brouer
On Thu, 2006-11-30 at 14:07 +0100, Patrick McHardy wrote:
> Qdiscs don't use RTABs to measure rates but to calculate
> transmission times. Transmission time is always related
> to the length, the difference between our patches is that
> you modify the RTABs in advance to include the overhead
> in the calculation, my patch changes the length used to
> look up the transmission time. Which works with or
> without RTABs.
Thanks, I understand now.
I had decided that that STAB was indeed the way forward,
and hence I should devote my time to making it work
with ADSL/ATM. The only issue is now I don't have
any spare time right now.
Yesterday I was chatting about this at LCA 2007, and
it dawned on me that there is a problem with the dual
RTAB/STAB approach.
Currently the lookup in the kernel is
time_to_transmit_a_packet = RTAB[packet_length_seen_by_kernel]
As I understand it, you are proposing to change that to
time_to_transmit_a_packet
= RTAB[STAB[packet_length_seen_by_kernel]]
= RTAB[packet_length_seen_on_the_wire]
Given RTAB is the same in both cases the results of the
calculation will be different (and ergo wrong in one case
or the other). RTAB can't change and remain compatible
with old kernels. Ergo this approach breaks backward
compatibility.
I can't see any way forward but to break the link
between RTAB and STAB. Personally I think this is a
good thing, as I think STAB should replace RTAB
completely. One way to do this is to give the kernel
STAB + transmission rate, so the calculation above
becomes:
time_to_transmit_a_packet
= STAB[packet_length_seen_by_kernel] * transmission_rate
RTAB disappears in new kernels, and the compatibility
problem goes with it.
Can you think of other ways forward? I need some way
forward so I can get this thing working.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
2007-01-17 23:07 ` Russell Stuart
@ 2007-01-18 4:05 ` Patrick McHardy
2007-01-18 6:16 ` Russell Stuart
0 siblings, 1 reply; 25+ messages in thread
From: Patrick McHardy @ 2007-01-18 4:05 UTC (permalink / raw)
To: Russell Stuart; +Cc: hadi, netdev, David Miller, Jesper Dangaard Brouer
Russell Stuart wrote:
> On Thu, 2006-11-30 at 14:07 +0100, Patrick McHardy wrote:
>
>>Qdiscs don't use RTABs to measure rates but to calculate
>>transmission times. Transmission time is always related
>>to the length, the difference between our patches is that
>>you modify the RTABs in advance to include the overhead
>>in the calculation, my patch changes the length used to
>>look up the transmission time. Which works with or
>>without RTABs.
>
>
> Thanks, I understand now.
Glad we found some common ground :)
> I had decided that that STAB was indeed the way forward,
> and hence I should devote my time to making it work
> with ADSL/ATM. The only issue is now I don't have
> any spare time right now.
>
> Yesterday I was chatting about this at LCA 2007, and
> it dawned on me that there is a problem with the dual
> RTAB/STAB approach.
>
> Currently the lookup in the kernel is
> time_to_transmit_a_packet = RTAB[packet_length_seen_by_kernel]
>
> As I understand it, you are proposing to change that to
> time_to_transmit_a_packet
> = RTAB[STAB[packet_length_seen_by_kernel]]
> = RTAB[packet_length_seen_on_the_wire]
>
> Given RTAB is the same in both cases the results of the
> calculation will be different (and ergo wrong in one case
> or the other). RTAB can't change and remain compatible
> with old kernels. Ergo this approach breaks backward
> compatibility.
RTABs don't change, they continue to work as before. But when
an STAB is present the lookup is based on the STAB size mapping.
Neither one is wrong, RTABs calculate the transmission time based
only on the specified rate, RTABs + STABs calculate the
transmission time based on the rate, but include external overhead.
> I can't see any way forward but to break the link
> between RTAB and STAB. Personally I think this is a
> good thing, as I think STAB should replace RTAB
> completely. One way to do this is to give the kernel
> STAB + transmission rate, so the calculation above
> becomes:
>
> time_to_transmit_a_packet
> = STAB[packet_length_seen_by_kernel] * transmission_rate
>
> RTAB disappears in new kernels, and the compatibility
> problem goes with it.
(STAB[..] / transmission_rate)
Removing RTABs would break compatibilty, I don't really see why
adding STABs would.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
2007-01-18 4:05 ` Patrick McHardy
@ 2007-01-18 6:16 ` Russell Stuart
[not found] ` <45AF5C02.1010005@trash.net>
0 siblings, 1 reply; 25+ messages in thread
From: Russell Stuart @ 2007-01-18 6:16 UTC (permalink / raw)
To: hadi, netdev, David Miller, Jesper Dangaard Brouer
On Thu, 2007-01-18 at 05:05 +0100, Patrick McHardy wrote:
> > Yesterday I was chatting about this at LCA 2007, and
> > it dawned on me that there is a problem with the dual
> > RTAB/STAB approach.
> >
> > Currently the lookup in the kernel is
> > time_to_transmit_a_packet = RTAB[packet_length_seen_by_kernel]
> >
> > As I understand it, you are proposing to change that to
> > time_to_transmit_a_packet
> > = RTAB[STAB[packet_length_seen_by_kernel]]
> > = RTAB[packet_length_seen_on_the_wire]
> >
> > Given RTAB is the same in both cases the results of the
> > calculation will be different (and ergo wrong in one case
> > or the other). RTAB can't change and remain compatible
> > with old kernels. Ergo this approach breaks backward
> > compatibility.
>
> RTABs don't change, they continue to work as before. But when
> an STAB is present the lookup is based on the STAB size mapping.
> Neither one is wrong, RTABs calculate the transmission time based
> only on the specified rate, RTABs + STABs calculate the
> transmission time based on the rate, but include external overhead.
No argument with "RTAB works as before". But aren't
you proposing to feed it the accurate packet lengths
calculated STAB? For example, if the VOIP IP datagram
is 60 bytes, older kernels will index RTAB by
(60 + ethernet_header_size), STAB kernels will index
it by 159 bytes (3 ATM cells - say). If "tc" doesn't
do a uname call or something, then it will send the
same RTAB to both kernels. Obviously the results
returned by the RTAB lookup will be different.
If you aren't proposing to feed the RTAB lookup with
the output of STAB, then I still don't understand
why the current ATM patch isn't needed.
Or are you proposing tc behave differently on different
kernel versions. (I have no problem with that, but
isn't it officially frowned upon?)
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
[not found] ` <45AF5C02.1010005@trash.net>
@ 2007-01-19 3:11 ` Russell Stuart
2007-01-19 12:19 ` Patrick McHardy
0 siblings, 1 reply; 25+ messages in thread
From: Russell Stuart @ 2007-01-19 3:11 UTC (permalink / raw)
To: Patrick McHardy; +Cc: hadi, netdev, David Miller
On Thu, 2007-01-18 at 12:37 +0100, Patrick McHardy wrote:
> > Or are you proposing tc behave differently on different
> > kernel versions. (I have no problem with that, but
> > isn't it officially frowned upon?)
>
> Yes. There is no way you can make this work on old kernels,
> nobody expects that. The important part is that everything
> continues to work as before and that both old and new iproute
> binaries work properly on both old and new kernels (new
> iproute on old kernels without STABs obviously).
I thought that some degree of compatibility was
expected. At the very least the newest version
of "tc" must work on _any_ kernel as least as
well as the version it replaces did.
I also though newer kernels should work older
version of iproute2, albeit without the features
added in the newer versions.
Are you saying this is not so?
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
2007-01-19 3:11 ` Russell Stuart
@ 2007-01-19 12:19 ` Patrick McHardy
2007-01-20 3:25 ` Russell Stuart
0 siblings, 1 reply; 25+ messages in thread
From: Patrick McHardy @ 2007-01-19 12:19 UTC (permalink / raw)
To: Russell Stuart; +Cc: hadi, netdev, David Miller
Russell Stuart wrote:
> On Thu, 2007-01-18 at 12:37 +0100, Patrick McHardy wrote:
>
>>>Or are you proposing tc behave differently on different
>>>kernel versions. (I have no problem with that, but
>>>isn't it officially frowned upon?)
>>
>>Yes. There is no way you can make this work on old kernels,
>>nobody expects that. The important part is that everything
>>continues to work as before and that both old and new iproute
>>binaries work properly on both old and new kernels (new
>>iproute on old kernels without STABs obviously).
>
>
> I thought that some degree of compatibility was
> expected. At the very least the newest version
> of "tc" must work on _any_ kernel as least as
> well as the version it replaces did.
>
> I also though newer kernels should work older
> version of iproute2, albeit without the features
> added in the newer versions.
>
> Are you saying this is not so?
No, thats exactly what I'm saying.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
2007-01-19 12:19 ` Patrick McHardy
@ 2007-01-20 3:25 ` Russell Stuart
2007-01-20 8:47 ` Patrick McHardy
0 siblings, 1 reply; 25+ messages in thread
From: Russell Stuart @ 2007-01-20 3:25 UTC (permalink / raw)
To: Patrick McHardy; +Cc: hadi, netdev, David Miller
On Fri, 2007-01-19 at 13:19 +0100, Patrick McHardy wrote:
> Russell Stuart wrote:
> > I thought that some degree of compatibility was
> > expected. At the very least the newest version
> > of "tc" must work on _any_ kernel as least as
> > well as the version it replaces did.
> >
> > I also though newer kernels should work older
> > version of iproute2, albeit without the features
> > added in the newer versions.
> >
> > Are you saying this is not so?
>
> No, thats exactly what I'm saying.
I don't understand - too many negates here
without parens. Are you saying:
a. Backward / Forward compatibility between the kernel
and its user space tools isn't an issue, or
b. There is no compatibility problem.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
2007-01-20 3:25 ` Russell Stuart
@ 2007-01-20 8:47 ` Patrick McHardy
2007-01-21 7:45 ` Russell Stuart
0 siblings, 1 reply; 25+ messages in thread
From: Patrick McHardy @ 2007-01-20 8:47 UTC (permalink / raw)
To: Russell Stuart; +Cc: hadi, netdev, David Miller
Russell Stuart wrote:
> On Fri, 2007-01-19 at 13:19 +0100, Patrick McHardy wrote:
>
>>[...]
>
> I don't understand - too many negates here
> without parens. Are you saying:
>
> a. Backward / Forward compatibility between the kernel
> and its user space tools isn't an issue, or
>
> b. There is no compatibility problem.
Again, (b). You seem to have something in mind, it would be
easier if you would just explain exactly where you think there
is a problem.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
2007-01-20 8:47 ` Patrick McHardy
@ 2007-01-21 7:45 ` Russell Stuart
2007-01-24 16:38 ` Patrick McHardy
0 siblings, 1 reply; 25+ messages in thread
From: Russell Stuart @ 2007-01-21 7:45 UTC (permalink / raw)
To: Patrick McHardy; +Cc: hadi, netdev, David Miller
On Sat, 2007-01-20 at 09:47 +0100, Patrick McHardy wrote:
> Russell Stuart wrote:
> > b. There is no compatibility problem.
>
> Again, (b). You seem to have something in mind, it would be
> easier if you would just explain exactly where you think there
> is a problem.
I though I had :(.
Consider:
Line speed is 256 K bits/sec.
Protocol: ADSL/ATM (PPPoE VC/LLC) (Overhead is 42 bytes + cell pad).
Kernel HZ is 1000.
cell_log = 8.
Below is a table which shows the RTAB that would be sent
to a pre-STAB kernel:
IP Datagram Packet Size Packet Size Ticks to
Packet Size Seen by Kernel On the Wire Send packet
RTAB[0]=2 -14..-7 0..7 53..53 1.656
RTAB[1]=2 -6..1 8..15 53..53 1.656
RTAB[2]=3 2..9 16..23 53..106 3.313
RTAB[3]=3 10..17 24..31 106..106 3.313
...
RTAB[9]=5 58..63 72..79 106..159 4.968
RTAB[10]=5 64..71 80..87 159..159 4.968
Below is the same thing for a post-STAB kernel:
IP Datagram Packet Size Packet Size Ticks to
Packet Size Seen by Kernel On the Wire Send packet
RTAB[0]=0 - Undefined as no STAB entry is 0.
RTAB[1]=0 - Undefined as no STAB entry is 0.
...
RTAB[5]=0 - Undefined as no STAB entry is 0.
RTAB[6]=2 -14..-7 0..7 53..53 1.656
RTAB[7]=2 -6..1 8..15 53..53 1.656
RTAB[8]=3 2..9 16..23 53..106 3.313
RTAB[9]=3 10..17 24..31 106..106 3.313
...
RTAB[15]=5 58..63 72..79 106..159 4.968
RTAB[16]=5 64..71 80..87 159..159 4.968
The two RTAB's are different. Thus you must send
different RTAB's to pre-STAB and post-STAB kernels.
How is "tc" to decide which one to send? I did add
code that checked uname once to solve a very
similar problem in "tc", but that got my wrist
slapped.
Replacing RTAB with STAB would solve the problem, BTW,
as the post-STAB kernel would ignore the RTAB.
It would also solve another problem. The granularity
of RTAB sucks for VOIP (my area of interest). Eg on
a 2 M bit link, one ATM cell takes 0.0848 ticks to
send, two cells 0.170 ticks, three cells 0.2544 ticks.
RTP voice packets are typically two or three cells.
RTAB only holds an integral number of ticks of course,
making the current traffic control engine useless for
VOIP links with speeds of around 2.5M bit or above.
This could be fixed in an STAB implementation.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
2007-01-21 7:45 ` Russell Stuart
@ 2007-01-24 16:38 ` Patrick McHardy
2007-01-24 22:32 ` Russell Stuart
0 siblings, 1 reply; 25+ messages in thread
From: Patrick McHardy @ 2007-01-24 16:38 UTC (permalink / raw)
To: Russell Stuart; +Cc: hadi, netdev, David Miller
Russell Stuart wrote:
> On Sat, 2007-01-20 at 09:47 +0100, Patrick McHardy wrote:
>
>>Russell Stuart wrote:
>>
>>>b. There is no compatibility problem.
>>
>>Again, (b). You seem to have something in mind, it would be
>>easier if you would just explain exactly where you think there
>>is a problem.
>
>
> I though I had :(.
>
> Consider:
> Line speed is 256 K bits/sec.
> Protocol: ADSL/ATM (PPPoE VC/LLC) (Overhead is 42 bytes + cell pad).
> Kernel HZ is 1000.
> cell_log = 8.
>
> Below is a table which shows the RTAB that would be sent
> to a pre-STAB kernel:
>
> IP Datagram Packet Size Packet Size Ticks to
> Packet Size Seen by Kernel On the Wire Send packet
> RTAB[0]=2 -14..-7 0..7 53..53 1.656
> RTAB[1]=2 -6..1 8..15 53..53 1.656
> RTAB[2]=3 2..9 16..23 53..106 3.313
> RTAB[3]=3 10..17 24..31 106..106 3.313
> ...
> RTAB[9]=5 58..63 72..79 106..159 4.968
> RTAB[10]=5 64..71 80..87 159..159 4.968
>
> Below is the same thing for a post-STAB kernel:
>
> IP Datagram Packet Size Packet Size Ticks to
> Packet Size Seen by Kernel On the Wire Send packet
> RTAB[0]=0 - Undefined as no STAB entry is 0.
> RTAB[1]=0 - Undefined as no STAB entry is 0.
> ...
> RTAB[5]=0 - Undefined as no STAB entry is 0.
> RTAB[6]=2 -14..-7 0..7 53..53 1.656
> RTAB[7]=2 -6..1 8..15 53..53 1.656
> RTAB[8]=3 2..9 16..23 53..106 3.313
> RTAB[9]=3 10..17 24..31 106..106 3.313
> ...
> RTAB[15]=5 58..63 72..79 106..159 4.968
> RTAB[16]=5 64..71 80..87 159..159 4.968
>
> The two RTAB's are different. Thus you must send
> different RTAB's to pre-STAB and post-STAB kernels.
> How is "tc" to decide which one to send? I did add
> code that checked uname once to solve a very
> similar problem in "tc", but that got my wrist
> slapped.
If the users asks to use STABs, send the modified RTAB.
If the kernel doesn't support STABs it will return an error,
which is good enough.
> Replacing RTAB with STAB would solve the problem, BTW,
> as the post-STAB kernel would ignore the RTAB.
>
> It would also solve another problem. The granularity
> of RTAB sucks for VOIP (my area of interest). Eg on
> a 2 M bit link, one ATM cell takes 0.0848 ticks to
> send, two cells 0.170 ticks, three cells 0.2544 ticks.
> RTP voice packets are typically two or three cells.
> RTAB only holds an integral number of ticks of course,
> making the current traffic control engine useless for
> VOIP links with speeds of around 2.5M bit or above.
> This could be fixed in an STAB implementation.
I think this is a different problem. If you replace RTABs
by STABs you again can't use it for anything that is only
interested in the size, not the transmission time (HFSC,
SFQ, ...).
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
2007-01-24 16:38 ` Patrick McHardy
@ 2007-01-24 22:32 ` Russell Stuart
2007-01-25 0:06 ` Patrick McHardy
0 siblings, 1 reply; 25+ messages in thread
From: Russell Stuart @ 2007-01-24 22:32 UTC (permalink / raw)
To: Patrick McHardy; +Cc: hadi, netdev, Jesper Dangaard Brouer
On Wed, 2007-01-24 at 17:38 +0100, Patrick McHardy wrote:
> > The two RTAB's are different. Thus you must send
> > different RTAB's to pre-STAB and post-STAB kernels.
> > How is "tc" to decide which one to send? I did add
> > code that checked uname once to solve a very
> > similar problem in "tc", but that got my wrist
> > slapped.
>
> If the users asks to use STABs, send the modified RTAB.
> If the kernel doesn't support STABs it will return an error,
> which is good enough.
Yuk! Now the user has to say whether he wants to use
STAB's or not? Currently, apart from some debugging
params to tc, the user isn't even aware that the
traffic control is implemented in terms of RTAB's.
That is how it should be - it is an implementation
detail.
> I think this is a different problem. If you replace RTABs
> by STABs you again can't use it for anything that is only
> interested in the size, not the transmission time (HFSC,
> SFQ, ...).
I was a little too brief.
The comment stems from the observation that in all
current implementations:
const A_CONSTANT;
for (i = 0; i < 256; i += 1)
assert(RTAB[i] == STAB[i] * A_CONSTANT);
Ergo, if in addition to implementing STAB as you
plan to, A_CONSTANT was shipped to the kernel then
RTAB could be replaced. A_CONSTANT could be set
so the calculation would return the time it would
take to send a packet in micro seconds, say (a
figure I just pulled out of the air). This is
1000 times more precise than the kernel can do
now.
It wouldn't be perfect - the kernel would send the
packets in bursts. But it would be good enough
to solve my problem with VOIP, I think.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
2007-01-24 22:32 ` Russell Stuart
@ 2007-01-25 0:06 ` Patrick McHardy
2007-01-25 0:55 ` Russell Stuart
0 siblings, 1 reply; 25+ messages in thread
From: Patrick McHardy @ 2007-01-25 0:06 UTC (permalink / raw)
To: Russell Stuart; +Cc: hadi, netdev, Jesper Dangaard Brouer
Russell Stuart wrote:
> Yuk! Now the user has to say whether he wants to use
> STAB's or not? Currently, apart from some debugging
> params to tc, the user isn't even aware that the
> traffic control is implemented in terms of RTAB's.
> That is how it should be - it is an implementation
> detail.
Of course he has to, just like your "atm" parameter. In case
of stabs it would be something like "stab atm".
>>I think this is a different problem. If you replace RTABs
>>by STABs you again can't use it for anything that is only
>>interested in the size, not the transmission time (HFSC,
>>SFQ, ...).
>
>
> I was a little too brief.
>
> The comment stems from the observation that in all
> current implementations:
>
> const A_CONSTANT;
> for (i = 0; i < 256; i += 1)
> assert(RTAB[i] == STAB[i] * A_CONSTANT);
>
> Ergo, if in addition to implementing STAB as you
> plan to, A_CONSTANT was shipped to the kernel then
> RTAB could be replaced.
At least look at the patch I sent. STAB mapping is _not_ a
multiplication by a constant (which wouldn't be able to
express minimum packet size or padding to multiples of cell
sizes).
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)
2007-01-25 0:06 ` Patrick McHardy
@ 2007-01-25 0:55 ` Russell Stuart
0 siblings, 0 replies; 25+ messages in thread
From: Russell Stuart @ 2007-01-25 0:55 UTC (permalink / raw)
To: Patrick McHardy; +Cc: hadi, netdev, Jesper Dangaard Brouer
On Thu, 2007-01-25 at 01:06 +0100, Patrick McHardy wrote:
> Of course he has to, just like your "atm" parameter. In case
> of stabs it would be something like "stab atm".
The difference being with "atm" he is telling the kernel
he has an ATM line, but with STAB he is telling the
kernel how it should internally calculate packet lengths
and transmission times. Why should he care how the
kernel does it? Surely this should be hidden. As it
is now, BTW.
> > I was a little too brief.
> >
> > The comment stems from the observation that in all
> > current implementations:
> >
> > const A_CONSTANT;
> > for (i = 0; i < 256; i += 1)
> > assert(RTAB[i] == STAB[i] * A_CONSTANT);
> >
> > Ergo, if in addition to implementing STAB as you
> > plan to, A_CONSTANT was shipped to the kernel then
> > RTAB could be replaced.
>
> At least look at the patch I sent. STAB mapping is _not_ a
> multiplication by a constant (which wouldn't be able to
> express minimum packet size or padding to multiples of cell
> sizes).
You have read this the wrong way. Firstly I did look
at the patch you posted - and commented on it in some
detail. Admittedly it was a while ago. The details
are fuzzy now, but I am pretty sure I know what you
want achieve and how you plan to go about it.
Secondly, I am not saying STAB is a multiplication by
a constant - any more than RTAB is multiplication by
a constant. I am fully aware that STAB can take into
account MPU's, cell padding, additional protocol
overhead and whatnot - just like RTAB does now.
What I am saying is that on or about is summed up in
the tc_calc_xmittime() function in tc_core.c. It is
a grand total of 1 line long:
return tc_core_usec2tick(1000000*((double)size/rate));
Later on in the code, we have this assignment:
rtab[i] = tc_calc_xmittime(bps, sz);
Which can be re-written:
rtab[i] = tc_core_usec2tick(1000000*((double)size/rate));
If STAB was introduced and was kept to a fixed 256
elements (I am not suggesting it should be), the
core change would be to change the line above to
read:
stab[i] = size;
rtab[i] = tc_core_usec2tick(1000000*((double)stab[i]/rate));
Thus my assertion that:
RTAB[i] = STAB[i] * A_CONSTANT
is literally true. A_CONSTANT is currently always
1000000 / rate * HZ.
All I am saying is that in the kernel, wherever we
have a reference to rtab[i] now, after your stab is
introduced that reference could be replaced by
stab[i] * A_CONSTANT, provided A_CONSTANT was sent
with stab by tc.
Doing this would have several advantages:
a. Uses less space than having both RTAB and STAB.
b. Provides the framework to allow the kernel to
overcomes my problem with HZ being too slow
for ATM links > 2M bips/sec.
c. Puts an end to my whinging about compatibility
problems above. We can just send the old RTAB.
It will be ignored by kernels that use STAB.
Are you objecting to this because I hadn't spelt it
out clearly, or is there something deeper I am
missing? It doesn't seem like a radical proposal to
me.
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2007-01-25 0:55 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-10-16 23:34 [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel) Russell Stuart
2006-10-17 13:07 ` jamal
2006-10-19 3:41 ` David Miller
2006-10-19 14:38 ` Patrick McHardy
2006-10-20 0:49 ` jamal
2006-10-20 8:54 ` Patrick McHardy
2006-10-23 11:22 ` Russell Stuart
2006-10-23 12:39 ` Patrick McHardy
2006-10-23 21:54 ` Russell Stuart
2006-10-24 16:19 ` Patrick McHardy
2006-10-24 20:00 ` Jesper Dangaard Brouer
2006-10-24 23:46 ` Russell Stuart
2006-11-30 13:07 ` Patrick McHardy
2007-01-17 23:07 ` Russell Stuart
2007-01-18 4:05 ` Patrick McHardy
2007-01-18 6:16 ` Russell Stuart
[not found] ` <45AF5C02.1010005@trash.net>
2007-01-19 3:11 ` Russell Stuart
2007-01-19 12:19 ` Patrick McHardy
2007-01-20 3:25 ` Russell Stuart
2007-01-20 8:47 ` Patrick McHardy
2007-01-21 7:45 ` Russell Stuart
2007-01-24 16:38 ` Patrick McHardy
2007-01-24 22:32 ` Russell Stuart
2007-01-25 0:06 ` Patrick McHardy
2007-01-25 0:55 ` Russell Stuart
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).