[PATCH] tbf scheduler: TSO support

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] tbf scheduler: TSO support
@ 2007-05-10 16:00 Hirokazu Takahashi
  2007-05-10 16:01 ` Patrick McHardy
  0 siblings, 1 reply; 9+ messages in thread
From: Hirokazu Takahashi @ 2007-05-10 16:00 UTC (permalink / raw)
  To: netdev

Hi,

TBF --- Simple Token Bucket Filter --- packet scheduler doesn't
work correctly with TSO on that it slows down to send out packets.
TSO packets will be discarded since the size can be larger than
the scheduler expects. But it won't cause serious problems
because the retransmitted packets can be passed.

So I made the scheduler allow to pass TSO packets:

 - tbf_enqueue() accepts packets with any size if the netdevice
   has TSO ability.

 - tbf_dequeue() can handle the packets whose size is larger than
   the bucket, which keeps tokens.
   Any packet, which may be TSO packet, can be sent if the bucket is
   full of tokens. this may lead that the number of tokens in
   the bucket turns into negative value, which means kind of debt.
   But we don't have to mind it because this will be filled up
   with tokens in a short time and it will turns into positive value
   again.

I'm not sure if this approach is the best. I appreciate any comments.
Thank you.


Signed-off-by: Hirokazu Takahashi <taka@valinux.co.jp>

--- linux-2.6.21/net/sched/sch_tbf.c.ORG	2007-05-08 20:59:28.000000000 +0900
+++ linux-2.6.21/net/sched/sch_tbf.c	2007-05-10 16:12:54.000000000 +0900
@@ -139,7 +139,7 @@ static int tbf_enqueue(struct sk_buff *s
 	struct tbf_sched_data *q = qdisc_priv(sch);
 	int ret;
 
-	if (skb->len > q->max_size) {
+	if (skb->len > q->max_size && !(sch->dev->features & NETIF_F_GSO_MASK)) {
 		sch->qstats.drops++;
 #ifdef CONFIG_NET_CLS_POLICE
 		if (sch->reshape_fail == NULL || sch->reshape_fail(skb, sch))
@@ -197,6 +197,7 @@ static struct sk_buff *tbf_dequeue(struc
 {
 	struct tbf_sched_data *q = qdisc_priv(sch);
 	struct sk_buff *skb;
+	int full = 0;
 
 	skb = q->qdisc->dequeue(q->qdisc);
 
@@ -204,7 +205,8 @@ static struct sk_buff *tbf_dequeue(struc
 		psched_time_t now;
 		long toks, delay;
 		long ptoks = 0;
-		unsigned int len = skb->len;
+		unsigned int len;
+		int n;
 
 		PSCHED_GET_TIME(now);
 
@@ -212,16 +214,26 @@ static struct sk_buff *tbf_dequeue(struc
 
 		if (q->P_tab) {
 			ptoks = toks + q->ptokens;
-			if (ptoks > (long)q->mtu)
+			if (ptoks >= (long)q->mtu) {
 				ptoks = q->mtu;
-			ptoks -= L2T_P(q, len);
+				full = 1;
+			}
+			for (len = skb->len; len > 0; len -= n) {
+				n = min(q->max_size, len);
+				ptoks -= L2T_P(q, n);
+			}
 		}
 		toks += q->tokens;
-		if (toks > (long)q->buffer)
+		if (toks >= (long)q->buffer) {
 			toks = q->buffer;
-		toks -= L2T(q, len);
+			full = 1;
+		}
+		for (len = skb->len; len > 0; len -= n) {
+			n = min(q->max_size, len);
+			toks -= L2T(q, n);
+		}
 
-		if ((toks|ptoks) >= 0) {
+		if ((toks|ptoks) >= 0 || full) {
 			q->t_c = now;
 			q->tokens = toks;
 			q->ptokens = ptoks;

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tbf scheduler: TSO support
  2007-05-10 16:00 [PATCH] tbf scheduler: TSO support Hirokazu Takahashi
@ 2007-05-10 16:01 ` Patrick McHardy
  0 siblings, 0 replies; 9+ messages in thread
From: Patrick McHardy @ 2007-05-10 16:01 UTC (permalink / raw)
  To: Hirokazu Takahashi; +Cc: netdev

Hirokazu Takahashi wrote:
> TBF --- Simple Token Bucket Filter --- packet scheduler doesn't
> work correctly with TSO on that it slows down to send out packets.
> TSO packets will be discarded since the size can be larger than
> the scheduler expects. But it won't cause serious problems
> because the retransmitted packets can be passed.


I already CCed netdev in my reply, so please use that thread instead
of starting a new one.

^ permalink raw reply	[flat|nested] 9+ messages in thread

[parent not found: <20070510.210556.05145104.taka@valinux.co.jp>]

* Re: [PATCH] tbf scheduler: TSO support
       [not found] <20070510.210556.05145104.taka@valinux.co.jp>
@ 2007-05-10 12:56 ` Patrick McHardy
  2007-05-10 21:13   ` David Miller
  0 siblings, 1 reply; 9+ messages in thread
From: Patrick McHardy @ 2007-05-10 12:56 UTC (permalink / raw)
  To: Hirokazu Takahashi; +Cc: linux-net, Linux Netdev List

Hirokazu Takahashi wrote:
> TBF --- Simple Token Bucket Filter --- packet scheduler doesn't
> work correctly with TSO on that it slows down to send out packets.
> TSO packets will be discarded since the size can be larger than
> the scheduler expects. But it won't cause serious problems
> because the retransmitted packets can be passed.
> 
> So I made the scheduler allow to pass TSO packets:
> 
>  - tbf_enqueue() accepts packets with any size if the netdevice
>    has TSO ability.
> 
>  - tbf_dequeue() can handle the packets whose size is larger than
>    the bucket, which keeps tokens.
>    Any packet, which may be TSO packet, can be sent if the bucket is
>    full of tokens. this may lead that the number of tokens in
>    the bucket turns into negative value, which means kind of debt.
>    But we don't have to mind it because this will be filled up
>    with tokens in a short time and it will turns into positive value
>    again.
> 
> I'm not sure if this approach is the best. I appreciate any comments.


I don't see why this is needed, the correct way to use TBF with TSO
is to specify a larger MTU value, in which case it won't drop TSO
packets.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tbf scheduler: TSO support
  2007-05-10 12:56 ` Patrick McHardy
@ 2007-05-10 21:13   ` David Miller
  2007-05-11  1:04     ` Patrick McHardy
  0 siblings, 1 reply; 9+ messages in thread
From: David Miller @ 2007-05-10 21:13 UTC (permalink / raw)
  To: kaber; +Cc: taka, linux-net, netdev

From: Patrick McHardy <kaber@trash.net>
Date: Thu, 10 May 2007 14:56:39 +0200

> Hirokazu Takahashi wrote:
> > TBF --- Simple Token Bucket Filter --- packet scheduler doesn't
> > work correctly with TSO on that it slows down to send out packets.
> > TSO packets will be discarded since the size can be larger than
> > the scheduler expects. But it won't cause serious problems
> > because the retransmitted packets can be passed.
> > 
> > So I made the scheduler allow to pass TSO packets:
> > 
> >  - tbf_enqueue() accepts packets with any size if the netdevice
> >    has TSO ability.
> > 
> >  - tbf_dequeue() can handle the packets whose size is larger than
> >    the bucket, which keeps tokens.
> >    Any packet, which may be TSO packet, can be sent if the bucket is
> >    full of tokens. this may lead that the number of tokens in
> >    the bucket turns into negative value, which means kind of debt.
> >    But we don't have to mind it because this will be filled up
> >    with tokens in a short time and it will turns into positive value
> >    again.
> > 
> > I'm not sure if this approach is the best. I appreciate any comments.
> 
> 
> I don't see why this is needed, the correct way to use TBF with TSO
> is to specify a larger MTU value, in which case it won't drop TSO
> packets.

Why should a user have to know anything in the world about TSO in
order to configure TBF properly?  I don't think they should have
to at all.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tbf scheduler: TSO support
  2007-05-10 21:13   ` David Miller
@ 2007-05-11  1:04     ` Patrick McHardy
  2007-05-11  4:01       ` Hirokazu Takahashi
  0 siblings, 1 reply; 9+ messages in thread
From: Patrick McHardy @ 2007-05-11  1:04 UTC (permalink / raw)
  To: David Miller; +Cc: taka, linux-net, netdev

David Miller wrote:
> From: Patrick McHardy <kaber@trash.net>
> Date: Thu, 10 May 2007 14:56:39 +0200
> 
>>I don't see why this is needed, the correct way to use TBF with TSO
>>is to specify a larger MTU value, in which case it won't drop TSO
>>packets.
> 
> 
> Why should a user have to know anything in the world about TSO in
> order to configure TBF properly?  I don't think they should have
> to at all.

The user shouldn't necessarily, but userspace should.
The way I see it the MTU is a fundamental parameter for TBF
(the peakrate bucket size) and just because userspace picks
a bad default (2000) this is no reason to change the
implementation to something that is not really TBF anymore
and even affects non-TSO packets _and_ TSO packets even
when the MTU is chosen large enough (granted, the first
point is an implementation detail). The much better solution
would be to let userspace pick an appropriate default value
and still let the user override it.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tbf scheduler: TSO support
  2007-05-11  1:04     ` Patrick McHardy
@ 2007-05-11  4:01       ` Hirokazu Takahashi
  2007-05-11 18:13         ` Patrick McHardy
  0 siblings, 1 reply; 9+ messages in thread
From: Hirokazu Takahashi @ 2007-05-11  4:01 UTC (permalink / raw)
  To: kaber; +Cc: davem, linux-net, netdev

Hi,

> >>I don't see why this is needed, the correct way to use TBF with TSO
> >>is to specify a larger MTU value, in which case it won't drop TSO
> >>packets.
> > 
> > 
> > Why should a user have to know anything in the world about TSO in
> > order to configure TBF properly?  I don't think they should have
> > to at all.
> 
> 
> The user shouldn't necessarily, but userspace should.
> The way I see it the MTU is a fundamental parameter for TBF
> (the peakrate bucket size) and just because userspace picks
> a bad default (2000) this is no reason to change the
> implementation to something that is not really TBF anymore
> and even affects non-TSO packets _and_ TSO packets even
> when the MTU is chosen large enough (granted, the first
> point is an implementation detail). The much better solution
> would be to let userspace pick an appropriate default value
> and still let the user override it.

I think the concept of TBF is quit good but the userspace tools have
become old that it doesn't fit to Gb ethernet environment.
The tools should be updated to care about much faster network and
GbE jumbo frames. I agree with you at this point.

On the other hand, handling TSO packet should be a kernel issue.
A TSO packet is logically just a multiple segmented packet
including several ordinary packets. This feature should be kept
invisible from userspace.

And I also feel it's hard to determine how large TSO packets
can/will be accepted that it depends on ether controllers.
It would be also tough to make the tools catch up with the
latest off loading features.

Thank you,
Hirokazu Takahashi.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tbf scheduler: TSO support
  2007-05-11  4:01       ` Hirokazu Takahashi
@ 2007-05-11 18:13         ` Patrick McHardy
  2007-05-12  8:49           ` Hirokazu Takahashi
  0 siblings, 1 reply; 9+ messages in thread
From: Patrick McHardy @ 2007-05-11 18:13 UTC (permalink / raw)
  To: Hirokazu Takahashi; +Cc: davem, linux-net, netdev

Hirokazu Takahashi wrote:
> I think the concept of TBF is quit good but the userspace tools have
> become old that it doesn't fit to Gb ethernet environment.
> The tools should be updated to care about much faster network and
> GbE jumbo frames. I agree with you at this point.
> 
> On the other hand, handling TSO packet should be a kernel issue.
> A TSO packet is logically just a multiple segmented packet
> including several ordinary packets. This feature should be kept
> invisible from userspace.

Putting aside this question, this cannot work properly without userspace
since the rate table is only built for the given MTU. Your patch tries
to work around that by summing up the result of L2T in q->max_size
steps, which gives incorrect results with MPU or overhead settings.

I think we can only do two things without userspace support:

- disable TSO (not a good idea)
- split the skb into q->max_size chunks on dequeue

The later would allow people to take full advantage of TSO with properly
configured TBF, but it would still at least work with a too small mtu
value.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tbf scheduler: TSO support
  2007-05-11 18:13         ` Patrick McHardy
@ 2007-05-12  8:49           ` Hirokazu Takahashi
  2007-05-12  9:09             ` Hirokazu Takahashi
  0 siblings, 1 reply; 9+ messages in thread
From: Hirokazu Takahashi @ 2007-05-12  8:49 UTC (permalink / raw)
  To: kaber; +Cc: davem, linux-net, netdev

Hi,

> > I think the concept of TBF is quit good but the userspace tools have
> > become old that it doesn't fit to Gb ethernet environment.
> > The tools should be updated to care about much faster network and
> > GbE jumbo frames. I agree with you at this point.
> > 
> > On the other hand, handling TSO packet should be a kernel issue.
> > A TSO packet is logically just a multiple segmented packet
> > including several ordinary packets. This feature should be kept
> > invisible from userspace.
> 
> 
> Putting aside this question, this cannot work properly without userspace
> since the rate table is only built for the given MTU. Your patch tries
> to work around that by summing up the result of L2T in q->max_size
> steps, which gives incorrect results with MPU or overhead settings.

You have a point. sch->dev->mtu should be used instead of q->max_size,
which gives precisely correct results. Every TSO packet will be split
into several packets with sch->dev->mtu length, which administrators
specify. The splitting is automatically done by TSO supporting controllers.

To minimize the overhead of the calculation I'd re-implement it like:

   segs = skb->len/sch->dev->mtu;
   rest = skb->len - sch->dev->mtu*segs;
   toks -= L2T(q, sch->dev->mtu) * segs;
   if (rest)
        toks -= L2T(q, rest);

I think the problem is that sch->dev->mtu might not be set correctly,
so some error checking is needed.

> I think we can only do two things without userspace support:
> 
> - disable TSO (not a good idea)

This is just what I'm doing now as a workaround.

> - split the skb into q->max_size chunks on dequeue

It sure is difficult because the splitting deeply depends on the
protocol stack which made the skb and it will make some overhead,
which is why TSO and other offloading features are introduced.

I'm now thinking I can make it just hold a TSO packet until p->tokens
reaches the size of the packet. I think it is straightforward
implementation. I'll try this.

> The later would allow people to take full advantage of TSO with properly
> configured TBF, but it would still at least work with a too small mtu
> value.

Thanks,
Hirokazu Takahashi.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] tbf scheduler: TSO support
  2007-05-12  8:49           ` Hirokazu Takahashi
@ 2007-05-12  9:09             ` Hirokazu Takahashi
  0 siblings, 0 replies; 9+ messages in thread
From: Hirokazu Takahashi @ 2007-05-12  9:09 UTC (permalink / raw)
  To: kaber; +Cc: davem, linux-net, netdev

Hi,

> > > I think the concept of TBF is quit good but the userspace tools have
> > > become old that it doesn't fit to Gb ethernet environment.
> > > The tools should be updated to care about much faster network and
> > > GbE jumbo frames. I agree with you at this point.
> > > 
> > > On the other hand, handling TSO packet should be a kernel issue.
> > > A TSO packet is logically just a multiple segmented packet
> > > including several ordinary packets. This feature should be kept
> > > invisible from userspace.
> > 
> > 
> > Putting aside this question, this cannot work properly without userspace
> > since the rate table is only built for the given MTU. Your patch tries
> > to work around that by summing up the result of L2T in q->max_size
> > steps, which gives incorrect results with MPU or overhead settings.
> 
> You have a point. sch->dev->mtu should be used instead of q->max_size,
> which gives precisely correct results.

Oops, this isn't really true because TSO supporting controllers will
probably add some headers to the split packets.
So I'd say "it will give almost correct results".

> Every TSO packet will be split
> into several packets with sch->dev->mtu length, which administrators
> specify. The splitting is automatically done by TSO supporting controllers.
> 
> To minimize the overhead of the calculation I'd re-implement it like:
> 
>    segs = skb->len/sch->dev->mtu;
>    rest = skb->len - sch->dev->mtu*segs;
>    toks -= L2T(q, sch->dev->mtu) * segs;
>    if (rest)
>         toks -= L2T(q, rest);
> 
> I think the problem is that sch->dev->mtu might not be set correctly,
> so some error checking is needed.
> 
> > I think we can only do two things without userspace support:
> > 
> > - disable TSO (not a good idea)
> 
> This is just what I'm doing now as a workaround.
> 
> > - split the skb into q->max_size chunks on dequeue
> 
> It sure is difficult because the splitting deeply depends on the
> protocol stack which made the skb and it will make some overhead,
> which is why TSO and other offloading features are introduced.
> 
> I'm now thinking I can make it just hold a TSO packet until p->tokens
> reaches the size of the packet. I think it is straightforward
> implementation. I'll try this.
> 
> > The later would allow people to take full advantage of TSO with properly
> > configured TBF, but it would still at least work with a too small mtu
> > value.
> 
> Thanks,
> Hirokazu Takahashi.
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-net" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2007-05-12  9:09 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-10 16:00 [PATCH] tbf scheduler: TSO support Hirokazu Takahashi
2007-05-10 16:01 ` Patrick McHardy
     [not found] <20070510.210556.05145104.taka@valinux.co.jp>
2007-05-10 12:56 ` Patrick McHardy
2007-05-10 21:13   ` David Miller
2007-05-11  1:04     ` Patrick McHardy
2007-05-11  4:01       ` Hirokazu Takahashi
2007-05-11 18:13         ` Patrick McHardy
2007-05-12  8:49           ` Hirokazu Takahashi
2007-05-12  9:09             ` Hirokazu Takahashi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).