netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick McHardy <kaber@trash.net>
To: hadi@cyberus.ca
Cc: Russell Stuart <russell-tcatm@stuart.id.au>,
	Alan Cox <alan@lxorguk.ukuu.org.uk>,
	Stephen Hemminger <shemminger@osdl.org>,
	netdev@vger.kernel.org, Jesper Dangaard Brouer <hawk@diku.dk>
Subject: Re: [PATCH 2/2] NET: Accurate packet scheduling for	ATM/ADSL	(userspace)
Date: Tue, 20 Jun 2006 18:51:13 +0200	[thread overview]
Message-ID: <44982781.8030301@trash.net> (raw)
In-Reply-To: <1150817922.5270.125.camel@jzny2>

jamal wrote:
> On Tue, 2006-20-06 at 16:45 +0200, Patrick McHardy wrote:
> 
>>Actually in the PPPoE case Linux doesn't know about ethernet
>>headers either, since shaping is usually done on the PPP device.
>>But that doesn't really matter since the ethernet link is not
>>the bottleneck - although it does add some delay for packetization.
> 
> 
> good point. But one could argue that is within linux (local) as opposed
> to something downstream at the ISP i.e. i have knowledge of it and i
> could do clever things. The other is: I have to know that the ISP is
> using pigeons as the link layer downstream and compensate for it.
> 
> The issue is really is whether Linux should be interested in the
> throughput it is told about or the goodput (also known as effective
> throughput) the service provider offers. Two different issues by
> definition. 


In the case of PPPoE non-work-conserving qdiscs are already used
to manage a link that is non-local with knowledge of the its
bandwidth, contrary to a local link that would be best managed
in work-conserving mode. And I think for better accuracy it is
necessary to manage effective throughput, especially if you're
interested in guaranteed delays.

>>>Yes, Linux cant tell if your service provider is lying to you.
>>
>>I wouldn't call it lying as long as they don't say "1.5mbps IP
>>layer throughput". 
> 
> 
> It is a scam for sure.
> By definition of what throughput is - you are telling the truth; just
> not the whole truth. Most users think in terms of goodput and not
> throughput. 
> i.e you are not telling the whole truth by not saying "it is 1.5Mbps ATM
> throughput". Tpyically not an issue until somebody finds that by leaving
> out "ATM" you meant throughput and not goodput. 


I think that point can be used to argue in favour of that Linux should
be able to manage effective throughput :)

>>Ethernet doesn't provide 100mbit IP layer
>>throughput either, and with minimum sized IP packets its actually
>>well below that.
>
> 
> OTOH, nobody has ethernet MTUs of 64 bytes.


Sure, but I might now want my HFSC class with guaranteed delay of 140us
to be distrurbed by someone sending small packets, that need more time
on the wire than HFSC thinks.

> To be academic and pedantic: The schedulers should be focusing on
> throughput and not goodput.
> Look at it from another angle related to the nature of the link layer
> used:
> If i buy a 1.5 Mbps 802.11JHS (such a link layer technology doesnt
> exist, but assume for the sake of arguement it does) from a wireless
> service provider, ethernet headers etc - but in this case the link is so
> bad (because of the link layer technology) i have to retransmit so much
> that 0.5 Mbps is wasted on retransmits, the question becomes: 
> 1)Do i fix the scheduler to compensate for this link layer retransmit?
> or
> 2)Do i find some other creative way to tell the scheduler that
> without making any changes to it that my ftp (despite the retransmits)
> should only chew 100Kbps.?
> 
> I am saying that #2 is the choice to go with hence my assertion earlier,
> it should be fine to tell the scheduler all it has is 1Mbps and nobody
> gets hurt. #1 if i could do it with minimal intrusion and still get to
> use it when i have 802.11g. 
> 
> Not sure i made sense.

HFSC is actually capable of handling this quite well. If you use it
in work-conserving mode (and the card doesn't do (much) internal
queueing) it will get clocked by successful transmissions. Using
link-sharing classes you can define proportions for use of available
bandwidth, possibly with upper limits. No hacks required :)

Anyway, this again goes more in the direction of handling link speed
changes.

>>A non intrusive way is prefered of course, but I can't really see
>>one if you want more than just a special-case solution that only
>>covers qdiscs using rate-tables and even ignores inner qdiscs.
>>HFSC and SFQ for example both need to calculate the wire length
>>at runtime.
>>
> 
> Agreed. That would be equivalent to #1 above.
> 
> 
>>Handling all qdiscs would mean adding a pointer to a mapping table
>>to struct net_device and using something like "skb_wire_len(skb, dev)"
>>instead of skb->len in the queueing layer. 
> 
> 
> That does seem sensible and simpler. I would suspect then that you will
> do this one time with something like
> ip dev add compensate_header 100 bytes

Something like that, but its a bit more complicated.
For ATM we need some mapping:
[0-48]  -> 53
[49-96] -> 106
...

for Ethernet we need:
[0-60] -> 64
[60-n] -> n + 4

We could do something like this (feel free to imagine nicer names):

ATM:
table = {
	.step = 53,
	.map = {
		[0..48] = 53,
		[49..96] = 106,
		...
	}
};

Requiring a table of size 32 for typical MTUs.

Ethernet:

table = {
	.step = 60,
	.map = {
		[0..60] = 60,
		[...] = 0,
	},
	.fixed_overhead = 4,
};

static inline unsigned int
skb_wire_len(struct sk_buff *skb, struct net_device *dev)
{
	unsigned int idx, len;

	if (dev->lengthtable == NULL)
		return skb->len;
	idx = skb->len / dev->lengthtable->step;
	len = dev->lengthtable->map[idx];
	return dev->lengthtable->fixed_overhead + len ? len : skb->len;
}

Unforunately I can't think of a way to handle the ATM case without
a division .. or iteration.

>>That of course doesn't
>>mean that we can't still provide pre-adjusted ratetables for qdiscs
>>that use them.
>>
> 
> 
> But what would the point be then if you can compensate as you did above?

It doesn't need runtime divisions :)

  reply	other threads:[~2006-06-20 16:52 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-14  9:40 [PATCH 2/2] NET: Accurate packet scheduling for ATM/ADSL (userspace) Jesper Dangaard Brouer
2006-06-14 10:57 ` Alan Cox
2006-06-14 13:18   ` Jesper Dangaard Brouer
2006-06-15  0:47   ` Russell Stuart
2006-06-15 13:03     ` jamal
2006-06-19 19:31       ` Jesper Dangaard Brouer
2006-06-20 14:06         ` jamal
2006-06-20 14:45           ` Patrick McHardy
2006-06-20 15:38             ` jamal
2006-06-20 16:51               ` Patrick McHardy [this message]
2006-06-22 19:02                 ` jamal
2006-06-23 15:05                   ` Patrick McHardy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=44982781.8030301@trash.net \
    --to=kaber@trash.net \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=hadi@cyberus.ca \
    --cc=hawk@diku.dk \
    --cc=netdev@vger.kernel.org \
    --cc=russell-tcatm@stuart.id.au \
    --cc=shemminger@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).