Understanding HFSC

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Understanding HFSC
@ 2011-12-04  4:57 John A. Sullivan III
  2011-12-04 12:38 ` Michal Soltys
  0 siblings, 1 reply; 3+ messages in thread
From: John A. Sullivan III @ 2011-12-04  4:57 UTC (permalink / raw)
  To: netdev

Hello, all.  I hope I am in the right place as this seems to be the
place to ask questions formerly asked on lartc.  For the last three
days, I've been banging my head against the wall trying to understand
HFSC and it's finally starting to crack (the wall, not my head although
that's close, too!).  It seems to be wonderful, powerful, mysterious,
and poorly understood.

I'm not sure I understand it either but it also seems much of what is
written about it is written by people who don't fully grasp it, e.g.,
mostly focusing on guaranteed bandwidth and hierarchical sharing but not
spending much time explaining the important concept of decoupling
latency requirements and bandwidth - the part most interesting to us.
So I'm hoping you'll indulge my questions and my attempt to articulate
my understanding to see if I get it or if I've completely missed the
plot!

One of the most confusing bits to me is, does the m1 rate apply to each
flow handled by the class or only to the entire class once it becomes
active? In other words, if I want to ensure that my VoIP packets jump in
front of my FTP bulk transfers as so fascinatingly illustrated on page 4
of http://trash.net/~kaber/hfsc/SIGCOM97.pdf and so specify a steeper m1
slope for the first 10 ms and I have a dozen RTP sessions running, does
that mean that as many sessions as snuck a packet into the first 10 ms
received that prioritized treatment and all the rest are treated at the
m2 rate or is the 10ms acceleration in deadline time applied to every
new RTP flow? I'm hoping the latter but it didn't appear to be
explicitly stated.

Perhaps it is even better illustrated by an example posted on
https://calomel.org/pf_hfsc.html where they describe a web server
serving its 10KB of text and then some large data files.  So if I set
umax=80kbits and dmax=200ms so that I deliver the first 10KB text of the
web page with no more than 200ms delay and then send the rest of the
images, videos, etc., at the m2 rate, what happens with multiple users?
The first user goes to the site, pulls down the 10KB text and then
starts on the 10MB video (assuming they are not pipelining).  This puts
the hfsc class firmly into m2.  A new user comes in while the first user
is still downloading the video.  Is the first 10KB for the second user
scheduled at the m2 rate or does m1 kick in to determine deadline and
jump those text packets in front of both the http video download and any
bulk file transfers that might be happening at the same time?

Second, what must go into the umax size? Let's assume we want umax to be
a single maximum sized packet on a non-jumbo frame Ethernet network.
Should umax be:
1500
1514 (add Ethernet)
1518 (add CRC)
1526 (add preamble)
1538 (add interframe gap)?

To keep this email from growing any longer, I'll put the rest in a
separate email? Thanks - John

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Understanding HFSC
  2011-12-04  4:57 Understanding HFSC John A. Sullivan III
@ 2011-12-04 12:38 ` Michal Soltys
  2011-12-06  3:42   ` John A. Sullivan III
  0 siblings, 1 reply; 3+ messages in thread
From: Michal Soltys @ 2011-12-04 12:38 UTC (permalink / raw)
  To: John A. Sullivan III; +Cc: netdev

On 11-12-04 05:57, John A. Sullivan III wrote:
> Hello, all.  I hope I am in the right place as this seems to be the
> place to ask questions formerly asked on lartc.  For the last three
> days, I've been banging my head against the wall trying to understand
> HFSC and it's finally starting to crack (the wall, not my head
> although that's close, too!).  It seems to be wonderful, powerful,
> mysterious, and poorly understood.
> 
> I'm not sure I understand it either but it also seems much of what is
> written about it is written by people who don't fully grasp it, e.g.,
> mostly focusing on guaranteed bandwidth and hierarchical sharing but
> not spending much time explaining the important concept of decoupling
> latency requirements and bandwidth - the part most interesting to us.
> So I'm hoping you'll indulge my questions and my attempt to articulate
> my understanding to see if I get it or if I've completely missed the
> plot!
> 
> One of the most confusing bits to me is, does the m1 rate apply to
> each flow handled by the class or only to the entire class once it
> becomes active?

Where a packet lands is (generally) determined by tc filters and/or
iptables' mark/classify targets. All the packets that end in some leaf
node, are governed by that node's realtime service curve, and at times
when that criterion is not used - at the ratio of virtual times (coming
from linkshare service curves) between that node and its siblings.
Regardless of curve used - smallest vt (linkshare criterion) or smallest
dt from all eligible (realtime criterion) wins, and the leaf with one
will fulfill the dequeue call.

If you need more fine grained control "below" such leaf node - you need
to either use deeper hierarchy with presumably "simpler" qdiscs attached
(but more complex marking setup), or shallower hierarchy with more
elaborate qdiscs attached. Think of work conserving qdiscs such as: sfq,
drr - paired with appropriate tc filters (tc-flow perhaps ?) as needed.

Btw - check out the very latest iproute2 tree - there're fresh
tc-hfsc(7) and tc-stab(8) manuals added. I tried to make them as
detailed as possible, but I might have overshot a bit - so opinion of
someone getting into hfsc territory is invaluable. You can read them (if
installing fresh iproute2 is out of question) with simple:

nroff -mandoc -rLL=<width>n <page> | less

http://git.kernel.org/?p=linux/kernel/git/shemminger/iproute2.git;a=blob_plain;f=man/man7/tc-hfsc.7;hb=HEAD
http://git.kernel.org/?p=linux/kernel/git/shemminger/iproute2.git;a=blob_plain;f=man/man8/tc-stab.8;hb=HEAD

> Second, what must go into the umax size? Let's assume we want umax to
> be a single maximum sized packet on a non-jumbo frame Ethernet
> network.
> Should umax be:
> 1500
> 1514 (add Ethernet)
> 1518 (add CRC)
> 1526 (add preamble)
> 1538 (add interframe gap)?
> 
> To keep this email from growing any longer, I'll put the rest in a
> separate email? Thanks - John
> 

Note: umax/dmax/rate and m1/d/m2 are just alternative ways to specify
the very same thing.

As for your question - in that particular case, assuming no vlan tags
either, that would be 1514, afaik. You can cover for the rest though
with tc-stab (and also for other layers, such as atm). Keep in mind,
that if you don't disable send offloading (usually enabled by default
these days), qdiscs might be dealing with e.g. massive tcp segments with
typical side effects.

I'll go through your other questions a bit later.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Understanding HFSC
  2011-12-04 12:38 ` Michal Soltys
@ 2011-12-06  3:42   ` John A. Sullivan III
  0 siblings, 0 replies; 3+ messages in thread
From: John A. Sullivan III @ 2011-12-06  3:42 UTC (permalink / raw)
  To: Michal Soltys; +Cc: netdev

Thank you very much, Michal, for taking the time to answer these in depth.  I know they are detailed and long questions and I'm sure you have other demands on your time! I'll respond in-line - John

----- Original Message -----
> From: "Michal Soltys" <soltys@ziu.info>
> To: "John A. Sullivan III" <jsullivan@opensourcedevel.com>
> Cc: netdev@vger.kernel.org
> Sent: Sunday, December 4, 2011 7:38:31 AM
> Subject: Re: Understanding HFSC
> 
> On 11-12-04 05:57, John A. Sullivan III wrote:
><snip>
> > One of the most confusing bits to me is, does the m1 rate apply to
> > each flow handled by the class or only to the entire class once it
> > becomes active?
> 
> Where a packet lands is (generally) determined by tc filters and/or
> iptables' mark/classify targets. All the packets that end in some
> leaf
> node, are governed by that node's realtime service curve, and at
> times
> when that criterion is not used - at the ratio of virtual times
> (coming
> from linkshare service curves) between that node and its siblings.
> Regardless of curve used - smallest vt (linkshare criterion) or
> smallest
> dt from all eligible (realtime criterion) wins, and the leaf with one
> will fulfill the dequeue call.
> 
> If you need more fine grained control "below" such leaf node - you
> need
> to either use deeper hierarchy with presumably "simpler" qdiscs
> attached
> (but more complex marking setup), or shallower hierarchy with more
> elaborate qdiscs attached. Think of work conserving qdiscs such as:
> sfq,
> drr - paired with appropriate tc filters (tc-flow perhaps ?) as
> needed.

Hmm . . . If I understand your response correctly, that's the answer I was hoping was not the case :( It sounds like the queue is ignorant of the flow, i.e., it only knows that it has a packet and wants to know if it should dequeue it.  It has no idea that packet 1 is from an existing conversation and packet 2 is from a new one that needs to be jumped to the head of the overall queue if packet 1 and 2 are in the same leaf class.  Let me illustrate with two separate examples to see if I understand.  In the first example, we have periodic traffic such as VoIP and, in the second, we have bulk traffic which is being sent as fast as the originating system can send it.  In both cases, let's assume there is another class which is always backlogged.

In the first case, we are sending 222 byte RTP VoIP packets every 20ms.  Thus, we set rt umax to 222, dmax to 5ms, and rate to some number more than sufficient to handle VoIP.  I think the umax/dmax setting means we reduce deadline time for the RTP packet but only for the first one.  So I start one VoIP conversation, the first packet is "accelerated" and the rest keep arriving in 20ms intervals.  These are having their deadline times calculated according to the rt rate and not the accelerated umax/dmax slope.  1ms after one of those RTP packets arrives, an RTP from a new VoIP session arrives.  Is deadline calculated for this first packet of the new conversation from umax/dmax or rate? I would hope it would be umax/dmax but, since it sounds like the class does not distinguish between separa
 te flows, it will be calculated at rate since we have already exceeded the intersection of m1 and m2.  Hmm . . . or does the time between the VoIP packets arriving every 20ms mean the queue actually is no longer backlogged and therefore resets the service curve so that each packet is effectively on the m1 slope rather than the m2 slope?

So let's go to the second scenario which does not involve periodic packets but a constant flow.  This is the example someone cited of using HFSC to accelerate the text portion of a web page.  So, we have typical text of 80kbits and want no more than 100ms delay serving that initial bit of text. So rt umax 80kbits dmax 100ms rate 500kbits Someone connects to the web server and we serve the first 80kbits at 800kbits per second and start streaming a large embedded video at 500kbits per second.  While that video is being sent, a second user connects to the web server.  Is there initial 80kbits of text sent at 800kbits per second or 500?
> 
> Btw - check out the very latest iproute2 tree - there're fresh
> tc-hfsc(7) and tc-stab(8) manuals added. I tried to make them as
> detailed as possible, but I might have overshot a bit - so opinion of
> someone getting into hfsc territory is invaluable. You can read them
> (if
> installing fresh iproute2 is out of question) with simple:
> 
> nroff -mandoc -rLL=<width>n <page> | less
> 
> http://git.kernel.org/?p=linux/kernel/git/shemminger/iproute2.git;a=blob_plain;f=man/man7/tc-hfsc.7;hb=HEAD
> http://git.kernel.org/?p=linux/kernel/git/shemminger/iproute2.git;a=blob_plain;f=man/man8/tc-stab.8;hb=HEAD
> 
><snip>
Thanks.  I think I already read this once or twice on-line but read it again from cover to cover from your links to ensure I had the latest and greatest.  Each time, it makes more sense - John

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-12-06  2:42 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-04  4:57 Understanding HFSC John A. Sullivan III
2011-12-04 12:38 ` Michal Soltys
2011-12-06  3:42   ` John A. Sullivan III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).