Dynamically classifying flows?

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Dynamically classifying flows?
@ 2005-03-07 17:50 Asim Shankar
  2005-03-07 20:34 ` Thomas Graf
  2005-03-07 23:17 ` Jonathan Day
  0 siblings, 2 replies; 6+ messages in thread
From: Asim Shankar @ 2005-03-07 17:50 UTC (permalink / raw)
  To: netdev

Hi,

I was looking into various queueing disciplines and had some thoughts/queries.
This email is going to be fairly high-level and somewhat long, so I'd be
grateful if you can bear with me.

Okay, so qdiscs can be run in various ways - FIFO, Round Robin (SFQ, PRIO),
HTB etc. Grossly oversimplified, I see all these strategies as allowing
administrators to statically define packet classes and class priorities, and
then possibly ensuring fairness amongst packets with equal class priotities.

This "staticness" of class priorities *may* lead to some problems (well, I'm
going to ask if they can). Consider a huge, popular file on an HTTP server.
Due to its popularity, requests for small pages may suffer. Similarly,
consider an SSH/SFTP server where SFTP traffic for large, popular files may
choke the SSH terminal connections (especially if the application doesn't set
the TOS bits or routers along the way ignore them). So we have interactive
flows (like someone SSHing to do some 'ls'es or many clients viewing a small
web-page) and bulk flows (downloads). By "flow" I mean a connection, not
necessarily an explicit TCP connection but a loose definition - say something
that "ip_conntrack" tracks.

Question 1: Can the number of and speed with which bulk flow packets are
generated adversely affect the interactive flows - i.e., can too many large
file downloads make the 'ls' or the small page downloads slow? Is this a
_likely_ scenario?

Diffserv already in effect tries to classify traffic as interactive or
bulk. However, this classification is still static and requires application
cooperation, which may not always be available or may be overridden. Web
servers for example don't change the TOS fields depending on whether the
file requested was a 700MB CD-image or a 2K homepage.

Question 2: Does the idea of _dynamically_ classifying traffic as interactive
or bulk make any sense at all? Or does the TOS field work well enough for
dynamic classification to not be of any practical interest?

If it does make sense,

Question 3: Has work already been along along these lines? If so, any pointers
would be appreciated.

Can we use ideas from process scheduling to be kinder to the interactive
flows? A "process" becomes a "flow", the "CPU" becomes the "NIC" and "time"
becomes "bytes". Process scheduling tries to keep system responsiveness high
by dynamically classifying processes as interactive or bulk and then making
interactive process priorities higher than non-interactive. A similar strategy
at the qdisc would mean that when the interactive flow has something to send,
it will get a higher priority. Flows will be dynamically assigned priorities
based on the history of traffic they generate.

Applying process scheduling would be somewhat expensive (we're keeping track
of connections). RED on the other hand does something *like* this by making
the probability of a packet drop of a particular flow proportional to the
traffic generated by the flow, of course it does so without any explicit
notion of flows. This leads to:

Question 5: Does RED provide *everything* this process-scheduling strategy
would? i.e., how would you compare the two?

Well, I guess that completes my question list for now.
Thanks for reading (and replying :-)),
Regards,

-- Asim

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Dynamically classifying flows?
  2005-03-07 17:50 Dynamically classifying flows? Asim Shankar
@ 2005-03-07 20:34 ` Thomas Graf
  2005-03-08  0:10   ` Asim Shankar
  2005-03-07 23:17 ` Jonathan Day
  1 sibling, 1 reply; 6+ messages in thread
From: Thomas Graf @ 2005-03-07 20:34 UTC (permalink / raw)
  To: Asim Shankar; +Cc: netdev

* Asim Shankar <7bca1cb505030709502316f9b8@mail.gmail.com> 2005-03-07 11:50
> Okay, so qdiscs can be run in various ways - FIFO, Round Robin (SFQ, PRIO),
> HTB etc. Grossly oversimplified, I see all these strategies as allowing
> administrators to statically define packet classes and class priorities, and
> then possibly ensuring fairness amongst packets with equal class priotities.
> 
> This "staticness" of class priorities *may* lead to some problems (well, I'm
> going to ask if they can). Consider a huge, popular file on an HTTP server.
> Due to its popularity, requests for small pages may suffer. Similarly,
> consider an SSH/SFTP server where SFTP traffic for large, popular files may
> choke the SSH terminal connections (especially if the application doesn't set
> the TOS bits or routers along the way ignore them). So we have interactive
> flows (like someone SSHing to do some 'ls'es or many clients viewing a small
> web-page) and bulk flows (downloads). By "flow" I mean a connection, not
> necessarily an explicit TCP connection but a loose definition - say something
> that "ip_conntrack" tracks.

The example of SSH is bad because it behaves very well. A scp/sftp will have
different DSCP flags than "normal" SSH sessions.

> Question 1: Can the number of and speed with which bulk flow packets are
> generated adversely affect the interactive flows - i.e., can too many large
> file downloads make the 'ls' or the small page downloads slow? Is this a
> _likely_ scenario?

It is a likely scenario but usually not a problem because you can classify this
kind of bulk packets by their size. u32 can be used use for such things or the
newly added meta ematch.

A much worse scenarios is a high amount of new, short living connections
because it is hard to classify those on non-static patterns and they pollute
all your queues. A real world scenario for this is a bunch of bittorrent users
in your network.

> Question 2: Does the idea of _dynamically_ classifying traffic as interactive
> or bulk make any sense at all? Or does the TOS field work well enough for
> dynamic classification to not be of any practical interest?

Yes it does and that's exactly the direction the ematch API is going to.

> Question 3: Has work already been along along these lines? If so, any pointers
> would be appreciated.

Quite a few papers but most of them don't work in practice for me.

> Can we use ideas from process scheduling to be kinder to the interactive
> flows? A "process" becomes a "flow", the "CPU" becomes the "NIC" and "time"
> becomes "bytes". Process scheduling tries to keep system responsiveness high
> by dynamically classifying processes as interactive or bulk and then making
> interactive process priorities higher than non-interactive. A similar strategy
> at the qdisc would mean that when the interactive flow has something to send,
> it will get a higher priority. Flows will be dynamically assigned priorities
> based on the history of traffic they generate.

In order to prioritize there must be a queue, and for remote terminal protocols
you want to avoid queues at any cost because it will introduce latency in any
case. Even on overloaded lines with full queues, the queues are rarely bigh
enough to really apply any sort of priority queues.

Another problem that arises is that usually you want different queue parameters
depending on the type of traffic it holds. I'm not sure if it would help to
determine those more dynamically, intuitively I'd say it doesn't make too much
sense but feel free to prove me wrong.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Dynamically classifying flows?
  2005-03-07 17:50 Dynamically classifying flows? Asim Shankar
  2005-03-07 20:34 ` Thomas Graf
@ 2005-03-07 23:17 ` Jonathan Day
  1 sibling, 0 replies; 6+ messages in thread
From: Jonathan Day @ 2005-03-07 23:17 UTC (permalink / raw)
  To: Asim Shankar, netdev

It depends a bit on what OS you are using, as to the
best strategy to follow.

Linux has Layer 7 packet classification, so it should
be possible to identify the more troublesome
applications and have them in seperate queues. I don't
think BSD's ALTQ has that capability, but I could be
wrong. Other Operating Systems are anybody's guess.

If applications are likely to misbehave, and in a
manner that is unpredictable, then something in the
"Fair Queueing" family would seem a good place to
start. Your primary goal is not to impose bounds on
the traffic as much as it is to place bounds on the
penalties imposed, so it's not clear that you'd
necessarily need to know whether something is bulk or
interactive, you'd just need to know if it would cause
problems if it were given greater bandwidth.

RED is good under certain conditions, not so good
under others. There are many members of the RED
family, so you might want to shop around a little.
There may be something that is perfect for your
situation.

I'd suggest ECN, but most applications don't pay
attention to notifications. Worth a look, though - the
OS might understand ECN and throttle the application
on its behalf. (Sounds sadistic.)

If you definitely want classful QoS, then I'd say make
two classes - everything definitely known & needing
high bandwidth, and everything else. Configure it so
that your second class can NEVER steal used bandwidth
from the first, but CAN be loaned it if there's
insufficient interactive traffic to fill the quota it
has been given. That way, you needn't care about
"unknown" stuff, or whatever, because it'll all be
lumped into the second class, unless it is
specifically designated as being in the first class.

--- Asim Shankar <asimshankar@gmail.com> wrote:

> Hi,
> 
> I was looking into various queueing disciplines and
> had some thoughts/queries.
> This email is going to be fairly high-level and
> somewhat long, so I'd be
> grateful if you can bear with me.
> 
> Okay, so qdiscs can be run in various ways - FIFO,
> Round Robin (SFQ, PRIO),
> HTB etc. Grossly oversimplified, I see all these
> strategies as allowing
> administrators to statically define packet classes
> and class priorities, and
> then possibly ensuring fairness amongst packets with
> equal class priotities.
> 
> This "staticness" of class priorities *may* lead to
> some problems (well, I'm
> going to ask if they can). Consider a huge, popular
> file on an HTTP server.
> Due to its popularity, requests for small pages may
> suffer. Similarly,
> consider an SSH/SFTP server where SFTP traffic for
> large, popular files may
> choke the SSH terminal connections (especially if
> the application doesn't set
> the TOS bits or routers along the way ignore them).
> So we have interactive
> flows (like someone SSHing to do some 'ls'es or many
> clients viewing a small
> web-page) and bulk flows (downloads). By "flow" I
> mean a connection, not
> necessarily an explicit TCP connection but a loose
> definition - say something
> that "ip_conntrack" tracks.
> 
> Question 1: Can the number of and speed with which
> bulk flow packets are
> generated adversely affect the interactive flows -
> i.e., can too many large
> file downloads make the 'ls' or the small page
> downloads slow? Is this a
> _likely_ scenario?
> 
> Diffserv already in effect tries to classify traffic
> as interactive or
> bulk. However, this classification is still static
> and requires application
> cooperation, which may not always be available or
> may be overridden. Web
> servers for example don't change the TOS fields
> depending on whether the
> file requested was a 700MB CD-image or a 2K
> homepage.
> 
> Question 2: Does the idea of _dynamically_
> classifying traffic as interactive
> or bulk make any sense at all? Or does the TOS field
> work well enough for
> dynamic classification to not be of any practical
> interest?
> 
> If it does make sense,
> 
> Question 3: Has work already been along along these
> lines? If so, any pointers
> would be appreciated.
> 
> Can we use ideas from process scheduling to be
> kinder to the interactive
> flows? A "process" becomes a "flow", the "CPU"
> becomes the "NIC" and "time"
> becomes "bytes". Process scheduling tries to keep
> system responsiveness high
> by dynamically classifying processes as interactive
> or bulk and then making
> interactive process priorities higher than
> non-interactive. A similar strategy
> at the qdisc would mean that when the interactive
> flow has something to send,
> it will get a higher priority. Flows will be
> dynamically assigned priorities
> based on the history of traffic they generate.
> 
> Applying process scheduling would be somewhat
> expensive (we're keeping track
> of connections). RED on the other hand does
> something *like* this by making
> the probability of a packet drop of a particular
> flow proportional to the
> traffic generated by the flow, of course it does so
> without any explicit
> notion of flows. This leads to:
> 
> Question 5: Does RED provide *everything* this
> process-scheduling strategy
> would? i.e., how would you compare the two?
> 
> Well, I guess that completes my question list for
> now.
> Thanks for reading (and replying :-)),
> Regards,
> 
> -- Asim
> 
> 

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Dynamically classifying flows?
  2005-03-07 20:34 ` Thomas Graf
@ 2005-03-08  0:10   ` Asim Shankar
  2005-03-08  0:25     ` Patrick McHardy
  2005-03-08  1:14     ` Thomas Graf
  0 siblings, 2 replies; 6+ messages in thread
From: Asim Shankar @ 2005-03-08  0:10 UTC (permalink / raw)
  To: Thomas Graf; +Cc: netdev

> It is a likely scenario but usually not a problem because you can classify this
> kind of bulk packets by their size. u32 can be used use for such things or the
> newly added meta ematch.
Filtering by size may not always work. An interactive flow may also
generate big (MTU) size packets, but it is interactive because the
_rate_ at which packets are produced is smaller. Though, if you think
that such cases are purely theoretical and don't create problems in
practice, do let me know.

> > Question 2: Does the idea of _dynamically_ classifying traffic as interactive
> > or bulk make any sense at all? Or does the TOS field work well enough for
> > dynamic classification to not be of any practical interest?
> 
> Yes it does and that's exactly the direction the ematch API is going to.
Can you point me to some details on ematch? Specifically, how it
supports dynamic  classifications of flows? Seems like this is
something really new you guys are working on, so not much
documentation is available.

> > Can we use ideas from process scheduling to be kinder to the interactive
> > flows? 
> In order to prioritize there must be a queue, and for remote terminal protocols
> you want to avoid queues at any cost because it will introduce latency in any
> case. Even on overloaded lines with full queues, the queues are rarely bigh
> enough to really apply any sort of priority queues.
Won't we always have queues? After all a qdisc essentially specifies a
de-queuing algorithm. I was thinking along the lines of process
scheduling to be able to avoid having to manually specify flow
priorities. Ideally speaking, it would automatically classify remote
terminal flows such that they see the least possible queueing delays.
Though, you seem to be of the opinion that manual classification is
easy enough and most traffic worth worrying about use DSCP flags
effectively. Is that correct?

Thanks for your comments,
Regards,

-- Asim

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Dynamically classifying flows?
  2005-03-08  0:10   ` Asim Shankar
@ 2005-03-08  0:25     ` Patrick McHardy
  2005-03-08  1:14     ` Thomas Graf
  1 sibling, 0 replies; 6+ messages in thread
From: Patrick McHardy @ 2005-03-08  0:25 UTC (permalink / raw)
  To: Asim Shankar; +Cc: Thomas Graf, netdev

Asim Shankar wrote:
>>It is a likely scenario but usually not a problem because you can classify this
>>kind of bulk packets by their size. u32 can be used use for such things or the
>>newly added meta ematch.
> 
> Filtering by size may not always work. An interactive flow may also
> generate big (MTU) size packets, but it is interactive because the
> _rate_ at which packets are produced is smaller. Though, if you think
> that such cases are purely theoretical and don't create problems in
> practice, do let me know.

The connbytes and the connrate match from netfilter patch-o-matic can
be used to dynamically reclassify demanding connections. Keep in mind
that reclassification can cause reordering, so you should make sure
it can't happen frequently for single connections.

Regards
Patrick

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Dynamically classifying flows?
  2005-03-08  0:10   ` Asim Shankar
  2005-03-08  0:25     ` Patrick McHardy
@ 2005-03-08  1:14     ` Thomas Graf
  1 sibling, 0 replies; 6+ messages in thread
From: Thomas Graf @ 2005-03-08  1:14 UTC (permalink / raw)
  To: Asim Shankar; +Cc: netdev

* Asim Shankar <7bca1cb505030716104856fe3@mail.gmail.com> 2005-03-07 18:10
> Filtering by size may not always work. An interactive flow may also
> generate big (MTU) size packets, but it is interactive because the
> _rate_ at which packets are produced is smaller. Though, if you think
> that such cases are purely theoretical and don't create problems in
> practice, do let me know.

It really depends on your needs and the accuracy you need. I guess,
there is not a single truth. I've been experimenting with a classifier
which classifies based on the packet rate over a certain time, relative
to the total packet rate, i.e. the higher the packet rate the more likely
it is supposd to be a bulk data transfer. It is quite successful for
high traffic scenarios where real connection tracking is not possible.
However, this only works for certain scenarios, there are dozens of
other ways to solve this problem, one of them is the recent additions
to netfilter as Patrick pointed out.

> Can you point me to some details on ematch? Specifically, how it
> supports dynamic  classifications of flows? Seems like this is
> something really new you guys are working on, so not much
> documentation is available.

The term "dynamic classification" is a bit wide and can be interpreted
in various ways. Ematch is not aiming towards a specific direction
but rather tries to provide a easy to use API for everyone to write
their own classification procedures or construct one by combining
existing ematches.

As you mentioned, it's quite new and still experimental and not well
documented but this will change together with the required userspace
additions.

> Won't we always have queues? After all a qdisc essentially specifies a
> de-queuing algorithm.

Yes but the queues will be empty most of the times except if you
apply some kind of rate limitation and that is exactly what should
be avoided while handling interactive flows. It would be possible of
course to try and pick all the packets belonging to a interactive flow
but why bother? Enqueueing them into a separate band is easier and
causes less troubles.

> I was thinking along the lines of process
> scheduling to be able to avoid having to manually specify flow
> priorities. Ideally speaking, it would automatically classify remote
> terminal flows such that they see the least possible queueing delays.

This would be nice but really is hard to achieve. The drawback of
any automation is that the worst-case scenarios are often much worse.
The classifier algorithm mentioned above aims into this direction
by automatically classifying packets into different bands based on
their packet rate relative to the total packet rate.

It is an interesting topic and has potention for great ideas and
concepts but it is harder than it looks at a first glance. It
gets quite worse when you try to guarantee some kind of QoS policy
because all dynamic approachs I've seen go totally berserk under
extreme conditions.

I'll give you some examples of what I mean:

  Assuming you to want detect and classify interactive flows. Does this
  mean that the connection handshake is included as well? If so, how
  do you differ between connection handshakes of interactive flows
  and bulk flows if not with the help of some static pattern? Depending
  on your actual policy you're likely being forced to drop out of band
  packets which will make the connection estabilishment very unreliable
  for interactive traffic which probably interfers with your QoS policy.

  Assuming you either ignore or solved the above problem, what happens
  if certain packets are classified wrong? Your flows are unreliable
  because they are subject to arbitary bursts of high latency or even
  packet loss.

  Assuming you use connection tracking and automatically classify all
  packets the same belonging to the same connection (which is a quite
  popular approach) what happens if parts of your interactive flows
  contain heavy bulky areas in between (e.g. ls -R / over a remote
  terminal connection)? It will pollute all your fine tuned interactive
  queues and take influence on other unrelated interactive flows.

As you can see it really depends on your needs and the actual problem
you're trying to solve. There is no easy and fully automatic solution
for the above. Actually it's quite hard to solve these with manual fine
tuned classification too but it is possible because you can divided
it into various problem domains and solve each independantly.

I will agree with you when you say that there is a possibility to
build a qdisc which will handle the typical home end-user setup with
remote terminal traffic, bulk data traffic and possibily some gaming
and I'll assist you in any way if you want to put effort into it
because it would be a good thing (tm).

> Though, you seem to be of the opinion that manual classification is
> easy enough and most traffic worth worrying about use DSCP flags
> effectively. Is that correct?

I think manually configurable classification tools have a more wide
spectrum of usability but I agree with you that those are harder to
use. I also think that DSCP is doing a good job and is enough for
more jobs than many people can imagine but it has strong limitations
and most problems need a combination of various techniques which is
the reason for the ematches to support logical expressions.

I think there is a big potential for making things easier to configure
but this needs time and a lot of effort. I think ematches are a step
in the right direction but a lot more effort needs to go into the
userspace tools. There is some work going on in the background but the
fact that usability is very time consuming makes it a slow process.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-03-08  1:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-07 17:50 Dynamically classifying flows? Asim Shankar
2005-03-07 20:34 ` Thomas Graf
2005-03-08  0:10   ` Asim Shankar
2005-03-08  0:25     ` Patrick McHardy
2005-03-08  1:14     ` Thomas Graf
2005-03-07 23:17 ` Jonathan Day

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).