Re: net: Add network priority cgroup

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: John Fastabend <john.r.fastabend@intel.com>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: Dave Taht <dave.taht@gmail.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"Love, Robert W" <robert.w.love@intel.com>,
	"David S. Miller" <davem@davemloft.net>
Subject: Re: net: Add network priority cgroup
Date: Mon, 14 Nov 2011 08:38:20 -0800	[thread overview]
Message-ID: <4EC143FC.5060704@intel.com> (raw)
In-Reply-To: <20111114144358.GB27284@hmsreliant.think-freely.org>

On 11/14/2011 6:43 AM, Neil Horman wrote:
> On Mon, Nov 14, 2011 at 01:32:04PM +0100, Dave Taht wrote:
>> On Mon, Nov 14, 2011 at 12:47 PM, Neil Horman <nhorman@tuxdriver.com> wrote:
>>> On Wed, Nov 09, 2011 at 02:57:33PM -0500, Neil Horman wrote:
>>>> Data Center Bridging environments are currently somewhat limited in their
>>>> ability to provide a general mechanism for controlling traffic priority.
>>>> Specifically they are unable to administratively control the priority at which
>>>> various types of network traffic are sent.
>>>>
>>>> Currently, the only ways to set the priority of a network buffer are:
>>>>
>>>> 1) Through the use of the SO_PRIORITY socket option
>>>> 2) By using low level hooks, like a tc action
>>>>
>>>> (1) is difficult from an administrative perspective because it requires that the
>>>> application to be coded to not just assume the default priority is sufficient,
>>>> and must expose an administrative interface to allow priority adjustment.  Such
>>>> a solution is not scalable in a DCB environment
>>>>
>>>> (2) is also difficult, as it requires constant administrative oversight of
>>>> applications so as to build appropriate rules to match traffic belonging to
>>>> various classes, so that priority can be appropriately set. It is further
>>>> limiting when DCB enabled hardware is in use, due to the fact that tc rules are
>>>> only run after a root qdisc has been selected (DCB enabled hardware may reserve
>>>> hw queues for various traffic classes and needs the priority to be set prior to
>>>> selecting the root qdisc)
>>>>
>>>>
>>>> I've discussed various solutions with John Fastabend, and we saw a cgroup as
>>>> being a good general solution to this problem.  The network priority cgroup
>>>> allows for a per-interface priority map to be built per cgroup.  Any traffic
>>>> originating from an application in a cgroup, that does not explicitly set its
>>>> priority with SO_PRIORITY will have its priority assigned to the value
>>>> designated for that group on that interface.  This allows a user space daemon,
>>>> when conducting LLDP negotiation with a DCB enabled peer to create a cgroup
>>>> based on the APP_TLV value received and administratively assign applications to
>>>> that priority using the existing cgroup utility infrastructure.
>>>>
>>>> Tested by John and myself, with good results
>>>>
>>>> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
>>>> CC: John Fastabend <john.r.fastabend@intel.com>
>>>> CC: Robert Love <robert.w.love@intel.com>
>>>> CC: "David S. Miller" <davem@davemloft.net>
>>>> --

Acked-by: John Fastabend <john.r.fastabend@intel.com>

>>>
>>> Bump, any other thoughts here?  Dave T. has some reasonable thoughts regarding
>>> the use of skb->priority, but IMO they really seem orthogonal to the purpose of
>>> this change.  Any other reviews would be welcome.
>>
>> Well, in part I've been playing catchup in the hope that lldp and
>> openlldp and/or this dcb netlink layer that I don't know anything
>> about (pointers please?) could help somehow to resolve the semantic
>> mess skb->priority has become in the first place.
>>
>> I liked what was described here.
>>
>> "What if we did at least carve out the DCB functionality away from
>> skb->priority?  Since, AIUI, we're only concerning ourselves with
>> locally generated traffic here, we're talking
>> about skbs that have a socket attached to them.  We could, instead of indexing
>> the prio_tc_map with skb->priority, we could index it with
>> skb->dev->priomap[skb->sk->prioidx] (as provided by this patch).  The cgroup
>> then could be, instead of a strict priority cgroup, a queue_selector cgroup (or
>> something more appropriately named), and we don't have to touch skb->priority at
>> all.  I'd really rather not start down that road until I got more opinions and
>> consensus on that, but it seems like a pretty good solution, one that would
>> allow hardware queue selection in systems that use things like DCB to co-exist
>> with software queueing features."
>>
> I was initially ok with this, but the more I think about it, the more I think
> its just not needed (see further down in this email for my reasoning).  John,
> Rob, do you have any thoughts here?
> 

I agree the original mechanism seems sufficient. skb->priority already has
all the qdisc and netfilter infrastructure in place to be used and using
it to prioritize "steer" packets at queues seems reasonable to me. Using
the skb->priority this way is not new pfifo_fast uses it to pick a band
and sch_prio does some similar prioritization, mqprio is a multiqueue
variant.

>> The piece that still kind of bothered me about the original proposal
>> (and perhaps this one) was that setting SO_PRIORITY in an app means
>> 'give my packets more mojo'.
>>
>> Taking something that took unprioritized packets and assigned them and
>> *them only* to a hardware queue struck me as possibly deprioritizing
>> the 'more mojo wanted' packets in the app(s), as they would end up in
>> some other, possibly overloaded, hardware queue.
>>
> I don't really see what you mean by this at all.  Taking packets with no
> priority and assigning them a priority doesn't really have an effect on
> pre-prioritized packets.  Or rather it shouldn't.  You can certainly create a
> problem by having apps prioritized according to conflicting semantic rules, but
> that strikes me as administrative error.  Garbage in...Garbage out.
> 
>> So a cgroup that moves all of the packets from an application into a
>> given hardware queue, and then gets scheduled normally according to
>> skb->priority and friends (software queue, default of pfifo_fast,
>> etc), seems to make some sense to me. (I wouldn't mind if we had
>> abstractions for software queues, too, like, I need a software queue
>> with these properties, find me a place for it on the hardware - but
>> I'm dreaming)
>>
>> One open question is where do packets generated from other subsystems
>> end up, if you are using a cgroup for the app? arp, dns, etc?
>>
> The overriding rule is the association of an skb to a socket.  If a transmitted
> frame has skb->sk set in dev_queue_xmit, then we interrogate its priority index
> as set when we passed through the sendmsg code at the top of the stack.
> Otherwise its behavior is unchanged from its current standpoint.
> 

Having a queue selection (skb->queue_mapping?) cgroup also would defeat
any hashing across multiple queues. With mqprio we can assign many hardware
queues to a skb->priority.

w.r.t. software queue abstractions don't we already have this? mq and
mqprio enumerate a software qdisc per hardware queue. You can attach
your favorite qdisc to these. This is likely off-topic for this thread though.

[...]

> In the end, I think its just plain old more useful to assign priorty here than
> some new thing.

I agree.

Thanks,
John


> 
> Neil
>  
>>> John, Robert, if you're supportive of these changes, some Acks would be
>>> appreciated.
>>
>>
>>>
>>>
>>> Regards
>>> Neil
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>>
>>
>> -- 
>> Dave Täht
>> SKYPE: davetaht
>> US Tel: 1-239-829-5608
>> FR Tel: 0638645374
>> http://www.bufferbloat.net
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>

next prev parent reply	other threads:[~2011-11-14 16:38 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-09 19:57 net: Add network priority cgroup Neil Horman
2011-11-09 19:57 ` [RFC PATCH 1/2] net: add network priority cgroup infrastructure Neil Horman
2011-11-09 19:57 ` [RFC PATCH 2/2] net: add documentation for net_prio cgroups Neil Horman
2011-11-09 20:27 ` net: Add network priority cgroup Dave Taht
2011-11-09 21:09   ` Neil Horman
2011-11-09 21:10   ` John Fastabend
2011-11-14 11:47 ` Neil Horman
2011-11-14 12:32   ` Dave Taht
2011-11-14 14:43     ` Neil Horman
2011-11-14 16:38       ` John Fastabend [this message]
2011-11-14 16:44         ` David Täht
2011-11-14 16:43       ` Shyam_Iyer
2011-11-14 17:24         ` Neil Horman
2011-11-14 20:02           ` Shyam_Iyer
2011-11-14 20:56             ` Neil Horman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4EC143FC.5060704@intel.com \
    --to=john.r.fastabend@intel.com \
    --cc=dave.taht@gmail.com \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=nhorman@tuxdriver.com \
    --cc=robert.w.love@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).