From: John Fastabend <john.r.fastabend@intel.com>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: Dave Taht <dave.taht@gmail.com>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"Love, Robert W" <robert.w.love@intel.com>,
"David S. Miller" <davem@davemloft.net>
Subject: Re: net: Add network priority cgroup
Date: Mon, 14 Nov 2011 08:38:20 -0800 [thread overview]
Message-ID: <4EC143FC.5060704@intel.com> (raw)
In-Reply-To: <20111114144358.GB27284@hmsreliant.think-freely.org>
On 11/14/2011 6:43 AM, Neil Horman wrote:
> On Mon, Nov 14, 2011 at 01:32:04PM +0100, Dave Taht wrote:
>> On Mon, Nov 14, 2011 at 12:47 PM, Neil Horman <nhorman@tuxdriver.com> wrote:
>>> On Wed, Nov 09, 2011 at 02:57:33PM -0500, Neil Horman wrote:
>>>> Data Center Bridging environments are currently somewhat limited in their
>>>> ability to provide a general mechanism for controlling traffic priority.
>>>> Specifically they are unable to administratively control the priority at which
>>>> various types of network traffic are sent.
>>>>
>>>> Currently, the only ways to set the priority of a network buffer are:
>>>>
>>>> 1) Through the use of the SO_PRIORITY socket option
>>>> 2) By using low level hooks, like a tc action
>>>>
>>>> (1) is difficult from an administrative perspective because it requires that the
>>>> application to be coded to not just assume the default priority is sufficient,
>>>> and must expose an administrative interface to allow priority adjustment. Such
>>>> a solution is not scalable in a DCB environment
>>>>
>>>> (2) is also difficult, as it requires constant administrative oversight of
>>>> applications so as to build appropriate rules to match traffic belonging to
>>>> various classes, so that priority can be appropriately set. It is further
>>>> limiting when DCB enabled hardware is in use, due to the fact that tc rules are
>>>> only run after a root qdisc has been selected (DCB enabled hardware may reserve
>>>> hw queues for various traffic classes and needs the priority to be set prior to
>>>> selecting the root qdisc)
>>>>
>>>>
>>>> I've discussed various solutions with John Fastabend, and we saw a cgroup as
>>>> being a good general solution to this problem. The network priority cgroup
>>>> allows for a per-interface priority map to be built per cgroup. Any traffic
>>>> originating from an application in a cgroup, that does not explicitly set its
>>>> priority with SO_PRIORITY will have its priority assigned to the value
>>>> designated for that group on that interface. This allows a user space daemon,
>>>> when conducting LLDP negotiation with a DCB enabled peer to create a cgroup
>>>> based on the APP_TLV value received and administratively assign applications to
>>>> that priority using the existing cgroup utility infrastructure.
>>>>
>>>> Tested by John and myself, with good results
>>>>
>>>> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
>>>> CC: John Fastabend <john.r.fastabend@intel.com>
>>>> CC: Robert Love <robert.w.love@intel.com>
>>>> CC: "David S. Miller" <davem@davemloft.net>
>>>> --
Acked-by: John Fastabend <john.r.fastabend@intel.com>
>>>
>>> Bump, any other thoughts here? Dave T. has some reasonable thoughts regarding
>>> the use of skb->priority, but IMO they really seem orthogonal to the purpose of
>>> this change. Any other reviews would be welcome.
>>
>> Well, in part I've been playing catchup in the hope that lldp and
>> openlldp and/or this dcb netlink layer that I don't know anything
>> about (pointers please?) could help somehow to resolve the semantic
>> mess skb->priority has become in the first place.
>>
>> I liked what was described here.
>>
>> "What if we did at least carve out the DCB functionality away from
>> skb->priority? Since, AIUI, we're only concerning ourselves with
>> locally generated traffic here, we're talking
>> about skbs that have a socket attached to them. We could, instead of indexing
>> the prio_tc_map with skb->priority, we could index it with
>> skb->dev->priomap[skb->sk->prioidx] (as provided by this patch). The cgroup
>> then could be, instead of a strict priority cgroup, a queue_selector cgroup (or
>> something more appropriately named), and we don't have to touch skb->priority at
>> all. I'd really rather not start down that road until I got more opinions and
>> consensus on that, but it seems like a pretty good solution, one that would
>> allow hardware queue selection in systems that use things like DCB to co-exist
>> with software queueing features."
>>
> I was initially ok with this, but the more I think about it, the more I think
> its just not needed (see further down in this email for my reasoning). John,
> Rob, do you have any thoughts here?
>
I agree the original mechanism seems sufficient. skb->priority already has
all the qdisc and netfilter infrastructure in place to be used and using
it to prioritize "steer" packets at queues seems reasonable to me. Using
the skb->priority this way is not new pfifo_fast uses it to pick a band
and sch_prio does some similar prioritization, mqprio is a multiqueue
variant.
>> The piece that still kind of bothered me about the original proposal
>> (and perhaps this one) was that setting SO_PRIORITY in an app means
>> 'give my packets more mojo'.
>>
>> Taking something that took unprioritized packets and assigned them and
>> *them only* to a hardware queue struck me as possibly deprioritizing
>> the 'more mojo wanted' packets in the app(s), as they would end up in
>> some other, possibly overloaded, hardware queue.
>>
> I don't really see what you mean by this at all. Taking packets with no
> priority and assigning them a priority doesn't really have an effect on
> pre-prioritized packets. Or rather it shouldn't. You can certainly create a
> problem by having apps prioritized according to conflicting semantic rules, but
> that strikes me as administrative error. Garbage in...Garbage out.
>
>> So a cgroup that moves all of the packets from an application into a
>> given hardware queue, and then gets scheduled normally according to
>> skb->priority and friends (software queue, default of pfifo_fast,
>> etc), seems to make some sense to me. (I wouldn't mind if we had
>> abstractions for software queues, too, like, I need a software queue
>> with these properties, find me a place for it on the hardware - but
>> I'm dreaming)
>>
>> One open question is where do packets generated from other subsystems
>> end up, if you are using a cgroup for the app? arp, dns, etc?
>>
> The overriding rule is the association of an skb to a socket. If a transmitted
> frame has skb->sk set in dev_queue_xmit, then we interrogate its priority index
> as set when we passed through the sendmsg code at the top of the stack.
> Otherwise its behavior is unchanged from its current standpoint.
>
Having a queue selection (skb->queue_mapping?) cgroup also would defeat
any hashing across multiple queues. With mqprio we can assign many hardware
queues to a skb->priority.
w.r.t. software queue abstractions don't we already have this? mq and
mqprio enumerate a software qdisc per hardware queue. You can attach
your favorite qdisc to these. This is likely off-topic for this thread though.
[...]
> In the end, I think its just plain old more useful to assign priorty here than
> some new thing.
I agree.
Thanks,
John
>
> Neil
>
>>> John, Robert, if you're supportive of these changes, some Acks would be
>>> appreciated.
>>
>>
>>>
>>>
>>> Regards
>>> Neil
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
>>
>>
>> --
>> Dave Täht
>> SKYPE: davetaht
>> US Tel: 1-239-829-5608
>> FR Tel: 0638645374
>> http://www.bufferbloat.net
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
next prev parent reply other threads:[~2011-11-14 16:38 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-09 19:57 net: Add network priority cgroup Neil Horman
2011-11-09 19:57 ` [RFC PATCH 1/2] net: add network priority cgroup infrastructure Neil Horman
2011-11-09 19:57 ` [RFC PATCH 2/2] net: add documentation for net_prio cgroups Neil Horman
2011-11-09 20:27 ` net: Add network priority cgroup Dave Taht
2011-11-09 21:09 ` Neil Horman
2011-11-09 21:10 ` John Fastabend
2011-11-14 11:47 ` Neil Horman
2011-11-14 12:32 ` Dave Taht
2011-11-14 14:43 ` Neil Horman
2011-11-14 16:38 ` John Fastabend [this message]
2011-11-14 16:44 ` David Täht
2011-11-14 16:43 ` Shyam_Iyer
2011-11-14 17:24 ` Neil Horman
2011-11-14 20:02 ` Shyam_Iyer
2011-11-14 20:56 ` Neil Horman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4EC143FC.5060704@intel.com \
--to=john.r.fastabend@intel.com \
--cc=dave.taht@gmail.com \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
--cc=nhorman@tuxdriver.com \
--cc=robert.w.love@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.