Re: [PATCH] net: add sysctl allow_so_priority for SO_PRIORITY setsockopt

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "David Täht" <dave.taht@gmail.com>
To: "Maciej Żenczykowski" <zenczykowski@gmail.com>
Cc: David Miller <davem@davemloft.net>, netdev@vger.kernel.org
Subject: Re: [PATCH] net: add sysctl allow_so_priority for SO_PRIORITY setsockopt
Date: Sat, 22 Oct 2011 11:01:18 +0200	[thread overview]
Message-ID: <4EA2865E.2050305@gmail.com> (raw)
In-Reply-To: <CAHo-OoyVSbsxb8U3Y5WCNRsxjr00g1O3HJcT1fmu5cmP5i-JsA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 6481 bytes --]

On 10/22/2011 10:27 AM, Maciej Żenczykowski wrote:
>> I also don't see why we'd want to allow disabling this either.

I have been watching this and the other capability patches go by with 
interest. My use case is that I would like to be running "named" as a 
non-root user, but would like it to vary the dscp (tos) field on a per 
connection basis.

tcp zone transfers = bulk
tcp/udp queries = something like interactive | CS5 (this moves dns 
queries into the VI queue on wireless - which can also be done with 
SO_PRIORITY)

Having TOS modification as a grant-able capability and otherwise 
restricting it makes some sense in a world of otherwise unrestricted 
user programs in the clouds, however I note that setting CS1, reducing 
something from best effort to background, should also be allowed 
universally.

I note that another way to hammer down someone elses (guest machine, 
external router, etc) TOS settings would be to do it in iptables, but to 
do it on a fine grained basis at present would take up to 63 iptables 
rules...

lastly...

The skb->priority field needs some re-thought. In the case of wireless, 
it selects a different tx queue based on magic (see net/wireless/utils.c)

         /* skb->priority values from 256->263 are magic values to
          * directly indicate a specific 802.1d priority.  This is used
          * to allow 802.1d priority to be passed directly in from VLAN
          * tags, etc.
          */
         if (skb->priority >= 256 && skb->priority <= 263)
                 return skb->priority - 256;


classification is an aristotelian rathole!

>> I really hate these patches that offer ways to disable things
>> that normally work, and thus break apps when the non-default
>> is selected.
> Well... the purpose of settings like this is precisely to break functionality
> when the default is not set ;-)
>
>> I kind of have a feeling the kind of situation you're trying to
>> account for, you have some cloud where people run random stuff
>> that you don't control.
> Yes, I have control of the kernel, I have control of root, I have control of
> some daemons that are running on the machine, but I don't really have
> control of the entirety of userspace, some of it I have source code for
> and could audit to guarantee correctness (but I can't really enforce
> that on the users, ultimately they can run any binary),
> and for some of it I don't even have that.  Either way, it's much
> easier to delegate setting policy to
> userspace management daemon(s), and leave enforcing it to the kernel.
> This is just one more such knob.
>
>> But you didn't specify this, and we just have to guess.  Why don't you
>> describe the specific situation where you want to modify this setting?
>> Please do this instead of just talking about what the side effects are
>> inside of the kernel.  That's much less interesting when it comes to
>> patches like this.
> Very well, that's a good point.
>
> Here's an attempt to provide some insight.
>
> I am attempting to allow not-fully-code-audited nor fully trusted apps to run
> in a cgroup containerized environment, with many apps in many
> containers (not 1:1, has hierarchies) on a single kernel.
> The apps are in the believed to not be actively malicious class, but
> very likely to be buggy, or written by ill-advised programmers based
> on wrong/outdated or otherwise incorrect documentation.  I cannot rely
> on unprivileged userspace getting things right.
> I have to have some mechanism to grant these apps permissions to
> utilize specific levels of network fabric priority.  For this I have
> the aforementioned per-cgroup allowed TOS settings.  VLANs are not appropriate
> because a client with high priority net privs is allowed to send a
> request to a server with no special priority permissions.
> (there are further patches to support tcp tos reflection so the server
> can automatically respond with the client's priority)
>
> Multiqueue networking combined with hardware priority queues and xps
> desires to use skb->priority + active cpu for tx queue selection.
> In this particular case TX queue selection should happen based on the
> TOS priority.
> Setting TOS automatically sets sk_priority (and hence skb->priority).
> So all's good, so long as userspace doesn't go and change the
> sk_priority field via SO_PRIORITY and break the mapping.
>
> As a further note:
>
> Some of these apps may be a little more special, a little more
> audited, and a little more trusted.
> Enough so that they might be granted CAP_NET_RAW, but not enough so
> that they can get CAP_NET_ADMIN.
> Hence the general desire for CAP_NET_ADMIN to control general
> machine-global networking state, but not have it control
> per-socket or per-packet settings.  ie. bringing up or down an
> interface affects everyone (hence must be CAP_NET_ADMIN, and much more
> tightly controlled), while spoofing a packet doesn't really negatively
> affect anyone (you can't assume the network is trusted, so there can
> be
> external sources of spoofing or eavesdropping anyway).
>
> ---
>
> I could attempt to publish the vast majority of our internal
> networking code base (there isn't really anything secret in there),
> but it's based on 2.6.34 and even after two years of attempting to
> clean it up and refactor it (along with a rebase from 2.6.26, and all
> while actively continuing development) I'm still not at the point were
> I would consider this to be a particular useful course of action
> (there's a lot of bugfixes of bugfixes of crappy patches in there,
> plus hacks, plus tons of backports from upstream, and tons of code
> which is upstream but slightly differently then we have it internally,
> because we had it first, and pushed v2 upstream, etc...).  Instead I'm
> trying to get the easy hanging fruit out of the way, rebase our
> patches onto probably 3.2 or 3.3, likely sending some more your way
> during the process, and see where that leaves us.  Basically trying to
> reduce the delta.  We will always have internal only patches, but the
> fewer, the less burden for us, hence I'm trying to get the ones I
> believe to be potentially useful externally upstreamed.  Obviously
> whatever patches you don't accept, we'll still keep around locally.
>
> Maciej
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Dave Täht


[-- Attachment #2: dave_taht.vcf --]
[-- Type: text/x-vcard, Size: 214 bytes --]

begin:vcard
fn;quoted-printable:Dave T=C3=A4ht
n;quoted-printable:T=C3=A4ht;Dave
email;internet:dave.taht@gmail.com
tel;home:1-239-829-5608
tel;cell:0638645374
x-mozilla-html:FALSE
version:2.1
end:vcard

     prev parent reply	other threads:[~2011-10-22  9:01 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-21 22:22 [PATCH] net: add sysctl allow_so_priority for SO_PRIORITY setsockopt Maciej Żenczykowski
2011-10-22  4:04 ` David Miller
2011-10-22  6:49   ` Maciej Żenczykowski
2011-10-22  6:58     ` David Miller
2011-10-22  8:27       ` Maciej Żenczykowski
2011-10-22  8:40         ` David Miller
2011-10-22  9:01         ` David Täht [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4EA2865E.2050305@gmail.com \
    --to=dave.taht@gmail.com \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=zenczykowski@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).