Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 1/2] bridge: leave carrier on for empty bridge
From: Stephen Hemminger @ 2011-09-02 22:11 UTC (permalink / raw)
  To: Nicolas de Pesloüan; +Cc: David S. Miller, netdev
In-Reply-To: <4E614CF7.7030700@gmail.com>

On Fri, 02 Sep 2011 23:39:03 +0200
Nicolas de Pesloüan <nicolas.2p.debian@gmail.com> wrote:

> Le 02/09/2011 19:22, Stephen Hemminger a écrit :
> > This resolves a regression seen by some users of bridging.
> > Some users use the bridge like a dummy device.
> > They expect to be able to put an IPv6 address on the device
> > with no ports attached during boot.
> >
> > Note: the bridge still will reflect the state of ports in the
> > bridge if there are any added.
> 
> Doesn't this jeopardize the behavior introduced in 1faa4356a3bd89ea11fb92752d897cff3a20ec0e
> "bridge: control carrier based on ports online"?
> 
> If the user starts the DHCP client before adding the first port to the bridge, the DHCP client will 
> have a carrier and start the autoconfiguration process. This was the old behavior, but you fixed it.
> 
> 	Nicolas.
> 

There is no perfect solution.
If DHCP works then IPv6 breaks?

^ permalink raw reply

* Re: BQL crap and wireless
From: Luis R. Rodriguez @ 2011-09-02 22:03 UTC (permalink / raw)
  To: John W. Linville
  Cc: Jim Gettys, Andrew McGregor, Adrian Chadd, Tom Herbert, Dave Taht,
	linux-wireless, Matt Smith, Kevin Hayes, Derek Smithies,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20110901141304.GA2580-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>

On Thu, Sep 1, 2011 at 7:13 AM, John W. Linville <linville-2XuSBdqkA4SvXiR4WA35Jg@public.gmane.orgm> wrote:
> On Wed, Aug 31, 2011 at 01:50:48PM -0700, Luis R. Rodriguez wrote:
>> On Wed, Aug 31, 2011 at 6:28 AM, Jim Gettys <jg-CC+yJ3UmIYqDUpFQwHEjaQ@public.gmane.org> wrote:
>
>> > such as wireless, or even possibly modern broadband with
>> > PowerBoost, classic RED or similar algorithms that do not take the
>> > buffer drain rate cannot possibly hack it properly.
>>
>> Understood, just curious if anyone has tried a Minstrel approach.
>
> FWIW, eBDP and the related algorithms from Tianji Li's paper are
> philosophically similar to minstrel.

Oh look at that, awesome!!!

>  They depend on measuring recent
> conditions and modifying the current queue length accordingly.
>
>        http://www.hamilton.ie/tianji_li/buffersizing.pdf
>
> The hack I added in debloat-testing is based on my understanding
> of eBDP.  It timestamps the SKBs when they are handed to the driver
> for Tx and then checks the timestamp when the SKB is orphaned.  It is
> a bit crude and is an abuse of the skb_orphan API.

Neat!

>  Also while it
> accounts for the 802.11e queues separately, it doesn't account for
> 802.11n aggregation.

I see..

>  Still, it seems to improve latency w/o hugely
> impacting throughput in at least some environments -- YMMV!

Sweet dude. For aggregation it seems the way to go is to get some
helpers as Andrew has suggested. Andrew, can you elaborate a little on
that? If feasible, then maybe then we can add it to the TODO list
page:

http://wireless.kernel.org/en/developers/todo-list

and when one of us gets to it, we get cranking on it.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: BQL crap and wireless
From: Luis R. Rodriguez @ 2011-09-02 21:59 UTC (permalink / raw)
  To: Jim Gettys
  Cc: Andrew McGregor, Adrian Chadd, Tom Herbert, Dave Taht,
	linux-wireless, Matt Smith, Kevin Hayes, Derek Smithies,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <4E5FA74D.5000705-CC+yJ3UmIYqDUpFQwHEjaQ@public.gmane.org>

BTW Jim, as you may have found out after sending this reply, vger
hates HTML e-mails so your e-mail only got to those on the CC list but
not to the list. I'm replying below, but since I'm now using ASCII
text the look of the e-mail will look like poo, I'll try to trim it
somehow as I did find valuable information you provided there. I also
make some suggestions below.

On Thu, Sep 1, 2011 at 8:39 AM, Jim Gettys <jg-CC+yJ3UmIYqDUpFQwHEjaQ@public.gmane.org> wrote:
> On 08/31/2011 04:50 PM, Luis R. Rodriguez wrote:
>
> On Wed, Aug 31, 2011 at 6:28 AM, Jim Gettys <jg-CC+yJ3UmIYqDUpFQwHEjaQ@public.gmane.org> wrote:
>
> On 08/30/2011 05:47 PM, Andrew McGregor wrote:
>
> On 31/08/2011, at 1:58 AM, Jim Gettys wrote:
>
> On 08/29/2011 11:42 PM, Adrian Chadd wrote:
>
> On 30 August 2011 11:34, Tom Herbert <therbert-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
>
> C(P) is going to be quite variable - a full frame retransmit of a 4ms
> long aggregate frame is SUM(exponential backoff, grab the air,
> preamble, header, 4ms, etc. for each pass.)
>
> It's not clear to me that doing heroic measures to compute the cost is
> going to be worthwhile due to the rate at which the costs can change on
> wireless; just getting into the rough ballpark may be enough. But
> buffering algorithms and AQM algorithms are going to need an estimate of
> the *time* it will take to transmit data, more than # of bytes or packets.
>
> That's not heroic measures; mac80211 needs all the code to calculate these
> times anyway, it's just a matter of collecting together some things we
> already know and calling the right function.
>
>
> Fine; if it's easy, accurate is better (presuming the costs get
> recalculated when circumstances change). We also will need the amount of
> data being transmitted; it is the rate of transmission (the rate at
> which the buffers are draining) that we'll likely need.
>
> Here's what I've gleaned from reading "RED in a different light",  Van
> Jacobson's Mitre talk and several conversations with Kathleen Nichols
> and Van: AQM algorithms that can handle variable bandwidth environments
> will need to be able to know the rate at which buffers empty.  It's the
> direction they are going with their experiments on a "RED light" algorithm.
>
> The fundamental insight as to why classic RED can't work in the wireless
> environment is that the instantaneous queue length has little actual
> information in it.

> Classic RED is tuned using the queue length as its
> basic parameter.  Their belief as to algorithms that will work is that
> the need to keep track of the running queue length *minimum over time*;
> you want to keep the buffers small on a longer term basis (so they both
> can absorb transients, which is their reason for existence, while
> keeping the latency typically low).

More details explained by Jim:

> TCP will fill any buffer just before the bottleneck in a path.  And, of
> course, other traffic transients can help fill the buffers too.  This is a
> natural consequence of it trying to figure out how fast it can run.
>
> Remember, TCP's job is to figure out just how fast it can go, so in its
> congestion avoidance phase, it increases the rate at which it transmits
> slowly (typical congestion avoidance algorithms will add one packet/ack to
> the bottleneck buffer).  This is why in my traces you can find
> at:
> http://gettys.wordpress.com/2010/12/06/whose-house-is-of-glasse-must-not-throw-stones-at-another/
> you see something that sort of looks like a TCP sawtooth, but not: with
> periodicity of order 10 seconds (given the amount of buffering in that
> modem).
>
> What's happening is that as the buffer fills, the acks come back slower and
> slower

Wait, why is that though?

> (so the buffer fills somewhat more slowly) with time. But as they
> fill, not only do you get additional latency, but the buffers become less
> and less effective at doing what they are intended for: smoothing transients
> of higher rate incoming packets.

Is this bufferbloat or TCP not taking into consideration possible
variations in latency?

> So I ended up with 1.2 seconds of latency on my cable modem.

Jeesh.

> Depending on
> how much buffering is in the device, you get different amounts of latency
> under load.  So at the provisioned bandwidth of those tests, my cable modem
> has at least 10x the "classic" BDP amount of buffering.

And for readers BDP is:

http://en.wikipedia.org/wiki/TCP_tuning#Bandwidth-delay_product_.28BDP.29

The max in-flight possible unacknowledged data in transit.

Just curious -- why would they have 10x the amount of BDP??

> Eventually, the bloated buffer does fill, (which might be a megabyte in
> size, btw, on some cable modems), and you get packet loss.

Why is that though?

> But well before this should have occurred, TCP Cubic has decided the path
> may have changed, and goes into a more aggressive search for a new operating
> point (so you see something that starts like a TCP sawtooth, but then
> switches back to something that looks more like slow start). This causes a
> long period bursty oscillation; the bursts are when multiple packet drops
> are occurring (because the TCP is hammering at the cable modem at way too
> high a speed for a short period).

Got it.

> Buffer bloat has therefore managed to destroy congestion avoidance
> entirely.  Sigh...  It also does a number on slow start, as the amount of
> buffering is way more than it should be and it takes several sets of packet
> loss before TCP even starts congestion avoidance.

So an observable trait is we likely can see some packet loss during
the shifts in throughput? Is that accurate?

For 802.11 this could be a lot of things -- some new noise,
interference, etc, but for Ethernet I'm curious what would cause this
assuming we ware blaming the fact that a lot of buffers are queued, is
the issue that the time it takes to flush out buffers surpasses some
timer on the OS to actually transmit the frames?

> On some home routers, the buffering is even more extreme; so whenever your
> bottleneck is in your home router (easy to do in my house as I have chimneys
> and lots of walls) the buffers in the home router become the problem.
>
> So I caught my home router of that era (it was a DIR 825, IIRC) with 8
> seconds of latency one evening.

Shit.

>  The additional major challenge we
> face that core routers do not is the highly variable traffic of mixed
> mice and elephants.  What actually works only time will tell.

Just FYI for 802.11 we have some effort to help us classify management
frames further into different access categories, typically we just
always use AC_VO to transmit management frames but the work undergoing
in 802.11ae Prioritization of Management frames allows us to assign
different ACs to different types of management frames, and the AP can
actually define these rules. People interested in this problem should
look at that and try out the defined default QoS management frame
profile and see if that helps. That is -- something like this might be
good for the bufferbloat experimental work -- but get it upstream,
don't fork ;)

> All this was well understood in the 1990's.  Most of us *thought* the
> problem was solved by AQM algorithms such as RED.  I was noddingly aware of
> AQM in the 1990's as I was working on HTTP in the IETF in that era, and AQM
> and buffer management in core routers was a very hot topic then.
>
> But the buffer management problem wasn't actually solved, and that result
> never got properly published, for the reasons I went into in:
>
> http://gettys.wordpress.com/2010/12/17/red-in-a-different-light/
>
> The field has languished since, and the problems that variable bandwidth
> present went unrecognised.

Then I stated:

> Thank you for elaborating on this, it helps to further dissect the
> issues. When mixing them together it makes the task harder to address.
> As for 802.11 we'd just need to worry about management frames. Dave
> and Andrew had also pointed out issues with multicast and the low
> rates used for them in comparison to unicast transmissions.

And you said:

> What matters for bufferbloat is how much payload gets moved per unit time:
> that is what is accumulating in the OS packet buffers.
>
>
> So in an environment in which the rate of transmission is highly
> variable,

I had interjected with:

> Not only that, remember we have different possible topologies
> depending on the type of 802.11 service used, IBSS, the typical AP-STA
> environment, Mesh, P2P (AP-STA really), and now with 802.11ad we get
> PBSS. What counts here is we may have variable rate of transmission to
> different possible peers, as such the queue length will depend on all
> of the expected transmissions.

and you had continued with:

> such as wireless, or even possibly modern broadband with
> PowerBoost, classic RED or similar algorithms that do not take the
> buffer drain rate cannot possibly hack it properly.

I had said:

> Understood, just curious if anyone has tried a Minstrel approach.
>
> Lets try to document all of this here:
>
> http://wireless.kernel.org/en/developers/bufferbloat

Then you ended with:

> Please do.  Or work on the material in the bufferbloat.net wiki.  I'm now
> too close to the problem to be good at explaining it any better than I try
> to here and in my other writings.  Thomas Jefferson writing is *hard*.
>
> Probably best is that generic descriptions go into the bufferbloat.net wiki,
> and linux specific stuff on kernel.org; but I don't honestly care all that
> much where it is so long as it does get written down, along with the
> importance of Minstrel like algorithms in dealing with the variable
> bandwidth problem (which isn't in any of the material in my blog or
> bufferbloat.net, having not understood the problem there myself at all
> before a month or two ago when Andrew McGregor explained to me in detail
> what Minstrel is and does and why it needs to exist).

Sure, will work on this.

  Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 1/2] bridge: leave carrier on for empty bridge
From: Nicolas de Pesloüan @ 2011-09-02 21:39 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David S. Miller, netdev
In-Reply-To: <20110902172247.396753508@vyatta.com>

Le 02/09/2011 19:22, Stephen Hemminger a écrit :
> This resolves a regression seen by some users of bridging.
> Some users use the bridge like a dummy device.
> They expect to be able to put an IPv6 address on the device
> with no ports attached during boot.
>
> Note: the bridge still will reflect the state of ports in the
> bridge if there are any added.

Doesn't this jeopardize the behavior introduced in 1faa4356a3bd89ea11fb92752d897cff3a20ec0e
"bridge: control carrier based on ports online"?

If the user starts the DHCP client before adding the first port to the bridge, the DHCP client will 
have a carrier and start the autoconfiguration process. This was the old behavior, but you fixed it.

	Nicolas.

^ permalink raw reply

* RE: [RFC, 1/2] ethtool:  Implement private flags interface for ethtool application.
From: Ben Hutchings @ 2011-09-02 21:36 UTC (permalink / raw)
  To: Wyborny, Carolyn
  Cc: davem@davemloft.net, netdev@vger.kernel.org,
	Michał Mirosław
In-Reply-To: <EDC0E76513226749BFBC9C3FB031318F0173FF8650@orsmsx508.amr.corp.intel.com>

On Fri, 2011-09-02 at 14:33 -0700, Wyborny, Carolyn wrote:
> 
> >-----Original Message-----
> >From: Ben Hutchings [mailto:bhutchings@solarflare.com]
> >Sent: Friday, September 02, 2011 2:27 PM
> >To: Wyborny, Carolyn
> >Cc: davem@davemloft.net; netdev@vger.kernel.org
> >Subject: Re: [RFC, 1/2] ethtool: Implement private flags interface for
> >ethtool application.
> >
> >On Fri, 2011-09-02 at 13:50 -0700, Carolyn Wyborny wrote:
> >> This patch completes the user space implementation of the private
> >> flags inteface in ethtool. Using -b/-B options.
> >[...]
> >
> >Private flags are supposed to be named (string set ETH_SS_PRIV_FLAGS).
> >ethtool should only support getting and setting flags by name, not
> >number.  That way people can actually remember what the flags do and
> >their scripts won't break when the driver changes flag numbers.
> >
> >Ben.
> >
> >--
> >Ben Hutchings, Staff Engineer, Solarflare
> >Not speaking for my employer; that's the marketing department's job.
> >They asked us to note that Solarflare product names are trademarked.
> 
> Ok, makes sense.  Do you want a private flags implementation or do you
> agree with Michal on extending ETHTOOL_[GS]FEATURES?

I'm happy to add support for private flags identified by name.

(And I still haven't worked out with Michal how we're going to handle
the new features interface in ethtool. :-/)

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* RE: [RFC, 1/2] ethtool: Implement private flags interface for ethtool application.
From: Wyborny, Carolyn @ 2011-09-02 21:35 UTC (permalink / raw)
  To: Ben Hutchings, Michał Mirosław
  Cc: David Miller, netdev@vger.kernel.org
In-Reply-To: <1314999272.3419.19.camel@bwh-desktop>



>-----Original Message-----
>From: Ben Hutchings [mailto:bhutchings@solarflare.com]
>Sent: Friday, September 02, 2011 2:35 PM
>To: Michał Mirosław
>Cc: Wyborny, Carolyn; David Miller; netdev@vger.kernel.org
>Subject: Re: [RFC, 1/2] ethtool: Implement private flags interface for
>ethtool application.
>
>On Fri, 2011-09-02 at 23:22 +0200, Michał Mirosław wrote:
>> W dniu 2 września 2011 23:17 użytkownik Michał Mirosław
>> <mirqus@gmail.com> napisał:
>> > 2011/9/2 Wyborny, Carolyn <carolyn.wyborny@intel.com>:
>> >>>-----Original Message-----
>> >>>From: David Miller [mailto:davem@davemloft.net]
>> >>>Sent: Friday, September 02, 2011 1:55 PM
>> >>>To: Wyborny, Carolyn
>> >>>Cc: bhutchings@solarflare.com; netdev@vger.kernel.org
>> >>>Subject: Re: [RFC, 1/2] ethtool: Implement private flags interface
>for
>> >>>ethtool application.
>> >>>
>> >>>From: Carolyn Wyborny <carolyn.wyborny@intel.com>
>> >>>Date: Fri,  2 Sep 2011 13:50:30 -0700
>> >>>
>> >>>> This patch completes the user space implementation of the private
>> >>>> flags inteface in ethtool. Using -b/-B options.
>> >>>>
>> >>>> Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
>> >>>
>> >>>The only use case you show here is something generic which other
>> >>>chips have, Energy Efficient Ethernet.
>> >>>
>> >>>Making an attribute private which is present widely amonst various
>> >>>networking chips makes no sense at all.
>> >>>
>> >>>It deserved a generic ethtool flag.
>> >>
>> >> Fair enough on this particular feature, but does that negate the
>implementation suggestion altogether?  I can send an updated feature
>implementation for the use case using DMA Coalescing if that will help.
>> > I would rather see this as an extension to ETHTOOL_[GS]FEATURES. Its
>> > semantics allow easy expanding to handle device-private flags
>without
>> > changing anything on userspace side.
>>
>> BTW, After pending Intel drivers get converted to ndo_set_features and
>> netdev->features get extended to 64 bits, it would also be possible to
>> use some of the unused bits there for device/driver-private flags
>> almost "for free".
>
>I don't really like the idea of mixing common feature flags with
>driver-specific flags.  It's likely to lead to confusion if you mix
>devices with different drivers in a bridge or a bond.
>
>Ben.
>
>--
>Ben Hutchings, Staff Engineer, Solarflare
>Not speaking for my employer; that's the marketing department's job.
>They asked us to note that Solarflare product names are trademarked.

Ok, I'll keep working on it per your previous feedback.

Thanks,

Carolyn


^ permalink raw reply

* Re: [RFC, 1/2] ethtool: Implement private flags interface for ethtool application.
From: Ben Hutchings @ 2011-09-02 21:34 UTC (permalink / raw)
  To: Michał Mirosław
  Cc: Wyborny, Carolyn, David Miller, netdev@vger.kernel.org
In-Reply-To: <CAHXqBF+9iP6tOmtiaXopGvLyBZvpXmLz05aHj1MrQ2RnAiGGLA@mail.gmail.com>

On Fri, 2011-09-02 at 23:22 +0200, Michał Mirosław wrote:
> W dniu 2 września 2011 23:17 użytkownik Michał Mirosław
> <mirqus@gmail.com> napisał:
> > 2011/9/2 Wyborny, Carolyn <carolyn.wyborny@intel.com>:
> >>>-----Original Message-----
> >>>From: David Miller [mailto:davem@davemloft.net]
> >>>Sent: Friday, September 02, 2011 1:55 PM
> >>>To: Wyborny, Carolyn
> >>>Cc: bhutchings@solarflare.com; netdev@vger.kernel.org
> >>>Subject: Re: [RFC, 1/2] ethtool: Implement private flags interface for
> >>>ethtool application.
> >>>
> >>>From: Carolyn Wyborny <carolyn.wyborny@intel.com>
> >>>Date: Fri,  2 Sep 2011 13:50:30 -0700
> >>>
> >>>> This patch completes the user space implementation of the private
> >>>> flags inteface in ethtool. Using -b/-B options.
> >>>>
> >>>> Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
> >>>
> >>>The only use case you show here is something generic which other
> >>>chips have, Energy Efficient Ethernet.
> >>>
> >>>Making an attribute private which is present widely amonst various
> >>>networking chips makes no sense at all.
> >>>
> >>>It deserved a generic ethtool flag.
> >>
> >> Fair enough on this particular feature, but does that negate the implementation suggestion altogether?  I can send an updated feature implementation for the use case using DMA Coalescing if that will help.
> > I would rather see this as an extension to ETHTOOL_[GS]FEATURES. Its
> > semantics allow easy expanding to handle device-private flags without
> > changing anything on userspace side.
> 
> BTW, After pending Intel drivers get converted to ndo_set_features and
> netdev->features get extended to 64 bits, it would also be possible to
> use some of the unused bits there for device/driver-private flags
> almost "for free".

I don't really like the idea of mixing common feature flags with
driver-specific flags.  It's likely to lead to confusion if you mix
devices with different drivers in a bridge or a bond.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* RE: [RFC, 1/2] ethtool:  Implement private flags interface for ethtool application.
From: Wyborny, Carolyn @ 2011-09-02 21:33 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: davem@davemloft.net, netdev@vger.kernel.org
In-Reply-To: <1314998831.3419.12.camel@bwh-desktop>



>-----Original Message-----
>From: Ben Hutchings [mailto:bhutchings@solarflare.com]
>Sent: Friday, September 02, 2011 2:27 PM
>To: Wyborny, Carolyn
>Cc: davem@davemloft.net; netdev@vger.kernel.org
>Subject: Re: [RFC, 1/2] ethtool: Implement private flags interface for
>ethtool application.
>
>On Fri, 2011-09-02 at 13:50 -0700, Carolyn Wyborny wrote:
>> This patch completes the user space implementation of the private
>> flags inteface in ethtool. Using -b/-B options.
>[...]
>
>Private flags are supposed to be named (string set ETH_SS_PRIV_FLAGS).
>ethtool should only support getting and setting flags by name, not
>number.  That way people can actually remember what the flags do and
>their scripts won't break when the driver changes flag numbers.
>
>Ben.
>
>--
>Ben Hutchings, Staff Engineer, Solarflare
>Not speaking for my employer; that's the marketing department's job.
>They asked us to note that Solarflare product names are trademarked.

Ok, makes sense.  Do you want a private flags implementation or do you agree with Michal on extending ETHTOOL_[GS]FEATURES?

Thanks,

Carolyn


^ permalink raw reply

* Re: [RFC, 2/2] igb:  Implementation of ethtool priv_flags for igb driver.
From: Ben Hutchings @ 2011-09-02 21:31 UTC (permalink / raw)
  To: Carolyn Wyborny; +Cc: davem, netdev
In-Reply-To: <1314996631-4773-2-git-send-email-carolyn.wyborny@intel.com>

On Fri, 2011-09-02 at 13:50 -0700, Carolyn Wyborny wrote:
> This patch adds igb driver support for the ethtool private flags
> interface. Two features are initially configured for private flags
> support as an example for use with the implementation in ethtoool
> application.
> 
> Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
> ---
>  drivers/net/ethernet/intel/igb/igb.h         |    2 +
>  drivers/net/ethernet/intel/igb/igb_ethtool.c |   28 ++++++++++++++++++++++++++
>  drivers/net/ethernet/intel/igb/igb_main.c    |    1 +
>  3 files changed, 31 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
> index bb47ed1..a8be3eb 100644
> --- a/drivers/net/ethernet/intel/igb/igb.h
> +++ b/drivers/net/ethernet/intel/igb/igb.h
> @@ -360,6 +360,7 @@ struct igb_adapter {
>  	u32 rss_queues;
>  	u32 wvbr;
>  	int node;
> +	u32 pflags;
>  };
>  
>  #define IGB_FLAG_HAS_MSI           (1 << 0)
> @@ -367,6 +368,7 @@ struct igb_adapter {
>  #define IGB_FLAG_QUAD_PORT_A       (1 << 2)
>  #define IGB_FLAG_QUEUE_PAIRS       (1 << 3)
>  #define IGB_FLAG_DMAC              (1 << 4)
> +#define IGB_FLAG_EEE               (1 << 5)
>  
>  /* DMA Coalescing defines */
>  #define IGB_MIN_TXPBSIZE           20408
> diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c
> index 64fb4ef..3a251c7 100644
> --- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
> +++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
> @@ -696,6 +696,9 @@ static void igb_get_drvinfo(struct net_device *netdev,
>  	drvinfo->testinfo_len = IGB_TEST_LEN;
>  	drvinfo->regdump_len = igb_get_regs_len(netdev);
>  	drvinfo->eedump_len = igb_get_eeprom_len(netdev);
> +#ifdef ETHTOOL_GPFLAGS

That macro is always defined,

> +	drvinfo->n_priv_flags = 2;

This is not enough.  You need to support ETH_SS_PRIV_FLAGS in the
get_sset_count and get_strings operations.

> +#endif
>  }
>  
>  static void igb_get_ringparam(struct net_device *netdev,
> @@ -2171,6 +2174,29 @@ static void igb_get_strings(struct net_device *netdev, u32 stringset, u8 *data)
>  	}
>  }
>  
> +static int igb_set_pflags(struct net_device *netdev, u32 data)
> +{
> +	u32 supported_flags = IGB_FLAG_EEE;
> +	struct igb_adapter *adapter = netdev_priv(netdev);
> +
> +	if (data & supported_flags) {
[...]

If the number of flags is 2, then the supported flags must be numbered
(1 << 0) and (1 << 1).

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* RE: [RFC, 1/2] ethtool: Implement private flags interface for ethtool application.
From: Wyborny, Carolyn @ 2011-09-02 21:29 UTC (permalink / raw)
  To: Michał Mirosław
  Cc: David Miller, bhutchings@solarflare.com, netdev@vger.kernel.org
In-Reply-To: <CAHXqBF+9iP6tOmtiaXopGvLyBZvpXmLz05aHj1MrQ2RnAiGGLA@mail.gmail.com>



>-----Original Message-----
>From: Michał Mirosław [mailto:mirqus@gmail.com]
>Sent: Friday, September 02, 2011 2:22 PM
>To: Wyborny, Carolyn
>Cc: David Miller; bhutchings@solarflare.com; netdev@vger.kernel.org
>Subject: Re: [RFC, 1/2] ethtool: Implement private flags interface for
>ethtool application.
>
>W dniu 2 września 2011 23:17 użytkownik Michał Mirosław
><mirqus@gmail.com> napisał:
>> 2011/9/2 Wyborny, Carolyn <carolyn.wyborny@intel.com>:
>>>>-----Original Message-----
>>>>From: David Miller [mailto:davem@davemloft.net]
>>>>Sent: Friday, September 02, 2011 1:55 PM
>>>>To: Wyborny, Carolyn
>>>>Cc: bhutchings@solarflare.com; netdev@vger.kernel.org
>>>>Subject: Re: [RFC, 1/2] ethtool: Implement private flags interface
>for
>>>>ethtool application.
>>>>
>>>>From: Carolyn Wyborny <carolyn.wyborny@intel.com>
>>>>Date: Fri,  2 Sep 2011 13:50:30 -0700
>>>>
>>>>> This patch completes the user space implementation of the private
>>>>> flags inteface in ethtool. Using -b/-B options.
>>>>>
>>>>> Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
>>>>
>>>>The only use case you show here is something generic which other
>>>>chips have, Energy Efficient Ethernet.
>>>>
>>>>Making an attribute private which is present widely amonst various
>>>>networking chips makes no sense at all.
>>>>
>>>>It deserved a generic ethtool flag.
>>>
>>> Fair enough on this particular feature, but does that negate the
>implementation suggestion altogether?  I can send an updated feature
>implementation for the use case using DMA Coalescing if that will help.
>> I would rather see this as an extension to ETHTOOL_[GS]FEATURES. Its
>> semantics allow easy expanding to handle device-private flags without
>> changing anything on userspace side.
>
>BTW, After pending Intel drivers get converted to ndo_set_features and
>netdev->features get extended to 64 bits, it would also be possible to
>use some of the unused bits there for device/driver-private flags
>almost "for free".
>
>Best Regards,
>Michał Mirosław

That seems reasonable as then there would be room for them, but is it going to be OK to use those unused bits or are they going to be intended for generic features and not device specific ones?  

Is the intent then to not ever use the priv_flags partial implementation?  

Thanks,

Carolyn

Carolyn Wyborny
Linux Development
LAN Access Division
Intel Corporation

^ permalink raw reply

* Re: [RFC, 1/2] ethtool:  Implement private flags interface for ethtool application.
From: Ben Hutchings @ 2011-09-02 21:27 UTC (permalink / raw)
  To: Carolyn Wyborny; +Cc: davem, netdev
In-Reply-To: <1314996631-4773-1-git-send-email-carolyn.wyborny@intel.com>

On Fri, 2011-09-02 at 13:50 -0700, Carolyn Wyborny wrote:
> This patch completes the user space implementation of the private
> flags inteface in ethtool. Using -b/-B options.
[...]

Private flags are supposed to be named (string set ETH_SS_PRIV_FLAGS).
ethtool should only support getting and setting flags by name, not
number.  That way people can actually remember what the flags do and
their scripts won't break when the driver changes flag numbers.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [RFC, 1/2] ethtool: Implement private flags interface for ethtool application.
From: Ben Hutchings @ 2011-09-02 21:23 UTC (permalink / raw)
  To: David Miller; +Cc: carolyn.wyborny, netdev
In-Reply-To: <20110902.165524.1076389492712310664.davem@davemloft.net>

On Fri, 2011-09-02 at 16:55 -0400, David Miller wrote:
> From: Carolyn Wyborny <carolyn.wyborny@intel.com>
> Date: Fri,  2 Sep 2011 13:50:30 -0700
> 
> > This patch completes the user space implementation of the private
> > flags inteface in ethtool. Using -b/-B options.
> > 
> > Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
> 
> The only use case you show here is something generic which other
> chips have, Energy Efficient Ethernet.
> 
> Making an attribute private which is present widely amonst various
> networking chips makes no sense at all.
> 
> It deserved a generic ethtool flag.

I agree.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [RFC, 1/2] ethtool: Implement private flags interface for ethtool application.
From: Michał Mirosław @ 2011-09-02 21:22 UTC (permalink / raw)
  To: Wyborny, Carolyn
  Cc: David Miller, bhutchings@solarflare.com, netdev@vger.kernel.org
In-Reply-To: <CAHXqBFKU2m6s1fQzZpJRvvR2HatJpXjWXv84=ZYD5E6OPFfUmQ@mail.gmail.com>

W dniu 2 września 2011 23:17 użytkownik Michał Mirosław
<mirqus@gmail.com> napisał:
> 2011/9/2 Wyborny, Carolyn <carolyn.wyborny@intel.com>:
>>>-----Original Message-----
>>>From: David Miller [mailto:davem@davemloft.net]
>>>Sent: Friday, September 02, 2011 1:55 PM
>>>To: Wyborny, Carolyn
>>>Cc: bhutchings@solarflare.com; netdev@vger.kernel.org
>>>Subject: Re: [RFC, 1/2] ethtool: Implement private flags interface for
>>>ethtool application.
>>>
>>>From: Carolyn Wyborny <carolyn.wyborny@intel.com>
>>>Date: Fri,  2 Sep 2011 13:50:30 -0700
>>>
>>>> This patch completes the user space implementation of the private
>>>> flags inteface in ethtool. Using -b/-B options.
>>>>
>>>> Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
>>>
>>>The only use case you show here is something generic which other
>>>chips have, Energy Efficient Ethernet.
>>>
>>>Making an attribute private which is present widely amonst various
>>>networking chips makes no sense at all.
>>>
>>>It deserved a generic ethtool flag.
>>
>> Fair enough on this particular feature, but does that negate the implementation suggestion altogether?  I can send an updated feature implementation for the use case using DMA Coalescing if that will help.
> I would rather see this as an extension to ETHTOOL_[GS]FEATURES. Its
> semantics allow easy expanding to handle device-private flags without
> changing anything on userspace side.

BTW, After pending Intel drivers get converted to ndo_set_features and
netdev->features get extended to 64 bits, it would also be possible to
use some of the unused bits there for device/driver-private flags
almost "for free".

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [RFC, 1/2] ethtool: Implement private flags interface for ethtool application.
From: Michał Mirosław @ 2011-09-02 21:17 UTC (permalink / raw)
  To: Wyborny, Carolyn
  Cc: David Miller, bhutchings@solarflare.com, netdev@vger.kernel.org
In-Reply-To: <EDC0E76513226749BFBC9C3FB031318F0173FF8607@orsmsx508.amr.corp.intel.com>

2011/9/2 Wyborny, Carolyn <carolyn.wyborny@intel.com>:
>>-----Original Message-----
>>From: David Miller [mailto:davem@davemloft.net]
>>Sent: Friday, September 02, 2011 1:55 PM
>>To: Wyborny, Carolyn
>>Cc: bhutchings@solarflare.com; netdev@vger.kernel.org
>>Subject: Re: [RFC, 1/2] ethtool: Implement private flags interface for
>>ethtool application.
>>
>>From: Carolyn Wyborny <carolyn.wyborny@intel.com>
>>Date: Fri,  2 Sep 2011 13:50:30 -0700
>>
>>> This patch completes the user space implementation of the private
>>> flags inteface in ethtool. Using -b/-B options.
>>>
>>> Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
>>
>>The only use case you show here is something generic which other
>>chips have, Energy Efficient Ethernet.
>>
>>Making an attribute private which is present widely amonst various
>>networking chips makes no sense at all.
>>
>>It deserved a generic ethtool flag.
>
> Fair enough on this particular feature, but does that negate the implementation suggestion altogether?  I can send an updated feature implementation for the use case using DMA Coalescing if that will help.

I would rather see this as an extension to ETHTOOL_[GS]FEATURES. Its
semantics allow easy expanding to handle device-private flags without
changing anything on userspace side.

Best Regards,
Michał Mirosław

^ permalink raw reply

* RE: [RFC, 1/2] ethtool: Implement private flags interface for ethtool application.
From: Wyborny, Carolyn @ 2011-09-02 21:11 UTC (permalink / raw)
  To: David Miller; +Cc: bhutchings@solarflare.com, netdev@vger.kernel.org
In-Reply-To: <20110902.165524.1076389492712310664.davem@davemloft.net>



>-----Original Message-----
>From: David Miller [mailto:davem@davemloft.net]
>Sent: Friday, September 02, 2011 1:55 PM
>To: Wyborny, Carolyn
>Cc: bhutchings@solarflare.com; netdev@vger.kernel.org
>Subject: Re: [RFC, 1/2] ethtool: Implement private flags interface for
>ethtool application.
>
>From: Carolyn Wyborny <carolyn.wyborny@intel.com>
>Date: Fri,  2 Sep 2011 13:50:30 -0700
>
>> This patch completes the user space implementation of the private
>> flags inteface in ethtool. Using -b/-B options.
>>
>> Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
>
>The only use case you show here is something generic which other
>chips have, Energy Efficient Ethernet.
>
>Making an attribute private which is present widely amonst various
>networking chips makes no sense at all.
>
>It deserved a generic ethtool flag.

Fair enough on this particular feature, but does that negate the implementation suggestion altogether?  I can send an updated feature implementation for the use case using DMA Coalescing if that will help.

Thanks,

Carolyn

Carolyn Wyborny
Linux Development
LAN Access Division
Intel Corporation

^ permalink raw reply

* Re: [RFC, 1/2] ethtool: Implement private flags interface for ethtool application.
From: David Miller @ 2011-09-02 20:55 UTC (permalink / raw)
  To: carolyn.wyborny; +Cc: bhutchings, netdev
In-Reply-To: <1314996631-4773-1-git-send-email-carolyn.wyborny@intel.com>

From: Carolyn Wyborny <carolyn.wyborny@intel.com>
Date: Fri,  2 Sep 2011 13:50:30 -0700

> This patch completes the user space implementation of the private
> flags inteface in ethtool. Using -b/-B options.
> 
> Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>

The only use case you show here is something generic which other
chips have, Energy Efficient Ethernet.

Making an attribute private which is present widely amonst various
networking chips makes no sense at all.

It deserved a generic ethtool flag.

^ permalink raw reply

* [RFC, 2/2] igb:  Implementation of ethtool priv_flags for igb driver.
From: Carolyn Wyborny @ 2011-09-02 20:50 UTC (permalink / raw)
  To: bhutchings, davem; +Cc: netdev
In-Reply-To: <1314996631-4773-1-git-send-email-carolyn.wyborny@intel.com>

This patch adds igb driver support for the ethtool private flags
interface. Two features are initially configured for private flags
support as an example for use with the implementation in ethtoool
application.

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
---
 drivers/net/ethernet/intel/igb/igb.h         |    2 +
 drivers/net/ethernet/intel/igb/igb_ethtool.c |   28 ++++++++++++++++++++++++++
 drivers/net/ethernet/intel/igb/igb_main.c    |    1 +
 3 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb.h b/drivers/net/ethernet/intel/igb/igb.h
index bb47ed1..a8be3eb 100644
--- a/drivers/net/ethernet/intel/igb/igb.h
+++ b/drivers/net/ethernet/intel/igb/igb.h
@@ -360,6 +360,7 @@ struct igb_adapter {
 	u32 rss_queues;
 	u32 wvbr;
 	int node;
+	u32 pflags;
 };
 
 #define IGB_FLAG_HAS_MSI           (1 << 0)
@@ -367,6 +368,7 @@ struct igb_adapter {
 #define IGB_FLAG_QUAD_PORT_A       (1 << 2)
 #define IGB_FLAG_QUEUE_PAIRS       (1 << 3)
 #define IGB_FLAG_DMAC              (1 << 4)
+#define IGB_FLAG_EEE               (1 << 5)
 
 /* DMA Coalescing defines */
 #define IGB_MIN_TXPBSIZE           20408
diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c
index 64fb4ef..3a251c7 100644
--- a/drivers/net/ethernet/intel/igb/igb_ethtool.c
+++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c
@@ -696,6 +696,9 @@ static void igb_get_drvinfo(struct net_device *netdev,
 	drvinfo->testinfo_len = IGB_TEST_LEN;
 	drvinfo->regdump_len = igb_get_regs_len(netdev);
 	drvinfo->eedump_len = igb_get_eeprom_len(netdev);
+#ifdef ETHTOOL_GPFLAGS
+	drvinfo->n_priv_flags = 2;
+#endif
 }
 
 static void igb_get_ringparam(struct net_device *netdev,
@@ -2171,6 +2174,29 @@ static void igb_get_strings(struct net_device *netdev, u32 stringset, u8 *data)
 	}
 }
 
+static int igb_set_pflags(struct net_device *netdev, u32 data)
+{
+	u32 supported_flags = IGB_FLAG_EEE;
+	struct igb_adapter *adapter = netdev_priv(netdev);
+
+	if (data & supported_flags) {
+		adapter->pflags = data;
+	} else {
+		printk(KERN_INFO, "set_pflags:flag not supported..");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static u32 igb_get_pflags(struct net_device *netdev)
+{
+	struct igb_adapter *adapter = netdev_priv(netdev);
+
+	return adapter->pflags;
+
+}
+
 static const struct ethtool_ops igb_ethtool_ops = {
 	.get_settings           = igb_get_settings,
 	.set_settings           = igb_set_settings,
@@ -2197,6 +2223,8 @@ static const struct ethtool_ops igb_ethtool_ops = {
 	.get_ethtool_stats      = igb_get_ethtool_stats,
 	.get_coalesce           = igb_get_coalesce,
 	.set_coalesce           = igb_set_coalesce,
+	.get_priv_flags         = igb_get_pflags,
+	.set_priv_flags         = igb_set_pflags,
 };
 
 void igb_set_ethtool_ops(struct net_device *netdev)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 289861c..a534f32 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2175,6 +2175,7 @@ static int __devinit igb_probe(struct pci_dev *pdev,
 	switch (hw->mac.type) {
 	case e1000_i350:
 		igb_set_eee_i350(hw);
+		adapter->pflags |= IGB_FLAG_EEE;
 		break;
 	default:
 		break;
-- 
1.7.4.4

^ permalink raw reply related

* [RFC, 1/2] ethtool:  Implement private flags interface for ethtool application.
From: Carolyn Wyborny @ 2011-09-02 20:50 UTC (permalink / raw)
  To: bhutchings, davem; +Cc: netdev

This patch completes the user space implementation of the private
flags inteface in ethtool. Using -b/-B options.

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
---
 ethtool.8.in |   16 +++++++++
 ethtool.c    |   98 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 113 insertions(+), 1 deletions(-)

diff --git a/ethtool.8.in b/ethtool.8.in
index 7a0bd43..ebcb233 100644
--- a/ethtool.8.in
+++ b/ethtool.8.in
@@ -134,6 +134,13 @@ ethtool \- query or control network driver and hardware settings
 .B2 rx on off
 .B2 tx on off
 .HP
+.B ethtool \-b|\-\-show\-priv\-flags
+.I ethX
+.HP
+.B ethtool \-B|\-\-set\-priv\-flags
+.I ethX
+.BN bit
+.HP
 .B ethtool \-c|\-\-show\-coalesce
 .I ethX
 .HP
@@ -342,6 +349,15 @@ Specifies whether RX pause should be enabled.
 .A2 tx on off
 Specifies whether TX pause should be enabled.
 .TP
+.B \-b \-\-show\-priv_flags
+Queries the specified network device for private flags set.
+.TP
+.B \-B \-\-set\-priv_flags
+Changes the private flags of the specified network device.
+.TP
+.BI bit \ N
+Sets the specific bit number in the private flagsg field.
+.TP
 .B \-c \-\-show\-coalesce
 Queries the specified network device for coalescing information.
 .TP
diff --git a/ethtool.c b/ethtool.c
index 943dfb7..4702d04 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -99,6 +99,8 @@ static int do_flash(int fd, struct ifreq *ifr);
 static int do_permaddr(int fd, struct ifreq *ifr);
 static int do_getfwdump(int fd, struct ifreq *ifr);
 static int do_setfwdump(int fd, struct ifreq *ifr);
+static int do_gpflags(int fd, struct ifreq *ifr);
+static int do_spflags(int fd, struct ifreq *ifr);
 
 static int send_ioctl(int fd, struct ifreq *ifr);
 
@@ -133,6 +135,8 @@ static enum {
 	MODE_PERMADDR,
 	MODE_SET_DUMP,
 	MODE_GET_DUMP,
+	MODE_GET_PFLAGS,
+	MODE_SET_PFLAGS,
 } mode = MODE_GSET;
 
 static struct option {
@@ -157,6 +161,9 @@ static struct option {
       "		[ autoneg on|off ]\n"
       "		[ rx on|off ]\n"
       "		[ tx on|off ]\n" },
+    { "-b", "--show-pflags", MODE_GET_PFLAGS, "Show private flags set." },
+    { "-B", "--set-pflags", MODE_SET_PFLAGS, "Set private flags.",
+		"		[value N | bit N]\n" },
     { "-c", "--show-coalesce", MODE_GCOALESCE, "Show coalesce options" },
     { "-C", "--coalesce", MODE_SCOALESCE, "Set coalesce options",
 		"		[adaptive-rx on|off]\n"
@@ -405,6 +412,11 @@ static int rx_class_rule_del = -1;
 static int rx_class_rule_added = 0;
 static struct ethtool_rx_flow_spec rx_rule_fs;
 
+static u32 priv_flags;
+static int priv_flags_changed;
+static u32 priv_flags_val_wanted;
+static u8 priv_flags_bit_wanted;
+
 static enum {
 	ONLINE=0,
 	OFFLINE,
@@ -552,6 +564,10 @@ static struct cmdline_info cmdline_msglvl[] = {
 	  NETIF_MSG_WOL, &msglvl_mask },
 };
 
+static struct cmdline_info cmdline_pflags[] = {
+	{ "value", CMDL_U32, &priv_flags_val_wanted, &priv_flags },
+};
+
 static long long
 get_int_range(char *str, int base, long long min, long long max)
 {
@@ -800,7 +816,9 @@ static void parse_cmdline(int argc, char **argp)
 			    (mode == MODE_FLASHDEV) ||
 			    (mode == MODE_PERMADDR) ||
 			    (mode == MODE_SET_DUMP) ||
-			    (mode == MODE_GET_DUMP)) {
+			    (mode == MODE_GET_DUMP) ||
+			    (mode == MODE_GET_PFLAGS) ||
+			    (mode == MODE_SET_PFLAGS)) {
 				devname = argp[i];
 				break;
 			}
@@ -884,6 +902,28 @@ static void parse_cmdline(int argc, char **argp)
 				i = argc;
 				break;
 			}
+			if (mode == MODE_SET_PFLAGS) {
+				if (!strcmp(argp[i], "value")) {
+					priv_flags_changed = 1;
+					i++;
+					if (i >= argc)
+						exit_bad_args();
+					priv_flags_val_wanted =
+						get_u32(argp[i], 0);
+				} else if (!strcmp(argp[i], "bit")) {
+					priv_flags_changed = 1;
+					i++;
+					if (i >= argc)
+						exit_bad_args();
+					priv_flags_bit_wanted =
+						get_int(argp[i], 0);
+					priv_flags_val_wanted =
+						(1 >> priv_flags_bit_wanted);
+				} else
+					exit_bad_args();
+				i = argc;
+				break;
+			}
 			if (mode == MODE_SCLSRULE) {
 				if (!strcmp(argp[i], "flow-type")) {
 					i += 1;
@@ -1957,6 +1997,10 @@ static int doit(void)
 		return do_getfwdump(fd, &ifr);
 	} else if (mode == MODE_SET_DUMP) {
 		return do_setfwdump(fd, &ifr);
+	} else if (mode == MODE_GET_PFLAGS) {
+		return do_gpflags(fd, &ifr);
+	} else if (mode == MODE_SET_PFLAGS) {
+		return do_spflags(fd, &ifr);
 	}
 
 	return 69;
@@ -3346,6 +3390,58 @@ static int do_setfwdump(int fd, struct ifreq *ifr)
 	return 0;
 }
 
+static int do_gpflags(int fd, struct ifreq *ifr)
+{
+	int err;
+	struct ethtool_value eval;
+
+	eval.cmd = ETHTOOL_GPFLAGS;
+	ifr->ifr_data = (caddr_t)&eval;
+	err = send_ioctl(fd, ifr);
+	if (err) {
+		perror("Cannot get device private flags");
+	} else {
+		fprintf(stdout, "	Current flags set: 0x%08x (%d)\n"
+			"			       ",
+			eval.data, eval.data);
+		print_flags(cmdline_pflags, ARRAY_SIZE(cmdline_pflags),
+			    eval.data);
+		fprintf(stdout, "\n");
+	}
+	return err;
+}
+
+static int do_spflags(int fd, struct ifreq *ifr)
+{
+	struct ethtool_value eval;
+	int err, changed = 0;
+	eval.cmd = ETHTOOL_GPFLAGS;
+	ifr->ifr_data = (caddr_t)&eval;
+	err = send_ioctl(fd, ifr);
+	if (err) {
+		perror("Cannot get device private flags");
+		return err;
+	}
+
+	priv_flags = eval.data;
+	do_generic_set(cmdline_pflags, ARRAY_SIZE(cmdline_pflags),
+		&changed);
+	if (!changed) {
+		fprintf(stderr, "No private flags have changed, aborting\n");
+			return 1;
+	} else {
+		eval.cmd = ETHTOOL_SPFLAGS;
+		eval.data = priv_flags_val_wanted;
+		ifr->ifr_data = (caddr_t)&eval;
+		err = send_ioctl(fd, ifr);
+		if (err) {
+			perror("Cannot set device private flags.\n");
+			return err;
+		}
+		return 0;
+	}
+}
+
 static int send_ioctl(int fd, struct ifreq *ifr)
 {
 	return ioctl(fd, SIOCETHTOOL, ifr);
-- 
1.7.4.4

^ permalink raw reply related

* Re: [patch net-next-2.6 v2] net: consolidate and fix ethtool_ops->get_settings calling
From: Jiri Pirko @ 2011-09-02 20:42 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: netdev, ralf, fubar, andy, kaber, bprakash, JBottomley,
	robert.w.love, davem, shemminger, decot, mirq-linux,
	alexander.h.duyck, amit.salecha, eric.dumazet, therbert, paulmck,
	laijs, xiaosuo, greearb, loke.chetan, linux-mips, linux-scsi,
	devel, bridge
In-Reply-To: <1314989161.3419.5.camel@bwh-desktop>

Fri, Sep 02, 2011 at 08:46:01PM CEST, bhutchings@solarflare.com wrote:
>On Fri, 2011-09-02 at 14:26 +0200, Jiri Pirko wrote:
>> This patch does several things:
>> - introduces __ethtool_get_settings which is called from ethtool code and
>>   from dev_ethtool_get_settings() as well.
>> - dev_ethtool_get_settings() becomes rtnl wrapper for
>>   __ethtool_get_settings()
>[...]
>
>I don't like this locking change.  Most other dev_*() functions require
>the caller to hold RTNL, and it will break any OOT module calling
>dev_ethtool_get_settings() without producing any warning at compile
>time.  Why not put an ASSERT_RTNL() in it instead?

Hmm. Okay, then I would remove dev_ethtool_get_settings() from
net/core/dev.c and only put __ethtool_get_settings() to
net/core/ethtool.c. Makes more sense to me to have it there...
ASSERT_RTNL woudl be good there as well.

>
>The rest of this looks fine.
>
>Ben. 
>
>-- 
>Ben Hutchings, Staff Engineer, Solarflare
>Not speaking for my employer; that's the marketing department's job.
>They asked us to note that Solarflare product names are trademarked.
>

^ permalink raw reply

* [PATCH 15/15] make kernel/signal.c user ns safe (v2)
From: Serge Hallyn @ 2011-09-02 19:56 UTC (permalink / raw)
  To: akpm, segooon, linux-kernel, netdev, containers, dhowells,
	ebiederm, rdunlap
  Cc: Serge Hallyn
In-Reply-To: <1314993400-6910-1-git-send-email-serge@hallyn.com>

From: Serge Hallyn <serge.hallyn@canonical.com>

Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
---
 kernel/signal.c |   26 ++++++++++++++++++++++----
 1 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/kernel/signal.c b/kernel/signal.c
index 291c970..c07b970 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -27,6 +27,7 @@
 #include <linux/capability.h>
 #include <linux/freezer.h>
 #include <linux/pid_namespace.h>
+#include <linux/user_namespace.h>
 #include <linux/nsproxy.h>
 #define CREATE_TRACE_POINTS
 #include <trace/events/signal.h>
@@ -1073,7 +1074,8 @@ static int __send_signal(int sig, struct siginfo *info, struct task_struct *t,
 			q->info.si_code = SI_USER;
 			q->info.si_pid = task_tgid_nr_ns(current,
 							task_active_pid_ns(t));
-			q->info.si_uid = current_uid();
+			q->info.si_uid = user_ns_map_uid(task_cred_xxx(t, user_ns),
+					current_cred(), current_uid());
 			break;
 		case (unsigned long) SEND_SIG_PRIV:
 			q->info.si_signo = sig;
@@ -1363,6 +1365,11 @@ int kill_pid_info_as_uid(int sig, struct siginfo *info, struct pid *pid,
 		goto out_unlock;
 	}
 	pcred = __task_cred(p);
+	/*
+	 * this is called (only) from drivers/usb/core/devio.c.
+	 * Do we need to add user_ns to urb and usb_device, so
+	 * we can pass them in here?
+	 */
 	if (si_fromuser(info) &&
 	    euid != pcred->suid && euid != pcred->uid &&
 	    uid  != pcred->suid && uid  != pcred->uid) {
@@ -1618,7 +1625,8 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
 	 */
 	rcu_read_lock();
 	info.si_pid = task_pid_nr_ns(tsk, tsk->parent->nsproxy->pid_ns);
-	info.si_uid = __task_cred(tsk)->uid;
+	info.si_uid = user_ns_map_uid(task_cred_xxx(tsk->parent, user_ns),
+				      __task_cred(tsk), __task_cred(tsk)->uid);
 	rcu_read_unlock();
 
 	info.si_utime = cputime_to_clock_t(cputime_add(tsk->utime,
@@ -1688,6 +1696,7 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	unsigned long flags;
 	struct task_struct *parent;
 	struct sighand_struct *sighand;
+	const struct cred *cred;
 
 	if (for_ptracer) {
 		parent = tsk->parent;
@@ -1703,7 +1712,9 @@ static void do_notify_parent_cldstop(struct task_struct *tsk,
 	 */
 	rcu_read_lock();
 	info.si_pid = task_pid_nr_ns(tsk, parent->nsproxy->pid_ns);
-	info.si_uid = __task_cred(tsk)->uid;
+	cred = __task_cred(tsk);
+	info.si_uid = user_ns_map_uid(task_cred_xxx(parent, user_ns),
+				cred, cred->uid);
 	rcu_read_unlock();
 
 	info.si_utime = cputime_to_clock_t(tsk->utime);
@@ -2122,7 +2133,10 @@ static int ptrace_signal(int signr, siginfo_t *info,
 		info->si_errno = 0;
 		info->si_code = SI_USER;
 		info->si_pid = task_pid_vnr(current->parent);
-		info->si_uid = task_uid(current->parent);
+		/* we can cache cred if this performs poorly */
+		info->si_uid = user_ns_map_uid(current_user_ns(),
+			__task_cred(current->parent),
+			task_uid(current->parent));
 	}
 
 	/* If the (new) signal is now blocked, requeue it.  */
@@ -2552,6 +2566,10 @@ SYSCALL_DEFINE2(rt_sigpending, sigset_t __user *, set, size_t, sigsetsize)
 
 #ifndef HAVE_ARCH_COPY_SIGINFO_TO_USER
 
+/*
+ * send_signal has converted the sender's uid to the receiver
+ * task user namespace, so no need to convert here
+ */
 int copy_siginfo_to_user(siginfo_t __user *to, siginfo_t *from)
 {
 	int err;
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH 14/15] net: pass user_ns to cap_netlink_recv()
From: Serge Hallyn @ 2011-09-02 19:56 UTC (permalink / raw)
  To: akpm, segooon, linux-kernel, netdev, containers, dhowells,
	ebiederm, rdunlap
  Cc: Serge E. Hallyn
In-Reply-To: <1314993400-6910-1-git-send-email-serge@hallyn.com>

From: "Serge E. Hallyn" <serge.hallyn@canonical.com>

and make cap_netlink_recv() userns-aware

cap_netlink_recv() was granting privilege if a capability is in
current_cap(), regardless of the user namespace.  Fix that by
targeting the capability check against the user namespace which
owns the skb.

Because sock_net is static inline defined in net/sock.h, which we
don't want to #include at the cap_netlink_recv function (commoncap.h).

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 drivers/scsi/scsi_netlink.c     |    3 ++-
 include/linux/security.h        |   14 +++++++++-----
 kernel/audit.c                  |    6 ++++--
 net/core/rtnetlink.c            |    3 ++-
 net/decnet/netfilter/dn_rtmsg.c |    3 ++-
 net/ipv4/netfilter/ip_queue.c   |    3 ++-
 net/ipv6/netfilter/ip6_queue.c  |    3 ++-
 net/netfilter/nfnetlink.c       |    2 +-
 net/netlink/genetlink.c         |    2 +-
 net/xfrm/xfrm_user.c            |    2 +-
 security/commoncap.c            |    6 ++----
 security/security.c             |    4 ++--
 security/selinux/hooks.c        |    5 +++--
 13 files changed, 33 insertions(+), 23 deletions(-)

diff --git a/drivers/scsi/scsi_netlink.c b/drivers/scsi/scsi_netlink.c
index 26a8a45..0aa2e57 100644
--- a/drivers/scsi/scsi_netlink.c
+++ b/drivers/scsi/scsi_netlink.c
@@ -111,7 +111,8 @@ scsi_nl_rcv_msg(struct sk_buff *skb)
 			goto next_msg;
 		}
 
-		if (security_netlink_recv(skb, CAP_SYS_ADMIN)) {
+		if (security_netlink_recv(skb, CAP_SYS_ADMIN,
+					  sock_net(skb->sk)->user_ns)) {
 			err = -EPERM;
 			goto next_msg;
 		}
diff --git a/include/linux/security.h b/include/linux/security.h
index ebd2a53..cfa1f47 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -95,7 +95,8 @@ struct xfrm_user_sec_ctx;
 struct seq_file;
 
 extern int cap_netlink_send(struct sock *sk, struct sk_buff *skb);
-extern int cap_netlink_recv(struct sk_buff *skb, int cap);
+extern int cap_netlink_recv(struct sk_buff *skb, int cap,
+			    struct user_namespace *ns);
 
 void reset_security_ops(void);
 
@@ -797,6 +798,7 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts)
  *	@skb.
  *	@skb contains the sk_buff structure for the netlink message.
  *	@cap indicates the capability required
+ *	@ns is the user namespace which owns skb
  *	Return 0 if permission is granted.
  *
  * Security hooks for Unix domain networking.
@@ -1557,7 +1559,8 @@ struct security_operations {
 			  struct sembuf *sops, unsigned nsops, int alter);
 
 	int (*netlink_send) (struct sock *sk, struct sk_buff *skb);
-	int (*netlink_recv) (struct sk_buff *skb, int cap);
+	int (*netlink_recv) (struct sk_buff *skb, int cap,
+			     struct user_namespace *ns);
 
 	void (*d_instantiate) (struct dentry *dentry, struct inode *inode);
 
@@ -1806,7 +1809,7 @@ void security_d_instantiate(struct dentry *dentry, struct inode *inode);
 int security_getprocattr(struct task_struct *p, char *name, char **value);
 int security_setprocattr(struct task_struct *p, char *name, void *value, size_t size);
 int security_netlink_send(struct sock *sk, struct sk_buff *skb);
-int security_netlink_recv(struct sk_buff *skb, int cap);
+int security_netlink_recv(struct sk_buff *skb, int cap, struct user_namespace *ns);
 int security_secid_to_secctx(u32 secid, char **secdata, u32 *seclen);
 int security_secctx_to_secid(const char *secdata, u32 seclen, u32 *secid);
 void security_release_secctx(char *secdata, u32 seclen);
@@ -2498,9 +2501,10 @@ static inline int security_netlink_send(struct sock *sk, struct sk_buff *skb)
 	return cap_netlink_send(sk, skb);
 }
 
-static inline int security_netlink_recv(struct sk_buff *skb, int cap)
+static inline int security_netlink_recv(struct sk_buff *skb, int cap,
+					struct user_namespace *ns)
 {
-	return cap_netlink_recv(skb, cap);
+	return cap_netlink_recv(skb, cap, ns);
 }
 
 static inline int security_secid_to_secctx(u32 secid, char **secdata, u32 *seclen)
diff --git a/kernel/audit.c b/kernel/audit.c
index 0a1355c..48144c4 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -601,13 +601,15 @@ static int audit_netlink_ok(struct sk_buff *skb, u16 msg_type)
 	case AUDIT_TTY_SET:
 	case AUDIT_TRIM:
 	case AUDIT_MAKE_EQUIV:
-		if (security_netlink_recv(skb, CAP_AUDIT_CONTROL))
+		if (security_netlink_recv(skb, CAP_AUDIT_CONTROL,
+					  sock_net(skb->sk)->user_ns))
 			err = -EPERM;
 		break;
 	case AUDIT_USER:
 	case AUDIT_FIRST_USER_MSG ... AUDIT_LAST_USER_MSG:
 	case AUDIT_FIRST_USER_MSG2 ... AUDIT_LAST_USER_MSG2:
-		if (security_netlink_recv(skb, CAP_AUDIT_WRITE))
+		if (security_netlink_recv(skb, CAP_AUDIT_WRITE,
+					  sock_net(skb->sk)->user_ns))
 			err = -EPERM;
 		break;
 	default:  /* bad msg */
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 99d9e95..4a444de 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1931,7 +1931,8 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 	sz_idx = type>>2;
 	kind = type&3;
 
-	if (kind != 2 && security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (kind != 2 && security_netlink_recv(skb, CAP_NET_ADMIN,
+					       net->user_ns))
 		return -EPERM;
 
 	if (kind == 2 && nlh->nlmsg_flags&NLM_F_DUMP) {
diff --git a/net/decnet/netfilter/dn_rtmsg.c b/net/decnet/netfilter/dn_rtmsg.c
index 69975e0..2d052ab 100644
--- a/net/decnet/netfilter/dn_rtmsg.c
+++ b/net/decnet/netfilter/dn_rtmsg.c
@@ -108,7 +108,8 @@ static inline void dnrmg_receive_user_skb(struct sk_buff *skb)
 	if (nlh->nlmsg_len < sizeof(*nlh) || skb->len < nlh->nlmsg_len)
 		return;
 
-	if (security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (security_netlink_recv(skb, CAP_NET_ADMIN,
+	    sock_net(skb->sk)->user_ns))
 		RCV_SKB_FAIL(-EPERM);
 
 	/* Eventually we might send routing messages too */
diff --git a/net/ipv4/netfilter/ip_queue.c b/net/ipv4/netfilter/ip_queue.c
index 5c9b9d9..51d7c52 100644
--- a/net/ipv4/netfilter/ip_queue.c
+++ b/net/ipv4/netfilter/ip_queue.c
@@ -432,7 +432,8 @@ __ipq_rcv_skb(struct sk_buff *skb)
 	if (type <= IPQM_BASE)
 		return;
 
-	if (security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (security_netlink_recv(skb, CAP_NET_ADMIN,
+				  sock_net(skb->sk)->user_ns))
 		RCV_SKB_FAIL(-EPERM);
 
 	spin_lock_bh(&queue_lock);
diff --git a/net/ipv6/netfilter/ip6_queue.c b/net/ipv6/netfilter/ip6_queue.c
index 2493948..8206bf3 100644
--- a/net/ipv6/netfilter/ip6_queue.c
+++ b/net/ipv6/netfilter/ip6_queue.c
@@ -433,7 +433,8 @@ __ipq_rcv_skb(struct sk_buff *skb)
 	if (type <= IPQM_BASE)
 		return;
 
-	if (security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (security_netlink_recv(skb, CAP_NET_ADMIN,
+				  sock_net(skb->sk)->user_ns))
 		RCV_SKB_FAIL(-EPERM);
 
 	spin_lock_bh(&queue_lock);
diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index 1905976..bcaff9d 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -130,7 +130,7 @@ static int nfnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 	const struct nfnetlink_subsystem *ss;
 	int type, err;
 
-	if (security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (security_netlink_recv(skb, CAP_NET_ADMIN, net->user_ns))
 		return -EPERM;
 
 	/* All the messages must at least contain nfgenmsg */
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index 482fa57..00a101c 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -516,7 +516,7 @@ static int genl_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 		return -EOPNOTSUPP;
 
 	if ((ops->flags & GENL_ADMIN_PERM) &&
-	    security_netlink_recv(skb, CAP_NET_ADMIN))
+	    security_netlink_recv(skb, CAP_NET_ADMIN, net->user_ns))
 		return -EPERM;
 
 	if (nlh->nlmsg_flags & NLM_F_DUMP) {
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 0256b8a..1808e1e 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2290,7 +2290,7 @@ static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 	link = &xfrm_dispatch[type];
 
 	/* All operations require privileges, even GET */
-	if (security_netlink_recv(skb, CAP_NET_ADMIN))
+	if (security_netlink_recv(skb, CAP_NET_ADMIN, net->user_ns))
 		return -EPERM;
 
 	if ((type == (XFRM_MSG_GETSA - XFRM_MSG_BASE) ||
diff --git a/security/commoncap.c b/security/commoncap.c
index a93b3b7..1e48e6a 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -56,11 +56,9 @@ int cap_netlink_send(struct sock *sk, struct sk_buff *skb)
 	return 0;
 }
 
-int cap_netlink_recv(struct sk_buff *skb, int cap)
+int cap_netlink_recv(struct sk_buff *skb, int cap, struct user_namespace *ns)
 {
-	if (!cap_raised(current_cap(), cap))
-		return -EPERM;
-	return 0;
+	return security_capable(ns, current_cred(), cap);
 }
 EXPORT_SYMBOL(cap_netlink_recv);
 
diff --git a/security/security.c b/security/security.c
index 0e4fccf..0a1453e 100644
--- a/security/security.c
+++ b/security/security.c
@@ -941,9 +941,9 @@ int security_netlink_send(struct sock *sk, struct sk_buff *skb)
 	return security_ops->netlink_send(sk, skb);
 }
 
-int security_netlink_recv(struct sk_buff *skb, int cap)
+int security_netlink_recv(struct sk_buff *skb, int cap, struct user_namespace *ns)
 {
-	return security_ops->netlink_recv(skb, cap);
+	return security_ops->netlink_recv(skb, cap, ns);
 }
 EXPORT_SYMBOL(security_netlink_recv);
 
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 266a229..fe290bb 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -4723,13 +4723,14 @@ static int selinux_netlink_send(struct sock *sk, struct sk_buff *skb)
 	return selinux_nlmsg_perm(sk, skb);
 }
 
-static int selinux_netlink_recv(struct sk_buff *skb, int capability)
+static int selinux_netlink_recv(struct sk_buff *skb, int capability,
+				struct user_namespace *ns)
 {
 	int err;
 	struct common_audit_data ad;
 	u32 sid;
 
-	err = cap_netlink_recv(skb, capability);
+	err = cap_netlink_recv(skb, capability, ns);
 	if (err)
 		return err;
 
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH 12/15] user_ns: target af_key capability check
From: Serge Hallyn @ 2011-09-02 19:56 UTC (permalink / raw)
  To: akpm-3NddpPZAyC0, segooon-Re5JQEeQqe8AvxtiuMwx3w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	dhowells-H+wXaHxf7aLQT0dZR+AlfA, ebiederm-aS9lmoZGLiVWk0Htik3J/w
In-Reply-To: <1314993400-6910-1-git-send-email-serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>

From: "Serge E. Hallyn" <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

This presumes that it really is complete wrt network namespaces.  Looking
at the code it appears to be.

Signed-off-by: Serge E. Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Cc: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 net/key/af_key.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/key/af_key.c b/net/key/af_key.c
index 1e733e9..1f90f4e 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -141,7 +141,7 @@ static int pfkey_create(struct net *net, struct socket *sock, int protocol,
 	struct sock *sk;
 	int err;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 	if (sock->type != SOCK_RAW)
 		return -ESOCKTNOSUPPORT;
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH 11/15] userns: make some net-sysfs capable calls targeted
From: Serge Hallyn @ 2011-09-02 19:56 UTC (permalink / raw)
  To: akpm-3NddpPZAyC0, segooon-Re5JQEeQqe8AvxtiuMwx3w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	dhowells-H+wXaHxf7aLQT0dZR+AlfA, ebiederm-aS9lmoZGLiVWk0Htik3J/w
In-Reply-To: <1314993400-6910-1-git-send-email-serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org>

From: "Serge E. Hallyn" <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>

Changelog: jul 1: fix compilation errors (net_device != net)

Signed-off-by: Serge E. Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Cc: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 net/core/net-sysfs.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 1683e5d..876915b 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -76,7 +76,7 @@ static ssize_t netdev_store(struct device *dev, struct device_attribute *attr,
 	unsigned long new;
 	int ret = -EINVAL;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(dev_net(net)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	new = simple_strtoul(buf, &endp, 0);
@@ -261,7 +261,7 @@ static ssize_t store_ifalias(struct device *dev, struct device_attribute *attr,
 	size_t count = len;
 	ssize_t ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(dev_net(netdev)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	/* ignore trailing newline */
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH 10/15] net/core/scm.c: target capable() calls to user_ns owning the net_ns
From: Serge Hallyn @ 2011-09-02 19:56 UTC (permalink / raw)
  To: akpm, segooon, linux-kernel, netdev, containers, dhowells,
	ebiederm, rdunlap
  Cc: Serge E. Hallyn
In-Reply-To: <1314993400-6910-1-git-send-email-serge@hallyn.com>

From: "Serge E. Hallyn" <serge.hallyn@canonical.com>

The uid/gid comparisons don't have to be pulled out.  This just seemed
more easily proved correct.

Changelog:
   mark struct cred arg const

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 net/core/scm.c |   41 ++++++++++++++++++++++++++++++++++-------
 1 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/net/core/scm.c b/net/core/scm.c
index 811b53f..4f376bf 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -43,17 +43,44 @@
  *	setu(g)id.
  */
 
-static __inline__ int scm_check_creds(struct ucred *creds)
+static __inline__ bool uidequiv(const struct cred *src, struct ucred *tgt,
+			       struct user_namespace *ns)
+{
+	if (src->user_ns != ns)
+		goto check_capable;
+	if (src->uid == tgt->uid || src->euid == tgt->uid ||
+	    src->suid == tgt->uid)
+		return true;
+check_capable:
+	if (ns_capable(ns, CAP_SETUID))
+		return true;
+	return false;
+}
+
+static __inline__ bool gidequiv(const struct cred *src, struct ucred *tgt,
+			       struct user_namespace *ns)
+{
+	if (src->user_ns != ns)
+		goto check_capable;
+	if (src->gid == tgt->gid || src->egid == tgt->gid ||
+	    src->sgid == tgt->gid)
+		return true;
+check_capable:
+	if (ns_capable(ns, CAP_SETGID))
+		return true;
+	return false;
+}
+
+static __inline__ int scm_check_creds(struct ucred *creds, struct socket *sock)
 {
 	const struct cred *cred = current_cred();
+	struct user_namespace *ns = sock_net(sock->sk)->user_ns;
 
-	if ((creds->pid == task_tgid_vnr(current) || capable(CAP_SYS_ADMIN)) &&
-	    ((creds->uid == cred->uid   || creds->uid == cred->euid ||
-	      creds->uid == cred->suid) || capable(CAP_SETUID)) &&
-	    ((creds->gid == cred->gid   || creds->gid == cred->egid ||
-	      creds->gid == cred->sgid) || capable(CAP_SETGID))) {
+	if ((creds->pid == task_tgid_vnr(current) || ns_capable(ns, CAP_SYS_ADMIN)) &&
+	     uidequiv(cred, creds, ns) && gidequiv(cred, creds, ns)) {
 	       return 0;
 	}
+
 	return -EPERM;
 }
 
@@ -169,7 +196,7 @@ int __scm_send(struct socket *sock, struct msghdr *msg, struct scm_cookie *p)
 			if (cmsg->cmsg_len != CMSG_LEN(sizeof(struct ucred)))
 				goto error;
 			memcpy(&p->creds, CMSG_DATA(cmsg), sizeof(struct ucred));
-			err = scm_check_creds(&p->creds);
+			err = scm_check_creds(&p->creds, sock);
 			if (err)
 				goto error;
 
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH 09/15] user ns: convert ipv6 to targeted capabilities
From: Serge Hallyn @ 2011-09-02 19:56 UTC (permalink / raw)
  To: akpm, segooon, linux-kernel, netdev, containers, dhowells,
	ebiederm, rdunlap
  Cc: Serge E. Hallyn
In-Reply-To: <1314993400-6910-1-git-send-email-serge@hallyn.com>

From: "Serge E. Hallyn" <serge.hallyn@canonical.com>

Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---
 net/ipv6/addrconf.c             |    4 ++--
 net/ipv6/af_inet6.c             |    6 ++++--
 net/ipv6/datagram.c             |    6 +++---
 net/ipv6/ip6_flowlabel.c        |   24 ++++++++++++++----------
 net/ipv6/ip6_tunnel.c           |    4 ++--
 net/ipv6/ip6mr.c                |    2 +-
 net/ipv6/ipv6_sockglue.c        |    7 ++++---
 net/ipv6/netfilter/ip6_tables.c |    8 ++++----
 net/ipv6/route.c                |    2 +-
 net/ipv6/sit.c                  |   10 +++++-----
 10 files changed, 40 insertions(+), 33 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index f012ebd..871e5cf 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2230,7 +2230,7 @@ int addrconf_add_ifaddr(struct net *net, void __user *arg)
 	struct in6_ifreq ireq;
 	int err;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (copy_from_user(&ireq, arg, sizeof(struct in6_ifreq)))
@@ -2249,7 +2249,7 @@ int addrconf_del_ifaddr(struct net *net, void __user *arg)
 	struct in6_ifreq ireq;
 	int err;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (copy_from_user(&ireq, arg, sizeof(struct in6_ifreq)))
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 3b5669a..1854ffe 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -160,7 +160,8 @@ lookup_protocol:
 	}
 
 	err = -EPERM;
-	if (sock->type == SOCK_RAW && !kern && !capable(CAP_NET_RAW))
+	if (sock->type == SOCK_RAW && !kern &&
+	    !ns_capable(net->user_ns, CAP_NET_RAW))
 		goto out_rcu_unlock;
 
 	sock->ops = answer->ops;
@@ -281,7 +282,8 @@ int inet6_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 		return -EINVAL;
 
 	snum = ntohs(addr->sin6_port);
-	if (snum && snum < PROT_SOCK && !capable(CAP_NET_BIND_SERVICE))
+	if (snum && snum < PROT_SOCK &&
+	    !ns_capable(sock_net(sk)->user_ns, CAP_NET_BIND_SERVICE))
 		return -EACCES;
 
 	lock_sock(sk);
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 9ef1831..33b1b0f 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -701,7 +701,7 @@ int datagram_send_ctl(struct net *net,
 				err = -EINVAL;
 				goto exit_f;
 			}
-			if (!capable(CAP_NET_RAW)) {
+			if (!ns_capable(net->user_ns, CAP_NET_RAW)) {
 				err = -EPERM;
 				goto exit_f;
 			}
@@ -721,7 +721,7 @@ int datagram_send_ctl(struct net *net,
 				err = -EINVAL;
 				goto exit_f;
 			}
-			if (!capable(CAP_NET_RAW)) {
+			if (!ns_capable(net->user_ns, CAP_NET_RAW)) {
 				err = -EPERM;
 				goto exit_f;
 			}
@@ -746,7 +746,7 @@ int datagram_send_ctl(struct net *net,
 				err = -EINVAL;
 				goto exit_f;
 			}
-			if (!capable(CAP_NET_RAW)) {
+			if (!ns_capable(net->user_ns, CAP_NET_RAW)) {
 				err = -EPERM;
 				goto exit_f;
 			}
diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c
index f3caf1b..4726c02 100644
--- a/net/ipv6/ip6_flowlabel.c
+++ b/net/ipv6/ip6_flowlabel.c
@@ -294,21 +294,22 @@ struct ipv6_txoptions *fl6_merge_options(struct ipv6_txoptions * opt_space,
 	return opt_space;
 }
 
-static unsigned long check_linger(unsigned long ttl)
+static unsigned long check_linger(unsigned long ttl, struct user_namespace *ns)
 {
 	if (ttl < FL_MIN_LINGER)
 		return FL_MIN_LINGER*HZ;
-	if (ttl > FL_MAX_LINGER && !capable(CAP_NET_ADMIN))
+	if (ttl > FL_MAX_LINGER && !ns_capable(ns, CAP_NET_ADMIN))
 		return 0;
 	return ttl*HZ;
 }
 
-static int fl6_renew(struct ip6_flowlabel *fl, unsigned long linger, unsigned long expires)
+static int fl6_renew(struct ip6_flowlabel *fl, unsigned long linger,
+		     unsigned long expires, struct user_namespace *ns)
 {
-	linger = check_linger(linger);
+	linger = check_linger(linger, ns);
 	if (!linger)
 		return -EPERM;
-	expires = check_linger(expires);
+	expires = check_linger(expires, ns);
 	if (!expires)
 		return -EPERM;
 	fl->lastuse = jiffies;
@@ -375,7 +376,7 @@ fl_create(struct net *net, struct in6_flowlabel_req *freq, char __user *optval,
 
 	fl->fl_net = hold_net(net);
 	fl->expires = jiffies;
-	err = fl6_renew(fl, freq->flr_linger, freq->flr_expires);
+	err = fl6_renew(fl, freq->flr_linger, freq->flr_expires, net->user_ns);
 	if (err)
 		goto done;
 	fl->share = freq->flr_share;
@@ -425,7 +426,7 @@ static int mem_check(struct sock *sk)
 	if (room <= 0 ||
 	    ((count >= FL_MAX_PER_SOCK ||
 	      (count > 0 && room < FL_MAX_SIZE/2) || room < FL_MAX_SIZE/4) &&
-	     !capable(CAP_NET_ADMIN)))
+	     !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)))
 		return -ENOBUFS;
 
 	return 0;
@@ -507,17 +508,20 @@ int ipv6_flowlabel_opt(struct sock *sk, char __user *optval, int optlen)
 		read_lock_bh(&ip6_sk_fl_lock);
 		for (sfl = np->ipv6_fl_list; sfl; sfl = sfl->next) {
 			if (sfl->fl->label == freq.flr_label) {
-				err = fl6_renew(sfl->fl, freq.flr_linger, freq.flr_expires);
+				err = fl6_renew(sfl->fl, freq.flr_linger, freq.flr_expires,
+						net->user_ns);
 				read_unlock_bh(&ip6_sk_fl_lock);
 				return err;
 			}
 		}
 		read_unlock_bh(&ip6_sk_fl_lock);
 
-		if (freq.flr_share == IPV6_FL_S_NONE && capable(CAP_NET_ADMIN)) {
+		if (freq.flr_share == IPV6_FL_S_NONE &&
+		    ns_capable(net->user_ns, CAP_NET_ADMIN)) {
 			fl = fl_lookup(net, freq.flr_label);
 			if (fl) {
-				err = fl6_renew(fl, freq.flr_linger, freq.flr_expires);
+				err = fl6_renew(fl, freq.flr_linger, freq.flr_expires,
+						net->user_ns);
 				fl_release(fl);
 				return err;
 			}
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 0bc9888..c430d69 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1269,7 +1269,7 @@ ip6_tnl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			break;
 		err = -EFAULT;
 		if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof (p)))
@@ -1304,7 +1304,7 @@ ip6_tnl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 		break;
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			break;
 
 		if (dev == ip6n->fb_tnl_dev) {
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index 705c828..1649ccd 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -1582,7 +1582,7 @@ int ip6_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, uns
 		return -ENOENT;
 
 	if (optname != MRT6_INIT) {
-		if (sk != mrt->mroute6_sk && !capable(CAP_NET_ADMIN))
+		if (sk != mrt->mroute6_sk && !ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EACCES;
 	}
 
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index 147ede38..485e181 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -343,7 +343,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 		break;
 
 	case IPV6_TRANSPARENT:
-		if (!capable(CAP_NET_ADMIN)) {
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) {
 			retv = -EPERM;
 			break;
 		}
@@ -381,7 +381,8 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 
 		/* hop-by-hop / destination options are privileged option */
 		retv = -EPERM;
-		if (optname != IPV6_RTHDR && !capable(CAP_NET_RAW))
+		if (optname != IPV6_RTHDR &&
+		    !ns_capable(net->user_ns, CAP_NET_RAW))
 			break;
 
 		opt = ipv6_renew_options(sk, np->opt, optname,
@@ -725,7 +726,7 @@ done:
 	case IPV6_IPSEC_POLICY:
 	case IPV6_XFRM_POLICY:
 		retv = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			break;
 		retv = xfrm_user_policy(sk, optname, optval, optlen);
 		break;
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 94874b0..7fce7d8 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -1869,7 +1869,7 @@ compat_do_ip6t_set_ctl(struct sock *sk, int cmd, void __user *user,
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1984,7 +1984,7 @@ compat_do_ip6t_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -2006,7 +2006,7 @@ do_ip6t_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -2031,7 +2031,7 @@ do_ip6t_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 9e69eb0..f00c18d 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1938,7 +1938,7 @@ int ipv6_route_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	switch(cmd) {
 	case SIOCADDRT:		/* Add a route */
 	case SIOCDELRT:		/* Delete a route */
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 		err = copy_from_user(&rtmsg, arg,
 				     sizeof(struct in6_rtmsg));
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 00b15ac..7438711 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -308,7 +308,7 @@ static int ipip6_tunnel_get_prl(struct ip_tunnel *t,
 	/* For simple GET or for root users,
 	 * we try harder to allocate.
 	 */
-	kp = (cmax <= 1 || capable(CAP_NET_ADMIN)) ?
+	kp = (cmax <= 1 || ns_capable(dev_net(t->dev)->user_ns, CAP_NET_ADMIN)) ?
 		kcalloc(cmax, sizeof(*kp), GFP_KERNEL) :
 		NULL;
 
@@ -929,7 +929,7 @@ ipip6_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		err = -EFAULT;
@@ -988,7 +988,7 @@ ipip6_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		if (dev == sitn->fb_tunnel_dev) {
@@ -1021,7 +1021,7 @@ ipip6_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCDELPRL:
 	case SIOCCHGPRL:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			goto done;
 		err = -EINVAL;
 		if (dev == sitn->fb_tunnel_dev)
@@ -1050,7 +1050,7 @@ ipip6_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCCHG6RD:
 	case SIOCDEL6RD:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(dev_net(dev)->user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		err = -EFAULT;
-- 
1.7.5.4

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox