Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 0/4] Implement persistent grant in xen-netfront/netback
From: Ian Campbell @ 2012-11-16  9:57 UTC (permalink / raw)
  To: Annie Li
  Cc: xen-devel@lists.xensource.com, netdev@vger.kernel.org,
	konrad.wilk@oracle.com
In-Reply-To: <1352962987-541-1-git-send-email-annie.li@oracle.com>

On Thu, 2012-11-15 at 07:03 +0000, Annie Li wrote:
> This patch implements persistent grants for xen-netfront/netback.

Hang on a sec. It has just occurred to me that netfront/netback in the
current mainline kernels don't currently use grant maps at all, they use
grant copy on both the tx and rx paths.

The supposed benefit of persistent grants is to avoid the TLB shootdowns
on grant unmap, but in the current code there should be exactly zero of
those.

If I understand correctly this patch goes from using grant copy
operations to persistently mapping frames and then using memcpy on those
buffers to copy in/out to local buffers. I'm finding it hard to think of
a reason why this should perform any better, do you have a theory which
explains it? (my best theory is that it has a beneficial impact on where
the cache locality of the data, but netperf doesn't typically actually
access the data so I'm not sure why that would matter)

Also AIUI this is also doing persistent grants for both Tx and Rx
directions?

For guest Rx does this mean it now copies twice, in dom0 from the DMA
buffer to the guest provided buffer and then again in the guest from the
granted buffer to a normal one?

For guest Tx how do you handle the lifecycle of the grant mapped pages
which are being sent up into the dom0 network stack? Or are you also now
copying twice in this case? (i.e. guest copies into a granted buffer and
dom0 copies out into a local buffer?)

Did you do measurement of the Tx and Rx cases independently? Do you know
that they both benefit from this change (rather than for example an
improvement in one direction masking a regression in the other). Were
the numbers you previously posted in one particular direction or did you
measure both?

Ian.

^ permalink raw reply

* [PATCH v2 3.7.0-rc4] of/net/mdio-gpio: Fix pdev->id issue when using devicetrees.
From: Srinivas KANDAGATLA @ 2012-11-16 10:33 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA, davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	srinivas.kandagatla-qxv4g6HH51o
In-Reply-To: <1352816773-17837-1-git-send-email-srinivas.kandagatla-qxv4g6HH51o@public.gmane.org>

From: Srinivas Kandagatla <srinivas.kandagatla-qxv4g6HH51o@public.gmane.org>

When the mdio-gpio driver is probed via device trees, the platform
device id is set as -1, However the pdev->id is re-used as bus-id for
while creating mdio gpio bus.
So
For device tree case the mdio-gpio bus name appears as "gpio-ffffffff"
where as
for non-device tree case the bus name appears as "gpio-<bus-num>"

Which means the bus_id is fixed in device tree case, so we can't have
two mdio gpio buses via device trees. Assigning a logical bus number
via device tree solves the problem and the bus name is much consistent
with non-device tree bus name.

Without this patch
1. we can't support two mdio-gpio buses via device trees.
2. we should always pass gpio-ffffffff as bus name to phy_connect, very
different to non-device tree bus name.

So, setting up the bus_id via aliases from device tree is the right
solution and other drivers do similar thing.

Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla-qxv4g6HH51o@public.gmane.org>
---
 .../devicetree/bindings/net/mdio-gpio.txt          |    9 ++++++++-
 drivers/net/phy/mdio-gpio.c                        |   11 +++++++----
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/mdio-gpio.txt b/Documentation/devicetree/bindings/net/mdio-gpio.txt
index bc95495..c79bab0 100644
--- a/Documentation/devicetree/bindings/net/mdio-gpio.txt
+++ b/Documentation/devicetree/bindings/net/mdio-gpio.txt
@@ -8,9 +8,16 @@ gpios property as described in section VIII.1 in the following order:
 
 MDC, MDIO.
 
+Note: Each gpio-mdio bus should have an alias correctly numbered in "aliases"
+node.
+
 Example:
 
-mdio {
+aliases {
+	mdio-gpio0 = <&mdio0>;
+};
+
+mdio0: mdio {
 	compatible = "virtual,mdio-gpio";
 	#address-cells = <1>;
 	#size-cells = <0>;
diff --git a/drivers/net/phy/mdio-gpio.c b/drivers/net/phy/mdio-gpio.c
index 899274f..2ed1140 100644
--- a/drivers/net/phy/mdio-gpio.c
+++ b/drivers/net/phy/mdio-gpio.c
@@ -185,17 +185,20 @@ static int __devinit mdio_gpio_probe(struct platform_device *pdev)
 {
 	struct mdio_gpio_platform_data *pdata;
 	struct mii_bus *new_bus;
-	int ret;
+	int ret, bus_id;
 
-	if (pdev->dev.of_node)
+	if (pdev->dev.of_node) {
 		pdata = mdio_gpio_of_get_data(pdev);
-	else
+		bus_id = of_alias_get_id(pdev->dev.of_node, "mdio-gpio");
+	} else {
 		pdata = pdev->dev.platform_data;
+		bus_id = pdev->id;
+	}
 
 	if (!pdata)
 		return -ENODEV;
 
-	new_bus = mdio_gpio_bus_init(&pdev->dev, pdata, pdev->id);
+	new_bus = mdio_gpio_bus_init(&pdev->dev, pdata, bus_id);
 	if (!new_bus)
 		return -ENODEV;
 
-- 
1.7.0.4

^ permalink raw reply related

* Re: [PATCH 4/5] ath6kl/wmi.c: eliminate possible double free
From: Kalle Valo @ 2012-11-16 11:17 UTC (permalink / raw)
  To: Julia Lawall
  Cc: kernel-janitors, John W. Linville, linux-wireless, netdev,
	linux-kernel
In-Reply-To: <1350816727-1381-5-git-send-email-Julia.Lawall@lip6.fr>

Hi Julia,

On 10/21/2012 01:52 PM, Julia Lawall wrote:
> From: Julia Lawall <Julia.Lawall@lip6.fr>
> 
> This makes two changes.  In ath6kl_wmi_cmd_send, a call to dev_kfree_skb on
> the skb argument is added to the initial sanity check to more completely
> establish the invariant that ath6kl_wmi_cmd_send owns its skb argument.
> Then, in ath6kl_wmi_sync_point, on failure of the call to
> ath6kl_wmi_cmd_send, the clearing of the local skb variable is moved up, so
> that the error-handling code at the end of the function does not free it
> again.
> 
> A simplified version of the semantic match that finds this problem is as
> follows: (http://coccinelle.lip6.fr/)
> 
> // <smpl>
> @r@
> identifier f,free,a;
> parameter list[n] ps;
> type T;
> expression e;
> @@
> 
> f(ps,T a,...) {
>   ... when any
>       when != a = e
>   if(...) { ... free(a); ... return ...; }
>   ... when any
> }
> 
> @@
> identifier r.f,r.free;
> expression x,a;
> expression list[r.n] xs;
> @@
> 
> * x = f(xs,a,...);
>   if (...) { ... free(a); ... return ...; }
> // </smpl>
> 
> Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>

I think this patch which is commited to ath6kl.git has fixed this.

commit 0616dc1f2bef563d7916c0dcedbb1bff7d9bd80b
Author: Vasanthakumar Thiagarajan <vthiagar@qca.qualcomm.com>
Date:   Tue Aug 14 10:10:33 2012 +0530

    ath6kl: Fix potential skb double free in ath6kl_wmi_sync_point()

    skb given to ath6kl_control_tx() is owned by ath6kl_control_tx().
    Calling function should not free the skb for error cases.
    This is found during code review.

    kvalo: fix a checkpatch warning in ath6kl_wmi_cmd_send()

    Signed-off-by: Vasanthakumar Thiagarajan <vthiagar@qca.qualcomm.com>
    Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

https://github.com/kvalo/ath6kl/commit/0616dc1f2bef563d7916c0dcedbb1bff7d9bd80b

If you have the time, I would appreciate if you could take a look and
confirm.

Kalle

^ permalink raw reply

* Re: [Xen-devel] [PATCH 1/4] xen/netback: implements persistent grant with one page pool.
From: ANNIE LI @ 2012-11-16 11:34 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Roger Pau Monne, xen-devel@lists.xensource.com,
	netdev@vger.kernel.org, konrad.wilk@oracle.com
In-Reply-To: <1353058363.3499.171.camel@zakaz.uk.xensource.com>



On 2012-11-16 17:32, Ian Campbell wrote:
> On Fri, 2012-11-16 at 02:49 +0000, ANNIE LI wrote:
>>> Take a look at the following functions from blkback; foreach_grant,
>>> add_persistent_gnt and get_persistent_gnt. They are generic functions to
>>> deal with persistent grants.
>> Ok, thanks.
>> Or moving those functions into a separate common file?
> Please put them somewhere common.

Ok.

>>> This is highly inefficient, one of the points of using gnttab_set_map_op
>>> is that you can queue a bunch of grants, and then map them at the same
>>> time using gnttab_map_refs, but here you are using it to map a single
>>> grant at a time. You should instead see how much grants you need to map
>>> to complete the request and map them all at the same time.
>> Yes, it is inefficient here. But this is limited by current netback
>> implementation. Current netback is not per-VIF based(not like blkback
>> does). After combining persistent grant and non persistent grant
>> together, every vif request in the queue may/may not support persistent
>> grant. I have to judge whether every vif in the queue supports
>> persistent grant or not. If it support, memcpy is used, if not,
>> grantcopy is used.
> You could (and should) still batch all the grant copies into one
> hypercall, e.g. walk the list either doing memcpy or queuing up copyops
> as appropriate, then at the end if the queue is non-zero length issue
> the hypercall.

This still connects with netback per-VIF implementation.

> I'd expect this lack of batching here and in the other case I just
> spotted to have a detrimental affect on guests running with this patch
> but not using persistent grants. Did you benchmark that case?

I did some test before.
But I'd better to create more detailed result under in different case.

>> After making netback per-VIF works, this issue can be fixed.
> You've mentioned improvements which are conditional on this work a few
> times I think, perhaps it makes sense to make that change first?

Yes, I did consider to implement per-VIF first before persistent grant. 
But thinking of it is part of wei's patch and combined with other 
patches, and decided to implement it later. But making the change first 
would make things more clear.

Thanks
Annie
> Ian.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message tomajordomo@vger.kernel.org
> More majordomo info athttp://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 0/4] Implement persistent grant in xen-netfront/netback
From: ANNIE LI @ 2012-11-16 11:37 UTC (permalink / raw)
  To: Ian Campbell
  Cc: xen-devel@lists.xensource.com, netdev@vger.kernel.org,
	konrad.wilk@oracle.com
In-Reply-To: <1353059821.3499.190.camel@zakaz.uk.xensource.com>



On 2012-11-16 17:57, Ian Campbell wrote:
> On Thu, 2012-11-15 at 07:03 +0000, Annie Li wrote:
>> This patch implements persistent grants for xen-netfront/netback.
> Hang on a sec. It has just occurred to me that netfront/netback in the
> current mainline kernels don't currently use grant maps at all, they use
> grant copy on both the tx and rx paths.

Ah, this patch is based on v3.4-rc3.
Current mainline kernel does not pass the netperf/netserver case. As I 
mentioned earlier, I hit BUG_ON with your debug patch too when testing 
mainline kernel with netperf/netserver.
This is interesting, I should have check the latest code.

>
> The supposed benefit of persistent grants is to avoid the TLB shootdowns
> on grant unmap, but in the current code there should be exactly zero of
> those.

Is there any performance document about current grant copy code in 
mainline kernel?

>
> If I understand correctly this patch goes from using grant copy
> operations to persistently mapping frames and then using memcpy on those
> buffers to copy in/out to local buffers. I'm finding it hard to think of
> a reason why this should perform any better, do you have a theory which
> explains it?

This patch is aiming to fix spin lock issue of grant operations, it 
comes out to avoid possible grant operations(including grant map and copy).

> (my best theory is that it has a beneficial impact on where
> the cache locality of the data, but netperf doesn't typically actually
> access the data so I'm not sure why that would matter)
>
> Also AIUI this is also doing persistent grants for both Tx and Rx
> directions?

Yes.

>
> For guest Rx does this mean it now copies twice, in dom0 from the DMA
> buffer to the guest provided buffer and then again in the guest from the
> granted buffer to a normal one?

Yes.

>
> For guest Tx how do you handle the lifecycle of the grant mapped pages
> which are being sent up into the dom0 network stack? Or are you also now
> copying twice in this case? (i.e. guest copies into a granted buffer and
> dom0 copies out into a local buffer?)

Copy twice: guest copies into a granted buffer and dom0 copies out into 
a local buffer.

>
> Did you do measurement of the Tx and Rx cases independently?

No.

> Do you know
> that they both benefit from this change (rather than for example an
> improvement in one direction masking a regression in the other).

On theory, this implementation avoid spinlock issue of grant operation, 
so they should both benefit from it.

> Were
> the numbers you previously posted in one particular direction or did you
> measure both?

One particular direction, one runs as server, the other runs as client.

Thanks
Annie
>
> Ian.
>

^ permalink raw reply

* Re: [PATCH 8/8] drivers/net/wireless/ath/ath6kl/hif.c: drop if around WARN_ON
From: Kalle Valo @ 2012-11-16 11:39 UTC (permalink / raw)
  To: Julia Lawall
  Cc: kernel-janitors-u79uwXL29TY76Z2rM5mHXA, John W. Linville,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1351974625-10282-9-git-send-email-Julia.Lawall-L2FTfq7BK8M@public.gmane.org>

On 11/03/2012 10:30 PM, Julia Lawall wrote:
> From: Julia Lawall <Julia.Lawall-L2FTfq7BK8M@public.gmane.org>
> 
> Just use WARN_ON rather than an if containing only WARN_ON(1).
> 
> A simplified version of the semantic patch that makes this transformation
> is as follows: (http://coccinelle.lip6.fr/)
> 
> // <smpl>
> @@
> expression e;
> @@
> - if (e) WARN_ON(1);
> + WARN_ON(e);
> // </smpl>
> 
> Signed-off-by: Julia Lawall <Julia.Lawall-L2FTfq7BK8M@public.gmane.org>

Thanks, applied to ath6kl.git.

Kalle
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 0/4] Implement persistent grant in xen-netfront/netback
From: Ian Campbell @ 2012-11-16 11:46 UTC (permalink / raw)
  To: ANNIE LI
  Cc: xen-devel@lists.xensource.com, netdev@vger.kernel.org,
	konrad.wilk@oracle.com
In-Reply-To: <50A6255C.10108@oracle.com>

On Fri, 2012-11-16 at 11:37 +0000, ANNIE LI wrote:
> 
> On 2012-11-16 17:57, Ian Campbell wrote:
> > On Thu, 2012-11-15 at 07:03 +0000, Annie Li wrote:
> >> This patch implements persistent grants for xen-netfront/netback.
> > Hang on a sec. It has just occurred to me that netfront/netback in the
> > current mainline kernels don't currently use grant maps at all, they use
> > grant copy on both the tx and rx paths.
> 
> Ah, this patch is based on v3.4-rc3.

Nothing has changed in more recent kernels in this regard.

> >
> > The supposed benefit of persistent grants is to avoid the TLB shootdowns
> > on grant unmap, but in the current code there should be exactly zero of
> > those.
> 
> Is there any performance document about current grant copy code in 
> mainline kernel?

Not AFAIK.

> > If I understand correctly this patch goes from using grant copy
> > operations to persistently mapping frames and then using memcpy on those
> > buffers to copy in/out to local buffers. I'm finding it hard to think of
> > a reason why this should perform any better, do you have a theory which
> > explains it?
> 
> This patch is aiming to fix spin lock issue of grant operations, it 
> comes out to avoid possible grant operations(including grant map and copy).

Makes sense. This is the sort of thing which ought to feature
prominently in commit messages and/or introductory mails.

> > Do you know
> > that they both benefit from this change (rather than for example an
> > improvement in one direction masking a regression in the other).
> 
> On theory, this implementation avoid spinlock issue of grant operation, 
> so they should both benefit from it.

It seems like having netfront simply allocate itself a pool of grant
references which it reuses would give equivalent benefits whilst being a
smaller patch, with no protocol change and avoiding double copying. In
fact by avoiding the double copy I'd expect it to be even better.

> > Were
> > the numbers you previously posted in one particular direction or did you
> > measure both?
> 
> One particular direction, one runs as server, the other runs as client.

I think you need to measure both dom0->domU and domU->dom0 to get the
full picture since AIUI netperf sends the bulk data in only one
direction with just ACKs coming back the other way.

Ian.

^ permalink raw reply

* [PATCH 0/4] netfilter updates for nf-next (try 2)
From: pablo @ 2012-11-16 12:40 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev

From: Pablo Neira Ayuso <pablo@netfilter.org>

Hi David,

This is the second try to include the following four patches that contain
updates for your net-next tree, they are:

* Little cleanup for IPVS the use of a strange notation to assign the
  conntrack object, from Alan Cox.

* Another little cleanup for nf_nat to save a couple of lines by using
  PTR_RET, from Wu Fengguan

* getsockopt support to obtain the original IPv6 address after NAT,
  similar to the one that IPv4 provides, from Florian Westphal.

* Follow-up patch pointed out by YOSHIFUJI Hideaki to only provide
  the scope_id in case that link is local, again from Florian Westphal.

You can pull these changes from:

git://1984.lsi.us.es/nf-next master

Thanks!

Alan Cox (1):
  ipvs: remove silly double assignment

Florian Westphal (2):
  netfilter: ipv6: add getsockopt to retrieve origdst
  netfilter: ipv6: only provide sk_bound_dev_if for link-local addr

Wu Fengguang (1):
  netfilter: nf_nat: use PTR_RET

 include/uapi/linux/in6.h                       |    1 +
 include/uapi/linux/netfilter_ipv6/ip6_tables.h |    3 ++
 net/ipv4/netfilter/iptable_nat.c               |    4 +-
 net/ipv6/netfilter/ip6table_nat.c              |    4 +-
 net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c |   66 ++++++++++++++++++++++++
 net/netfilter/ipvs/ip_vs_nfct.c                |    2 +-
 net/netfilter/ipvs/ip_vs_xmit.c                |    8 +--
 7 files changed, 77 insertions(+), 11 deletions(-)

-- 
1.7.10.4

^ permalink raw reply

* [PATCH 4/4] netfilter: ipv6: only provide sk_bound_dev_if for link-local addr
From: pablo @ 2012-11-16 12:40 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1353069653-3231-1-git-send-email-pablo@netfilter.org>

From: Florian Westphal <fw@strlen.de>

yoshfuji points out that sk_bound_dev_if should only be provided
for link-local addresses.

IPv6 getpeer/sockname also has this test, i.e. we will now
only set sin6_scope_id if the original(!) destination
was a link-local address.

Reported-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
index 02dcafd..e5f6cf7 100644
--- a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
+++ b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
@@ -334,9 +334,14 @@ ipv6_getorigdst(struct sock *sk, int optval, void __user *user, int *len)
 	memcpy(&sin6.sin6_addr,
 		&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.u3.in6,
 					sizeof(sin6.sin6_addr));
-	sin6.sin6_scope_id = sk->sk_bound_dev_if;
 
 	nf_ct_put(ct);
+
+	if (ipv6_addr_type(&sin6.sin6_addr) & IPV6_ADDR_LINKLOCAL)
+		sin6.sin6_scope_id = sk->sk_bound_dev_if;
+	else
+		sin6.sin6_scope_id = 0;
+
 	return copy_to_user(user, &sin6, sizeof(sin6)) ? -EFAULT : 0;
 }
 
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 1/4] ipvs: remove silly double assignment
From: pablo @ 2012-11-16 12:40 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1353069653-3231-1-git-send-email-pablo@netfilter.org>

From: Alan Cox <alan@linux.intel.com>

I don't even want to think what the C spec says for this 8)

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/ipvs/ip_vs_nfct.c |    2 +-
 net/netfilter/ipvs/ip_vs_xmit.c |    8 ++++----
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_nfct.c b/net/netfilter/ipvs/ip_vs_nfct.c
index 022e77e..c8beafd 100644
--- a/net/netfilter/ipvs/ip_vs_nfct.c
+++ b/net/netfilter/ipvs/ip_vs_nfct.c
@@ -82,7 +82,7 @@ void
 ip_vs_update_conntrack(struct sk_buff *skb, struct ip_vs_conn *cp, int outin)
 {
 	enum ip_conntrack_info ctinfo;
-	struct nf_conn *ct = ct = nf_ct_get(skb, &ctinfo);
+	struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
 	struct nf_conntrack_tuple new_tuple;
 
 	if (ct == NULL || nf_ct_is_confirmed(ct) || nf_ct_is_untracked(ct) ||
diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index 12008b4..ee6b7a9 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -594,7 +594,7 @@ ip_vs_nat_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 #if IS_ENABLED(CONFIG_NF_CONNTRACK)
 	if (cp->flags & IP_VS_CONN_F_SYNC && local) {
 		enum ip_conntrack_info ctinfo;
-		struct nf_conn *ct = ct = nf_ct_get(skb, &ctinfo);
+		struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
 
 		if (ct && !nf_ct_is_untracked(ct)) {
 			IP_VS_DBG_RL_PKT(10, AF_INET, pp, skb, 0,
@@ -710,7 +710,7 @@ ip_vs_nat_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 #if IS_ENABLED(CONFIG_NF_CONNTRACK)
 	if (cp->flags & IP_VS_CONN_F_SYNC && local) {
 		enum ip_conntrack_info ctinfo;
-		struct nf_conn *ct = ct = nf_ct_get(skb, &ctinfo);
+		struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
 
 		if (ct && !nf_ct_is_untracked(ct)) {
 			IP_VS_DBG_RL_PKT(10, AF_INET6, pp, skb, 0,
@@ -1235,7 +1235,7 @@ ip_vs_icmp_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
 #if IS_ENABLED(CONFIG_NF_CONNTRACK)
 	if (cp->flags & IP_VS_CONN_F_SYNC && local) {
 		enum ip_conntrack_info ctinfo;
-		struct nf_conn *ct = ct = nf_ct_get(skb, &ctinfo);
+		struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
 
 		if (ct && !nf_ct_is_untracked(ct)) {
 			IP_VS_DBG(10, "%s(): "
@@ -1356,7 +1356,7 @@ ip_vs_icmp_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
 #if IS_ENABLED(CONFIG_NF_CONNTRACK)
 	if (cp->flags & IP_VS_CONN_F_SYNC && local) {
 		enum ip_conntrack_info ctinfo;
-		struct nf_conn *ct = ct = nf_ct_get(skb, &ctinfo);
+		struct nf_conn *ct = nf_ct_get(skb, &ctinfo);
 
 		if (ct && !nf_ct_is_untracked(ct)) {
 			IP_VS_DBG(10, "%s(): "
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 2/4] netfilter: nf_nat: use PTR_RET
From: pablo @ 2012-11-16 12:40 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1353069653-3231-1-git-send-email-pablo@netfilter.org>

From: Wu Fengguang <fengguang.wu@intel.com>

Use PTR_RET rather than if(IS_ERR(...)) + PTR_ERR

Generated by: coccinelle/api/ptr_ret.cocci

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/ipv4/netfilter/iptable_nat.c  |    4 +---
 net/ipv6/netfilter/ip6table_nat.c |    4 +---
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/netfilter/iptable_nat.c b/net/ipv4/netfilter/iptable_nat.c
index 9e0ffaf..8d65b74 100644
--- a/net/ipv4/netfilter/iptable_nat.c
+++ b/net/ipv4/netfilter/iptable_nat.c
@@ -274,9 +274,7 @@ static int __net_init iptable_nat_net_init(struct net *net)
 		return -ENOMEM;
 	net->ipv4.nat_table = ipt_register_table(net, &nf_nat_ipv4_table, repl);
 	kfree(repl);
-	if (IS_ERR(net->ipv4.nat_table))
-		return PTR_ERR(net->ipv4.nat_table);
-	return 0;
+	return PTR_RET(net->ipv4.nat_table);
 }
 
 static void __net_exit iptable_nat_net_exit(struct net *net)
diff --git a/net/ipv6/netfilter/ip6table_nat.c b/net/ipv6/netfilter/ip6table_nat.c
index e418bd6..4c8219e 100644
--- a/net/ipv6/netfilter/ip6table_nat.c
+++ b/net/ipv6/netfilter/ip6table_nat.c
@@ -275,9 +275,7 @@ static int __net_init ip6table_nat_net_init(struct net *net)
 		return -ENOMEM;
 	net->ipv6.ip6table_nat = ip6t_register_table(net, &nf_nat_ipv6_table, repl);
 	kfree(repl);
-	if (IS_ERR(net->ipv6.ip6table_nat))
-		return PTR_ERR(net->ipv6.ip6table_nat);
-	return 0;
+	return PTR_RET(net->ipv6.ip6table_nat);
 }
 
 static void __net_exit ip6table_nat_net_exit(struct net *net)
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH 3/4] netfilter: ipv6: add getsockopt to retrieve origdst
From: pablo @ 2012-11-16 12:40 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <1353069653-3231-1-git-send-email-pablo@netfilter.org>

From: Florian Westphal <fw@strlen.de>

userspace can query the original ipv4 destination address of a REDIRECTed
connection via
getsockopt(m_sock, SOL_IP, SO_ORIGINAL_DST, &m_server_addr, &addrsize)

but for ipv6 no such option existed.

This adds getsockopt(..., IPPROTO_IPV6, IP6T_SO_ORIGINAL_DST, ...).

Without this, userspace needs to parse /proc or use ctnetlink, which
appears to be overkill.

This uses option number 80 for IP6T_SO_ORIGINAL_DST, which is spare,
to use the same number we use in the IPv4 socket option SO_ORIGINAL_DST.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/uapi/linux/in6.h                       |    1 +
 include/uapi/linux/netfilter_ipv6/ip6_tables.h |    3 ++
 net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c |   61 ++++++++++++++++++++++++
 3 files changed, 65 insertions(+)

diff --git a/include/uapi/linux/in6.h b/include/uapi/linux/in6.h
index 1e31599..f79c372 100644
--- a/include/uapi/linux/in6.h
+++ b/include/uapi/linux/in6.h
@@ -240,6 +240,7 @@ struct in6_flowlabel_req {
  *
  * IP6T_SO_GET_REVISION_MATCH	68
  * IP6T_SO_GET_REVISION_TARGET	69
+ * IP6T_SO_ORIGINAL_DST		80
  */
 
 /* RFC5014: Source address selection */
diff --git a/include/uapi/linux/netfilter_ipv6/ip6_tables.h b/include/uapi/linux/netfilter_ipv6/ip6_tables.h
index bf1ef65..649c680 100644
--- a/include/uapi/linux/netfilter_ipv6/ip6_tables.h
+++ b/include/uapi/linux/netfilter_ipv6/ip6_tables.h
@@ -178,6 +178,9 @@ struct ip6t_error {
 #define IP6T_SO_GET_REVISION_TARGET	(IP6T_BASE_CTL + 5)
 #define IP6T_SO_GET_MAX			IP6T_SO_GET_REVISION_TARGET
 
+/* obtain original address if REDIRECT'd connection */
+#define IP6T_SO_ORIGINAL_DST            80
+
 /* ICMP matching stuff */
 struct ip6t_icmp {
 	__u8 type;				/* type to match */
diff --git a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
index 8860d23..02dcafd 100644
--- a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
+++ b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
@@ -21,6 +21,7 @@
 
 #include <linux/netfilter_bridge.h>
 #include <linux/netfilter_ipv6.h>
+#include <linux/netfilter_ipv6/ip6_tables.h>
 #include <net/netfilter/nf_conntrack.h>
 #include <net/netfilter/nf_conntrack_helper.h>
 #include <net/netfilter/nf_conntrack_l4proto.h>
@@ -295,6 +296,50 @@ static struct nf_hook_ops ipv6_conntrack_ops[] __read_mostly = {
 	},
 };
 
+static int
+ipv6_getorigdst(struct sock *sk, int optval, void __user *user, int *len)
+{
+	const struct inet_sock *inet = inet_sk(sk);
+	const struct ipv6_pinfo *inet6 = inet6_sk(sk);
+	const struct nf_conntrack_tuple_hash *h;
+	struct sockaddr_in6 sin6;
+	struct nf_conntrack_tuple tuple = { .src.l3num = NFPROTO_IPV6 };
+	struct nf_conn *ct;
+
+	tuple.src.u3.in6 = inet6->rcv_saddr;
+	tuple.src.u.tcp.port = inet->inet_sport;
+	tuple.dst.u3.in6 = inet6->daddr;
+	tuple.dst.u.tcp.port = inet->inet_dport;
+	tuple.dst.protonum = sk->sk_protocol;
+
+	if (sk->sk_protocol != IPPROTO_TCP && sk->sk_protocol != IPPROTO_SCTP)
+		return -ENOPROTOOPT;
+
+	if (*len < 0 || (unsigned int) *len < sizeof(sin6))
+		return -EINVAL;
+
+	h = nf_conntrack_find_get(sock_net(sk), NF_CT_DEFAULT_ZONE, &tuple);
+	if (!h) {
+		pr_debug("IP6T_SO_ORIGINAL_DST: Can't find %pI6c/%u-%pI6c/%u.\n",
+			 &tuple.src.u3.ip6, ntohs(tuple.src.u.tcp.port),
+			 &tuple.dst.u3.ip6, ntohs(tuple.dst.u.tcp.port));
+		return -ENOENT;
+	}
+
+	ct = nf_ct_tuplehash_to_ctrack(h);
+
+	sin6.sin6_family = AF_INET6;
+	sin6.sin6_port = ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.u.tcp.port;
+	sin6.sin6_flowinfo = inet6->flow_label & IPV6_FLOWINFO_MASK;
+	memcpy(&sin6.sin6_addr,
+		&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.u3.in6,
+					sizeof(sin6.sin6_addr));
+	sin6.sin6_scope_id = sk->sk_bound_dev_if;
+
+	nf_ct_put(ct);
+	return copy_to_user(user, &sin6, sizeof(sin6)) ? -EFAULT : 0;
+}
+
 #if defined(CONFIG_NF_CT_NETLINK) || defined(CONFIG_NF_CT_NETLINK_MODULE)
 
 #include <linux/netfilter/nfnetlink.h>
@@ -359,6 +404,14 @@ MODULE_ALIAS("nf_conntrack-" __stringify(AF_INET6));
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Yasuyuki KOZAKAI @USAGI <yasuyuki.kozakai@toshiba.co.jp>");
 
+static struct nf_sockopt_ops so_getorigdst6 = {
+	.pf		= NFPROTO_IPV6,
+	.get_optmin	= IP6T_SO_ORIGINAL_DST,
+	.get_optmax	= IP6T_SO_ORIGINAL_DST + 1,
+	.get		= ipv6_getorigdst,
+	.owner		= THIS_MODULE,
+};
+
 static int ipv6_net_init(struct net *net)
 {
 	int ret = 0;
@@ -425,6 +478,12 @@ static int __init nf_conntrack_l3proto_ipv6_init(void)
 	need_conntrack();
 	nf_defrag_ipv6_enable();
 
+	ret = nf_register_sockopt(&so_getorigdst6);
+	if (ret < 0) {
+		pr_err("Unable to register netfilter socket option\n");
+		return ret;
+	}
+
 	ret = register_pernet_subsys(&ipv6_net_ops);
 	if (ret < 0)
 		goto cleanup_pernet;
@@ -440,6 +499,7 @@ static int __init nf_conntrack_l3proto_ipv6_init(void)
  cleanup_ipv6:
 	unregister_pernet_subsys(&ipv6_net_ops);
  cleanup_pernet:
+	nf_unregister_sockopt(&so_getorigdst6);
 	return ret;
 }
 
@@ -448,6 +508,7 @@ static void __exit nf_conntrack_l3proto_ipv6_fini(void)
 	synchronize_net();
 	nf_unregister_hooks(ipv6_conntrack_ops, ARRAY_SIZE(ipv6_conntrack_ops));
 	unregister_pernet_subsys(&ipv6_net_ops);
+	nf_unregister_sockopt(&so_getorigdst6);
 }
 
 module_init(nf_conntrack_l3proto_ipv6_init);
-- 
1.7.10.4

^ permalink raw reply related

* [PATCH net-next 0/17] Make the network stack usable by userns root
From: Eric W. Biederman @ 2012-11-16 13:01 UTC (permalink / raw)
  To: David Miller; +Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Linux Containers


In a secondary user namespace the root user only has CAP_NET_ADMIN,
CAP_NET_RAW and CAP_NET_BIND_SERVICE with respect to the secondary user
namespace.  The test "capable(CAP_NET_ADMIN)" tests for capabilities in
the initial user namespace.

The following set of patches goes through the networking stack.  First
pushing the capable(CAP_NET_ADMIN) admin calls down farther in the stack
so individual instances can be changed.  Then where I have I it appears
safe I have relaxed the permission checks.

The code is available in git from:
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git netns-v73

The netns-v73 branch is against v3.7-rc3 and merges cleanly with net-next.

In my user namespace tree I am working to allow unprivileged users to
create user namespace, and to allow the user namespace root able to
create network namespaces.  Making these patches really about allowing
unprivileged users able to use the networking stack (not that they will
be able to talk to anyone).

David I have some small dependencies on the first two patches of this
series in my later user namespace work.  So after these changes have
been reviewed if you can pull my netns-v73 branch (which is just these
patches) into net-next that will help me avoid unnecessary conflicts.

Eric

Eric W. Biederman (16):
      netns: Deduplicate and fix copy_net_ns when !CONFIG_NET_NS
      userns: make each net (net_ns) belong to a user_ns
      sysctl: Pass useful parameters to sysctl permissions
      net: Don't export sysctls to unprivileged users
      net: Push capable(CAP_NET_ADMIN) into the rtnl methods
      net: Update the per network namespace sysctls to be available to the network namespace owner
      net: Allow userns root to force the scm creds
      net: Allow userns root control of the core of the network stack.
      net: Allow userns root to control ipv4
      net: Allow userns root to control ipv6
      net: Allow userns root to control llc, netfilter, netlink, packet, and xfrm
      net: Allow userns root to control the network bridge code.
      net: Allow the userns root to control vlans.
      net: Enable some sysctls that are safe for the userns root
      net: Enable a userns root rtnl calls that are safe for unprivilged users
      net: Make CAP_NET_BIND_SERVICE per user namespace

Zhao Hongjiang (1):
      user_ns: get rid of duplicate code in net_ctl_permissions

 fs/proc/proc_sysctl.c                   |    9 +++++----
 include/linux/sysctl.h                  |    3 +--
 include/net/net_namespace.h             |   24 ++++++++++++++++--------
 kernel/nsproxy.c                        |    2 +-
 net/8021q/vlan.c                        |   12 ++++++------
 net/bridge/br_ioctl.c                   |   25 +++++++++++++------------
 net/bridge/br_sysfs_br.c                |   10 +++++-----
 net/bridge/br_sysfs_if.c                |    2 +-
 net/can/gw.c                            |    6 ++++++
 net/core/dev.c                          |   17 +++++++++++++----
 net/core/ethtool.c                      |    2 +-
 net/core/neighbour.c                    |    4 ++++
 net/core/net-sysfs.c                    |   15 ++++++++++-----
 net/core/net_namespace.c                |   23 ++++++++++++-----------
 net/core/rtnetlink.c                    |   12 +++++++++++-
 net/core/scm.c                          |    6 +++---
 net/core/sock.c                         |    7 ++++---
 net/core/sysctl_net_core.c              |    5 +++++
 net/dcb/dcbnl.c                         |    3 +++
 net/decnet/dn_dev.c                     |    6 ++++++
 net/decnet/dn_fib.c                     |    6 ++++++
 net/ipv4/af_inet.c                      |    9 ++++++---
 net/ipv4/arp.c                          |    2 +-
 net/ipv4/devinet.c                      |    4 ++--
 net/ipv4/fib_frontend.c                 |    2 +-
 net/ipv4/ip_fragment.c                  |    4 ++++
 net/ipv4/ip_gre.c                       |    4 ++--
 net/ipv4/ip_options.c                   |    6 +++---
 net/ipv4/ip_sockglue.c                  |    5 +++--
 net/ipv4/ip_vti.c                       |    4 ++--
 net/ipv4/ipip.c                         |    4 ++--
 net/ipv4/ipmr.c                         |    2 +-
 net/ipv4/netfilter/arp_tables.c         |    8 ++++----
 net/ipv4/netfilter/ip_tables.c          |    8 ++++----
 net/ipv4/route.c                        |    4 ++++
 net/ipv4/sysctl_net_ipv4.c              |    3 +++
 net/ipv4/tcp.c                          |    2 +-
 net/ipv4/tcp_cong.c                     |    3 ++-
 net/ipv6/addrconf.c                     |    4 ++--
 net/ipv6/af_inet6.c                     |    5 +++--
 net/ipv6/anycast.c                      |    2 +-
 net/ipv6/datagram.c                     |    6 +++---
 net/ipv6/ip6_flowlabel.c                |    3 ++-
 net/ipv6/ip6_gre.c                      |    4 ++--
 net/ipv6/ip6_tunnel.c                   |    4 ++--
 net/ipv6/ip6mr.c                        |    2 +-
 net/ipv6/ipv6_sockglue.c                |    7 ++++---
 net/ipv6/netfilter/ip6_tables.c         |    8 ++++----
 net/ipv6/reassembly.c                   |    4 ++++
 net/ipv6/route.c                        |    6 +++++-
 net/ipv6/sit.c                          |    8 ++++----
 net/key/af_key.c                        |    2 +-
 net/llc/af_llc.c                        |    2 +-
 net/netfilter/ipset/ip_set_core.c       |    2 +-
 net/netfilter/ipvs/ip_vs_ctl.c          |    8 ++++++--
 net/netfilter/ipvs/ip_vs_lblc.c         |    7 ++++++-
 net/netfilter/ipvs/ip_vs_lblcr.c        |    4 ++++
 net/netfilter/nf_conntrack_acct.c       |    4 ++++
 net/netfilter/nf_conntrack_ecache.c     |    4 ++++
 net/netfilter/nf_conntrack_helper.c     |    4 ++++
 net/netfilter/nf_conntrack_proto_dccp.c |    8 ++++++--
 net/netfilter/nf_conntrack_standalone.c |    4 ++++
 net/netfilter/nf_conntrack_timestamp.c  |    4 ++++
 net/netfilter/nfnetlink.c               |    2 +-
 net/netlink/af_netlink.c                |    2 +-
 net/packet/af_packet.c                  |    2 +-
 net/phonet/pn_netlink.c                 |    6 ++++++
 net/sched/act_api.c                     |    3 +++
 net/sched/cls_api.c                     |    2 ++
 net/sched/sch_api.c                     |    9 +++++++++
 net/sctp/socket.c                       |    8 +++++---
 net/sysctl_net.c                        |   15 ++++++++++++---
 net/unix/sysctl_net_unix.c              |    4 ++++
 net/xfrm/xfrm_sysctl.c                  |    4 ++++
 net/xfrm/xfrm_user.c                    |    2 +-
 75 files changed, 308 insertions(+), 140 deletions(-)

^ permalink raw reply

* [PATCH net-next 01/17] netns: Deduplicate and fix copy_net_ns when !CONFIG_NET_NS
From: Eric W. Biederman @ 2012-11-16 13:02 UTC (permalink / raw)
  To: David Miller
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Linux Containers,
	Eric W. Biederman
In-Reply-To: <87d2zd8zwn.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

From: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

The copy of copy_net_ns used when the network stack is not
built is broken as it does not return -EINVAL when attempting
to create a new network namespace.  We don't even have
a previous network namespace.

Since we need a copy of copy_net_ns in net/net_namespace.h that is
available when the networking stack is not built at all move the
correct version of copy_net_ns from net_namespace.c into net_namespace.h
Leaving us with just 2 versions of copy_net_ns.  One version for when
we compile in network namespace suport and another stub for all other
occasions.

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 include/net/net_namespace.h |   15 +++++++++------
 net/core/net_namespace.c    |    7 -------
 2 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 95e6466..32dcb60 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -126,16 +126,19 @@ struct net {
 /* Init's network namespace */
 extern struct net init_net;
 
-#ifdef CONFIG_NET
+#ifdef CONFIG_NET_NS
 extern struct net *copy_net_ns(unsigned long flags, struct net *net_ns);
 
-#else /* CONFIG_NET */
-static inline struct net *copy_net_ns(unsigned long flags, struct net *net_ns)
+#else /* CONFIG_NET_NS */
+#include <linux/sched.h>
+#include <linux/nsproxy.h>
+static inline struct net *copy_net_ns(unsigned long flags, struct net *old_net)
 {
-	/* There is nothing to copy so this is a noop */
-	return net_ns;
+	if (flags & CLONE_NEWNET)
+		return ERR_PTR(-EINVAL);
+	return old_net;
 }
-#endif /* CONFIG_NET */
+#endif /* CONFIG_NET_NS */
 
 
 extern struct list_head net_namespace_list;
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 42f1e1c..2c1c590 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -347,13 +347,6 @@ struct net *get_net_ns_by_fd(int fd)
 }
 
 #else
-struct net *copy_net_ns(unsigned long flags, struct net *old_net)
-{
-	if (flags & CLONE_NEWNET)
-		return ERR_PTR(-EINVAL);
-	return old_net;
-}
-
 struct net *get_net_ns_by_fd(int fd)
 {
 	return ERR_PTR(-EINVAL);
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH net-next 03/17] sysctl: Pass useful parameters to sysctl permissions
From: Eric W. Biederman @ 2012-11-16 13:02 UTC (permalink / raw)
  To: David Miller
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Linux Containers,
	Eric W. Biederman
In-Reply-To: <1353070992-5552-1-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

From: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

- Current is implicitly avaiable so passing current->nsproxy isn't useful.
- The ctl_table_header is needed to find how the sysctl table is connected
  to the rest of sysctl.
- ctl_table_root is avaiable in the ctl_table_header so no need to it.

With these changes it becomes possible to write a version of
net_sysctl_permission that takes into account the network namespace of
the sysctl table, an important feature in extending the user namespace.

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 fs/proc/proc_sysctl.c  |    9 +++++----
 include/linux/sysctl.h |    3 +--
 net/sysctl_net.c       |    3 +--
 3 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index a781bdf..701580d 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -378,12 +378,13 @@ static int test_perm(int mode, int op)
 	return -EACCES;
 }
 
-static int sysctl_perm(struct ctl_table_root *root, struct ctl_table *table, int op)
+static int sysctl_perm(struct ctl_table_header *head, struct ctl_table *table, int op)
 {
+	struct ctl_table_root *root = head->root;
 	int mode;
 
 	if (root->permissions)
-		mode = root->permissions(root, current->nsproxy, table);
+		mode = root->permissions(head, table);
 	else
 		mode = table->mode;
 
@@ -491,7 +492,7 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf,
 	 * and won't be until we finish.
 	 */
 	error = -EPERM;
-	if (sysctl_perm(head->root, table, write ? MAY_WRITE : MAY_READ))
+	if (sysctl_perm(head, table, write ? MAY_WRITE : MAY_READ))
 		goto out;
 
 	/* if that can happen at all, it should be -EINVAL, not -EISDIR */
@@ -717,7 +718,7 @@ static int proc_sys_permission(struct inode *inode, int mask)
 	if (!table) /* global root - r-xr-xr-x */
 		error = mask & MAY_WRITE ? -EACCES : 0;
 	else /* Use the permissions on the sysctl table entry */
-		error = sysctl_perm(head->root, table, mask & ~MAY_NOT_BLOCK);
+		error = sysctl_perm(head, table, mask & ~MAY_NOT_BLOCK);
 
 	sysctl_head_finish(head);
 	return error;
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index cd844a6..14a8ff2 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -158,8 +158,7 @@ struct ctl_table_root {
 	struct ctl_table_set default_set;
 	struct ctl_table_set *(*lookup)(struct ctl_table_root *root,
 					   struct nsproxy *namespaces);
-	int (*permissions)(struct ctl_table_root *root,
-			struct nsproxy *namespaces, struct ctl_table *table);
+	int (*permissions)(struct ctl_table_header *head, struct ctl_table *table);
 };
 
 /* struct ctl_path describes where in the hierarchy a table is added */
diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index e3a6e37..e98f393 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -38,8 +38,7 @@ static int is_seen(struct ctl_table_set *set)
 }
 
 /* Return standard mode bits for table entry. */
-static int net_ctl_permissions(struct ctl_table_root *root,
-			       struct nsproxy *nsproxy,
+static int net_ctl_permissions(struct ctl_table_header *head,
 			       struct ctl_table *table)
 {
 	/* Allow network administrator to have same access as root. */
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH net-next 04/17] net: Don't export sysctls to unprivileged users
From: Eric W. Biederman @ 2012-11-16 13:02 UTC (permalink / raw)
  To: David Miller
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Linux Containers,
	Eric W. Biederman
In-Reply-To: <1353070992-5552-1-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

From: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

In preparation for supporting the creation of network namespaces
by unprivileged users, modify all of the per net sysctl exports
and refuse to allow them to unprivileged users.

This makes it safe for unprivileged users in general to access
per net sysctls, and allows sysctls to be exported to unprivileged
users on an individual basis as they are deemed safe.

Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 net/core/neighbour.c                    |    4 ++++
 net/core/sysctl_net_core.c              |    5 +++++
 net/ipv4/devinet.c                      |    8 ++++++++
 net/ipv4/ip_fragment.c                  |    4 ++++
 net/ipv4/route.c                        |    4 ++++
 net/ipv4/sysctl_net_ipv4.c              |    3 +++
 net/ipv6/addrconf.c                     |    4 ++++
 net/ipv6/icmp.c                         |    7 ++++++-
 net/ipv6/reassembly.c                   |    4 ++++
 net/ipv6/route.c                        |    4 ++++
 net/ipv6/sysctl_net_ipv6.c              |    4 ++++
 net/netfilter/ipvs/ip_vs_ctl.c          |    4 ++++
 net/netfilter/ipvs/ip_vs_lblc.c         |    7 ++++++-
 net/netfilter/ipvs/ip_vs_lblcr.c        |    4 ++++
 net/netfilter/nf_conntrack_acct.c       |    4 ++++
 net/netfilter/nf_conntrack_ecache.c     |    4 ++++
 net/netfilter/nf_conntrack_helper.c     |    4 ++++
 net/netfilter/nf_conntrack_proto_dccp.c |    8 ++++++--
 net/netfilter/nf_conntrack_standalone.c |    4 ++++
 net/netfilter/nf_conntrack_timestamp.c  |    4 ++++
 net/unix/sysctl_net_unix.c              |    4 ++++
 net/xfrm/xfrm_sysctl.c                  |    4 ++++
 22 files changed, 98 insertions(+), 4 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 2257148..f1c0c2e 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -2987,6 +2987,10 @@ int neigh_sysctl_register(struct net_device *dev, struct neigh_parms *p,
 		t->neigh_vars[NEIGH_VAR_BASE_REACHABLE_TIME_MS].extra1 = dev;
 	}
 
+	/* Don't export sysctls to unprivileged users */
+	if (neigh_parms_net(p)->user_ns != &init_user_ns)
+		t->neigh_vars[0].procname = NULL;
+
 	snprintf(neigh_path, sizeof(neigh_path), "net/%s/neigh/%s",
 		p_name, dev_name_source);
 	t->sysctl_header =
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index a7c3684..d1b0804 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -216,6 +216,11 @@ static __net_init int sysctl_core_net_init(struct net *net)
 			goto err_dup;
 
 		tbl[0].data = &net->core.sysctl_somaxconn;
+
+		/* Don't export any sysctls to unprivileged users */
+		if (net->user_ns != &init_user_ns) {
+			tbl[0].procname = NULL;
+		}
 	}
 
 	net->core.sysctl_hdr = register_net_sysctl(net, "net/core", tbl);
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 2a6abc1..66df8d1 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1637,6 +1637,10 @@ static int __devinet_sysctl_register(struct net *net, char *dev_name,
 		t->devinet_vars[i].extra2 = net;
 	}
 
+	/* Don't export sysctls to unprivileged users */
+	if (net->user_ns != &init_user_ns)
+		t->devinet_vars[0].procname = NULL;
+
 	snprintf(path, sizeof(path), "net/ipv4/conf/%s", dev_name);
 
 	t->sysctl_header = register_net_sysctl(net, path, t->devinet_vars);
@@ -1722,6 +1726,10 @@ static __net_init int devinet_init_net(struct net *net)
 		tbl[0].data = &all->data[IPV4_DEVCONF_FORWARDING - 1];
 		tbl[0].extra1 = all;
 		tbl[0].extra2 = net;
+
+		/* Don't export sysctls to unprivileged users */
+		if (net->user_ns != &init_user_ns)
+			tbl[0].procname = NULL;
 #endif
 	}
 
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 448e685..1cf6a76 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -802,6 +802,10 @@ static int __net_init ip4_frags_ns_ctl_register(struct net *net)
 		table[0].data = &net->ipv4.frags.high_thresh;
 		table[1].data = &net->ipv4.frags.low_thresh;
 		table[2].data = &net->ipv4.frags.timeout;
+
+		/* Don't export sysctls to unprivileged users */
+		if (net->user_ns != &init_user_ns)
+			table[0].procname = NULL;
 	}
 
 	hdr = register_net_sysctl(net, "net/ipv4", table);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index a8c6512..5b58788 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2493,6 +2493,10 @@ static __net_init int sysctl_route_net_init(struct net *net)
 		tbl = kmemdup(tbl, sizeof(ipv4_route_flush_table), GFP_KERNEL);
 		if (tbl == NULL)
 			goto err_dup;
+
+		/* Don't export sysctls to unprivileged users */
+		if (net->user_ns != &init_user_ns)
+			tbl[0].procname = NULL;
 	}
 	tbl[0].extra1 = net;
 
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 63d4ecc..d84400b 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -883,6 +883,9 @@ static __net_init int ipv4_sysctl_init_net(struct net *net)
 		table[6].data =
 			&net->ipv4.sysctl_ping_group_range;
 
+		/* Don't export sysctls to unprivileged users */
+		if (net->user_ns != &init_user_ns)
+			table[0].procname = NULL;
 	}
 
 	/*
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 0424e4e..6378be4 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4582,6 +4582,10 @@ static int __addrconf_sysctl_register(struct net *net, char *dev_name,
 		t->addrconf_vars[i].extra2 = net;
 	}
 
+	/* Don't export sysctls to unprivileged users */
+	if (net->user_ns != &init_user_ns)
+		t->addrconf_vars[0].procname = NULL;
+
 	snprintf(path, sizeof(path), "net/ipv6/conf/%s", dev_name);
 
 	t->sysctl_header = register_net_sysctl(net, path, t->addrconf_vars);
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 24d69db..db9df8a 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -967,9 +967,14 @@ struct ctl_table * __net_init ipv6_icmp_sysctl_init(struct net *net)
 			sizeof(ipv6_icmp_table_template),
 			GFP_KERNEL);
 
-	if (table)
+	if (table) {
 		table[0].data = &net->ipv6.sysctl.icmpv6_time;
 
+		/* Don't export sysctls to unprivileged users */
+		if (net->user_ns != &init_user_ns)
+			table[0].procname = NULL;
+	}
+
 	return table;
 }
 #endif
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index da8a4e3..e5253ec 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -616,6 +616,10 @@ static int __net_init ip6_frags_ns_sysctl_register(struct net *net)
 		table[0].data = &net->ipv6.frags.high_thresh;
 		table[1].data = &net->ipv6.frags.low_thresh;
 		table[2].data = &net->ipv6.frags.timeout;
+
+		/* Don't export sysctls to unprivileged users */
+		if (net->user_ns != &init_user_ns)
+			table[0].procname = NULL;
 	}
 
 	hdr = register_net_sysctl(net, "net/ipv6", table);
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index b1e6cf0..551fb82 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2873,6 +2873,10 @@ struct ctl_table * __net_init ipv6_route_sysctl_init(struct net *net)
 		table[7].data = &net->ipv6.sysctl.ip6_rt_mtu_expires;
 		table[8].data = &net->ipv6.sysctl.ip6_rt_min_advmss;
 		table[9].data = &net->ipv6.sysctl.ip6_rt_gc_min_interval;
+
+		/* Don't export sysctls to unprivileged users */
+		if (net->user_ns != &init_user_ns)
+			table[0].procname = NULL;
 	}
 
 	return table;
diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
index e85c48b..b06fd07 100644
--- a/net/ipv6/sysctl_net_ipv6.c
+++ b/net/ipv6/sysctl_net_ipv6.c
@@ -52,6 +52,10 @@ static int __net_init ipv6_sysctl_net_init(struct net *net)
 		goto out;
 	ipv6_table[0].data = &net->ipv6.sysctl.bindv6only;
 
+	/* Don't export sysctls to unprivileged users */
+	if (net->user_ns != &init_user_ns)
+		ipv6_table[0].procname = NULL;
+
 	ipv6_route_table = ipv6_route_sysctl_init(net);
 	if (!ipv6_route_table)
 		goto out_ipv6_table;
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index c4ee437..c6cebd5 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -3699,6 +3699,10 @@ static int __net_init ip_vs_control_net_init_sysctl(struct net *net)
 		tbl = kmemdup(vs_vars, sizeof(vs_vars), GFP_KERNEL);
 		if (tbl == NULL)
 			return -ENOMEM;
+
+		/* Don't export sysctls to unprivileged users */
+		if (net->user_ns != &init_user_ns)
+			tbl[0].procname = NULL;
 	} else
 		tbl = vs_vars;
 	/* Initialize sysctl defaults */
diff --git a/net/netfilter/ipvs/ip_vs_lblc.c b/net/netfilter/ipvs/ip_vs_lblc.c
index df646cc..42ec368 100644
--- a/net/netfilter/ipvs/ip_vs_lblc.c
+++ b/net/netfilter/ipvs/ip_vs_lblc.c
@@ -560,6 +560,11 @@ static int __net_init __ip_vs_lblc_init(struct net *net)
 						GFP_KERNEL);
 		if (ipvs->lblc_ctl_table == NULL)
 			return -ENOMEM;
+
+		/* Don't export sysctls to unprivileged users */
+		if (net->user_ns != &init_user_ns)
+			ipvs->lblc_ctl_table[0].procname = NULL;
+
 	} else
 		ipvs->lblc_ctl_table = vs_vars_table;
 	ipvs->sysctl_lblc_expiration = DEFAULT_EXPIRATION;
@@ -569,7 +574,7 @@ static int __net_init __ip_vs_lblc_init(struct net *net)
 		register_net_sysctl(net, "net/ipv4/vs", ipvs->lblc_ctl_table);
 	if (!ipvs->lblc_ctl_header) {
 		if (!net_eq(net, &init_net))
-			kfree(ipvs->lblc_ctl_table);
+			kfree(ipvs->lblc_ctl_table);\
 		return -ENOMEM;
 	}
 
diff --git a/net/netfilter/ipvs/ip_vs_lblcr.c b/net/netfilter/ipvs/ip_vs_lblcr.c
index 570e31e..2c54a30 100644
--- a/net/netfilter/ipvs/ip_vs_lblcr.c
+++ b/net/netfilter/ipvs/ip_vs_lblcr.c
@@ -754,6 +754,10 @@ static int __net_init __ip_vs_lblcr_init(struct net *net)
 						GFP_KERNEL);
 		if (ipvs->lblcr_ctl_table == NULL)
 			return -ENOMEM;
+
+		/* Don't export sysctls to unprivileged users */
+		if (net->user_ns != &init_user_ns)
+			ipvs->lblcr_ctl_table[0].procname = NULL;
 	} else
 		ipvs->lblcr_ctl_table = vs_vars_table;
 	ipvs->sysctl_lblcr_expiration = DEFAULT_EXPIRATION;
diff --git a/net/netfilter/nf_conntrack_acct.c b/net/netfilter/nf_conntrack_acct.c
index d61e078..7df424e 100644
--- a/net/netfilter/nf_conntrack_acct.c
+++ b/net/netfilter/nf_conntrack_acct.c
@@ -69,6 +69,10 @@ static int nf_conntrack_acct_init_sysctl(struct net *net)
 
 	table[0].data = &net->ct.sysctl_acct;
 
+	/* Don't export sysctls to unprivileged users */
+	if (net->user_ns != &init_user_ns)
+		table[0].procname = NULL;
+
 	net->ct.acct_sysctl_header = register_net_sysctl(net, "net/netfilter",
 							 table);
 	if (!net->ct.acct_sysctl_header) {
diff --git a/net/netfilter/nf_conntrack_ecache.c b/net/netfilter/nf_conntrack_ecache.c
index de9781b..faa978f 100644
--- a/net/netfilter/nf_conntrack_ecache.c
+++ b/net/netfilter/nf_conntrack_ecache.c
@@ -196,6 +196,10 @@ static int nf_conntrack_event_init_sysctl(struct net *net)
 	table[0].data = &net->ct.sysctl_events;
 	table[1].data = &net->ct.sysctl_events_retry_timeout;
 
+	/* Don't export sysctls to unprivileged users */
+	if (net->user_ns != &init_user_ns)
+		table[0].procname = NULL;
+
 	net->ct.event_sysctl_header =
 		register_net_sysctl(net, "net/netfilter", table);
 	if (!net->ct.event_sysctl_header) {
diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c
index c4bc637..884f2b3 100644
--- a/net/netfilter/nf_conntrack_helper.c
+++ b/net/netfilter/nf_conntrack_helper.c
@@ -64,6 +64,10 @@ static int nf_conntrack_helper_init_sysctl(struct net *net)
 
 	table[0].data = &net->ct.sysctl_auto_assign_helper;
 
+	/* Don't export sysctls to unprivileged users */
+	if (net->user_ns != &init_user_ns)
+		table[0].procname = NULL;
+
 	net->ct.helper_sysctl_header =
 		register_net_sysctl(net, "net/netfilter", table);
 
diff --git a/net/netfilter/nf_conntrack_proto_dccp.c b/net/netfilter/nf_conntrack_proto_dccp.c
index 6535326..a8ae287 100644
--- a/net/netfilter/nf_conntrack_proto_dccp.c
+++ b/net/netfilter/nf_conntrack_proto_dccp.c
@@ -815,7 +815,7 @@ static struct ctl_table dccp_sysctl_table[] = {
 };
 #endif /* CONFIG_SYSCTL */
 
-static int dccp_kmemdup_sysctl_table(struct nf_proto_net *pn,
+static int dccp_kmemdup_sysctl_table(struct net *net, struct nf_proto_net *pn,
 				     struct dccp_net *dn)
 {
 #ifdef CONFIG_SYSCTL
@@ -836,6 +836,10 @@ static int dccp_kmemdup_sysctl_table(struct nf_proto_net *pn,
 	pn->ctl_table[5].data = &dn->dccp_timeout[CT_DCCP_CLOSING];
 	pn->ctl_table[6].data = &dn->dccp_timeout[CT_DCCP_TIMEWAIT];
 	pn->ctl_table[7].data = &dn->dccp_loose;
+
+	/* Don't export sysctls to unprivileged users */
+	if (net->user_ns != &init_user_ns)
+		pn->ctl_table[0].procname = NULL;
 #endif
 	return 0;
 }
@@ -857,7 +861,7 @@ static int dccp_init_net(struct net *net, u_int16_t proto)
 		dn->dccp_timeout[CT_DCCP_TIMEWAIT]	= 2 * DCCP_MSL;
 	}
 
-	return dccp_kmemdup_sysctl_table(pn, dn);
+	return dccp_kmemdup_sysctl_table(net, pn, dn);
 }
 
 static struct nf_conntrack_l4proto dccp_proto4 __read_mostly = {
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index 9b39432..363285d 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -489,6 +489,10 @@ static int nf_conntrack_standalone_init_sysctl(struct net *net)
 	table[3].data = &net->ct.sysctl_checksum;
 	table[4].data = &net->ct.sysctl_log_invalid;
 
+	/* Don't export sysctls to unprivileged users */
+	if (net->user_ns != &init_user_ns)
+		table[0].procname = NULL;
+
 	net->ct.sysctl_header = register_net_sysctl(net, "net/netfilter", table);
 	if (!net->ct.sysctl_header)
 		goto out_unregister_netfilter;
diff --git a/net/netfilter/nf_conntrack_timestamp.c b/net/netfilter/nf_conntrack_timestamp.c
index dbb364f..7ea8026 100644
--- a/net/netfilter/nf_conntrack_timestamp.c
+++ b/net/netfilter/nf_conntrack_timestamp.c
@@ -51,6 +51,10 @@ static int nf_conntrack_tstamp_init_sysctl(struct net *net)
 
 	table[0].data = &net->ct.sysctl_tstamp;
 
+	/* Don't export sysctls to unprivileged users */
+	if (net->user_ns != &init_user_ns)
+		table[0].procname = NULL;
+
 	net->ct.tstamp_sysctl_header = register_net_sysctl(net,	"net/netfilter",
 							   table);
 	if (!net->ct.tstamp_sysctl_header) {
diff --git a/net/unix/sysctl_net_unix.c b/net/unix/sysctl_net_unix.c
index b34b5b9..8800604 100644
--- a/net/unix/sysctl_net_unix.c
+++ b/net/unix/sysctl_net_unix.c
@@ -34,6 +34,10 @@ int __net_init unix_sysctl_register(struct net *net)
 	if (table == NULL)
 		goto err_alloc;
 
+	/* Don't export sysctls to unprivileged users */
+	if (net->user_ns != &init_user_ns)
+		table[0].procname = NULL;
+
 	table[0].data = &net->unx.sysctl_max_dgram_qlen;
 	net->unx.ctl = register_net_sysctl(net, "net/unix", table);
 	if (net->unx.ctl == NULL)
diff --git a/net/xfrm/xfrm_sysctl.c b/net/xfrm/xfrm_sysctl.c
index 380976f..05a6e3d 100644
--- a/net/xfrm/xfrm_sysctl.c
+++ b/net/xfrm/xfrm_sysctl.c
@@ -54,6 +54,10 @@ int __net_init xfrm_sysctl_init(struct net *net)
 	table[2].data = &net->xfrm.sysctl_larval_drop;
 	table[3].data = &net->xfrm.sysctl_acq_expires;
 
+	/* Don't export sysctls to unprivileged users */
+	if (net->user_ns != &init_user_ns)
+		table[0].procname = NULL;
+
 	net->xfrm.sysctl_hdr = register_net_sysctl(net, "net/core", table);
 	if (!net->xfrm.sysctl_hdr)
 		goto out_register;
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH net-next 05/17] net: Push capable(CAP_NET_ADMIN) into the rtnl methods
From: Eric W. Biederman @ 2012-11-16 13:03 UTC (permalink / raw)
  To: David Miller
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Linux Containers,
	Eric W. Biederman
In-Reply-To: <1353070992-5552-1-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

From: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

- In rtnetlink_rcv_msg convert the capable(CAP_NET_ADMIN) check
  to ns_capable(net->user-ns, CAP_NET_ADMIN).  Allowing unprivileged
  users to make netlink calls to modify their local network
  namespace.

- In the rtnetlink doit methods add capable(CAP_NET_ADMIN) so
  that calls that are not safe for unprivileged users are still
  protected.

Later patches will remove the extra capable calls from methods
that are safe for unprivilged users.

Acked-by: Serge Hallyn <serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 net/bridge/br_netlink.c |    3 +++
 net/can/gw.c            |    6 ++++++
 net/core/fib_rules.c    |    6 ++++++
 net/core/neighbour.c    |    9 +++++++++
 net/core/rtnetlink.c    |   17 ++++++++++++++++-
 net/dcb/dcbnl.c         |    3 +++
 net/decnet/dn_dev.c     |    6 ++++++
 net/decnet/dn_fib.c     |    6 ++++++
 net/ipv4/devinet.c      |    6 ++++++
 net/ipv4/fib_frontend.c |    6 ++++++
 net/ipv6/addrconf.c     |    6 ++++++
 net/ipv6/addrlabel.c    |    3 +++
 net/ipv6/route.c        |    6 ++++++
 net/phonet/pn_netlink.c |    6 ++++++
 net/sched/act_api.c     |    3 +++
 net/sched/cls_api.c     |    2 ++
 net/sched/sch_api.c     |    9 +++++++++
 17 files changed, 102 insertions(+), 1 deletions(-)

diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 093f527..251d558 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -153,6 +153,9 @@ static int br_rtm_setlink(struct sk_buff *skb,  struct nlmsghdr *nlh, void *arg)
 	struct net_bridge_port *p;
 	u8 new_state;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	if (nlmsg_len(nlh) < sizeof(*ifm))
 		return -EINVAL;
 
diff --git a/net/can/gw.c b/net/can/gw.c
index 1f5c978..574dda7 100644
--- a/net/can/gw.c
+++ b/net/can/gw.c
@@ -751,6 +751,9 @@ static int cgw_create_job(struct sk_buff *skb,  struct nlmsghdr *nlh,
 	struct cgw_job *gwj;
 	int err = 0;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	if (nlmsg_len(nlh) < sizeof(*r))
 		return -EINVAL;
 
@@ -839,6 +842,9 @@ static int cgw_remove_job(struct sk_buff *skb,  struct nlmsghdr *nlh, void *arg)
 	struct can_can_gw ccgw;
 	int err = 0;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	if (nlmsg_len(nlh) < sizeof(*r))
 		return -EINVAL;
 
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 58a4ba2..bf5b5b8 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -275,6 +275,9 @@ static int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
 	struct nlattr *tb[FRA_MAX+1];
 	int err = -EINVAL, unresolved = 0;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*frh)))
 		goto errout;
 
@@ -424,6 +427,9 @@ static int fib_nl_delrule(struct sk_buff *skb, struct nlmsghdr* nlh, void *arg)
 	struct nlattr *tb[FRA_MAX+1];
 	int err = -EINVAL;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*frh)))
 		goto errout;
 
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index f1c0c2e..7adcdaf 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1620,6 +1620,9 @@ static int neigh_delete(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 	struct net_device *dev = NULL;
 	int err = -EINVAL;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	ASSERT_RTNL();
 	if (nlmsg_len(nlh) < sizeof(*ndm))
 		goto out;
@@ -1684,6 +1687,9 @@ static int neigh_add(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 	struct net_device *dev = NULL;
 	int err;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	ASSERT_RTNL();
 	err = nlmsg_parse(nlh, sizeof(*ndm), tb, NDA_MAX, NULL);
 	if (err < 0)
@@ -1962,6 +1968,9 @@ static int neightbl_set(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 	struct nlattr *tb[NDTA_MAX+1];
 	int err;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	err = nlmsg_parse(nlh, sizeof(*ndtmsg), tb, NDTA_MAX,
 			  nl_neightbl_policy);
 	if (err < 0)
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 76d4c2c..5d55c30 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1547,6 +1547,9 @@ static int rtnl_setlink(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 	struct nlattr *tb[IFLA_MAX+1];
 	char ifname[IFNAMSIZ];
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFLA_MAX, ifla_policy);
 	if (err < 0)
 		goto errout;
@@ -1590,6 +1593,9 @@ static int rtnl_dellink(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 	int err;
 	LIST_HEAD(list_kill);
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFLA_MAX, ifla_policy);
 	if (err < 0)
 		return err;
@@ -1720,6 +1726,9 @@ static int rtnl_newlink(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 	struct nlattr *linkinfo[IFLA_INFO_MAX+1];
 	int err;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 #ifdef CONFIG_MODULES
 replay:
 #endif
@@ -2057,6 +2066,9 @@ static int rtnl_fdb_add(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 	u8 *addr;
 	int err;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	err = nlmsg_parse(nlh, sizeof(*ndm), tb, NDA_MAX, NULL);
 	if (err < 0)
 		return err;
@@ -2123,6 +2135,9 @@ static int rtnl_fdb_del(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 	int err = -EINVAL;
 	__u8 *addr;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	if (nlmsg_len(nlh) < sizeof(*ndm))
 		return -EINVAL;
 
@@ -2282,7 +2297,7 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 	sz_idx = type>>2;
 	kind = type&3;
 
-	if (kind != 2 && !capable(CAP_NET_ADMIN))
+	if (kind != 2 && !ns_capable(net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (kind == 2 && nlh->nlmsg_flags&NLM_F_DUMP) {
diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c
index 70989e6..b07c75d 100644
--- a/net/dcb/dcbnl.c
+++ b/net/dcb/dcbnl.c
@@ -1662,6 +1662,9 @@ static int dcb_doit(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 	struct nlmsghdr *reply_nlh = NULL;
 	const struct reply_func *fn;
 
+	if ((nlh->nlmsg_type == RTM_SETDCB) && !capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	if (!net_eq(net, &init_net))
 		return -EINVAL;
 
diff --git a/net/decnet/dn_dev.c b/net/decnet/dn_dev.c
index 7b7e561..e47ba9f 100644
--- a/net/decnet/dn_dev.c
+++ b/net/decnet/dn_dev.c
@@ -573,6 +573,9 @@ static int dn_nl_deladdr(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 	struct dn_ifaddr __rcu **ifap;
 	int err = -EINVAL;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	if (!net_eq(net, &init_net))
 		goto errout;
 
@@ -614,6 +617,9 @@ static int dn_nl_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 	struct dn_ifaddr *ifa;
 	int err;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	if (!net_eq(net, &init_net))
 		return -EINVAL;
 
diff --git a/net/decnet/dn_fib.c b/net/decnet/dn_fib.c
index 102d610..e36614e 100644
--- a/net/decnet/dn_fib.c
+++ b/net/decnet/dn_fib.c
@@ -520,6 +520,9 @@ static int dn_fib_rtm_delroute(struct sk_buff *skb, struct nlmsghdr *nlh, void *
 	struct rtattr **rta = arg;
 	struct rtmsg *r = NLMSG_DATA(nlh);
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	if (!net_eq(net, &init_net))
 		return -EINVAL;
 
@@ -540,6 +543,9 @@ static int dn_fib_rtm_newroute(struct sk_buff *skb, struct nlmsghdr *nlh, void *
 	struct rtattr **rta = arg;
 	struct rtmsg *r = NLMSG_DATA(nlh);
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	if (!net_eq(net, &init_net))
 		return -EINVAL;
 
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index 66df8d1..ae95f64 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -538,6 +538,9 @@ static int inet_rtm_deladdr(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg
 
 	ASSERT_RTNL();
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFA_MAX, ifa_ipv4_policy);
 	if (err < 0)
 		goto errout;
@@ -645,6 +648,9 @@ static int inet_rtm_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg
 
 	ASSERT_RTNL();
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	ifa = rtm_to_ifaddr(net, nlh);
 	if (IS_ERR(ifa))
 		return PTR_ERR(ifa);
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 825c608..bce4541 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -613,6 +613,9 @@ static int inet_rtm_delroute(struct sk_buff *skb, struct nlmsghdr *nlh, void *ar
 	struct fib_table *tb;
 	int err;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	err = rtm_to_fib_config(net, skb, nlh, &cfg);
 	if (err < 0)
 		goto errout;
@@ -635,6 +638,9 @@ static int inet_rtm_newroute(struct sk_buff *skb, struct nlmsghdr *nlh, void *ar
 	struct fib_table *tb;
 	int err;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	err = rtm_to_fib_config(net, skb, nlh, &cfg);
 	if (err < 0)
 		goto errout;
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 6378be4..47c591d 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3369,6 +3369,9 @@ inet6_rtm_deladdr(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 	struct in6_addr *pfx;
 	int err;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFA_MAX, ifa_ipv6_policy);
 	if (err < 0)
 		return err;
@@ -3439,6 +3442,9 @@ inet6_rtm_newaddr(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg)
 	u8 ifa_flags;
 	int err;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	err = nlmsg_parse(nlh, sizeof(*ifm), tb, IFA_MAX, ifa_ipv6_policy);
 	if (err < 0)
 		return err;
diff --git a/net/ipv6/addrlabel.c b/net/ipv6/addrlabel.c
index ff76eec..b106f80 100644
--- a/net/ipv6/addrlabel.c
+++ b/net/ipv6/addrlabel.c
@@ -425,6 +425,9 @@ static int ip6addrlbl_newdel(struct sk_buff *skb, struct nlmsghdr *nlh,
 	u32 label;
 	int err = 0;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	err = nlmsg_parse(nlh, sizeof(*ifal), tb, IFAL_MAX, ifal_policy);
 	if (err < 0)
 		return err;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 551fb82..2a98397 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2336,6 +2336,9 @@ static int inet6_rtm_delroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *a
 	struct fib6_config cfg;
 	int err;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	err = rtm_to_fib6_config(skb, nlh, &cfg);
 	if (err < 0)
 		return err;
@@ -2348,6 +2351,9 @@ static int inet6_rtm_newroute(struct sk_buff *skb, struct nlmsghdr* nlh, void *a
 	struct fib6_config cfg;
 	int err;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	err = rtm_to_fib6_config(skb, nlh, &cfg);
 	if (err < 0)
 		return err;
diff --git a/net/phonet/pn_netlink.c b/net/phonet/pn_netlink.c
index 83a8389..0193630 100644
--- a/net/phonet/pn_netlink.c
+++ b/net/phonet/pn_netlink.c
@@ -70,6 +70,9 @@ static int addr_doit(struct sk_buff *skb, struct nlmsghdr *nlh, void *attr)
 	int err;
 	u8 pnaddr;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
@@ -230,6 +233,9 @@ static int route_doit(struct sk_buff *skb, struct nlmsghdr *nlh, void *attr)
 	int err;
 	u8 dst;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 102761d..65d240c 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -987,6 +987,9 @@ static int tc_ctl_action(struct sk_buff *skb, struct nlmsghdr *n, void *arg)
 	u32 portid = skb ? NETLINK_CB(skb).portid : 0;
 	int ret = 0, ovr = 0;
 
+	if ((n->nlmsg_type != RTM_GETACTION) && !capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	ret = nlmsg_parse(n, sizeof(struct tcamsg), tca, TCA_ACT_MAX, NULL);
 	if (ret < 0)
 		return ret;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 7ae0289..ff55ed6 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -139,6 +139,8 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct nlmsghdr *n, void *arg)
 	int err;
 	int tp_created = 0;
 
+	if ((n->nlmsg_type != RTM_GETTFILTER) && !capable(CAP_NET_ADMIN))
+		return -EPERM;
 replay:
 	t = nlmsg_data(n);
 	protocol = TC_H_MIN(t->tcm_info);
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index a18d975..7dae9d0 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -981,6 +981,9 @@ static int tc_get_qdisc(struct sk_buff *skb, struct nlmsghdr *n, void *arg)
 	struct Qdisc *p = NULL;
 	int err;
 
+	if ((n->nlmsg_type != RTM_GETQDISC) && !capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	dev = __dev_get_by_index(net, tcm->tcm_ifindex);
 	if (!dev)
 		return -ENODEV;
@@ -1044,6 +1047,9 @@ static int tc_modify_qdisc(struct sk_buff *skb, struct nlmsghdr *n, void *arg)
 	struct Qdisc *q, *p;
 	int err;
 
+	if (!capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 replay:
 	/* Reinit, just in case something touches this. */
 	tcm = nlmsg_data(n);
@@ -1380,6 +1386,9 @@ static int tc_ctl_tclass(struct sk_buff *skb, struct nlmsghdr *n, void *arg)
 	u32 qid = TC_H_MAJ(clid);
 	int err;
 
+	if ((n->nlmsg_type != RTM_GETTCLASS) && !capable(CAP_NET_ADMIN))
+		return -EPERM;
+
 	dev = __dev_get_by_index(net, tcm->tcm_ifindex);
 	if (!dev)
 		return -ENODEV;
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH net-next 06/17] net: Update the per network namespace sysctls to be available to the network namespace owner
From: Eric W. Biederman @ 2012-11-16 13:03 UTC (permalink / raw)
  To: David Miller
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Linux Containers,
	Eric W. Biederman
In-Reply-To: <1353070992-5552-1-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

From: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

- Allow anyone with CAP_NET_ADMIN rights in the user namespace of the
  the netowrk namespace to change sysctls.
- Allow anyone the uid of the user namespace root the same
  permissions over the network namespace sysctls as the global root.
- Allow anyone with gid of the user namespace root group the same
  permissions over the network namespace sysctl as the global root group.

Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 net/sysctl_net.c |   12 +++++++++++-
 1 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index e98f393..6410436 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -41,11 +41,21 @@ static int is_seen(struct ctl_table_set *set)
 static int net_ctl_permissions(struct ctl_table_header *head,
 			       struct ctl_table *table)
 {
+	struct net *net = container_of(head->set, struct net, sysctls);
+	kuid_t root_uid = make_kuid(net->user_ns, 0);
+	kgid_t root_gid = make_kgid(net->user_ns, 0);
+
 	/* Allow network administrator to have same access as root. */
-	if (capable(CAP_NET_ADMIN)) {
+	if (ns_capable(net->user_ns, CAP_NET_ADMIN) ||
+	    uid_eq(root_uid, current_uid())) {
 		int mode = (table->mode >> 6) & 7;
 		return (mode << 6) | (mode << 3) | mode;
 	}
+	/* Allow netns root group to have the same assess as the root group */
+	if (gid_eq(root_gid, current_gid())) {
+		int mode = (table->mode >> 3) & 7;
+		return (mode << 3) | (mode << 3) | mode;
+	}
 	return table->mode;
 }
 
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH net-next 07/17] user_ns: get rid of duplicate code in net_ctl_permissions
From: Eric W. Biederman @ 2012-11-16 13:03 UTC (permalink / raw)
  To: David Miller
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Linux Containers,
	Eric W. Biederman
In-Reply-To: <1353070992-5552-1-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

From: Zhao Hongjiang <zhaohongjiang-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

Get rid of duplicate code in net_ctl_permissions and fix the comment.

Signed-off-by: Zhao Hongjiang <zhaohongjiang-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 net/sysctl_net.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index 6410436..9bc6db0 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -51,10 +51,10 @@ static int net_ctl_permissions(struct ctl_table_header *head,
 		int mode = (table->mode >> 6) & 7;
 		return (mode << 6) | (mode << 3) | mode;
 	}
-	/* Allow netns root group to have the same assess as the root group */
+	/* Allow netns root group to have the same access as the root group */
 	if (gid_eq(root_gid, current_gid())) {
 		int mode = (table->mode >> 3) & 7;
-		return (mode << 3) | (mode << 3) | mode;
+		return (mode << 3) | mode;
 	}
 	return table->mode;
 }
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH net-next 08/17] net: Allow userns root to force the scm creds
From: Eric W. Biederman @ 2012-11-16 13:03 UTC (permalink / raw)
  To: David Miller
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Linux Containers,
	Eric W. Biederman
In-Reply-To: <1353070992-5552-1-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

From: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

If the user calling sendmsg has the appropriate privieleges
in their user namespace allow them to set the uid, gid, and
pid in the SCM_CREDENTIALS control message to any valid value.

Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 net/core/scm.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/core/scm.c b/net/core/scm.c
index ab57084..57fb1ee 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -51,11 +51,11 @@ static __inline__ int scm_check_creds(struct ucred *creds)
 	if (!uid_valid(uid) || !gid_valid(gid))
 		return -EINVAL;
 
-	if ((creds->pid == task_tgid_vnr(current) || capable(CAP_SYS_ADMIN)) &&
+	if ((creds->pid == task_tgid_vnr(current) || nsown_capable(CAP_SYS_ADMIN)) &&
 	    ((uid_eq(uid, cred->uid)   || uid_eq(uid, cred->euid) ||
-	      uid_eq(uid, cred->suid)) || capable(CAP_SETUID)) &&
+	      uid_eq(uid, cred->suid)) || nsown_capable(CAP_SETUID)) &&
 	    ((gid_eq(gid, cred->gid)   || gid_eq(gid, cred->egid) ||
-	      gid_eq(gid, cred->sgid)) || capable(CAP_SETGID))) {
+	      gid_eq(gid, cred->sgid)) || nsown_capable(CAP_SETGID))) {
 	       return 0;
 	}
 	return -EPERM;
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH net-next 10/17] net: Allow userns root to control ipv4
From: Eric W. Biederman @ 2012-11-16 13:03 UTC (permalink / raw)
  To: David Miller
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Linux Containers,
	Eric W. Biederman
In-Reply-To: <1353070992-5552-1-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

From: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Allow an unpriviled user who has created a user namespace, and then
created a network namespace to effectively use the new network
namespace, by reducing capable(CAP_NET_ADMIN) and
capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.

Settings that merely control a single network device are allowed.
Either the network device is a logical network device where
restrictions make no difference or the network device is hardware NIC
that has been explicity moved from the initial network namespace.

In general policy and network stack state changes are allowed
while resource control is left unchanged.

Allow creating raw sockets.
Allow the SIOCSARP ioctl to control the arp cache.
Allow the SIOCSIFFLAG ioctl to allow setting network device flags.
Allow the SIOCSIFADDR ioctl to allow setting a netdevice ipv4 address.
Allow the SIOCSIFBRDADDR ioctl to allow setting a netdevice ipv4 broadcast address.
Allow the SIOCSIFDSTADDR ioctl to allow setting a netdevice ipv4 destination address.
Allow the SIOCSIFNETMASK ioctl to allow setting a netdevice ipv4 netmask.
Allow the SIOCADDRT and SIOCDELRT ioctls to allow adding and deleting ipv4 routes.

Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
adding, changing and deleting gre tunnels.

Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
adding, changing and deleting ipip tunnels.

Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
adding, changing and deleting ipsec virtual tunnel interfaces.

Allow setting the MRT_INIT, MRT_DONE, MRT_ADD_VIF, MRT_DEL_VIF, MRT_ADD_MFC,
MRT_DEL_MFC, MRT_ASSERT, MRT_PIM, MRT_TABLE socket options on multicast routing
sockets.

Allow setting and receiving IPOPT_CIPSO, IP_OPT_SEC, IP_OPT_SID and
arbitrary ip options.

Allow setting IP_SEC_POLICY/IP_XFRM_POLICY ipv4 socket option.
Allow setting the IP_TRANSPARENT ipv4 socket option.
Allow setting the TCP_REPAIR socket option.
Allow setting the TCP_CONGESTION socket option.

Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 net/ipv4/af_inet.c              |    3 ++-
 net/ipv4/arp.c                  |    2 +-
 net/ipv4/devinet.c              |    4 ++--
 net/ipv4/fib_frontend.c         |    2 +-
 net/ipv4/ip_gre.c               |    4 ++--
 net/ipv4/ip_options.c           |    6 +++---
 net/ipv4/ip_sockglue.c          |    5 +++--
 net/ipv4/ip_vti.c               |    4 ++--
 net/ipv4/ipip.c                 |    4 ++--
 net/ipv4/ipmr.c                 |    2 +-
 net/ipv4/netfilter/arp_tables.c |    8 ++++----
 net/ipv4/netfilter/ip_tables.c  |    8 ++++----
 net/ipv4/tcp.c                  |    2 +-
 net/ipv4/tcp_cong.c             |    3 ++-
 14 files changed, 30 insertions(+), 27 deletions(-)

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 766c596..7449bcf 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -346,7 +346,8 @@ lookup_protocol:
 	}
 
 	err = -EPERM;
-	if (sock->type == SOCK_RAW && !kern && !capable(CAP_NET_RAW))
+	if (sock->type == SOCK_RAW && !kern &&
+	    !ns_capable(net->user_ns, CAP_NET_RAW))
 		goto out_rcu_unlock;
 
 	err = -EAFNOSUPPORT;
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 4780045..ce6fbdf 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -1161,7 +1161,7 @@ int arp_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	switch (cmd) {
 	case SIOCDARP:
 	case SIOCSARP:
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 	case SIOCGARP:
 		err = copy_from_user(&r, arg, sizeof(struct arpreq));
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index ae95f64..f75f4f6 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -729,7 +729,7 @@ int devinet_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 
 	case SIOCSIFFLAGS:
 		ret = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			goto out;
 		break;
 	case SIOCSIFADDR:	/* Set interface address (and family) */
@@ -737,7 +737,7 @@ int devinet_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	case SIOCSIFDSTADDR:	/* Set the destination address */
 	case SIOCSIFNETMASK: 	/* Set the netmask for the interface */
 		ret = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			goto out;
 		ret = -EINVAL;
 		if (sin->sin_family != AF_INET)
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index bce4541..784716a 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -488,7 +488,7 @@ int ip_rt_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	switch (cmd) {
 	case SIOCADDRT:		/* Add a route */
 	case SIOCDELRT:		/* Delete a route */
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 
 		if (copy_from_user(&rt, arg, sizeof(rt)))
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 7240f8e..6acd774 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -1082,7 +1082,7 @@ ipgre_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		err = -EFAULT;
@@ -1157,7 +1157,7 @@ ipgre_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		if (dev == ign->fb_tunnel_dev) {
diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c
index 1dc01f9..f6289bf 100644
--- a/net/ipv4/ip_options.c
+++ b/net/ipv4/ip_options.c
@@ -409,7 +409,7 @@ int ip_options_compile(struct net *net,
 					optptr[2] += 8;
 					break;
 				      default:
-					if (!skb && !capable(CAP_NET_RAW)) {
+					if (!skb && !ns_capable(net->user_ns, CAP_NET_RAW)) {
 						pp_ptr = optptr + 3;
 						goto error;
 					}
@@ -445,7 +445,7 @@ int ip_options_compile(struct net *net,
 				opt->router_alert = optptr - iph;
 			break;
 		      case IPOPT_CIPSO:
-			if ((!skb && !capable(CAP_NET_RAW)) || opt->cipso) {
+			if ((!skb && !ns_capable(net->user_ns, CAP_NET_RAW)) || opt->cipso) {
 				pp_ptr = optptr;
 				goto error;
 			}
@@ -458,7 +458,7 @@ int ip_options_compile(struct net *net,
 		      case IPOPT_SEC:
 		      case IPOPT_SID:
 		      default:
-			if (!skb && !capable(CAP_NET_RAW)) {
+			if (!skb && !ns_capable(net->user_ns, CAP_NET_RAW)) {
 				pp_ptr = optptr;
 				goto error;
 			}
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 5eea4a8..796c2ae 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -980,13 +980,14 @@ mc_msf_out:
 	case IP_IPSEC_POLICY:
 	case IP_XFRM_POLICY:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 			break;
 		err = xfrm_user_policy(sk, optname, optval, optlen);
 		break;
 
 	case IP_TRANSPARENT:
-		if (!!val && !capable(CAP_NET_RAW) && !capable(CAP_NET_ADMIN)) {
+		if (!!val && !ns_capable(sock_net(sk)->user_ns, CAP_NET_RAW) &&
+		    !ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)) {
 			err = -EPERM;
 			break;
 		}
diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c
index 1831092..d14d93a 100644
--- a/net/ipv4/ip_vti.c
+++ b/net/ipv4/ip_vti.c
@@ -497,7 +497,7 @@ vti_tunnel_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		err = -EFAULT;
@@ -562,7 +562,7 @@ vti_tunnel_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		if (dev == ipn->fb_tunnel_dev) {
diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index e15b452..692ff04 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -664,7 +664,7 @@ ipip_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		err = -EFAULT;
@@ -724,7 +724,7 @@ ipip_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		if (dev == ipn->fb_tunnel_dev) {
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 6168c4d..adf3d34 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -1213,7 +1213,7 @@ int ip_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, unsi
 
 	if (optname != MRT_INIT) {
 		if (sk != rcu_access_pointer(mrt->mroute_sk) &&
-		    !capable(CAP_NET_ADMIN))
+		    !ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EACCES;
 	}
 
diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index 97e61ea..3ea4127 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -1533,7 +1533,7 @@ static int compat_do_arpt_set_ctl(struct sock *sk, int cmd, void __user *user,
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1677,7 +1677,7 @@ static int compat_do_arpt_get_ctl(struct sock *sk, int cmd, void __user *user,
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1698,7 +1698,7 @@ static int do_arpt_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1722,7 +1722,7 @@ static int do_arpt_get_ctl(struct sock *sk, int cmd, void __user *user, int *len
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 170b1fd..17c5e06 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -1846,7 +1846,7 @@ compat_do_ipt_set_ctl(struct sock *sk,	int cmd, void __user *user,
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1961,7 +1961,7 @@ compat_do_ipt_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1983,7 +1983,7 @@ do_ipt_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -2008,7 +2008,7 @@ do_ipt_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 197c000..e9f55a5 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2303,7 +2303,7 @@ void tcp_sock_destruct(struct sock *sk)
 
 static inline bool tcp_can_repair_sock(const struct sock *sk)
 {
-	return capable(CAP_NET_ADMIN) &&
+	return ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN) &&
 		((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_ESTABLISHED));
 }
 
diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index 1432cdb..baf2861 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -259,7 +259,8 @@ int tcp_set_congestion_control(struct sock *sk, const char *name)
 	if (!ca)
 		err = -ENOENT;
 
-	else if (!((ca->flags & TCP_CONG_NON_RESTRICTED) || capable(CAP_NET_ADMIN)))
+	else if (!((ca->flags & TCP_CONG_NON_RESTRICTED) ||
+		   ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)))
 		err = -EPERM;
 
 	else if (!try_module_get(ca->owner))
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH net-next 11/17] net: Allow userns root to control ipv6
From: Eric W. Biederman @ 2012-11-16 13:03 UTC (permalink / raw)
  To: David Miller
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Linux Containers,
	Eric W. Biederman
In-Reply-To: <1353070992-5552-1-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

From: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Allow an unpriviled user who has created a user namespace, and then
created a network namespace to effectively use the new network
namespace, by reducing capable(CAP_NET_ADMIN) and
capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.

Settings that merely control a single network device are allowed.
Either the network device is a logical network device where
restrictions make no difference or the network device is hardware NIC
that has been explicity moved from the initial network namespace.

In general policy and network stack state changes are allowed while
resource control is left unchanged.

Allow the SIOCSIFADDR ioctl to add ipv6 addresses.
Allow the SIOCDIFADDR ioctl to delete ipv6 addresses.
Allow the SIOCADDRT ioctl to add ipv6 routes.
Allow the SIOCDELRT ioctl to delete ipv6 routes.

Allow creation of ipv6 raw sockets.

Allow setting the IPV6_JOIN_ANYCAST socket option.
Allow setting the IPV6_FL_A_RENEW parameter of the IPV6_FLOWLABEL_MGR
socket option.

Allow setting the IPV6_TRANSPARENT socket option.
Allow setting the IPV6_HOPOPTS socket option.
Allow setting the IPV6_RTHDRDSTOPTS socket option.
Allow setting the IPV6_DSTOPTS socket option.
Allow setting the IPV6_IPSEC_POLICY socket option.
Allow setting the IPV6_XFRM_POLICY socket option.

Allow sending packets with the IPV6_2292HOPOPTS control message.
Allow sending packets with the IPV6_2292DSTOPTS control message.
Allow sending packets with the IPV6_RTHDRDSTOPTS control message.

Allow setting the multicast routing socket options on non multicast
routing sockets.

Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL, and SIOCDELTUNNEL ioctls for
setting up, changing and deleting tunnels over ipv6.

Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL, SIOCDELTUNNEL ioctls for
setting up, changing and deleting ipv6 over ipv4 tunnels.

Allow the SIOCADDPRL, SIOCDELPRL, SIOCCHGPRL ioctls for adding,
deleting, and changing the potential router list for ISATAP tunnels.

Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 net/ipv6/addrconf.c             |    4 ++--
 net/ipv6/af_inet6.c             |    3 ++-
 net/ipv6/anycast.c              |    2 +-
 net/ipv6/datagram.c             |    6 +++---
 net/ipv6/ip6_flowlabel.c        |    3 ++-
 net/ipv6/ip6_gre.c              |    4 ++--
 net/ipv6/ip6_tunnel.c           |    4 ++--
 net/ipv6/ip6mr.c                |    2 +-
 net/ipv6/ipv6_sockglue.c        |    7 ++++---
 net/ipv6/netfilter/ip6_tables.c |    8 ++++----
 net/ipv6/route.c                |    2 +-
 net/ipv6/sit.c                  |    8 ++++----
 12 files changed, 28 insertions(+), 25 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 47c591d..b8e0a62 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2268,7 +2268,7 @@ int addrconf_add_ifaddr(struct net *net, void __user *arg)
 	struct in6_ifreq ireq;
 	int err;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (copy_from_user(&ireq, arg, sizeof(struct in6_ifreq)))
@@ -2287,7 +2287,7 @@ int addrconf_del_ifaddr(struct net *net, void __user *arg)
 	struct in6_ifreq ireq;
 	int err;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (copy_from_user(&ireq, arg, sizeof(struct in6_ifreq)))
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index a974247..19f68b2 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -160,7 +160,8 @@ lookup_protocol:
 	}
 
 	err = -EPERM;
-	if (sock->type == SOCK_RAW && !kern && !capable(CAP_NET_RAW))
+	if (sock->type == SOCK_RAW && !kern &&
+	    !ns_capable(net->user_ns, CAP_NET_RAW))
 		goto out_rcu_unlock;
 
 	sock->ops = answer->ops;
diff --git a/net/ipv6/anycast.c b/net/ipv6/anycast.c
index cdf02be..0808df6 100644
--- a/net/ipv6/anycast.c
+++ b/net/ipv6/anycast.c
@@ -64,7 +64,7 @@ int ipv6_sock_ac_join(struct sock *sk, int ifindex, const struct in6_addr *addr)
 	int	ishost = !net->ipv6.devconf_all->forwarding;
 	int	err = 0;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 	if (ipv6_addr_is_multicast(addr))
 		return -EINVAL;
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index be2b67d6..2d84f49 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -701,7 +701,7 @@ int datagram_send_ctl(struct net *net, struct sock *sk,
 				err = -EINVAL;
 				goto exit_f;
 			}
-			if (!capable(CAP_NET_RAW)) {
+			if (!ns_capable(net->user_ns, CAP_NET_RAW)) {
 				err = -EPERM;
 				goto exit_f;
 			}
@@ -721,7 +721,7 @@ int datagram_send_ctl(struct net *net, struct sock *sk,
 				err = -EINVAL;
 				goto exit_f;
 			}
-			if (!capable(CAP_NET_RAW)) {
+			if (!ns_capable(net->user_ns, CAP_NET_RAW)) {
 				err = -EPERM;
 				goto exit_f;
 			}
@@ -746,7 +746,7 @@ int datagram_send_ctl(struct net *net, struct sock *sk,
 				err = -EINVAL;
 				goto exit_f;
 			}
-			if (!capable(CAP_NET_RAW)) {
+			if (!ns_capable(net->user_ns, CAP_NET_RAW)) {
 				err = -EPERM;
 				goto exit_f;
 			}
diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c
index 90bbefb..29124b7 100644
--- a/net/ipv6/ip6_flowlabel.c
+++ b/net/ipv6/ip6_flowlabel.c
@@ -519,7 +519,8 @@ int ipv6_flowlabel_opt(struct sock *sk, char __user *optval, int optlen)
 		}
 		read_unlock_bh(&ip6_sk_fl_lock);
 
-		if (freq.flr_share == IPV6_FL_S_NONE && capable(CAP_NET_ADMIN)) {
+		if (freq.flr_share == IPV6_FL_S_NONE &&
+		    ns_capable(net->user_ns, CAP_NET_ADMIN)) {
 			fl = fl_lookup(net, freq.flr_label);
 			if (fl) {
 				err = fl6_renew(fl, freq.flr_linger, freq.flr_expires);
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 0185679..f822527 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -1161,7 +1161,7 @@ static int ip6gre_tunnel_ioctl(struct net_device *dev,
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		err = -EFAULT;
@@ -1209,7 +1209,7 @@ static int ip6gre_tunnel_ioctl(struct net_device *dev,
 
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		if (dev == ign->fb_tunnel_dev) {
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index cb7e2de..9140def 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1325,7 +1325,7 @@ ip6_tnl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			break;
 		err = -EFAULT;
 		if (copy_from_user(&p, ifr->ifr_ifru.ifru_data, sizeof (p)))
@@ -1362,7 +1362,7 @@ ip6_tnl_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 		break;
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			break;
 
 		if (dev == ip6n->fb_tnl_dev) {
diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index f7c7c63..d7330f8 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -1583,7 +1583,7 @@ int ip6_mroute_setsockopt(struct sock *sk, int optname, char __user *optval, uns
 		return -ENOENT;
 
 	if (optname != MRT6_INIT) {
-		if (sk != mrt->mroute6_sk && !capable(CAP_NET_ADMIN))
+		if (sk != mrt->mroute6_sk && !ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EACCES;
 	}
 
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index ba6d13d..2429889 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -343,7 +343,8 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 		break;
 
 	case IPV6_TRANSPARENT:
-		if (valbool && !capable(CAP_NET_ADMIN) && !capable(CAP_NET_RAW)) {
+		if (valbool && !ns_capable(net->user_ns, CAP_NET_ADMIN) &&
+		    !ns_capable(net->user_ns, CAP_NET_RAW)) {
 			retv = -EPERM;
 			break;
 		}
@@ -381,7 +382,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname,
 
 		/* hop-by-hop / destination options are privileged option */
 		retv = -EPERM;
-		if (optname != IPV6_RTHDR && !capable(CAP_NET_RAW))
+		if (optname != IPV6_RTHDR && !ns_capable(net->user_ns, CAP_NET_RAW))
 			break;
 
 		opt = ipv6_renew_options(sk, np->opt, optname,
@@ -754,7 +755,7 @@ done:
 	case IPV6_IPSEC_POLICY:
 	case IPV6_XFRM_POLICY:
 		retv = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			break;
 		retv = xfrm_user_policy(sk, optname, optval, optlen);
 		break;
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index d7cb045..636a296 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -1856,7 +1856,7 @@ compat_do_ip6t_set_ctl(struct sock *sk, int cmd, void __user *user,
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1971,7 +1971,7 @@ compat_do_ip6t_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -1993,7 +1993,7 @@ do_ip6t_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
@@ -2018,7 +2018,7 @@ do_ip6t_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 {
 	int ret;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	switch (cmd) {
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 2a98397..be2c173 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1987,7 +1987,7 @@ int ipv6_route_ioctl(struct net *net, unsigned int cmd, void __user *arg)
 	switch(cmd) {
 	case SIOCADDRT:		/* Add a route */
 	case SIOCDELRT:		/* Delete a route */
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			return -EPERM;
 		err = copy_from_user(&rtmsg, arg,
 				     sizeof(struct in6_rtmsg));
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 3ed54ff..df92301 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -966,7 +966,7 @@ ipip6_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCADDTUNNEL:
 	case SIOCCHGTUNNEL:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		err = -EFAULT;
@@ -1025,7 +1025,7 @@ ipip6_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 
 	case SIOCDELTUNNEL:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		if (dev == sitn->fb_tunnel_dev) {
@@ -1058,7 +1058,7 @@ ipip6_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCDELPRL:
 	case SIOCCHGPRL:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			goto done;
 		err = -EINVAL;
 		if (dev == sitn->fb_tunnel_dev)
@@ -1087,7 +1087,7 @@ ipip6_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 	case SIOCCHG6RD:
 	case SIOCDEL6RD:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			goto done;
 
 		err = -EFAULT;
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH net-next 12/17] net: Allow userns root to control llc, netfilter, netlink, packet, and xfrm
From: Eric W. Biederman @ 2012-11-16 13:03 UTC (permalink / raw)
  To: David Miller
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Linux Containers,
	Eric W. Biederman
In-Reply-To: <1353070992-5552-1-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

From: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Allow an unpriviled user who has created a user namespace, and then
created a network namespace to effectively use the new network
namespace, by reducing capable(CAP_NET_ADMIN) and
capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.

Allow creation of af_key sockets.
Allow creation of llc sockets.
Allow creation of af_packet sockets.

Allow sending xfrm netlink control messages.

Allow binding to netlink multicast groups.
Allow sending to netlink multicast groups.
Allow adding and dropping netlink multicast groups.
Allow sending to all netlink multicast groups and port ids.

Allow reading the netfilter SO_IP_SET socket option.
Allow sending netfilter netlink messages.
Allow setting and getting ip_vs netfilter socket options.

Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 net/key/af_key.c                  |    2 +-
 net/llc/af_llc.c                  |    2 +-
 net/netfilter/ipset/ip_set_core.c |    2 +-
 net/netfilter/ipvs/ip_vs_ctl.c    |    4 ++--
 net/netfilter/nfnetlink.c         |    2 +-
 net/netlink/af_netlink.c          |    2 +-
 net/packet/af_packet.c            |    2 +-
 net/xfrm/xfrm_user.c              |    2 +-
 8 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/net/key/af_key.c b/net/key/af_key.c
index 08897a3..5b426a6 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -141,7 +141,7 @@ static int pfkey_create(struct net *net, struct socket *sock, int protocol,
 	struct sock *sk;
 	int err;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 	if (sock->type != SOCK_RAW)
 		return -ESOCKTNOSUPPORT;
diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index c219000..8870988 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -160,7 +160,7 @@ static int llc_ui_create(struct net *net, struct socket *sock, int protocol,
 	struct sock *sk;
 	int rc = -ESOCKTNOSUPPORT;
 
-	if (!capable(CAP_NET_RAW))
+	if (!ns_capable(net->user_ns, CAP_NET_RAW))
 		return -EPERM;
 
 	if (!net_eq(net, &init_net))
diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
index 778465f..fed899f 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -1643,7 +1643,7 @@ ip_set_sockfn_get(struct sock *sk, int optval, void __user *user, int *len)
 	void *data;
 	int copylen = *len, ret = 0;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 	if (optval != SO_IP_SET)
 		return -EBADF;
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index c6cebd5..ec664cb 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -2339,7 +2339,7 @@ do_ip_vs_set_ctl(struct sock *sk, int cmd, void __user *user, unsigned int len)
 	struct ip_vs_dest_user_kern udest;
 	struct netns_ipvs *ipvs = net_ipvs(net);
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (cmd < IP_VS_BASE_CTL || cmd > IP_VS_SO_SET_MAX)
@@ -2632,7 +2632,7 @@ do_ip_vs_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 	struct netns_ipvs *ipvs = net_ipvs(net);
 
 	BUG_ON(!net);
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if (cmd < IP_VS_BASE_CTL || cmd > IP_VS_SO_GET_MAX)
diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index ffb92c0..58a09b7 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -138,7 +138,7 @@ static int nfnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 	const struct nfnetlink_subsystem *ss;
 	int type, err;
 
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	/* All the messages must at least contain nfgenmsg */
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 4da797f..c8a1eb6 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -612,7 +612,7 @@ retry:
 static inline int netlink_capable(const struct socket *sock, unsigned int flag)
 {
 	return (nl_table[sock->sk->sk_protocol].flags & flag) ||
-	       capable(CAP_NET_ADMIN);
+		ns_capable(sock_net(sock->sk)->user_ns, CAP_NET_ADMIN);
 }
 
 static void
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 94060ed..6d95278 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2478,7 +2478,7 @@ static int packet_create(struct net *net, struct socket *sock, int protocol,
 	__be16 proto = (__force __be16)protocol; /* weird, but documented */
 	int err;
 
-	if (!capable(CAP_NET_RAW))
+	if (!ns_capable(net->user_ns, CAP_NET_RAW))
 		return -EPERM;
 	if (sock->type != SOCK_DGRAM && sock->type != SOCK_RAW &&
 	    sock->type != SOCK_PACKET)
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 421f984..eb872b2 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2349,7 +2349,7 @@ static int xfrm_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
 	link = &xfrm_dispatch[type];
 
 	/* All operations require privileges, even GET */
-	if (!capable(CAP_NET_ADMIN))
+	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 		return -EPERM;
 
 	if ((type == (XFRM_MSG_GETSA - XFRM_MSG_BASE) ||
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH net-next 14/17] net: Allow the userns root to control vlans.
From: Eric W. Biederman @ 2012-11-16 13:03 UTC (permalink / raw)
  To: David Miller
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Linux Containers,
	Eric W. Biederman
In-Reply-To: <1353070992-5552-1-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

From: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

Allow an unpriviled user who has created a user namespace, and then
created a network namespace to effectively use the new network
namespace, by reducing capable(CAP_NET_ADMIN) and
capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.

Allow the vlan ioctls:
SET_VLAN_INGRESS_PRIORITY_CMD
SET_VLAN_EGRESS_PRIORITY_CMD
SET_VLAN_FLAG_CMD
SET_VLAN_NAME_TYPE_CMD
ADD_VLAN_CMD
DEL_VLAN_CMD

Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 net/8021q/vlan.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index ee07072..4734260 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -529,7 +529,7 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 	switch (args.cmd) {
 	case SET_VLAN_INGRESS_PRIORITY_CMD:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			break;
 		vlan_dev_set_ingress_priority(dev,
 					      args.u.skb_priority,
@@ -539,7 +539,7 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 
 	case SET_VLAN_EGRESS_PRIORITY_CMD:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			break;
 		err = vlan_dev_set_egress_priority(dev,
 						   args.u.skb_priority,
@@ -548,7 +548,7 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 
 	case SET_VLAN_FLAG_CMD:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			break;
 		err = vlan_dev_change_flags(dev,
 					    args.vlan_qos ? args.u.flag : 0,
@@ -557,7 +557,7 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 
 	case SET_VLAN_NAME_TYPE_CMD:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			break;
 		if ((args.u.name_type >= 0) &&
 		    (args.u.name_type < VLAN_NAME_TYPE_HIGHEST)) {
@@ -573,14 +573,14 @@ static int vlan_ioctl_handler(struct net *net, void __user *arg)
 
 	case ADD_VLAN_CMD:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			break;
 		err = register_vlan_device(dev, args.u.VID);
 		break;
 
 	case DEL_VLAN_CMD:
 		err = -EPERM;
-		if (!capable(CAP_NET_ADMIN))
+		if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
 			break;
 		unregister_vlan_dev(dev, NULL);
 		err = 0;
-- 
1.7.5.4

^ permalink raw reply related

* [PATCH net-next 15/17] net: Enable some sysctls that are safe for the userns root
From: Eric W. Biederman @ 2012-11-16 13:03 UTC (permalink / raw)
  To: David Miller
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, Linux Containers,
	Eric W. Biederman
In-Reply-To: <1353070992-5552-1-git-send-email-ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

From: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>

- Enable the per device ipv4 sysctls:
   net/ipv4/conf/<if>/forwarding
   net/ipv4/conf/<if>/mc_forwarding
   net/ipv4/conf/<if>/accept_redirects
   net/ipv4/conf/<if>/secure_redirects
   net/ipv4/conf/<if>/shared_media
   net/ipv4/conf/<if>/rp_filter
   net/ipv4/conf/<if>/send_redirects
   net/ipv4/conf/<if>/accept_source_route
   net/ipv4/conf/<if>/accept_local
   net/ipv4/conf/<if>/src_valid_mark
   net/ipv4/conf/<if>/proxy_arp
   net/ipv4/conf/<if>/medium_id
   net/ipv4/conf/<if>/bootp_relay
   net/ipv4/conf/<if>/log_martians
   net/ipv4/conf/<if>/tag
   net/ipv4/conf/<if>/arp_filter
   net/ipv4/conf/<if>/arp_announce
   net/ipv4/conf/<if>/arp_ignore
   net/ipv4/conf/<if>/arp_accept
   net/ipv4/conf/<if>/arp_notify
   net/ipv4/conf/<if>/proxy_arp_pvlan
   net/ipv4/conf/<if>/disable_xfrm
   net/ipv4/conf/<if>/disable_policy
   net/ipv4/conf/<if>/force_igmp_version
   net/ipv4/conf/<if>/promote_secondaries
   net/ipv4/conf/<if>/route_localnet

- Enable the global ipv4 sysctl:
   net/ipv4/ip_forward

- Enable the per device ipv6 sysctls:
   net/ipv6/conf/<if>/forwarding
   net/ipv6/conf/<if>/hop_limit
   net/ipv6/conf/<if>/mtu
   net/ipv6/conf/<if>/accept_ra
   net/ipv6/conf/<if>/accept_redirects
   net/ipv6/conf/<if>/autoconf
   net/ipv6/conf/<if>/dad_transmits
   net/ipv6/conf/<if>/router_solicitations
   net/ipv6/conf/<if>/router_solicitation_interval
   net/ipv6/conf/<if>/router_solicitation_delay
   net/ipv6/conf/<if>/force_mld_version
   net/ipv6/conf/<if>/use_tempaddr
   net/ipv6/conf/<if>/temp_valid_lft
   net/ipv6/conf/<if>/temp_prefered_lft
   net/ipv6/conf/<if>/regen_max_retry
   net/ipv6/conf/<if>/max_desync_factor
   net/ipv6/conf/<if>/max_addresses
   net/ipv6/conf/<if>/accept_ra_defrtr
   net/ipv6/conf/<if>/accept_ra_pinfo
   net/ipv6/conf/<if>/accept_ra_rtr_pref
   net/ipv6/conf/<if>/router_probe_interval
   net/ipv6/conf/<if>/accept_ra_rt_info_max_plen
   net/ipv6/conf/<if>/proxy_ndp
   net/ipv6/conf/<if>/accept_source_route
   net/ipv6/conf/<if>/optimistic_dad
   net/ipv6/conf/<if>/mc_forwarding
   net/ipv6/conf/<if>/disable_ipv6
   net/ipv6/conf/<if>/accept_dad
   net/ipv6/conf/<if>/force_tllao

- Enable the global ipv6 sysctls:
   net/ipv6/bindv6only
   net/ipv6/icmp/ratelimit

Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
---
 net/ipv4/devinet.c         |    8 --------
 net/ipv6/addrconf.c        |    4 ----
 net/ipv6/icmp.c            |    7 +------
 net/ipv6/sysctl_net_ipv6.c |    4 ----
 4 files changed, 1 insertions(+), 22 deletions(-)

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index f75f4f6..446b1b9 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1643,10 +1643,6 @@ static int __devinet_sysctl_register(struct net *net, char *dev_name,
 		t->devinet_vars[i].extra2 = net;
 	}
 
-	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
-		t->devinet_vars[0].procname = NULL;
-
 	snprintf(path, sizeof(path), "net/ipv4/conf/%s", dev_name);
 
 	t->sysctl_header = register_net_sysctl(net, path, t->devinet_vars);
@@ -1732,10 +1728,6 @@ static __net_init int devinet_init_net(struct net *net)
 		tbl[0].data = &all->data[IPV4_DEVCONF_FORWARDING - 1];
 		tbl[0].extra1 = all;
 		tbl[0].extra2 = net;
-
-		/* Don't export sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns)
-			tbl[0].procname = NULL;
 #endif
 	}
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index b8e0a62..5f1967b 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -4588,10 +4588,6 @@ static int __addrconf_sysctl_register(struct net *net, char *dev_name,
 		t->addrconf_vars[i].extra2 = net;
 	}
 
-	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
-		t->addrconf_vars[0].procname = NULL;
-
 	snprintf(path, sizeof(path), "net/ipv6/conf/%s", dev_name);
 
 	t->sysctl_header = register_net_sysctl(net, path, t->addrconf_vars);
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index db9df8a..24d69db 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -967,14 +967,9 @@ struct ctl_table * __net_init ipv6_icmp_sysctl_init(struct net *net)
 			sizeof(ipv6_icmp_table_template),
 			GFP_KERNEL);
 
-	if (table) {
+	if (table)
 		table[0].data = &net->ipv6.sysctl.icmpv6_time;
 
-		/* Don't export sysctls to unprivileged users */
-		if (net->user_ns != &init_user_ns)
-			table[0].procname = NULL;
-	}
-
 	return table;
 }
 #endif
diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
index b06fd07..e85c48b 100644
--- a/net/ipv6/sysctl_net_ipv6.c
+++ b/net/ipv6/sysctl_net_ipv6.c
@@ -52,10 +52,6 @@ static int __net_init ipv6_sysctl_net_init(struct net *net)
 		goto out;
 	ipv6_table[0].data = &net->ipv6.sysctl.bindv6only;
 
-	/* Don't export sysctls to unprivileged users */
-	if (net->user_ns != &init_user_ns)
-		ipv6_table[0].procname = NULL;
-
 	ipv6_route_table = ipv6_route_sysctl_init(net);
 	if (!ipv6_route_table)
 		goto out_ipv6_table;
-- 
1.7.5.4

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox