Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: module loading with CAP_NET_ADMIN
From: Vasiliy Kulikov @ 2011-02-25 15:57 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: Ben Hutchings, netdev, linux-kernel, Kees Cook, Eugene Teo,
	Dan Rosenberg, David S. Miller
In-Reply-To: <4D67CAD7.7060408@msgid.tls.msk.ru>

On Fri, Feb 25, 2011 at 18:29 +0300, Michael Tokarev wrote:
> 25.02.2011 15:30, Vasiliy Kulikov wrote:
> > On Thu, Feb 24, 2011 at 16:34 +0000, Ben Hutchings wrote:
> >> On Thu, 2011-02-24 at 18:12 +0300, Vasiliy Kulikov wrote:
> >>> My proposal is changing request_module("%s", name) to something like
> >>> request_module("netdev-%s", name) inside of dev_load() and adding
> >>> aliases to related drivers.
> 
> It is not the kernel patching which we should worry about, kernel
> part is trivial.
> 
> What is not trivial is to patch all the systems out there who
> autoloads network drivers based on /etc/modprobe.d/network-aliases.conf
> (some local file), ie, numerous working setups which already
> uses this mechanism since stone age.  And patching these is
> not trivial at all, unfortunately.
> 
> Somewhat weird setups (one can load the modules explicitly, and
> nowadays this all is handled by udev anyway), but this change
> will break some working systems.
> 
> Maybe the cost (some pain for some users) isn't large enough
> but the outcome is good, and I think it _is_ good, but it needs
> some wider discussion first, imho.
> 
> I can't think of a way to handle this without breaking stuff.

Currently Linux slowly moves in the direction of rootless systems.  This
definitely need proper restrictions of CAP_* power.  Network admin does
nothing with general modules.  It _has_ to break something one day
because old assumptions about permission stuff don't conform CAP_*
things: old assumptions are very closely connected with just everything.

I'm not sure how this particular CAP_NET_ADMIN misuse should be fixed,
maybe distributions should supply script to upgrade modprobe configs.
Also note that change s/CAP_SYS_MODULE/CAP_NET_ADMIN/ was made in
2.6.32, so there is a possibility that the set of affected distributions
(that doesn't use udev stuff) is very small.

Thanks for your input,

-- 
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

^ permalink raw reply

* Occasional link flap on Intel 82599 on boot in XAUI mode
From: Brent Cook @ 2011-02-25 16:11 UTC (permalink / raw)
  To: netdev

We have a custom system with dual 82599's in XAUI mode. One has its pair of ports connected to a 10G switch, the other has its pair of ports connected to an FPGA.

Occasionally, on any of the interfaces, we will see the links flapping up and down when the system initially boots. This happens maybe one in 20 boots.

Feb 23 14:58:10 mfg kernel: [  594.254977] ixgbe: eth1 NIC Link is Down
Feb 23 14:58:12 mfg kernel: [  596.230039] ixgbe: eth1 NIC Link is Up 10 Gbps, Flow Control: RX/TX
Feb 23 14:58:12 mfg kernel: [  596.256537] ixgbe: eth1 NIC Link is Down
Feb 23 14:58:16 mfg kernel: [  600.228096] ixgbe: eth1 NIC Link is Up 10 Gbps, Flow Control: RX/TX
Feb 23 14:58:16 mfg kernel: [  600.240135] ixgbe: eth1 NIC Link is Down
Feb 23 14:58:18 mfg kernel: [  602.227047] ixgbe: eth1 NIC Link is Up 10 Gbps, Flow Control: RX/TX

Simply forcing a down/up on the interface seems to correct the problem:

ip link set eth1 down
ip link set eth1 up

Is anyone else using XAUI mode and has seen this? Here is the kernel information that we are using currently:

Linux sprint.labnet.local 2.6.32.28-bps #1 SMP PREEMPT Mon Jan 31 16:05:50 CST 2011 x86_64 GNU/Linux

Feb 23 14:48:24 mfg kernel: [    3.488664] Intel(R) PRO/1000 Network Driver - version 7.3.21-k5-NAPI
Feb 23 14:48:24 mfg kernel: [    3.495082] Copyright (c) 1999-2006 Intel Corporation.
Feb 23 14:48:24 mfg kernel: [    3.500238] e1000e: Intel(R) PRO/1000 Network Driver - 1.0.2-k2
Feb 23 14:48:24 mfg kernel: [    3.506138] e1000e: Copyright (c) 1999-2008 Intel Corporation.
Feb 23 14:48:24 mfg kernel: [    3.511988] Intel(R) Gigabit Ethernet Network Driver - version 1.3.16-k2
Feb 23 14:48:24 mfg kernel: [    3.518666] Copyright (c) 2007-2009 Intel Corporation.
Feb 23 14:48:24 mfg kernel: [    3.523817] ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver - version 2.0.44-k2
Feb 23 14:48:24 mfg kernel: [    3.531622] ixgbe: Copyright (c) 1999-2009 Intel Corporation.
Feb 23 14:48:24 mfg kernel: [    3.537365] ixgbe 0000:01:00.0: PCI INT A -> GSI 24 (level, low) -> IRQ 24
Feb 23 14:48:24 mfg kernel: [    3.632934] ixgbe: 0000:01:00.0: ixgbe_init_interrupt_scheme: Multiqueue Enabled: Rx Queue count = 8, Tx Queue count = 8
Feb 23 14:48:24 mfg kernel: [    3.643769] ixgbe 0000:01:00.0: (PCI Express:5.0Gb/s:Width x8) 00:12:34:56:78:67
Feb 23 14:48:24 mfg kernel: [    3.651221] ixgbe 0000:01:00.0: MAC: 2, PHY: 0, PBA No: ffffff-0ff
Feb 23 14:48:24 mfg kernel: [    3.669390] ixgbe 0000:01:00.0: Intel(R) 10 Gigabit Network Connection
Feb 23 14:48:24 mfg kernel: [    3.675907] ixgbe 0000:01:00.1: PCI INT B -> GSI 47 (level, low) -> IRQ 47
Feb 23 14:48:24 mfg kernel: [    3.771863] ixgbe: 0000:01:00.1: ixgbe_init_interrupt_scheme: Multiqueue Enabled: Rx Queue count = 8, Tx Queue count = 8
Feb 23 14:48:24 mfg kernel: [    3.782699] ixgbe 0000:01:00.1: (PCI Express:5.0Gb/s:Width x8) 00:12:34:56:78:64
Feb 23 14:48:24 mfg kernel: [    3.790151] ixgbe 0000:01:00.1: MAC: 2, PHY: 0, PBA No: ffffff-0ff
Feb 23 14:48:24 mfg kernel: [    3.808314] ixgbe 0000:01:00.1: Intel(R) 10 Gigabit Network Connection
Feb 23 14:48:24 mfg kernel: [    3.814831] ixgbe 0000:02:00.0: PCI INT A -> GSI 35 (level, low) -> IRQ 35
Feb 23 14:48:24 mfg kernel: [    3.910802] ixgbe: 0000:02:00.0: ixgbe_init_interrupt_scheme: Multiqueue Enabled: Rx Queue count = 8, Tx Queue count = 8
Feb 23 14:48:24 mfg kernel: [    3.921636] ixgbe 0000:02:00.0: (PCI Express:5.0Gb/s:Width x8) 00:12:34:56:78:52
Feb 23 14:48:24 mfg kernel: [    3.929087] ixgbe 0000:02:00.0: MAC: 2, PHY: 0, PBA No: ffffff-0ff
Feb 23 14:48:24 mfg kernel: [    3.947246] ixgbe 0000:02:00.0: Intel(R) 10 Gigabit Network Connection
Feb 23 14:48:24 mfg kernel: [    3.953761] ixgbe 0000:02:00.1: PCI INT B -> GSI 36 (level, low) -> IRQ 36
Feb 23 14:48:24 mfg kernel: [    4.049731] ixgbe: 0000:02:00.1: ixgbe_init_interrupt_scheme: Multiqueue Enabled: Rx Queue count = 8, Tx Queue count = 8
Feb 23 14:48:24 mfg kernel: [    4.060566] ixgbe 0000:02:00.1: (PCI Express:5.0Gb/s:Width x8) 00:12:34:56:78:53
Feb 23 14:48:24 mfg kernel: [    4.068018] ixgbe 0000:02:00.1: MAC: 2, PHY: 0, PBA No: ffffff-0ff
Feb 23 14:48:24 mfg kernel: [    4.086179] ixgbe 0000:02:00.1: Intel(R) 10 Gigabit Network Connection

^ permalink raw reply

* Re: [PATCH] udp: avoid searching when no ports are available
From: Daniel Baluta @ 2011-02-25 16:45 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem, Rohan Chitradurga
In-Reply-To: <1298635575.2659.65.camel@edumazet-laptop>

On Fri, Feb 25, 2011 at 2:06 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le vendredi 25 février 2011 à 13:35 +0200, Daniel Baluta a écrit :
>> udp_lib_get_port uses a bitmap to mark used ports.
>>
>> When no ports are available we spend a lot of time, searching
>> for a port while holding hslot lock. Avoid this by checking if
>> bitmap is full.
>>
>>
>> Signed-off-by: Rohan Chitradurga <rohan@ixiacom.com>
>> Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
>> ---
>>  net/ipv4/udp.c |    6 ++++++
>>  1 files changed, 6 insertions(+), 0 deletions(-)
>>
>> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
>> index d37baaa..3e3592d 100644
>> --- a/net/ipv4/udp.c
>> +++ b/net/ipv4/udp.c
>> @@ -225,6 +225,12 @@ int udp_lib_get_port(struct sock *sk, unsigned short snum,
>>                       udp_lib_lport_inuse(net, snum, hslot, bitmap, sk,
>>                                           saddr_comp, udptable->log);
>>
>> +                     /* avoid searching when no ports are available */
>> +                     if (bitmap_full(bitmap, PORTS_PER_CHAIN)) {
>> +                             spin_unlock_bh(&hslot->lock);
>> +                             break;
>> +                     }
>> +
>>                       snum = first;
>>                       /*
>>                        * Iterate on all possible values of snum for this hash.
>
> Really ? I wonder how you got your performance numbers then.
>
> First, PORTS_PER_CHAIN is wrong here, since its value is the max
> possible value (256 bits)

You are right. I have been working/testing on 2.6.32 where:

#define PORTS_PER_CHAIN (65536 / UDP_HTABLE_SIZE)

and I thought that the latest kernel has the same meaning
for PORTS_PER_CHAIN.

>
> #define UDP_HTABLE_SIZE_MIN              (CONFIG_BASE_SMALL ? 128 : 256)
> #define MAX_UDP_PORTS 65536
> #define PORTS_PER_CHAIN (MAX_UDP_PORTS / UDP_HTABLE_SIZE_MIN)   -> 256
>
> As soon as your machine (and most current machines have) has enough
> memory, UDP hash table size is not 256, but 1024 or 2048
>
> dmesg | grep "UDP hash"
> [    1.735203] UDP hash table entries: 2048 (order: 6, 327680 bytes)
>
> So real bitmap size is 64 or 32 bits.
>
> Your call to bitmap_full() always return false.

I guess now, the correct bitmap size is MAX_UDP_PORTS / (udptable->mask + 1)
or  MAX_UDP_PORTS >> udptable->log,  right?

> I dont like this patch. If you have special UDP needs on a small
> machine, just add to kernel boot param "uhash_entries=8192", so that the
> bitmap has 8 bits only.

We don't have special needs on a small machine. We just want that when
when all UDP ports are exhausted, bind calls to fail faster.

I will be back with tests on latest kernel.

thanks,
Daniel.

^ permalink raw reply

* Re: [PATCH] udp: avoid searching when no ports are available
From: Eric Dumazet @ 2011-02-25 16:55 UTC (permalink / raw)
  To: Daniel Baluta; +Cc: netdev, davem, Rohan Chitradurga
In-Reply-To: <AANLkTimodVFC8969md-f7iebS_9WvVahV9fPFFLtup73@mail.gmail.com>

Le vendredi 25 février 2011 à 18:45 +0200, Daniel Baluta a écrit :
> I guess now, the correct bitmap size is MAX_UDP_PORTS / (udptable->mask + 1)
> or  MAX_UDP_PORTS >> udptable->log,  right?
> 


Yes, but using bitmap_zero(bitmap, PORTS_PER_CHAIN) is faster.

It generates 4 machine instructions,
	movq   $0x0,(%r10)
	movq   $0x0,0x8(%r10)
	movq   $0x0,0x10(%r10)
	movq   $0x0,0x18(%r10)

while bitmap_zero(bitmap, some_non_constant_expression) is more
expensive (it calls an out of line function)


> We don't have special needs on a small machine. We just want that when
> when all UDP ports are exhausted, bind calls to fail faster.
> 
> I will be back with tests on latest kernel.

Hmm, please always use the latest kernel before sending patches.

Thanks



^ permalink raw reply

* Re: [PATCH] don't allow CAP_NET_ADMIN to load non-netdev kernel modules
From: Valdis.Kletnieks @ 2011-02-25 17:25 UTC (permalink / raw)
  To: Vasiliy Kulikov
  Cc: David S. Miller, netdev, linux-kernel, Alexey Kuznetsov,
	Pekka Savola (ipv6), James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy, Eric Dumazet, Tom Herbert, Changli Gao,
	Jesse Gross
In-Reply-To: <20110225151414.GA5211@albatros>

[-- Attachment #1: Type: text/plain, Size: 1145 bytes --]

On Fri, 25 Feb 2011 18:14:14 +0300, Vasiliy Kulikov said:
> Since a8f80e8ff94ecba629542d9b4b5f5a8ee3eb565c any process with
> CAP_NET_ADMIN may load any module from /lib/modules/.  This doesn't mean
> that CAP_NET_ADMIN is a superset of CAP_SYS_MODULE as modules are limited
> to /lib/modules/**.  However, CAP_NET_ADMIN capability shouldn't allow
> anybody load any module not related to networking.
> 
> This patch restricts an ability of autoloading modules to netdev modules
> with explicit aliases.  Currently there are only three users of the
> feature: ipip, ip_gre and sit.

And you stop an attacker from simply recompiling the module with a suitable
MODULE_ALIAS line added, how, exactly?  This patch may make sense down the
road, but not while it's still trivial for a malicious root user to drop stuff
into /lib/modules.

And if you're going the route "but SELinux/SMACK/Tomoyo will prevent a malicious
root user from doing that", then the obvious reply is "this should be part of those
subsystems rather than something done one-off like this (especially as it has a chance
of breaking legitimate setups that use the current scheme).

[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply

* Re: [RFC] be2net: add rxhash support
From: Ajit Khaparde @ 2011-02-25 17:44 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

> -----Original Message-----
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Sent: Thursday, February 24, 2011 2:25 PM
> To: Khaparde, Ajit
> Cc: netdev@vger.kernel.org
> Subject: [RFC] be2net: add rxhash support
> 
> Ajit, it seems be2net provides RSS hash value in rx compl descriptor ?
> 
> Could we feed skb->rxhash with it ?
> 
> Thanks !
Thanks Eric. Sure.
This is a long pending change which fell through the cracks.
But then because hashing is enabled in the device only when
Number of Rx Queues is > 1, I would suggest the following patch.

Unaware of exact conventions, I have added signed-off-by to the patch already.

Thanks

-----
[PATCH net-next] be2net: add rxhash support

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
---
 drivers/net/benet/be_main.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index 26f9c56..8c4b782 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -1038,6 +1038,10 @@ static void be_rx_compl_process(struct be_adapter *adapter,
 
 	skb->truesize = skb->len + sizeof(struct sk_buff);
 	skb->protocol = eth_type_trans(skb, adapter->netdev);
+	if (adapter->netdev->features & NETIF_F_RXHASH)
+		skb->rxhash = AMAP_GET_BITS(struct amap_eth_rx_compl,
+					rsshash, rxcp);
+
 
 	vlanf = AMAP_GET_BITS(struct amap_eth_rx_compl, vtp, rxcp);
 	vtm = AMAP_GET_BITS(struct amap_eth_rx_compl, vtm, rxcp);
@@ -1099,6 +1103,10 @@ static void be_rx_compl_process_gro(struct be_adapter *adapter,
 		return;
 	}
 
+	if (adapter->netdev->features & NETIF_F_RXHASH)
+		skb->rxhash = AMAP_GET_BITS(struct amap_eth_rx_compl,
+						rsshash, rxcp);
+
 	remaining = pkt_size;
 	for (i = 0, j = -1; i < num_rcvd; i++) {
 		page_info = get_rx_page_info(adapter, rxo, rxq_idx);
@@ -2619,6 +2627,9 @@ static void be_netdev_init(struct net_device *netdev)
 		NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
 		NETIF_F_GRO | NETIF_F_TSO6;
 
+	if (be_multi_rxq(adapter))
+		netdev->features |= NETIF_F_RXHASH;
+
 	netdev->vlan_features |= NETIF_F_SG | NETIF_F_TSO |
 		NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
 
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH] don't allow CAP_NET_ADMIN to load non-netdev kernel modules
From: Vasiliy Kulikov @ 2011-02-25 17:47 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: David S. Miller, netdev, linux-kernel, Alexey Kuznetsov,
	Pekka Savola (ipv6), James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy, Eric Dumazet, Tom Herbert, Changli Gao,
	Jesse Gross
In-Reply-To: <135187.1298654740@localhost>

On Fri, Feb 25, 2011 at 12:25 -0500, Valdis.Kletnieks@vt.edu wrote:
> And you stop an attacker from simply recompiling the module with a suitable
> MODULE_ALIAS line added, how, exactly?  This patch may make sense down the
> road, but not while it's still trivial for a malicious root user to drop stuff
> into /lib/modules.

The threat is not a malicious root, but non-root with CAP_NET_ADMIN.
It's hardly possible to load arbitrary module into the kernel having
CAP_NET_ADMIN without other capabilities.

> And if you're going the route "but SELinux/SMACK/Tomoyo will prevent a malicious
> root user from doing that", then the obvious reply is "this should be part of those
> subsystems rather than something done one-off like this (especially as it has a chance
> of breaking legitimate setups that use the current scheme).

No, I don't want to add anything about LSMs at all.


Thanks,

-- 
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

^ permalink raw reply

* Re: [PATCH] don't allow CAP_NET_ADMIN to load non-netdev kernel modules
From: Ben Hutchings @ 2011-02-25 17:48 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Vasiliy Kulikov, David S. Miller, netdev, linux-kernel,
	Alexey Kuznetsov, Pekka Savola (ipv6), James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Eric Dumazet, Tom Herbert,
	Changli Gao, Jesse Gross
In-Reply-To: <135187.1298654740@localhost>

On Fri, 2011-02-25 at 12:25 -0500, Valdis.Kletnieks@vt.edu wrote:
> On Fri, 25 Feb 2011 18:14:14 +0300, Vasiliy Kulikov said:
> > Since a8f80e8ff94ecba629542d9b4b5f5a8ee3eb565c any process with
> > CAP_NET_ADMIN may load any module from /lib/modules/.  This doesn't mean
> > that CAP_NET_ADMIN is a superset of CAP_SYS_MODULE as modules are limited
> > to /lib/modules/**.  However, CAP_NET_ADMIN capability shouldn't allow
> > anybody load any module not related to networking.
> > 
> > This patch restricts an ability of autoloading modules to netdev modules
> > with explicit aliases.  Currently there are only three users of the
> > feature: ipip, ip_gre and sit.
> 
> And you stop an attacker from simply recompiling the module with a suitable
> MODULE_ALIAS line added, how, exactly?  This patch may make sense down the
> road, but not while it's still trivial for a malicious root user to drop stuff
> into /lib/modules.

A process running as root normally has CAP_NET_ADMIN, but not every
process with CAP_NET_ADMIN will be running as root.

> And if you're going the route "but SELinux/SMACK/Tomoyo will prevent a malicious
> root user from doing that", then the obvious reply is "this should be part of those
> subsystems rather than something done one-off like this (especially as it has a chance
> of breaking legitimate setups that use the current scheme).

The notional attacker has CAP_NET_ADMIN, perhaps through a vulnerable
service or a vulnerable set-capability executable.  They do not yet have
full root access and so cannot install a module, even in the absence of
an LSM.

So long as the attacker is able to load arbitrary modules, however, they
could exploit a vulnerability in any installed (not loaded) module.
Again, LSMs are irrelevant to this as they do not protect against kernel
bugs.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* RE: Occasional link flap on Intel 82599 on boot in XAUI mode
From: Skidmore, Donald C @ 2011-02-25 18:11 UTC (permalink / raw)
  To: Brent Cook, netdev@vger.kernel.org
In-Reply-To: <201102251011.42400.bcook@breakingpoint.com>

>-----Original Message-----
>From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
>On Behalf Of Brent Cook
>Sent: Friday, February 25, 2011 8:12 AM
>To: netdev@vger.kernel.org
>Subject: Occasional link flap on Intel 82599 on boot in XAUI mode
>
>We have a custom system with dual 82599's in XAUI mode. One has its pair
>of ports connected to a 10G switch, the other has its pair of ports
>connected to an FPGA.
>
>Occasionally, on any of the interfaces, we will see the links flapping
>up and down when the system initially boots. This happens maybe one in
>20 boots.
>
>Feb 23 14:58:10 mfg kernel: [  594.254977] ixgbe: eth1 NIC Link is Down
>Feb 23 14:58:12 mfg kernel: [  596.230039] ixgbe: eth1 NIC Link is Up 10
>Gbps, Flow Control: RX/TX
>Feb 23 14:58:12 mfg kernel: [  596.256537] ixgbe: eth1 NIC Link is Down
>Feb 23 14:58:16 mfg kernel: [  600.228096] ixgbe: eth1 NIC Link is Up 10
>Gbps, Flow Control: RX/TX
>Feb 23 14:58:16 mfg kernel: [  600.240135] ixgbe: eth1 NIC Link is Down
>Feb 23 14:58:18 mfg kernel: [  602.227047] ixgbe: eth1 NIC Link is Up 10
>Gbps, Flow Control: RX/TX
>
>Simply forcing a down/up on the interface seems to correct the problem:
>
>ip link set eth1 down
>ip link set eth1 up
>
>Is anyone else using XAUI mode and has seen this? Here is the kernel
>information that we are using currently:
>
>Linux sprint.labnet.local 2.6.32.28-bps #1 SMP PREEMPT Mon Jan 31
>16:05:50 CST 2011 x86_64 GNU/Linux
>
>Feb 23 14:48:24 mfg kernel: [    3.488664] Intel(R) PRO/1000 Network
>Driver - version 7.3.21-k5-NAPI
>Feb 23 14:48:24 mfg kernel: [    3.495082] Copyright (c) 1999-2006 Intel
>Corporation.
>Feb 23 14:48:24 mfg kernel: [    3.500238] e1000e: Intel(R) PRO/1000
>Network Driver - 1.0.2-k2
>Feb 23 14:48:24 mfg kernel: [    3.506138] e1000e: Copyright (c) 1999-
>2008 Intel Corporation.
>Feb 23 14:48:24 mfg kernel: [    3.511988] Intel(R) Gigabit Ethernet
>Network Driver - version 1.3.16-k2
>Feb 23 14:48:24 mfg kernel: [    3.518666] Copyright (c) 2007-2009 Intel
>Corporation.
>Feb 23 14:48:24 mfg kernel: [    3.523817] ixgbe: Intel(R) 10 Gigabit
>PCI Express Network Driver - version 2.0.44-k2
>Feb 23 14:48:24 mfg kernel: [    3.531622] ixgbe: Copyright (c) 1999-
>2009 Intel Corporation.
>Feb 23 14:48:24 mfg kernel: [    3.537365] ixgbe 0000:01:00.0: PCI INT A
>-> GSI 24 (level, low) -> IRQ 24
>Feb 23 14:48:24 mfg kernel: [    3.632934] ixgbe: 0000:01:00.0:
>ixgbe_init_interrupt_scheme: Multiqueue Enabled: Rx Queue count = 8, Tx
>Queue count = 8
>Feb 23 14:48:24 mfg kernel: [    3.643769] ixgbe 0000:01:00.0: (PCI
>Express:5.0Gb/s:Width x8) 00:12:34:56:78:67
>Feb 23 14:48:24 mfg kernel: [    3.651221] ixgbe 0000:01:00.0: MAC: 2,
>PHY: 0, PBA No: ffffff-0ff
>Feb 23 14:48:24 mfg kernel: [    3.669390] ixgbe 0000:01:00.0: Intel(R)
>10 Gigabit Network Connection
>Feb 23 14:48:24 mfg kernel: [    3.675907] ixgbe 0000:01:00.1: PCI INT B
>-> GSI 47 (level, low) -> IRQ 47
>Feb 23 14:48:24 mfg kernel: [    3.771863] ixgbe: 0000:01:00.1:
>ixgbe_init_interrupt_scheme: Multiqueue Enabled: Rx Queue count = 8, Tx
>Queue count = 8
>Feb 23 14:48:24 mfg kernel: [    3.782699] ixgbe 0000:01:00.1: (PCI
>Express:5.0Gb/s:Width x8) 00:12:34:56:78:64
>Feb 23 14:48:24 mfg kernel: [    3.790151] ixgbe 0000:01:00.1: MAC: 2,
>PHY: 0, PBA No: ffffff-0ff
>Feb 23 14:48:24 mfg kernel: [    3.808314] ixgbe 0000:01:00.1: Intel(R)
>10 Gigabit Network Connection
>Feb 23 14:48:24 mfg kernel: [    3.814831] ixgbe 0000:02:00.0: PCI INT A
>-> GSI 35 (level, low) -> IRQ 35
>Feb 23 14:48:24 mfg kernel: [    3.910802] ixgbe: 0000:02:00.0:
>ixgbe_init_interrupt_scheme: Multiqueue Enabled: Rx Queue count = 8, Tx
>Queue count = 8
>Feb 23 14:48:24 mfg kernel: [    3.921636] ixgbe 0000:02:00.0: (PCI
>Express:5.0Gb/s:Width x8) 00:12:34:56:78:52
>Feb 23 14:48:24 mfg kernel: [    3.929087] ixgbe 0000:02:00.0: MAC: 2,
>PHY: 0, PBA No: ffffff-0ff
>Feb 23 14:48:24 mfg kernel: [    3.947246] ixgbe 0000:02:00.0: Intel(R)
>10 Gigabit Network Connection
>Feb 23 14:48:24 mfg kernel: [    3.953761] ixgbe 0000:02:00.1: PCI INT B
>-> GSI 36 (level, low) -> IRQ 36
>Feb 23 14:48:24 mfg kernel: [    4.049731] ixgbe: 0000:02:00.1:
>ixgbe_init_interrupt_scheme: Multiqueue Enabled: Rx Queue count = 8, Tx
>Queue count = 8
>Feb 23 14:48:24 mfg kernel: [    4.060566] ixgbe 0000:02:00.1: (PCI
>Express:5.0Gb/s:Width x8) 00:12:34:56:78:53
>Feb 23 14:48:24 mfg kernel: [    4.068018] ixgbe 0000:02:00.1: MAC: 2,
>PHY: 0, PBA No: ffffff-0ff
>Feb 23 14:48:24 mfg kernel: [    4.086179] ixgbe 0000:02:00.1: Intel(R)
>10 Gigabit Network Connection
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

By any chance have you tried it with a newer driver?  Latest Source Forge is 3.2.9.

Also do you only see this link flap on boot, can you recreate the flap by unload and loading the ixgbe module?

Thanks,
-Don Skidmore <donald.c.skidmore@intel.com>

^ permalink raw reply

* Re: [RFC] be2net: add rxhash support
From: Eric Dumazet @ 2011-02-25 18:21 UTC (permalink / raw)
  To: Ajit Khaparde; +Cc: netdev
In-Reply-To: <20110225174425.GA11203@akhaparde-VBox>

Le vendredi 25 février 2011 à 11:44 -0600, Ajit Khaparde a écrit :
> > -----Original Message-----
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Sent: Thursday, February 24, 2011 2:25 PM
> > To: Khaparde, Ajit
> > Cc: netdev@vger.kernel.org
> > Subject: [RFC] be2net: add rxhash support
> > 
> > Ajit, it seems be2net provides RSS hash value in rx compl descriptor ?
> > 
> > Could we feed skb->rxhash with it ?
> > 
> > Thanks !
> Thanks Eric. Sure.
> This is a long pending change which fell through the cracks.
> But then because hashing is enabled in the device only when
> Number of Rx Queues is > 1, I would suggest the following patch.
> 
> Unaware of exact conventions, I have added signed-off-by to the patch already.
> 
> Thanks
> 
> -----
> [PATCH net-next] be2net: add rxhash support
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
> ---
>  drivers/net/benet/be_main.c |   11 +++++++++++
>  1 files changed, 11 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
> index 26f9c56..8c4b782 100644
> --- a/drivers/net/benet/be_main.c
> +++ b/drivers/net/benet/be_main.c
> @@ -1038,6 +1038,10 @@ static void be_rx_compl_process(struct be_adapter *adapter,
>  
>  	skb->truesize = skb->len + sizeof(struct sk_buff);
>  	skb->protocol = eth_type_trans(skb, adapter->netdev);
> +	if (adapter->netdev->features & NETIF_F_RXHASH)
> +		skb->rxhash = AMAP_GET_BITS(struct amap_eth_rx_compl,
> +					rsshash, rxcp);
> +
>  
>  	vlanf = AMAP_GET_BITS(struct amap_eth_rx_compl, vtp, rxcp);
>  	vtm = AMAP_GET_BITS(struct amap_eth_rx_compl, vtm, rxcp);
> @@ -1099,6 +1103,10 @@ static void be_rx_compl_process_gro(struct be_adapter *adapter,
>  		return;
>  	}
>  
> +	if (adapter->netdev->features & NETIF_F_RXHASH)
> +		skb->rxhash = AMAP_GET_BITS(struct amap_eth_rx_compl,
> +						rsshash, rxcp);
> +
>  	remaining = pkt_size;
>  	for (i = 0, j = -1; i < num_rcvd; i++) {
>  		page_info = get_rx_page_info(adapter, rxo, rxq_idx);
> @@ -2619,6 +2627,9 @@ static void be_netdev_init(struct net_device *netdev)
>  		NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
>  		NETIF_F_GRO | NETIF_F_TSO6;
>  
> +	if (be_multi_rxq(adapter))
> +		netdev->features |= NETIF_F_RXHASH;
> +
>  	netdev->vlan_features |= NETIF_F_SG | NETIF_F_TSO |
>  		NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
>  


I added some traces, and I am not sure its OK :

With one active tcp flow, I got different rxhash values :

[ 1064.674253] rxhash=bbd37952 rsshp=1 bank=1
[ 1064.738104] rxhash=37acd31d rsshp=1 bank=1
[ 1064.741684] rxhash=bbd37952 rsshp=1 bank=1
[ 1064.874283] rxhash=bbd37952 rsshp=1 bank=1
[ 1064.940201] rxhash=bbd37952 rsshp=1 bank=1
[ 1064.955278] rxhash=b668ace2 rsshp=1 bank=1
[ 1065.080028] rxhash=bbd37952 rsshp=1 bank=1
[ 1065.153360] rxhash=bbd37952 rsshp=1 bank=1
[ 1065.293164] rxhash=bbd37952 rsshp=1 bank=1
[ 1065.401862] rxhash=bbd37952 rsshp=1 bank=1
[ 1065.460506] rxhash=bbd37952 rsshp=1 bank=1
[ 1065.519980] rxhash=bbd37952 rsshp=1 bank=1
[ 1065.650160] rxhash=bbd37952 rsshp=1 bank=1
[ 1065.717585] rxhash=bbd37952 rsshp=1 bank=1
[ 1065.730909] rxhash=37acd31d rsshp=1 bank=1
[ 1065.840350] rxhash=bbd37952 rsshp=1 bank=1
[ 1065.900704] rxhash=bbd37952 rsshp=1 bank=1
[ 1065.931526] rxhash=b668ace2 rsshp=1 bank=1
[ 1066.503657] rxhash=bbd37952 rsshp=1 bank=1
[ 1066.570138] rxhash=bbd37952 rsshp=1 bank=1

How is it possible ?

(I have a VLAN config on top of a bonding)




^ permalink raw reply

* Re: [PATCH ref0] net: add Faraday FTMAC100 10/100 Ethernet driver
From: David Miller @ 2011-02-25 18:34 UTC (permalink / raw)
  To: eric.dumazet
  Cc: ratbert.chuang, netdev, linux-kernel, bhutchings, joe, dilinger,
	mirqus, ratbert
In-Reply-To: <1298631127.2659.22.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 25 Feb 2011 11:52:07 +0100

> Le vendredi 25 février 2011 à 17:45 +0800, Po-Yu Chuang a écrit :
> 
>> It's a little faster than v5 now. Thanks.
>> I will submit the current version later.
>> 
>> One more question just curious, why 128 bytes?
> 
> Probably its best for NIU hardware specs
> 
> You could try 64, as it should be enough for most IP/TCP/UDP processing.

IPV6.

^ permalink raw reply

* Re: via-rhine -- VT6105M and checksum offloading
From: Jan Ceuleers @ 2011-02-25 18:35 UTC (permalink / raw)
  To: Roger Luethi; +Cc: Benjamin LaHaise, David Miller, netdev
In-Reply-To: <20110225075303.GA8748@core.hellgate.ch>

On 25/02/11 08:53, Roger Luethi wrote:
> I have a patch to enable hw checksumming (the ethtool hooks are done, but I
> somehow missed the NETIF_F_GRO bit). Care to give it a whirl?

Can you post that, preferably rebased to net-next? Even if Benjamin 
doesn't get 'round to implementing all of the improvements Dave proposes 
perhaps Dave will be clement enough to apply it as-is if it proves to be 
a net positive?

Thanks, Jan

^ permalink raw reply

* Re: [PATCH ref0] net: add Faraday FTMAC100 10/100 Ethernet driver
From: Eric Dumazet @ 2011-02-25 18:45 UTC (permalink / raw)
  To: David Miller
  Cc: ratbert.chuang, netdev, linux-kernel, bhutchings, joe, dilinger,
	mirqus, ratbert, Ajit Khaparde
In-Reply-To: <20110225.103456.226779149.davem@davemloft.net>

Le vendredi 25 février 2011 à 10:34 -0800, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Fri, 25 Feb 2011 11:52:07 +0100
> 
> > Le vendredi 25 février 2011 à 17:45 +0800, Po-Yu Chuang a écrit :
> > 
> >> It's a little faster than v5 now. Thanks.
> >> I will submit the current version later.
> >> 
> >> One more question just curious, why 128 bytes?
> > 
> > Probably its best for NIU hardware specs
> > 
> > You could try 64, as it should be enough for most IP/TCP/UDP processing.
> 
> IPV6.

drivers/net/benet/be.h:70:#define BE_HDR_LEN            64

Maybe we should have a comment somewhere.

CC Ajit Khaparde <ajit.khaparde@emulex.com>

^ permalink raw reply

* Re: [PATCH] don't allow CAP_NET_ADMIN to load non-netdev kernel modules
From: David Miller @ 2011-02-25 18:47 UTC (permalink / raw)
  To: segoon
  Cc: netdev, linux-kernel, kuznet, pekkas, jmorris, yoshfuji, kaber,
	eric.dumazet, therbert, xiaosuo, jesse
In-Reply-To: <20110225151414.GA5211@albatros>

From: Vasiliy Kulikov <segoon@openwall.com>
Date: Fri, 25 Feb 2011 18:14:14 +0300

> Since a8f80e8ff94ecba629542d9b4b5f5a8ee3eb565c any process with
> CAP_NET_ADMIN may load any module from /lib/modules/.  This doesn't mean
> that CAP_NET_ADMIN is a superset of CAP_SYS_MODULE as modules are limited
> to /lib/modules/**.  However, CAP_NET_ADMIN capability shouldn't allow
> anybody load any module not related to networking.

Why go through this naming change, which does break things, instead of
simply adding a capability mask tag or similar to modules somehow.  You
could stick it into a special elf section or similar.

Doesn't that make tons more sense than this?

^ permalink raw reply

* Re: [PATCH ref0] net: add Faraday FTMAC100 10/100 Ethernet driver
From: Eric Dumazet @ 2011-02-25 18:47 UTC (permalink / raw)
  To: David Miller
  Cc: ratbert.chuang, netdev, linux-kernel, bhutchings, joe, dilinger,
	mirqus, ratbert, Ajit Khaparde
In-Reply-To: <1298659538.2659.103.camel@edumazet-laptop>

Le vendredi 25 février 2011 à 19:45 +0100, Eric Dumazet a écrit :
> Le vendredi 25 février 2011 à 10:34 -0800, David Miller a écrit :
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Fri, 25 Feb 2011 11:52:07 +0100
> > 
> > > Le vendredi 25 février 2011 à 17:45 +0800, Po-Yu Chuang a écrit :
> > > 
> > >> It's a little faster than v5 now. Thanks.
> > >> I will submit the current version later.
> > >> 
> > >> One more question just curious, why 128 bytes?
> > > 
> > > Probably its best for NIU hardware specs
> > > 
> > > You could try 64, as it should be enough for most IP/TCP/UDP processing.
> > 
> > IPV6.
> 
> drivers/net/benet/be.h:70:#define BE_HDR_LEN            64
> 
> Maybe we should have a comment somewhere.
> 
> CC Ajit Khaparde <ajit.khaparde@emulex.com>
> 


A compromise would be to use 128 for the allocation, but only copy 64
bytes.

^ permalink raw reply

* RFC v1: sysctl: add sysctl header cookie, share tables between nets
From: Lucian Adrian Grijincu @ 2011-02-25 18:52 UTC (permalink / raw)
  To: David S. Miller, Alexey Dobriyan, Eric W. Biederman,
	Octavian Purdila, netdev

This is a new approach to the "share sysctl tables" RFC series I
posted earlier this month.

In previous patches I proposed deriving 'struct net*' from the parent
ctl_entry's ->extra1 field, but that has seen opposition due to mixing
in information from the dentry cache/fs layers.

In this version, the clt_table_header is extended to hold a cookie at
creation time and pass it to the handlers. By default every
ctl_table_header that is netns specific will store the 'struct net*'
in the cookie.

I could go on with the patch series and share other ctl_tables between
network namespace in the same manner, but I stopped here to not waste
time on a solution that you do not consider applying for reasons I
don't see now.

If you like this, I'll post a full patch series:
* change proc_handler to accept a cookie
* change all proc_handler functions in the kernel to accept a cookie
* apply sysctl table sharing to other tables. Candidates would be:
  nf_conntrack_acct_init_sysctl, nf_conntrack_standalone_init_sysctl,
  unix_sysctl_register, but there may be others I'm not seeing now.

This series is against Linus's 2.6.38-rc6 (plus a few other patches).

 fs/proc/proc_sysctl.c       |   11 +++++++-
 include/linux/sysctl.h      |    8 +++++-
 include/net/ipv6.h          |    6 +---
 include/net/net_namespace.h |   26 ++++++++++++++++++
 kernel/sysctl.c             |   12 +++++---
 net/core/sysctl_net_core.c  |   28 ++-----------------
 net/ipv4/ip_fragment.c      |   34 ++++-------------------
 net/ipv4/route.c            |   36 +++++--------------------
 net/ipv4/sysctl_net_ipv4.c  |   53 ++++++-------------------------------
 net/ipv6/icmp.c             |   17 +----------
 net/ipv6/reassembly.c       |   34 ++++-------------------
 net/ipv6/route.c            |   54 ++++++++++---------------------------
 net/ipv6/sysctl_net_ipv6.c  |   61 +++++-------------------------------------
 net/sysctl_net.c            |   37 ++++++++++++++++++++++++--
 14 files changed, 143 insertions(+), 274 deletions(-)

 * [PATCH 1/9] sysctl: add ctl_header_cookie
 * [PATCH 2/9] sysctl: use ctl_header_cookie in proc_handler
 * [PATCH 3/9] sysctl: add netns_proc_dointvec and similar handlers
 * [PATCH 4/9] sysctl: ipv4: ipfrag: share ip4_frags_ns_ctl_table between nets
 * [PATCH 5/9] sysctl: net: share netns_core_table between nets
 * [PATCH 6/9] sysctl: route: share ipv4_route_flush_table between nets
 * [PATCH 7/9] sysctl: ipv4: share ipv4_net_table between nets
 * [PATCH 8/9] sysctl: ipv6: share ip6_frags_ns_ctl_table between nets
 * [PATCH 9/9] sysctl: ipv6: share ip6_ctl_table, ipv6_icmp_table and ipv6_route_table between nets

^ permalink raw reply

* [PATCH 1/9] sysctl: add ctl_header_cookie
From: Lucian Adrian Grijincu @ 2011-02-25 18:52 UTC (permalink / raw)
  To: David S. Miller, Alexey Dobriyan, Eric W. Biederman,
	Octavian Purdila, netdev
  Cc: Lucian Adrian Grijincu
In-Reply-To: <1298659961-23863-1-git-send-email-lucian.grijincu@gmail.com>

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 include/linux/sysctl.h |    5 ++++-
 kernel/sysctl.c        |   12 ++++++++----
 net/sysctl_net.c       |    6 +++---
 3 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 7bb5cb6..43fed29 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -1048,6 +1048,9 @@ struct ctl_table_header
 	struct ctl_table *attached_by;
 	struct ctl_table *attached_to;
 	struct ctl_table_header *parent;
+	/* Pointer to data that outlives this ctl_table_header.
+	 * Caller responsible to free the cookie. */
+	void *ctl_header_cookie;
 };
 
 /* struct ctl_path describes where in the hierarchy a table is added */
@@ -1058,7 +1061,7 @@ struct ctl_path {
 void register_sysctl_root(struct ctl_table_root *root);
 struct ctl_table_header *__register_sysctl_paths(
 	struct ctl_table_root *root, struct nsproxy *namespaces,
-	const struct ctl_path *path, struct ctl_table *table);
+	const struct ctl_path *path, struct ctl_table *table, void *cookie);
 struct ctl_table_header *register_sysctl_table(struct ctl_table * table);
 struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
 						struct ctl_table *table);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 0f1bd83..31fd587 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -199,6 +199,7 @@ static struct ctl_table_header root_table_header = {
 	.ctl_entry = LIST_HEAD_INIT(sysctl_table_root.default_set.list),
 	.root = &sysctl_table_root,
 	.set = &sysctl_table_root.default_set,
+	.ctl_header_cookie = NULL,
 };
 static struct ctl_table_root sysctl_table_root = {
 	.root_list = LIST_HEAD_INIT(sysctl_table_root.root_list),
@@ -1774,6 +1775,9 @@ static void try_attach(struct ctl_table_header *p, struct ctl_table_header *q)
  * @namespaces: Data to compute which lists of sysctl entries are visible
  * @path: The path to the directory the sysctl table is in.
  * @table: the top-level table structure
+ * @cookie: Pointer to user provided data that must be accessible
+ *  until unregister_sysctl_table. This cookie will be passed to the
+ *  proc_handler.
  *
  * Register a sysctl table hierarchy. @table should be a filled in ctl_table
  * array. A completely 0 filled entry terminates the table.
@@ -1822,9 +1826,8 @@ static void try_attach(struct ctl_table_header *p, struct ctl_table_header *q)
  * to the table header on success.
  */
 struct ctl_table_header *__register_sysctl_paths(
-	struct ctl_table_root *root,
-	struct nsproxy *namespaces,
-	const struct ctl_path *path, struct ctl_table *table)
+	struct ctl_table_root *root, struct nsproxy *namespaces,
+	const struct ctl_path *path, struct ctl_table *table, void *cookie)
 {
 	struct ctl_table_header *header;
 	struct ctl_table *new, **prevp;
@@ -1871,6 +1874,7 @@ struct ctl_table_header *__register_sysctl_paths(
 	header->root = root;
 	sysctl_set_parent(NULL, header->ctl_table);
 	header->count = 1;
+	header->ctl_header_cookie = cookie;
 #ifdef CONFIG_SYSCTL_SYSCALL_CHECK
 	if (sysctl_check_table(namespaces, header->ctl_table)) {
 		kfree(header);
@@ -1911,7 +1915,7 @@ struct ctl_table_header *register_sysctl_paths(const struct ctl_path *path,
 						struct ctl_table *table)
 {
 	return __register_sysctl_paths(&sysctl_table_root, current->nsproxy,
-					path, table);
+				       path, table, NULL);
 }
 
 /**
diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index ca84212..9dadd17 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -109,8 +109,8 @@ struct ctl_table_header *register_net_sysctl_table(struct net *net,
 	struct nsproxy namespaces;
 	namespaces = *current->nsproxy;
 	namespaces.net_ns = net;
-	return __register_sysctl_paths(&net_sysctl_root,
-					&namespaces, path, table);
+	return __register_sysctl_paths(&net_sysctl_root, &namespaces, path,
+				       table, NULL);
 }
 EXPORT_SYMBOL_GPL(register_net_sysctl_table);
 
@@ -118,7 +118,7 @@ struct ctl_table_header *register_net_sysctl_rotable(const
 		struct ctl_path *path, struct ctl_table *table)
 {
 	return __register_sysctl_paths(&net_sysctl_ro_root,
-			&init_nsproxy, path, table);
+				       &init_nsproxy, path, table, NULL);
 }
 EXPORT_SYMBOL_GPL(register_net_sysctl_rotable);
 
-- 
1.7.4.rc1.7.g2cf08.dirty

^ permalink raw reply related

* [PATCH 2/9] sysctl: use ctl_header_cookie in proc_handler
From: Lucian Adrian Grijincu @ 2011-02-25 18:52 UTC (permalink / raw)
  To: David S. Miller, Alexey Dobriyan, Eric W. Biederman,
	Octavian Purdila, netdev
  Cc: Lucian Adrian Grijincu
In-Reply-To: <1298659961-23863-1-git-send-email-lucian.grijincu@gmail.com>

TODO: if this patch series gets a positive feedback this patch will be
extended with a kernel-wide change of each proc_handler to add a
'cookie' argument.

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 fs/proc/proc_sysctl.c  |   11 ++++++++++-
 include/linux/sysctl.h |    3 +++
 2 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index 09a1f92..85b6b75 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -135,6 +135,7 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf,
 	struct inode *inode = filp->f_path.dentry->d_inode;
 	struct ctl_table_header *head = grab_header(inode);
 	struct ctl_table *table = PROC_I(inode)->sysctl_entry;
+	proc_handler_cookie *phc = (proc_handler_cookie *) table->proc_handler;
 	ssize_t error;
 	size_t res;
 
@@ -156,7 +157,15 @@ static ssize_t proc_sys_call_handler(struct file *filp, void __user *buf,
 
 	/* careful: calling conventions are nasty here */
 	res = count;
-	error = table->proc_handler(table, write, buf, &res, ppos);
+	/*XXX Most handlers only use the first 5 arguments (without
+	 *XXX @cookie). Changing all handlers is too much of work,
+	 *XXX as this is only a RFC patch at the moment.
+	 *XXX
+	 *XXX This is just a HACK for now, I did this this way to not
+	 *XXX waste time changing all the handlers, in the final version
+	 *XXX I'll change all the handlers if there's not other solution.
+	 */
+	error = phc(table, write, buf, &res, ppos, head->ctl_header_cookie);
 	if (!error)
 		error = res;
 out:
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 43fed29..3d21832 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -963,6 +963,9 @@ typedef struct ctl_table ctl_table;
 
 typedef int proc_handler (struct ctl_table *ctl, int write,
 			  void __user *buffer, size_t *lenp, loff_t *ppos);
+typedef int proc_handler_cookie(struct ctl_table *ctl, int write,
+				void __user *buffer, size_t *lenp,
+				loff_t *ppos, void *ctl_header_cookie);
 
 extern int proc_dostring(struct ctl_table *, int,
 			 void __user *, size_t *, loff_t *);
-- 
1.7.4.rc1.7.g2cf08.dirty

^ permalink raw reply related

* [PATCH 3/9] sysctl: add netns_proc_dointvec and similar handlers
From: Lucian Adrian Grijincu @ 2011-02-25 18:52 UTC (permalink / raw)
  To: David S. Miller, Alexey Dobriyan, Eric W. Biederman,
	Octavian Purdila, netdev
  Cc: Lucian Adrian Grijincu
In-Reply-To: <1298659961-23863-1-git-send-email-lucian.grijincu@gmail.com>

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 include/net/net_namespace.h |   26 ++++++++++++++++++++++++++
 net/sysctl_net.c            |   31 +++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+), 0 deletions(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 1bf812b..0b7d37d 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -276,4 +276,30 @@ extern struct ctl_table_header *register_net_sysctl_rotable(
 	const struct ctl_path *path, struct ctl_table *table);
 extern void unregister_net_sysctl_table(struct ctl_table_header *header);
 
+/* similar to the versions without 'netns', with these remarks:
+ * - these handlers receive as cookie a 'struct net*'
+ * - the data field of ctl_table* must be of the form
+ *    &init_net.member1.member2..memberN
+ * - these handlers will call their equivalent handler with a
+ *   ctl_table with data of the form: net->member1.member2..memberN
+ */
+extern int netns_proc_dostring(struct ctl_table *,
+		int, void __user *, size_t *, loff_t *, void *net);
+extern int netns_proc_dointvec(struct ctl_table *, int,
+		void __user *, size_t *, loff_t *, void *net);
+extern int netns_proc_dointvec_minmax(struct ctl_table *, int,
+		void __user *, size_t *, loff_t *, void *net);
+extern int netns_proc_dointvec_jiffies(struct ctl_table *, int,
+		void __user *, size_t *, loff_t *, void *net);
+extern int netns_proc_dointvec_userhz_jiffies(struct ctl_table *, int,
+		void __user *, size_t *, loff_t *, void *net);
+extern int netns_proc_dointvec_ms_jiffies(struct ctl_table *, int,
+		void __user *, size_t *, loff_t *, void *net);
+extern int netns_proc_doulongvec_minmax(struct ctl_table *, int,
+		void __user *, size_t *, loff_t *, void *net);
+extern int netns_proc_doulongvec_ms_jiffies_minmax(struct ctl_table *table, int,
+		void __user *, size_t *, loff_t *, void *net);
+extern int netns_proc_do_large_bitmap(struct ctl_table *, int,
+		void __user *, size_t *, loff_t *, void *net);
+
 #endif /* __NET_NET_NAMESPACE_H */
diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index 9dadd17..60b36ad 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -127,3 +127,34 @@ void unregister_net_sysctl_table(struct ctl_table_header *header)
 	unregister_sysctl_table(header);
 }
 EXPORT_SYMBOL_GPL(unregister_net_sysctl_table);
+
+
+
+static int netns_proc_wrapper(struct ctl_table *table, int write,
+			      void __user *buffer, size_t *lenp,
+			      loff_t *ppos, void *net, proc_handler proc_handler)
+{
+	struct ctl_table tmp = *table;
+	tmp.data += (char *)net - (char *)&init_net;
+	return ((proc_handler_cookie*) proc_handler)(&tmp, write, buffer, lenp, ppos, NULL);
+}
+
+
+#define NETNS_PROC_WRAP(handler_name)					\
+	int netns_##handler_name(struct ctl_table *table, int write,	\
+				 void __user *buffer, size_t *lenp,	\
+				 loff_t *ppos, void *net)		\
+	{								\
+		return netns_proc_wrapper(table, write, buffer, lenp,	\
+					  ppos, net, handler_name);	\
+	}								\
+	EXPORT_SYMBOL_GPL(netns_##handler_name);
+
+NETNS_PROC_WRAP(proc_dointvec);
+NETNS_PROC_WRAP(proc_dointvec_minmax);
+NETNS_PROC_WRAP(proc_dointvec_jiffies);
+NETNS_PROC_WRAP(proc_dointvec_userhz_jiffies);
+NETNS_PROC_WRAP(proc_dointvec_ms_jiffies);
+NETNS_PROC_WRAP(proc_doulongvec_minmax)
+NETNS_PROC_WRAP(proc_doulongvec_ms_jiffies_minmax);
+NETNS_PROC_WRAP(proc_do_large_bitmap);
-- 
1.7.4.rc1.7.g2cf08.dirty

^ permalink raw reply related

* [PATCH 4/9] sysctl: ipv4: ipfrag: share ip4_frags_ns_ctl_table between nets
From: Lucian Adrian Grijincu @ 2011-02-25 18:52 UTC (permalink / raw)
  To: David S. Miller, Alexey Dobriyan, Eric W. Biederman,
	Octavian Purdila, netdev
  Cc: Lucian Adrian Grijincu
In-Reply-To: <1298659961-23863-1-git-send-email-lucian.grijincu@gmail.com>

The only reason we were creating a copy of this table was to set
->data to point to data from within the newly created net. The
netns_proc_* handlers do this dynamically.

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 net/ipv4/ip_fragment.c |   34 ++++++----------------------------
 net/sysctl_net.c       |    2 +-
 2 files changed, 7 insertions(+), 29 deletions(-)

diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index a1151b8..ffca3cc 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -677,21 +677,21 @@ static struct ctl_table ip4_frags_ns_ctl_table[] = {
 		.data		= &init_net.ipv4.frags.high_thresh,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec
+		.proc_handler	= (proc_handler *) netns_proc_dointvec
 	},
 	{
 		.procname	= "ipfrag_low_thresh",
 		.data		= &init_net.ipv4.frags.low_thresh,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec
+		.proc_handler	= (proc_handler *) netns_proc_dointvec
 	},
 	{
 		.procname	= "ipfrag_time",
 		.data		= &init_net.ipv4.frags.timeout,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec_jiffies,
+		.proc_handler	= (proc_handler *) netns_proc_dointvec_jiffies,
 	},
 	{ }
 };
@@ -717,41 +717,19 @@ static struct ctl_table ip4_frags_ctl_table[] = {
 
 static int __net_init ip4_frags_ns_ctl_register(struct net *net)
 {
-	struct ctl_table *table;
 	struct ctl_table_header *hdr;
-
-	table = ip4_frags_ns_ctl_table;
-	if (!net_eq(net, &init_net)) {
-		table = kmemdup(table, sizeof(ip4_frags_ns_ctl_table), GFP_KERNEL);
-		if (table == NULL)
-			goto err_alloc;
-
-		table[0].data = &net->ipv4.frags.high_thresh;
-		table[1].data = &net->ipv4.frags.low_thresh;
-		table[2].data = &net->ipv4.frags.timeout;
-	}
-
-	hdr = register_net_sysctl_table(net, net_ipv4_ctl_path, table);
+	hdr = register_net_sysctl_table(net, net_ipv4_ctl_path,
+					ip4_frags_ns_ctl_table);
 	if (hdr == NULL)
-		goto err_reg;
+		return -ENOMEM;
 
 	net->ipv4.frags_hdr = hdr;
 	return 0;
-
-err_reg:
-	if (!net_eq(net, &init_net))
-		kfree(table);
-err_alloc:
-	return -ENOMEM;
 }
 
 static void __net_exit ip4_frags_ns_ctl_unregister(struct net *net)
 {
-	struct ctl_table *table;
-
-	table = net->ipv4.frags_hdr->ctl_table_arg;
 	unregister_net_sysctl_table(net->ipv4.frags_hdr);
-	kfree(table);
 }
 
 static void ip4_frags_ctl_register(void)
diff --git a/net/sysctl_net.c b/net/sysctl_net.c
index 60b36ad..d80e9c4 100644
--- a/net/sysctl_net.c
+++ b/net/sysctl_net.c
@@ -110,7 +110,7 @@ struct ctl_table_header *register_net_sysctl_table(struct net *net,
 	namespaces = *current->nsproxy;
 	namespaces.net_ns = net;
 	return __register_sysctl_paths(&net_sysctl_root, &namespaces, path,
-				       table, NULL);
+				       table, net);
 }
 EXPORT_SYMBOL_GPL(register_net_sysctl_table);
 
-- 
1.7.4.rc1.7.g2cf08.dirty

^ permalink raw reply related

* [PATCH 6/9] sysctl: route: share ipv4_route_flush_table between nets
From: Lucian Adrian Grijincu @ 2011-02-25 18:52 UTC (permalink / raw)
  To: David S. Miller, Alexey Dobriyan, Eric W. Biederman,
	Octavian Purdila, netdev
  Cc: Lucian Adrian Grijincu
In-Reply-To: <1298659961-23863-1-git-send-email-lucian.grijincu@gmail.com>

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 net/ipv4/route.c |   36 +++++++-----------------------------
 1 files changed, 7 insertions(+), 29 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 6ed6603..8fd0208 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -3038,19 +3038,18 @@ void ip_rt_multicast_event(struct in_device *in_dev)
 
 #ifdef CONFIG_SYSCTL
 static int ipv4_sysctl_rtcache_flush(ctl_table *__ctl, int write,
-					void __user *buffer,
-					size_t *lenp, loff_t *ppos)
+				     void __user *buffer,
+				     size_t *lenp, loff_t *ppos, void *cookie)
 {
 	if (write) {
 		int flush_delay;
 		ctl_table ctl;
-		struct net *net;
+		struct net *net = (struct net *) cookie;
 
 		memcpy(&ctl, __ctl, sizeof(ctl));
 		ctl.data = &flush_delay;
 		proc_dointvec(&ctl, write, buffer, lenp, ppos);
 
-		net = (struct net *)__ctl->extra1;
 		rt_cache_flush(net, flush_delay);
 		return 0;
 	}
@@ -3191,7 +3190,7 @@ static struct ctl_table ipv4_route_flush_table[] = {
 		.procname	= "flush",
 		.maxlen		= sizeof(int),
 		.mode		= 0200,
-		.proc_handler	= ipv4_sysctl_rtcache_flush,
+		.proc_handler	= (proc_handler *) ipv4_sysctl_rtcache_flush,
 	},
 	{ },
 };
@@ -3205,37 +3204,16 @@ static __net_initdata struct ctl_path ipv4_route_path[] = {
 
 static __net_init int sysctl_route_net_init(struct net *net)
 {
-	struct ctl_table *tbl;
-
-	tbl = ipv4_route_flush_table;
-	if (!net_eq(net, &init_net)) {
-		tbl = kmemdup(tbl, sizeof(ipv4_route_flush_table), GFP_KERNEL);
-		if (tbl == NULL)
-			goto err_dup;
-	}
-	tbl[0].extra1 = net;
-
-	net->ipv4.route_hdr =
-		register_net_sysctl_table(net, ipv4_route_path, tbl);
+	net->ipv4.route_hdr = register_net_sysctl_table(net,
+				ipv4_route_path, ipv4_route_flush_table);
 	if (net->ipv4.route_hdr == NULL)
-		goto err_reg;
+		return -ENOMEM;
 	return 0;
-
-err_reg:
-	if (tbl != ipv4_route_flush_table)
-		kfree(tbl);
-err_dup:
-	return -ENOMEM;
 }
 
 static __net_exit void sysctl_route_net_exit(struct net *net)
 {
-	struct ctl_table *tbl;
-
-	tbl = net->ipv4.route_hdr->ctl_table_arg;
 	unregister_net_sysctl_table(net->ipv4.route_hdr);
-	BUG_ON(tbl == ipv4_route_flush_table);
-	kfree(tbl);
 }
 
 static __net_initdata struct pernet_operations sysctl_route_ops = {
-- 
1.7.4.rc1.7.g2cf08.dirty

^ permalink raw reply related

* [PATCH 7/9] sysctl: ipv4: share ipv4_net_table between nets
From: Lucian Adrian Grijincu @ 2011-02-25 18:52 UTC (permalink / raw)
  To: David S. Miller, Alexey Dobriyan, Eric W. Biederman,
	Octavian Purdila, netdev
  Cc: Lucian Adrian Grijincu
In-Reply-To: <1298659961-23863-1-git-send-email-lucian.grijincu@gmail.com>

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 net/ipv4/sysctl_net_ipv4.c |   53 +++++++------------------------------------
 1 files changed, 9 insertions(+), 44 deletions(-)

diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 1a45665..6fd3279 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -636,49 +636,49 @@ static struct ctl_table ipv4_net_table[] = {
 		.data		= &init_net.ipv4.sysctl_icmp_echo_ignore_all,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec
+		.proc_handler	= (proc_handler *) netns_proc_dointvec
 	},
 	{
 		.procname	= "icmp_echo_ignore_broadcasts",
 		.data		= &init_net.ipv4.sysctl_icmp_echo_ignore_broadcasts,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec
+		.proc_handler	= (proc_handler *) netns_proc_dointvec
 	},
 	{
 		.procname	= "icmp_ignore_bogus_error_responses",
 		.data		= &init_net.ipv4.sysctl_icmp_ignore_bogus_error_responses,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec
+		.proc_handler	= (proc_handler *) netns_proc_dointvec
 	},
 	{
 		.procname	= "icmp_errors_use_inbound_ifaddr",
 		.data		= &init_net.ipv4.sysctl_icmp_errors_use_inbound_ifaddr,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec
+		.proc_handler	= (proc_handler *) netns_proc_dointvec
 	},
 	{
 		.procname	= "icmp_ratelimit",
 		.data		= &init_net.ipv4.sysctl_icmp_ratelimit,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec_ms_jiffies,
+		.proc_handler	= (proc_handler *) netns_proc_dointvec_ms_jiffies,
 	},
 	{
 		.procname	= "icmp_ratemask",
 		.data		= &init_net.ipv4.sysctl_icmp_ratemask,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec
+		.proc_handler	= (proc_handler *) netns_proc_dointvec
 	},
 	{
 		.procname	= "rt_cache_rebuild_count",
 		.data		= &init_net.ipv4.sysctl_rt_cache_rebuild_count,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec
+		.proc_handler	= (proc_handler *) netns_proc_dointvec
 	},
 	{ }
 };
@@ -692,53 +692,18 @@ EXPORT_SYMBOL_GPL(net_ipv4_ctl_path);
 
 static __net_init int ipv4_sysctl_init_net(struct net *net)
 {
-	struct ctl_table *table;
-
-	table = ipv4_net_table;
-	if (!net_eq(net, &init_net)) {
-		table = kmemdup(table, sizeof(ipv4_net_table), GFP_KERNEL);
-		if (table == NULL)
-			goto err_alloc;
-
-		table[0].data =
-			&net->ipv4.sysctl_icmp_echo_ignore_all;
-		table[1].data =
-			&net->ipv4.sysctl_icmp_echo_ignore_broadcasts;
-		table[2].data =
-			&net->ipv4.sysctl_icmp_ignore_bogus_error_responses;
-		table[3].data =
-			&net->ipv4.sysctl_icmp_errors_use_inbound_ifaddr;
-		table[4].data =
-			&net->ipv4.sysctl_icmp_ratelimit;
-		table[5].data =
-			&net->ipv4.sysctl_icmp_ratemask;
-		table[6].data =
-			&net->ipv4.sysctl_rt_cache_rebuild_count;
-	}
-
 	net->ipv4.sysctl_rt_cache_rebuild_count = 4;
 
 	net->ipv4.ipv4_hdr = register_net_sysctl_table(net,
-			net_ipv4_ctl_path, table);
+			net_ipv4_ctl_path, ipv4_net_table);
 	if (net->ipv4.ipv4_hdr == NULL)
-		goto err_reg;
-
+		return -ENOMEM;
 	return 0;
-
-err_reg:
-	if (!net_eq(net, &init_net))
-		kfree(table);
-err_alloc:
-	return -ENOMEM;
 }
 
 static __net_exit void ipv4_sysctl_exit_net(struct net *net)
 {
-	struct ctl_table *table;
-
-	table = net->ipv4.ipv4_hdr->ctl_table_arg;
 	unregister_net_sysctl_table(net->ipv4.ipv4_hdr);
-	kfree(table);
 }
 
 static __net_initdata struct pernet_operations ipv4_sysctl_ops = {
-- 
1.7.4.rc1.7.g2cf08.dirty

^ permalink raw reply related

* [PATCH 8/9] sysctl: ipv6: share ip6_frags_ns_ctl_table between nets
From: Lucian Adrian Grijincu @ 2011-02-25 18:52 UTC (permalink / raw)
  To: David S. Miller, Alexey Dobriyan, Eric W. Biederman,
	Octavian Purdila, netdev
  Cc: Lucian Adrian Grijincu
In-Reply-To: <1298659961-23863-1-git-send-email-lucian.grijincu@gmail.com>

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 net/ipv6/reassembly.c |   34 ++++++----------------------------
 1 files changed, 6 insertions(+), 28 deletions(-)

diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 07beeb0..868cbd5 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -600,21 +600,21 @@ static struct ctl_table ip6_frags_ns_ctl_table[] = {
 		.data		= &init_net.ipv6.frags.high_thresh,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec
+		.proc_handler	= (proc_handler *) netns_proc_dointvec,
 	},
 	{
 		.procname	= "ip6frag_low_thresh",
 		.data		= &init_net.ipv6.frags.low_thresh,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec
+		.proc_handler	= (proc_handler *) netns_proc_dointvec,
 	},
 	{
 		.procname	= "ip6frag_time",
 		.data		= &init_net.ipv6.frags.timeout,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec_jiffies,
+		.proc_handler	= (proc_handler *) netns_proc_dointvec_jiffies,
 	},
 	{ }
 };
@@ -632,42 +632,20 @@ static struct ctl_table ip6_frags_ctl_table[] = {
 
 static int __net_init ip6_frags_ns_sysctl_register(struct net *net)
 {
-	struct ctl_table *table;
 	struct ctl_table_header *hdr;
 
-	table = ip6_frags_ns_ctl_table;
-	if (!net_eq(net, &init_net)) {
-		table = kmemdup(table, sizeof(ip6_frags_ns_ctl_table), GFP_KERNEL);
-		if (table == NULL)
-			goto err_alloc;
-
-		table[0].data = &net->ipv6.frags.high_thresh;
-		table[1].data = &net->ipv6.frags.low_thresh;
-		table[2].data = &net->ipv6.frags.timeout;
-	}
-
-	hdr = register_net_sysctl_table(net, net_ipv6_ctl_path, table);
+	hdr = register_net_sysctl_table(net, net_ipv6_ctl_path,
+					ip6_frags_ns_ctl_table);
 	if (hdr == NULL)
-		goto err_reg;
+		return -ENOMEM;
 
 	net->ipv6.sysctl.frags_hdr = hdr;
 	return 0;
-
-err_reg:
-	if (!net_eq(net, &init_net))
-		kfree(table);
-err_alloc:
-	return -ENOMEM;
 }
 
 static void __net_exit ip6_frags_ns_sysctl_unregister(struct net *net)
 {
-	struct ctl_table *table;
-
-	table = net->ipv6.sysctl.frags_hdr->ctl_table_arg;
 	unregister_net_sysctl_table(net->ipv6.sysctl.frags_hdr);
-	if (!net_eq(net, &init_net))
-		kfree(table);
 }
 
 static struct ctl_table_header *ip6_ctl_header;
-- 
1.7.4.rc1.7.g2cf08.dirty

^ permalink raw reply related

* [PATCH 5/9] sysctl: net: share netns_core_table between nets
From: Lucian Adrian Grijincu @ 2011-02-25 18:52 UTC (permalink / raw)
  To: David S. Miller, Alexey Dobriyan, Eric W. Biederman,
	Octavian Purdila, netdev
  Cc: Lucian Adrian Grijincu
In-Reply-To: <1298659961-23863-1-git-send-email-lucian.grijincu@gmail.com>

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 net/core/sysctl_net_core.c |   28 +++-------------------------
 1 files changed, 3 insertions(+), 25 deletions(-)

diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 385b609..e5a1544 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -182,7 +182,7 @@ static struct ctl_table netns_core_table[] = {
 		.data		= &init_net.core.sysctl_somaxconn,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec
+		.proc_handler	= (proc_handler *) netns_proc_dointvec
 	},
 	{ }
 };
@@ -195,41 +195,19 @@ __net_initdata struct ctl_path net_core_path[] = {
 
 static __net_init int sysctl_core_net_init(struct net *net)
 {
-	struct ctl_table *tbl;
-
 	net->core.sysctl_somaxconn = SOMAXCONN;
 
-	tbl = netns_core_table;
-	if (!net_eq(net, &init_net)) {
-		tbl = kmemdup(tbl, sizeof(netns_core_table), GFP_KERNEL);
-		if (tbl == NULL)
-			goto err_dup;
-
-		tbl[0].data = &net->core.sysctl_somaxconn;
-	}
-
 	net->core.sysctl_hdr = register_net_sysctl_table(net,
-			net_core_path, tbl);
+			net_core_path, netns_core_table);
 	if (net->core.sysctl_hdr == NULL)
-		goto err_reg;
+		return -ENOMEM;
 
 	return 0;
-
-err_reg:
-	if (tbl != netns_core_table)
-		kfree(tbl);
-err_dup:
-	return -ENOMEM;
 }
 
 static __net_exit void sysctl_core_net_exit(struct net *net)
 {
-	struct ctl_table *tbl;
-
-	tbl = net->core.sysctl_hdr->ctl_table_arg;
 	unregister_net_sysctl_table(net->core.sysctl_hdr);
-	BUG_ON(tbl == netns_core_table);
-	kfree(tbl);
 }
 
 static __net_initdata struct pernet_operations sysctl_core_ops = {
-- 
1.7.4.rc1.7.g2cf08.dirty


^ permalink raw reply related

* [PATCH 9/9] sysctl: ipv6: share ip6_ctl_table, ipv6_icmp_table and ipv6_route_table between nets
From: Lucian Adrian Grijincu @ 2011-02-25 18:52 UTC (permalink / raw)
  To: David S. Miller, Alexey Dobriyan, Eric W. Biederman,
	Octavian Purdila, netdev
  Cc: Lucian Adrian Grijincu
In-Reply-To: <1298659961-23863-1-git-send-email-lucian.grijincu@gmail.com>

This patch includes another implementation of the patch from [1]. This
patch will not apply cleanly if that one has been applied.

[1] http://thread.gmane.org/gmane.linux.network/187273

Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
---
 include/net/ipv6.h         |    6 +---
 net/ipv6/icmp.c            |   17 +-----------
 net/ipv6/route.c           |   54 +++++++++++----------------------------
 net/ipv6/sysctl_net_ipv6.c |   61 ++++++--------------------------------------
 4 files changed, 27 insertions(+), 111 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 96e50e0..1526ed6 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -652,11 +652,9 @@ static inline int snmp6_unregister_dev(struct inet6_dev *idev) { return 0; }
 #endif
 
 #ifdef CONFIG_SYSCTL
-extern ctl_table ipv6_route_table_template[];
-extern ctl_table ipv6_icmp_table_template[];
+extern ctl_table ipv6_route_table[];
+extern ctl_table ipv6_icmp_table[];
 
-extern struct ctl_table *ipv6_icmp_sysctl_init(struct net *net);
-extern struct ctl_table *ipv6_route_sysctl_init(struct net *net);
 extern int ipv6_sysctl_register(void);
 extern void ipv6_sysctl_unregister(void);
 extern int ipv6_static_sysctl_register(void);
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 03e62f9..924cb36 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -938,29 +938,16 @@ int icmpv6_err_convert(u8 type, u8 code, int *err)
 EXPORT_SYMBOL(icmpv6_err_convert);
 
 #ifdef CONFIG_SYSCTL
-ctl_table ipv6_icmp_table_template[] = {
+ctl_table ipv6_icmp_table[] = {
 	{
 		.procname	= "ratelimit",
 		.data		= &init_net.ipv6.sysctl.icmpv6_time,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec_ms_jiffies,
+		.proc_handler	= (proc_handler *) netns_proc_dointvec_ms_jiffies,
 	},
 	{ },
 };
 
-struct ctl_table * __net_init ipv6_icmp_sysctl_init(struct net *net)
-{
-	struct ctl_table *table;
-
-	table = kmemdup(ipv6_icmp_table_template,
-			sizeof(ipv6_icmp_table_template),
-			GFP_KERNEL);
-
-	if (table)
-		table[0].data = &net->ipv6.sysctl.icmpv6_time;
-
-	return table;
-}
 #endif
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index a998db6..29e05ca 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2553,11 +2553,11 @@ static const struct file_operations rt6_stats_seq_fops = {
 
 #ifdef CONFIG_SYSCTL
 
-static
-int ipv6_sysctl_rtcache_flush(ctl_table *ctl, int write,
-			      void __user *buffer, size_t *lenp, loff_t *ppos)
+static int netns_ipv6_sysctl_rtcache_flush(ctl_table *ctl, int write,
+					   void __user *buffer, size_t *lenp,
+					   loff_t *ppos, void *cookie)
 {
-	struct net *net = current->nsproxy->net_ns;
+	struct net *net = (struct net *) cookie;
 	int delay = net->ipv6.sysctl.flush_delay;
 	if (write) {
 		proc_dointvec(ctl, write, buffer, lenp, ppos);
@@ -2567,103 +2567,79 @@ int ipv6_sysctl_rtcache_flush(ctl_table *ctl, int write,
 		return -EINVAL;
 }
 
-ctl_table ipv6_route_table_template[] = {
+ctl_table ipv6_route_table[] = {
 	{
 		.procname	=	"flush",
 		.data		=	&init_net.ipv6.sysctl.flush_delay,
 		.maxlen		=	sizeof(int),
 		.mode		=	0200,
-		.proc_handler	=	ipv6_sysctl_rtcache_flush
+		.proc_handler	=	(proc_handler *) netns_ipv6_sysctl_rtcache_flush
 	},
 	{
 		.procname	=	"gc_thresh",
 		.data		=	&ip6_dst_ops_template.gc_thresh,
 		.maxlen		=	sizeof(int),
 		.mode		=	0644,
-		.proc_handler	=	proc_dointvec,
+		.proc_handler	=	(proc_handler *) netns_proc_dointvec,
 	},
 	{
 		.procname	=	"max_size",
 		.data		=	&init_net.ipv6.sysctl.ip6_rt_max_size,
 		.maxlen		=	sizeof(int),
 		.mode		=	0644,
-		.proc_handler	=	proc_dointvec,
+		.proc_handler	=	(proc_handler *) netns_proc_dointvec,
 	},
 	{
 		.procname	=	"gc_min_interval",
 		.data		=	&init_net.ipv6.sysctl.ip6_rt_gc_min_interval,
 		.maxlen		=	sizeof(int),
 		.mode		=	0644,
-		.proc_handler	=	proc_dointvec_jiffies,
+		.proc_handler	=	(proc_handler *) netns_proc_dointvec_jiffies,
 	},
 	{
 		.procname	=	"gc_timeout",
 		.data		=	&init_net.ipv6.sysctl.ip6_rt_gc_timeout,
 		.maxlen		=	sizeof(int),
 		.mode		=	0644,
-		.proc_handler	=	proc_dointvec_jiffies,
+		.proc_handler	=	(proc_handler *) netns_proc_dointvec_jiffies,
 	},
 	{
 		.procname	=	"gc_interval",
 		.data		=	&init_net.ipv6.sysctl.ip6_rt_gc_interval,
 		.maxlen		=	sizeof(int),
 		.mode		=	0644,
-		.proc_handler	=	proc_dointvec_jiffies,
+		.proc_handler	=	(proc_handler *) netns_proc_dointvec_jiffies,
 	},
 	{
 		.procname	=	"gc_elasticity",
 		.data		=	&init_net.ipv6.sysctl.ip6_rt_gc_elasticity,
 		.maxlen		=	sizeof(int),
 		.mode		=	0644,
-		.proc_handler	=	proc_dointvec,
+		.proc_handler	=	(proc_handler *) netns_proc_dointvec,
 	},
 	{
 		.procname	=	"mtu_expires",
 		.data		=	&init_net.ipv6.sysctl.ip6_rt_mtu_expires,
 		.maxlen		=	sizeof(int),
 		.mode		=	0644,
-		.proc_handler	=	proc_dointvec_jiffies,
+		.proc_handler	=	(proc_handler *) netns_proc_dointvec_jiffies,
 	},
 	{
 		.procname	=	"min_adv_mss",
 		.data		=	&init_net.ipv6.sysctl.ip6_rt_min_advmss,
 		.maxlen		=	sizeof(int),
 		.mode		=	0644,
-		.proc_handler	=	proc_dointvec,
+		.proc_handler	=	(proc_handler *) netns_proc_dointvec,
 	},
 	{
 		.procname	=	"gc_min_interval_ms",
 		.data		=	&init_net.ipv6.sysctl.ip6_rt_gc_min_interval,
 		.maxlen		=	sizeof(int),
 		.mode		=	0644,
-		.proc_handler	=	proc_dointvec_ms_jiffies,
+		.proc_handler	=	(proc_handler *) netns_proc_dointvec_ms_jiffies,
 	},
 	{ }
 };
-
-struct ctl_table * __net_init ipv6_route_sysctl_init(struct net *net)
-{
-	struct ctl_table *table;
-
-	table = kmemdup(ipv6_route_table_template,
-			sizeof(ipv6_route_table_template),
-			GFP_KERNEL);
-
-	if (table) {
-		table[0].data = &net->ipv6.sysctl.flush_delay;
-		table[1].data = &net->ipv6.ip6_dst_ops.gc_thresh;
-		table[2].data = &net->ipv6.sysctl.ip6_rt_max_size;
-		table[3].data = &net->ipv6.sysctl.ip6_rt_gc_min_interval;
-		table[4].data = &net->ipv6.sysctl.ip6_rt_gc_timeout;
-		table[5].data = &net->ipv6.sysctl.ip6_rt_gc_interval;
-		table[6].data = &net->ipv6.sysctl.ip6_rt_gc_elasticity;
-		table[7].data = &net->ipv6.sysctl.ip6_rt_mtu_expires;
-		table[8].data = &net->ipv6.sysctl.ip6_rt_min_advmss;
-		table[9].data = &net->ipv6.sysctl.ip6_rt_gc_min_interval;
-	}
-
-	return table;
-}
 #endif
 
 static int __net_init ip6_route_net_init(struct net *net)
diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
index 7cb65ef..cd15483 100644
--- a/net/ipv6/sysctl_net_ipv6.c
+++ b/net/ipv6/sysctl_net_ipv6.c
@@ -17,25 +17,25 @@
 
 static struct ctl_table empty[1];
 
-static ctl_table ipv6_table_template[] = {
+static ctl_table ipv6_table[] = {
 	{
 		.procname	= "route",
 		.maxlen		= 0,
 		.mode		= 0555,
-		.child		= ipv6_route_table_template
+		.child		= ipv6_route_table
 	},
 	{
 		.procname	= "icmp",
 		.maxlen		= 0,
 		.mode		= 0555,
-		.child		= ipv6_icmp_table_template
+		.child		= ipv6_icmp_table
 	},
 	{
 		.procname	= "bindv6only",
 		.data		= &init_net.ipv6.sysctl.bindv6only,
 		.maxlen		= sizeof(int),
 		.mode		= 0644,
-		.proc_handler	= proc_dointvec
+		.proc_handler	= (proc_handler *) netns_proc_dointvec
 	},
 	{
 		.procname	= "neigh",
@@ -66,62 +66,17 @@ EXPORT_SYMBOL_GPL(net_ipv6_ctl_path);
 
 static int __net_init ipv6_sysctl_net_init(struct net *net)
 {
-	struct ctl_table *ipv6_table;
-	struct ctl_table *ipv6_route_table;
-	struct ctl_table *ipv6_icmp_table;
-	int err;
-
-	err = -ENOMEM;
-	ipv6_table = kmemdup(ipv6_table_template, sizeof(ipv6_table_template),
-			     GFP_KERNEL);
-	if (!ipv6_table)
-		goto out;
-
-	ipv6_route_table = ipv6_route_sysctl_init(net);
-	if (!ipv6_route_table)
-		goto out_ipv6_table;
-	ipv6_table[0].child = ipv6_route_table;
-
-	ipv6_icmp_table = ipv6_icmp_sysctl_init(net);
-	if (!ipv6_icmp_table)
-		goto out_ipv6_route_table;
-	ipv6_table[1].child = ipv6_icmp_table;
-
-	ipv6_table[2].data = &net->ipv6.sysctl.bindv6only;
-
-	net->ipv6.sysctl.table = register_net_sysctl_table(net, net_ipv6_ctl_path,
-							   ipv6_table);
+	net->ipv6.sysctl.table = register_net_sysctl_table(net,
+				   net_ipv6_ctl_path, ipv6_table);
 	if (!net->ipv6.sysctl.table)
-		goto out_ipv6_icmp_table;
-
-	err = 0;
-out:
-	return err;
+		return -ENOMEM;
 
-out_ipv6_icmp_table:
-	kfree(ipv6_icmp_table);
-out_ipv6_route_table:
-	kfree(ipv6_route_table);
-out_ipv6_table:
-	kfree(ipv6_table);
-	goto out;
+	return 0;
 }
 
 static void __net_exit ipv6_sysctl_net_exit(struct net *net)
 {
-	struct ctl_table *ipv6_table;
-	struct ctl_table *ipv6_route_table;
-	struct ctl_table *ipv6_icmp_table;
-
-	ipv6_table = net->ipv6.sysctl.table->ctl_table_arg;
-	ipv6_route_table = ipv6_table[0].child;
-	ipv6_icmp_table = ipv6_table[1].child;
-
 	unregister_net_sysctl_table(net->ipv6.sysctl.table);
-
-	kfree(ipv6_table);
-	kfree(ipv6_route_table);
-	kfree(ipv6_icmp_table);
 }
 
 static struct pernet_operations ipv6_sysctl_net_ops = {
-- 
1.7.4.rc1.7.g2cf08.dirty


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox