Netdev List
 help / color / mirror / Atom feed
* Re: Maximum no of bytes Ethernet can transfer at a time ??
From: Ajit @ 2011-04-29  8:53 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20110428110634.GA4165@hmsreliant.think-freely.org>

Neil Horman <nhorman <at> tuxdriver.com> writes:

I forgot to mention one more thing,
In the sender code you need to edit the mac of the target machine.

you will find it at the start of the code.

Also take care of eth0 to eth1 in the receiver code as per the need.

Thank you.




^ permalink raw reply

* Re: [PATCH] bridge: Module use count must be updated as bridges are created/destroyed
From: David Miller @ 2011-04-29  8:44 UTC (permalink / raw)
  To: JBeulich; +Cc: shemminger, bridge, jeffm, netdev
In-Reply-To: <4DBA937F020000780003ED96@vpn.id2.novell.com>

From: "Jan Beulich" <JBeulich@novell.com>
Date: Fri, 29 Apr 2011 09:31:27 +0100

>>>> On 29.04.11 at 10:10, David Miller <davem@davemloft.net> wrote:
>> From: "Jan Beulich" <JBeulich@novell.com>
>> Date: Fri, 29 Apr 2011 08:41:10 +0100
>> 
>>> You talk of rmmod on the very module, but the issue is about
>>> modprobe -r on a dependent module. I cannot believe you consider
>>> it correct that *implicit* unloading of bridge.ko should happen when
>>> bridges are configured.
>> 
>> Which module in particular depends upon bridge and causes the
>> problem?
> 
> The problem was observed (a long time ago) with ebtable_broute,
> and I cannot see how this would have changed meanwhile.

Well your change makes it so that someone who actually _wants_ to
unload the bridge module, regardless of configuration, cannot do so.

I think that's a worse problem than this ebtables thing.

Nothing on the system should be hitting modules with unload requests
unless the user explicitly asked for that specific module to be
unloaded.  At least not by default.

So the me the problem is perhaps that "modprobe -r" does this auto
dependency unloading thing by default.

When we first fixed network device drivers so that they now properly
always run with no module refcount at all, people complained because
there were some distributions that ran some daemon that periodically
looked for "unreferenced" modules and "helped" the user by
automatically unloaded them.

We killed that foolish daemon, and we can fix "modprobe -r" too.

Does "rmmod" have this behavior too?  If not, and it does the right
thing by only unloaded what the user asked for, then people should
use that.

I really don't in any way want to block people from being able to
cleanly unload the bridge module, regardless of configuration, if
that's what they want so your patch as written is not going to be
considered for inclusion.


^ permalink raw reply

* Re: Maximum no of bytes Ethernet can transfer at a time ??
From: Ajit @ 2011-04-29  8:39 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20110428110634.GA4165@hmsreliant.think-freely.org>

Neil Horman <nhorman <at> tuxdriver.com> writes:


> > 
> Can you just post a link to the code?
> Neil
> 
> > Thank you. 
> > 
> > 

here are the links to the code..

sender code : http://pastebin.com/se17Xn96

The command line arguments are like ./pgm_name eth0 /root/abc

where abc is the file name.

receiver code : http://pastebin.com/AKiUWzwR

just run it.

In the receiver code I have assigned offset to buffer[15], it was just for
debugging purpose. Actually offset must have 16 bits of buffer[15] and
buffer[16], but for files of around 100kb-200kb it will easily work..no need to
modify.

Thank you.


^ permalink raw reply

* [PATCH] usbnet: runtime pm: fix out of memory
From: tom.leiming-Re5JQEeQqe8AvxtiuMwx3w @ 2011-04-29  8:37 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA, linux-usb-u79uwXL29TY76Z2rM5mHXA,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: Ming Lei

From: Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

This patch makes use of the EVENT_DEV_OPEN flag introduced recently to
fix one out of memory issue, which can be reproduced on omap3/4 based
pandaboard/beagle XM easily with steps below:

	- enable runtime pm
	echo auto > /sys/devices/platform/usbhs-omap.0/ehci-omap.0/usb1/1-1/1-1.1/power/control

	- ifconfig eth0 up

	- then out of memroy happened, see [1] for kernel message.

Follows my analysis:
	- 'ifconfig eth0 up' brings eth0 out of suspend, and usbnet_resume
	is called to schedule dev->bh, then rx urbs are submited to prepare for
	recieving data;

	- some usbnet devices will produce garbage rx packets flood if
	info->reset is not called in usbnet_open.

	- so there is no enough chances for usbnet_bh to handle and release
	recieved skb buffers since many rx interrupts consumes cpu, so out of memory
	for atomic allocation in rx_submit happened.

This patch fixes the issue by simply not allowing schedule of usbnet_bh until device
is opened.

[1], dmesg
[  234.712005] smsc95xx 1-1.1:1.0: rpm_resume flags 0x4
[  234.712066] usb 1-1.1: rpm_resume flags 0x0
[  234.712066] usb 1-1: rpm_resume flags 0x0
[  234.712097] usb usb1: rpm_resume flags 0x0
[  234.712127] usb usb1: usb auto-resume
[  234.712158] ehci-omap ehci-omap.0: resume root hub
[  234.754028] hub 1-0:1.0: hub_resume
[  234.754821] hub 1-0:1.0: port 1: status 0507 change 0000
[  234.756011] hub 1-0:1.0: state 7 ports 3 chg 0000 evt 0000
[  234.756042] hub 1-0:1.0: rpm_resume flags 0x4
[  234.756072] usb usb1: rpm_resume flags 0x0
[  234.756164] usb usb1: rpm_resume returns 1
[  234.756195] hub 1-0:1.0: rpm_resume returns 0
[  234.756195] hub 1-0:1.0: rpm_suspend flags 0x4
[  234.756225] hub 1-0:1.0: rpm_suspend returns 0
[  234.756256] usb usb1: rpm_resume returns 0
[  234.757141] usb 1-1: usb auto-resume
[  234.793151] ehci-omap ehci-omap.0: GetStatus port:1 status 001005 0  ACK POWER sig=se0 PE CONNECT
[  234.816558] usb 1-1: finish resume
[  234.817871] hub 1-1:1.0: hub_resume
[  234.818420] hub 1-1:1.0: port 1: status 0507 change 0000
[  234.820495] ehci-omap ehci-omap.0: reused qh eec50220 schedule
[  234.820495] usb 1-1: link qh256-0001/eec50220 start 1 [1/0 us]
[  234.820587] usb 1-1: rpm_resume returns 0
[  234.820800] hub 1-1:1.0: state 7 ports 5 chg 0000 evt 0000
[  234.820800] hub 1-1:1.0: rpm_resume flags 0x4
[  234.820831] hub 1-1:1.0: rpm_resume returns 0
[  234.820861] hub 1-1:1.0: rpm_suspend flags 0x4
[  234.820861] hub 1-1:1.0: rpm_suspend returns 0
[  234.821777] usb 1-1.1: usb auto-resume
[  234.868591] hub 1-1:1.0: state 7 ports 5 chg 0000 evt 0002
[  234.868591] hub 1-1:1.0: rpm_resume flags 0x4
[  234.868621] hub 1-1:1.0: rpm_resume returns 0
[  234.868652] hub 1-1:1.0: rpm_suspend flags 0x4
[  234.868652] hub 1-1:1.0: rpm_suspend returns 0
[  234.879486] usb 1-1.1: finish resume
[  234.880279] usb 1-1.1: rpm_resume returns 0
[  234.880310] smsc95xx 1-1.1:1.0: rpm_resume returns 0
[  238.880187] ksoftirqd/0: page allocation failure. order:0, mode:0x20
[  238.880218] Backtrace:
[  238.880249] [<c01b9800>] (dump_backtrace+0x0/0xf8) from [<c065e1dc>] (dump_stack+0x18/0x1c)
[  238.880249]  r6:00000000 r5:00000000 r4:00000020 r3:00000002
[  238.880310] [<c065e1c4>] (dump_stack+0x0/0x1c) from [<c026ece4>] (__alloc_pages_nodemask+0x620/0x724)
[  238.880340] [<c026e6c4>] (__alloc_pages_nodemask+0x0/0x724) from [<c02986d4>] (kmem_getpages.clone.34+0x34/0xc8)
[  238.880371] [<c02986a0>] (kmem_getpages.clone.34+0x0/0xc8) from [<c02988f8>] (cache_grow.clone.42+0x84/0x154)
[  238.880371]  r6:ef871aa4 r5:ef871a80 r4:ef81fd40 r3:00000020
[  238.880401] [<c0298874>] (cache_grow.clone.42+0x0/0x154) from [<c0298b64>] (cache_alloc_refill+0x19c/0x1f0)
[  238.880432] [<c02989c8>] (cache_alloc_refill+0x0/0x1f0) from [<c0299804>] (kmem_cache_alloc+0x90/0x190)
[  238.880462] [<c0299774>] (kmem_cache_alloc+0x0/0x190) from [<c052e260>] (__alloc_skb+0x34/0xe8)
[  238.880493] [<c052e22c>] (__alloc_skb+0x0/0xe8) from [<bf0509f4>] (rx_submit+0x2c/0x1d4 [usbnet])
[  238.880523] [<bf0509c8>] (rx_submit+0x0/0x1d4 [usbnet]) from [<bf050d38>] (rx_complete+0x19c/0x1b0 [usbnet])
[  238.880737] [<bf050b9c>] (rx_complete+0x0/0x1b0 [usbnet]) from [<bf006fd0>] (usb_hcd_giveback_urb+0xa8/0xf4 [usbcore])
[  238.880737]  r8:eeeced34 r7:eeecec00 r6:eeecec00 r5:00000000 r4:eec2dd20
[  238.880767] r3:bf050b9c
[  238.880859] [<bf006f28>] (usb_hcd_giveback_urb+0x0/0xf4 [usbcore]) from [<bf03c8f8>] (ehci_urb_done+0xb0/0xbc [ehci_hcd])
[  238.880859]  r6:00000000 r5:eec2dd20 r4:eeeced44 r3:eec2dd34
[  238.880920] [<bf03c848>] (ehci_urb_done+0x0/0xbc [ehci_hcd]) from [<bf040204>] (qh_completions+0x308/0x3bc [ehci_hcd])
[  238.880920]  r7:00000000 r6:eeda21a0 r5:ffdfe3c0 r4:eeda21ac
[  238.880981] [<bf03fefc>] (qh_completions+0x0/0x3bc [ehci_hcd]) from [<bf040ef8>] (scan_async+0xb0/0x16c [ehci_hcd])
[  238.881011] [<bf040e48>] (scan_async+0x0/0x16c [ehci_hcd]) from [<bf040fec>] (ehci_work+0x38/0x90 [ehci_hcd])
[  238.881042] [<bf040fb4>] (ehci_work+0x0/0x90 [ehci_hcd]) from [<bf042940>] (ehci_irq+0x300/0x34c [ehci_hcd])
[  238.881072]  r4:eeeced34 r3:00000001
[  238.881134] [<bf042640>] (ehci_irq+0x0/0x34c [ehci_hcd]) from [<bf006828>] (usb_hcd_irq+0x40/0xac [usbcore])
[  238.881195] [<bf0067e8>] (usb_hcd_irq+0x0/0xac [usbcore]) from [<c0239764>] (handle_irq_event_percpu+0xb8/0x240)
[  238.881225]  r6:eec504e0 r5:0000006d r4:eec504e0 r3:bf0067e8
[  238.881256] [<c02396ac>] (handle_irq_event_percpu+0x0/0x240) from [<c0239930>] (handle_irq_event+0x44/0x64)
[  238.881256] [<c02398ec>] (handle_irq_event+0x0/0x64) from [<c023bbd0>] (handle_level_irq+0xe0/0x114)
[  238.881286]  r6:0000006d r5:c080c14c r4:c080c100 r3:00020000
[  238.881317] [<c023baf0>] (handle_level_irq+0x0/0x114) from [<c01ab090>] (asm_do_IRQ+0x90/0xd0)
[  238.881317]  r5:00000000 r4:0000006d
[  238.881347] [<c01ab000>] (asm_do_IRQ+0x0/0xd0) from [<c06624d0>] (__irq_svc+0x50/0x134)
[  238.881378] Exception stack(0xef837e20 to 0xef837e68)
[  238.881378] 7e20: 00000001 00185610 016cc000 c00490c0 eb380000 ef800540 00000020 00004ae0
[  238.881408] 7e40: 00000020 bf0509f4 60000013 ef837e9c ef837e40 ef837e68 c0226f0c c0298ca0
[  238.881408] 7e60: 20000013 ffffffff
[  238.881408]  r5:fa240100 r4:ffffffff
[  238.881439] [<c0298bb8>] (__kmalloc_track_caller+0x0/0x1d0) from [<c052e284>] (__alloc_skb+0x58/0xe8)
[  238.881469] [<c052e22c>] (__alloc_skb+0x0/0xe8) from [<bf0509f4>] (rx_submit+0x2c/0x1d4 [usbnet])
[  238.881500] [<bf0509c8>] (rx_submit+0x0/0x1d4 [usbnet]) from [<bf0513d8>] (usbnet_bh+0x1b4/0x250 [usbnet])
[  238.881530] [<bf051224>] (usbnet_bh+0x0/0x250 [usbnet]) from [<c01f912c>] (tasklet_action+0xb0/0x1f8)
[  238.881530]  r6:00000000 r5:ef9757f0 r4:ef9757ec r3:bf051224
[  238.881561] [<c01f907c>] (tasklet_action+0x0/0x1f8) from [<c01f97ac>] (__do_softirq+0x140/0x290)
[  238.881561]  r8:00000006 r7:00000101 r6:00000000 r5:c0806098 r4:00000001
[  238.881591] r3:c01f907c
[  238.881622] [<c01f966c>] (__do_softirq+0x0/0x290) from [<c01f99cc>] (run_ksoftirqd+0xd0/0x1f4)
[  238.881622] [<c01f98fc>] (run_ksoftirqd+0x0/0x1f4) from [<c02113b0>] (kthread+0x90/0x98)
[  238.881652]  r7:00000013 r6:c01f98fc r5:00000000 r4:ef831efc
[  238.881683] [<c0211320>] (kthread+0x0/0x98) from [<c01f62f4>] (do_exit+0x0/0x374)
[  238.881713]  r6:c01f62f4 r5:c0211320 r4:ef831efc
[  238.881713] Mem-info:
[  238.881744] Normal per-cpu:
[  238.881744] CPU    0: hi:  186, btch:  31 usd:  38
[  238.881744] CPU    1: hi:  186, btch:  31 usd: 169
[  238.881774] HighMem per-cpu:
[  238.881774] CPU    0: hi:   90, btch:  15 usd:  66
[  238.881774] CPU    1: hi:   90, btch:  15 usd:  86
[  238.881805] active_anon:544 inactive_anon:71 isolated_anon:0
[  238.881805]  active_file:926 inactive_file:2538 isolated_file:0
[  238.881805]  unevictable:0 dirty:10 writeback:0 unstable:0
[  238.881805]  free:57782 slab_reclaimable:864 slab_unreclaimable:186898
[  238.881805]  mapped:632 shmem:144 pagetables:50 bounce:0
[  238.881835] Normal free:1328kB min:3532kB low:4412kB high:5296kB active_anon:0kB inactive_anon:0kB active_file:880kB inactive_file:848kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:780288kB mlocked:0kB dirty:36kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:3456kB slab_unreclaimable:747592kB kernel_stack:392kB pagetables:200kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[  238.881866] lowmem_reserve[]: 0 1904 1904
[  238.881896] HighMem free:229800kB min:236kB low:508kB high:784kB active_anon:2176kB inactive_anon:284kB active_file:2824kB inactive_file:9304kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:243712kB mlocked:0kB dirty:4kB writeback:0kB mapped:2528kB shmem:576kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[  238.881927] lowmem_reserve[]: 0 0 0
[  238.881958] Normal: 0*4kB 4*8kB 6*16kB 0*32kB 1*64kB 1*128kB 0*256kB 2*512kB 0*1024kB 0*2048kB 0*4096kB = 1344kB
[  238.882019] HighMem: 6*4kB 2*8kB 4*16kB 4*32kB 1*64kB 1*128kB 0*256kB 2*512kB 3*1024kB 0*2048kB 55*4096kB = 229800kB
[  238.882080] 3610 total pagecache pages
[  238.882080] 0 pages in swap cache
[  238.882080] Swap cache stats: add 0, delete 0, find 0/0
[  238.882110] Free swap  = 0kB
[  238.882110] Total swap = 0kB
[  238.933776] 262144 pages of RAM
[  238.933776] 58240 free pages
[  238.933776] 10503 reserved pages
[  238.933776] 187773 slab pages
[  238.933807] 2475 pages shared
[  238.933807] 0 pages swap cached

Cc: David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
Signed-off-by: Ming Lei <tom.leiming-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
 drivers/net/usb/usbnet.c |   10 +++++++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index 009bba3..9ab439d 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -645,6 +645,7 @@ int usbnet_stop (struct net_device *net)
 	struct driver_info	*info = dev->driver_info;
 	int			retval;
 
+	clear_bit(EVENT_DEV_OPEN, &dev->flags);
 	netif_stop_queue (net);
 
 	netif_info(dev, ifdown, dev->net,
@@ -1524,9 +1525,12 @@ int usbnet_resume (struct usb_interface *intf)
 		smp_mb();
 		clear_bit(EVENT_DEV_ASLEEP, &dev->flags);
 		spin_unlock_irq(&dev->txq.lock);
-		if (!(dev->txq.qlen >= TX_QLEN(dev)))
-			netif_start_queue(dev->net);
-		tasklet_schedule (&dev->bh);
+
+		if (test_bit(EVENT_DEV_OPEN, &dev->flags)) {
+			if (!(dev->txq.qlen >= TX_QLEN(dev)))
+				netif_start_queue(dev->net);
+			tasklet_schedule (&dev->bh);
+		}
 	}
 	return 0;
 }
-- 
1.7.4.1

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH] bridge: Module use count must be updated as bridges are created/destroyed
From: Jan Beulich @ 2011-04-29  8:31 UTC (permalink / raw)
  To: David Miller; +Cc: shemminger, bridge, jeffm, netdev
In-Reply-To: <20110429.011051.183061494.davem@davemloft.net>

>>> On 29.04.11 at 10:10, David Miller <davem@davemloft.net> wrote:
> From: "Jan Beulich" <JBeulich@novell.com>
> Date: Fri, 29 Apr 2011 08:41:10 +0100
> 
>> You talk of rmmod on the very module, but the issue is about
>> modprobe -r on a dependent module. I cannot believe you consider
>> it correct that *implicit* unloading of bridge.ko should happen when
>> bridges are configured.
> 
> Which module in particular depends upon bridge and causes the
> problem?

The problem was observed (a long time ago) with ebtable_broute,
and I cannot see how this would have changed meanwhile.

Jan


^ permalink raw reply

* Re: Re: [v2 PATCH 0/6] IPVS: init and cleanup.
From: Hans Schillstrom @ 2011-04-29  8:17 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: horms, ebiederm, lvs-devel, netdev, netfilter-devel,
	hans.schillstrom

Hello
>
>Hello,
>
>On Fri, 29 Apr 2011, Hans Schillstrom wrote:
>
>> This patch series handles exit from a network name space.
>> 
>> REVISION
>> 
>> This is version 3
>
>	Great! Only one missing lock:

Ooops..

>
>Patch 3:
>	Missing write_[un]lock_bh(&__ip_vs_svc_lock) in ip_vs_flush,
>	now we use ip_vs_unlink_service_nolock.
>
>As you are going to create new version you can look also at these:
>
>Patch 5:
>	Can you rename 'goto exit;' to 'goto cleanup;' or
>	'goto out;'
>
>Patch 6:
>	Add at least one comma after 'enabled' here:
>	netns(%d) enabled first service added
>
I'll send a new set.

Thanks
Hans



^ permalink raw reply

* Re: [PATCH] bridge: Module use count must be updated as bridges are created/destroyed
From: David Miller @ 2011-04-29  8:10 UTC (permalink / raw)
  To: JBeulich; +Cc: shemminger, bridge, jeffm, netdev
In-Reply-To: <4DBA87B6020000780003ED74@vpn.id2.novell.com>

From: "Jan Beulich" <JBeulich@novell.com>
Date: Fri, 29 Apr 2011 08:41:10 +0100

> You talk of rmmod on the very module, but the issue is about
> modprobe -r on a dependent module. I cannot believe you consider
> it correct that *implicit* unloading of bridge.ko should happen when
> bridges are configured.

Which module in particular depends upon bridge and causes the
problem?

^ permalink raw reply

* IPv6 PREFSRC behaves different than IPv4 PREFSRC?
From: Florian Westphal @ 2011-04-29  8:09 UTC (permalink / raw)
  To: netdev; +Cc: Daniel Walter

I tested todays net-next and tried to use the new ipv6 prefsrc
patch that went in this month.

But as far as I can say it does not work, or at least
the behaviour I see is unexpected.

Test setup:
eth0 has addresses 192.168.10.100/24 and dead:1::100/64

With IPv4,
I get this behaviour:

$ ip addr add  192.168.10.101 dev eth0
$ ip route get 192.168.10.1
192.168.10.1 dev eth0  src 192.168.10.100
    cache
$ ip route add 192.168.10.1 src 192.168.10.101 dev eth0
$ ip route get 192.168.10.1
192.168.10.1 dev eth0  src 192.168.10.101
     cache
$ ip route get  192.168.10.2
192.168.10.2 dev eth0  src 192.168.10.100
    cache

And that is what I would have expected... But with ipv6 the behaviour
differs:

ipv6
$ ip route get dead:1::1
dead:1::1 via dead:1::1 dev eth0  src dead:1::100  metric 0
    cache
$ ip addr add dead:1::101/64 dev eth0
$ ip route add dead:1::1 src dead:1::101
$ ip route get dead:1::1
dead:1::1 via dead:1::1 dev eth0  src dead:1::101  metric 0
    cache

$ ip route add dead:1::2 src dead:1::100
$ ip route get dead:1::2
dead:1::2 via dead:1::2 dev eth0  src dead:1::101  metric 0

-> src is dead:1:101, and not 100 as I had expected...

This behaviour is rather irritating.
What am I doing wrong?

Thanks,
Florian

^ permalink raw reply

* Re: [PATCH] bridge: Module use count must be updated as bridges are created/destroyed
From: Jan Beulich @ 2011-04-29  7:41 UTC (permalink / raw)
  To: David Miller; +Cc: shemminger, bridge, jeffm, netdev
In-Reply-To: <20110429.002530.112581952.davem@davemloft.net>

>>> On 29.04.11 at 09:25, David Miller <davem@davemloft.net> wrote:
> From: "Jan Beulich" <JBeulich@novell.com>
> Date: Fri, 29 Apr 2011 08:21:14 +0100
> 
>> Otherwise 'modprobe -r' on a module having a dependency on bridge will
>> implicitly unload bridge, bringing down all connectivity that was using
>> bridges.
>> 
>> Signed-off-by: Jan Beulich <jbeulich@novell.com>
>> Cc: Jeff Mahoney <jeffm@suse.com>
> 
> All network device drivers behave exactly the same way, when you rmmod
> the thing we unconfigure all the routes, addresses, etc. going through
> that device and let you unload it.
> 
> And this behavior is very much intentional.
> 
> Don't add an exception here.

You talk of rmmod on the very module, but the issue is about
modprobe -r on a dependent module. I cannot believe you consider
it correct that *implicit* unloading of bridge.ko should happen when
bridges are configured.

If the solution proposed isn't satisfactory, can you suggest a better
alternative still serving the purpose?

Jan


^ permalink raw reply

* Re: [v2 PATCH 0/6] IPVS: init and cleanup.
From: Julian Anastasov @ 2011-04-29  7:29 UTC (permalink / raw)
  To: Hans Schillstrom
  Cc: horms, ebiederm, lvs-devel, netdev, netfilter-devel,
	hans.schillstrom
In-Reply-To: <1304058334-17594-1-git-send-email-hans@schillstrom.com>


	Hello,

On Fri, 29 Apr 2011, Hans Schillstrom wrote:

> This patch series handles exit from a network name space.
> 
> REVISION
> 
> This is version 3

	Great! Only one missing lock:

Patch 3:
	Missing write_[un]lock_bh(&__ip_vs_svc_lock) in ip_vs_flush,
	now we use ip_vs_unlink_service_nolock.

As you are going to create new version you can look also at these:

Patch 5:
	Can you rename 'goto exit;' to 'goto cleanup;' or
	'goto out;'

Patch 6:
	Add at least one comma after 'enabled' here:
	netns(%d) enabled first service added

> OVERVIEW
> Basically there was three faults in the netns implementation.
> - Kernel threads hold devices and preventing an exit.
> - dst cache holds references to devices.
> - Services was not always released.
> 
> Patch 1 & 3 contains the functionality
>       4 renames funcctions
>       5 removes empty functions
>       6 Debuging.
> 
> IMPLEMENTATION
> - Avoid to increment the usage counter for kernel threads.
>   this is done in the first patch.
> - Patch 3 tries to restore the cleanup order.
>   Add NETDEV_UNREGISTER notification for dst_reset
> 
> Revision 3 
> Residies in patch 3
>         Throttle renamed to enable.
>         Comments from Julian implemented
>         Check enable in ip_vs_in, ip_vs_out and ip_vs_forward_icmp*
>         Remove in ip_vs_in_icmp*.
>         ip_vs_svc_reset() moved into ip_vs_dst_event().
>         ip_vs_service_cleanup() uses ip_vs_flush and mutex lock.
>         ip_vs_unlink_service_nolock() added.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: [PATCH] bridge: Module use count must be updated as bridges are created/destroyed
From: David Miller @ 2011-04-29  7:25 UTC (permalink / raw)
  To: JBeulich; +Cc: shemminger, bridge, jeffm, netdev
In-Reply-To: <4DBA830A020000780003ED5D@vpn.id2.novell.com>

From: "Jan Beulich" <JBeulich@novell.com>
Date: Fri, 29 Apr 2011 08:21:14 +0100

> Otherwise 'modprobe -r' on a module having a dependency on bridge will
> implicitly unload bridge, bringing down all connectivity that was using
> bridges.
> 
> Signed-off-by: Jan Beulich <jbeulich@novell.com>
> Cc: Jeff Mahoney <jeffm@suse.com>

All network device drivers behave exactly the same way, when you rmmod
the thing we unconfigure all the routes, addresses, etc. going through
that device and let you unload it.

And this behavior is very much intentional.

Don't add an exception here.


^ permalink raw reply

* [PATCH] bridge: Module use count must be updated as bridges are created/destroyed
From: Jan Beulich @ 2011-04-29  7:21 UTC (permalink / raw)
  To: shemminger; +Cc: bridge, Jeff Mahoney, netdev

Otherwise 'modprobe -r' on a module having a dependency on bridge will
implicitly unload bridge, bringing down all connectivity that was using
bridges.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Jeff Mahoney <jeffm@suse.com>

---
 net/bridge/br_if.c |    9 +++++++++
 1 file changed, 9 insertions(+)

--- 2.6.39-rc5/net/bridge/br_if.c
+++ 2.6.39-rc5-bridge-module-get-put/net/bridge/br_if.c
@@ -290,6 +290,11 @@ int br_add_bridge(struct net *net, const
 	if (!dev)
 		return -ENOMEM;
 
+	if (!try_module_get(THIS_MODULE)) {
+		free_netdev(dev);
+		return -ENOENT;
+	}
+
 	rtnl_lock();
 	if (strchr(dev->name, '%')) {
 		ret = dev_alloc_name(dev, dev->name);
@@ -308,6 +313,8 @@ int br_add_bridge(struct net *net, const
 		unregister_netdevice(dev);
  out:
 	rtnl_unlock();
+	if (ret)
+		module_put(THIS_MODULE);
 	return ret;
 
 out_free:
@@ -339,6 +346,8 @@ int br_del_bridge(struct net *net, const
 		del_br(netdev_priv(dev), NULL);
 
 	rtnl_unlock();
+	if (ret == 0)
+		module_put(THIS_MODULE);
 	return ret;
 }
 




^ permalink raw reply

* Re: [PATCH 1/2] wireless: Make struct ieee80211_channel const where possible
From: Johannes Berg @ 2011-04-29  7:11 UTC (permalink / raw)
  To: Joe Perches
  Cc: David S. Miller, John W. Linville, linux-wireless, netdev,
	linux-kernel
In-Reply-To: <b953f08a496209fe0fb69166c5175ff270b7efc5.1304054543.git.joe@perches.com>

On Thu, 2011-04-28 at 22:25 -0700, Joe Perches wrote:
> Useful for declarations of static const.

I don't mind marking lots of places const, but it isn't actually as
useful as you think since you can't declare any channels const as they
are modified at runtime for regulatory enforcement.

johannes

^ permalink raw reply

* Re: [v2 PATCH 0/6] IPVS: init and cleanup.
From: Simon Horman @ 2011-04-29  6:42 UTC (permalink / raw)
  To: Hans Schillstrom
  Cc: ja, ebiederm, lvs-devel, netdev, netfilter-devel,
	hans.schillstrom
In-Reply-To: <1304058334-17594-1-git-send-email-hans@schillstrom.com>

On Fri, Apr 29, 2011 at 08:25:28AM +0200, Hans Schillstrom wrote:
> This patch series handles exit from a network name space.

Hi Hans,

thanks for your work on this.
I've looked over the patches and nothing catches my eye -
thanks for fixing up the things that I commented on previously.

I think that patches 1 and 2 should be targeted at 2.6.39,
while the remainder look like 2.6.40 material. Does that
sound reasonable to you?

Julian and Eric,

can you take a look over these?
I would appreciate your review.


^ permalink raw reply

* [v2 PATCH 0/6] IPVS: init and cleanup.
From: Hans Schillstrom @ 2011-04-29  6:25 UTC (permalink / raw)
  To: ja, horms, ebiederm, lvs-devel, netdev, netfilter-devel
  Cc: hans.schillstrom, Hans Schillstrom

This patch series handles exit from a network name space.

REVISION

This is version 3

OVERVIEW
Basically there was three faults in the netns implementation.
- Kernel threads hold devices and preventing an exit.
- dst cache holds references to devices.
- Services was not always released.

Patch 1 & 3 contains the functionality
      4 renames funcctions
      5 removes empty functions
      6 Debuging.

IMPLEMENTATION
- Avoid to increment the usage counter for kernel threads.
  this is done in the first patch.
- Patch 3 tries to restore the cleanup order.
  Add NETDEV_UNREGISTER notification for dst_reset

Revision 3 
Residies in patch 3
        Throttle renamed to enable.
        Comments from Julian implemented
        Check enable in ip_vs_in, ip_vs_out and ip_vs_forward_icmp*
        Remove in ip_vs_in_icmp*.
        ip_vs_svc_reset() moved into ip_vs_dst_event().
        ip_vs_service_cleanup() uses ip_vs_flush and mutex lock.
        ip_vs_unlink_service_nolock() added.
 
An netns exit could look like this
IPVS: Enter: __ip_vs_dev_cleanup, net/netfilter/ipvs/ip_vs_core.c
IPVS: stopping master sync thread 1286 ...
IPVS: stopping backup sync thread 1294 ...
IPVS: Leave: __ip_vs_dev_cleanup, net/netfilter/ipvs/ip_vs_core.c
IPVS: Enter: ip_vs_dst_event, net/netfilter/ipvs/ip_vs_ctl.c line
IPVS: Leave: ip_vs_dst_event, net/netfilter/ipvs/ip_vs_ctl.c line
...
IPVS: Enter: ip_vs_dst_event, net/netfilter/ipvs/ip_vs_ctl.c line
IPVS: Leave: ip_vs_dst_event, net/netfilter/ipvs/ip_vs_ctl.c line
IPVS: Enter: ip_vs_service_net_cleanup, net/netfilter/ipvs/ip_vs_ctl.c
IPVS: __ip_vs_del_service: enter
IPVS: Moving dest 192.168.1.6:0 into trash, dest->refcnt=43450
...
IPVS: Moving dest 192.168.1.3:0 into trash, dest->refcnt=43449
IPVS: __ip_vs_del_service: enter
IPVS: Removing destination 0/[2003:0000:0000:0000:0000:0002:0000:0006]:80
...
IPVS: Removing destination 0/[2003:0000:0000:0000:0000:0002:0000:0003]:80
IPVS: Removing service 0/[2003:0000:0000:0000:0000:0002:0004:0100]:80 usecnt=0
IPVS: Leave: ip_vs_service_net_cleanup, net/netfilter/ipvs/ip_vs_ctl.c
IPVS: Enter: ip_vs_control_net_cleanup, net/netfilter/ipvs/ip_vs_ctl.c
IPVS: Removing service 80/0.0.0.0:0 usecnt=0
IPVS: Leave: ip_vs_control_net_cleanup, net/netfilter/ipvs/ip_vs_ctl.c
IPVS: ipvs netns 8 released

PATCH SET
This patch set is based upon net-next-2.6 (2.6.39-rc2)


SUMMARY

 include/net/ip_vs.h              |   23 ++++--
 net/netfilter/ipvs/ip_vs_app.c   |   23 +-----
 net/netfilter/ipvs/ip_vs_conn.c  |   14 +---
 net/netfilter/ipvs/ip_vs_core.c  |  132 +++++++++++++++++++++--------
 net/netfilter/ipvs/ip_vs_ctl.c   |  178 ++++++++++++++++++++++++++++++--------
 net/netfilter/ipvs/ip_vs_est.c   |   21 +----
 net/netfilter/ipvs/ip_vs_proto.c |   11 +--
 net/netfilter/ipvs/ip_vs_sync.c  |   70 ++++++++--------
 8 files changed, 299 insertions(+), 173 deletions(-)


^ permalink raw reply

* [v3 PATCH 1/6] IPVS: Change of socket usage to enable name space exit.
From: Hans Schillstrom @ 2011-04-29  6:25 UTC (permalink / raw)
  To: ja, horms, ebiederm, lvs-devel, netdev, netfilter-devel
  Cc: hans.schillstrom, Hans Schillstrom
In-Reply-To: <1304058334-17594-1-git-send-email-hans@schillstrom.com>

To work this patch also needs the other patches in this series.

VERSION: 2

DESCRIPTION

If the sync daemons run in a name space while it crashes
or get killed, there is no way to stop them except for a reboot.
When all patches are there, ip_vs_core will handle register_pernet_(),
i.e. ip_vs_sync_init() and ip_vs_sync_cleanup() will be removed.

Kernel threads should not increment the use count of a socket.
By calling sk_change_net() after creating a socket this is avoided.
sock_release cant be used intead sk_release_kernel() should be used.

Thanks Eric W Biederman for your advices.

This patch is based on net-next-2.6  ver 2.6.39-rc2

CHANGES
 Rev 2
   sock_create_kern() used instead of __sock_create()
   return codes of kthread_stop() used.

Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 net/netfilter/ipvs/ip_vs_core.c |    2 +-
 net/netfilter/ipvs/ip_vs_sync.c |   58 +++++++++++++++++++++++++--------------
 2 files changed, 38 insertions(+), 22 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 07accf6..a0791dc 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -1896,7 +1896,7 @@ static int __net_init __ip_vs_init(struct net *net)
 
 static void __net_exit __ip_vs_cleanup(struct net *net)
 {
-	IP_VS_DBG(10, "ipvs netns %d released\n", net_ipvs(net)->gen);
+	IP_VS_DBG(2, "ipvs netns %d released\n", net_ipvs(net)->gen);
 }
 
 static struct pernet_operations ipvs_core_ops = {
diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index 3e7961e..c187823 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -1303,13 +1303,18 @@ static struct socket *make_send_sock(struct net *net)
 	struct socket *sock;
 	int result;
 
-	/* First create a socket */
-	result = __sock_create(net, PF_INET, SOCK_DGRAM, IPPROTO_UDP, &sock, 1);
+	/* First create a socket move it to right name space later */
+	result = sock_create_kern(PF_INET, SOCK_DGRAM, IPPROTO_UDP, &sock);
 	if (result < 0) {
 		pr_err("Error during creation of socket; terminating\n");
 		return ERR_PTR(result);
 	}
-
+	/*
+	 * Kernel sockets that are a part of a namespace, should not
+	 * hold a reference to a namespace in order to allow to stop it.
+	 * After sk_change_net should be released using sk_release_kernel.
+	 */
+	sk_change_net(sock->sk, net);
 	result = set_mcast_if(sock->sk, ipvs->master_mcast_ifn);
 	if (result < 0) {
 		pr_err("Error setting outbound mcast interface\n");
@@ -1334,8 +1339,8 @@ static struct socket *make_send_sock(struct net *net)
 
 	return sock;
 
-  error:
-	sock_release(sock);
+error:
+	sk_release_kernel(sock->sk);
 	return ERR_PTR(result);
 }
 
@@ -1350,12 +1355,17 @@ static struct socket *make_receive_sock(struct net *net)
 	int result;
 
 	/* First create a socket */
-	result = __sock_create(net, PF_INET, SOCK_DGRAM, IPPROTO_UDP, &sock, 1);
+	result = sock_create_kern(PF_INET, SOCK_DGRAM, IPPROTO_UDP, &sock);
 	if (result < 0) {
 		pr_err("Error during creation of socket; terminating\n");
 		return ERR_PTR(result);
 	}
-
+	/*
+	 * Kernel sockets that are a part of a namespace, should not
+	 * hold a reference to a namespace in order to allow to stop it.
+	 * After sk_change_net should be released using sk_release_kernel.
+	 */
+	sk_change_net(sock->sk, net);
 	/* it is equivalent to the REUSEADDR option in user-space */
 	sock->sk->sk_reuse = 1;
 
@@ -1377,8 +1387,8 @@ static struct socket *make_receive_sock(struct net *net)
 
 	return sock;
 
-  error:
-	sock_release(sock);
+error:
+	sk_release_kernel(sock->sk);
 	return ERR_PTR(result);
 }
 
@@ -1473,7 +1483,7 @@ static int sync_thread_master(void *data)
 		ip_vs_sync_buff_release(sb);
 
 	/* release the sending multicast socket */
-	sock_release(tinfo->sock);
+	sk_release_kernel(tinfo->sock->sk);
 	kfree(tinfo);
 
 	return 0;
@@ -1513,7 +1523,7 @@ static int sync_thread_backup(void *data)
 	}
 
 	/* release the sending multicast socket */
-	sock_release(tinfo->sock);
+	sk_release_kernel(tinfo->sock->sk);
 	kfree(tinfo->buf);
 	kfree(tinfo);
 
@@ -1601,7 +1611,7 @@ outtinfo:
 outbuf:
 	kfree(buf);
 outsocket:
-	sock_release(sock);
+	sk_release_kernel(sock->sk);
 out:
 	return result;
 }
@@ -1610,6 +1620,7 @@ out:
 int stop_sync_thread(struct net *net, int state)
 {
 	struct netns_ipvs *ipvs = net_ipvs(net);
+	int retc = -EINVAL;
 
 	IP_VS_DBG(7, "%s(): pid %d\n", __func__, task_pid_nr(current));
 
@@ -1629,7 +1640,7 @@ int stop_sync_thread(struct net *net, int state)
 		spin_lock_bh(&ipvs->sync_lock);
 		ipvs->sync_state &= ~IP_VS_STATE_MASTER;
 		spin_unlock_bh(&ipvs->sync_lock);
-		kthread_stop(ipvs->master_thread);
+		retc = kthread_stop(ipvs->master_thread);
 		ipvs->master_thread = NULL;
 	} else if (state == IP_VS_STATE_BACKUP) {
 		if (!ipvs->backup_thread)
@@ -1639,16 +1650,14 @@ int stop_sync_thread(struct net *net, int state)
 			task_pid_nr(ipvs->backup_thread));
 
 		ipvs->sync_state &= ~IP_VS_STATE_BACKUP;
-		kthread_stop(ipvs->backup_thread);
+		retc = kthread_stop(ipvs->backup_thread);
 		ipvs->backup_thread = NULL;
-	} else {
-		return -EINVAL;
 	}
 
 	/* decrease the module use count */
 	ip_vs_use_count_dec();
 
-	return 0;
+	return retc;
 }
 
 /*
@@ -1670,8 +1679,15 @@ static int __net_init __ip_vs_sync_init(struct net *net)
 
 static void __ip_vs_sync_cleanup(struct net *net)
 {
-	stop_sync_thread(net, IP_VS_STATE_MASTER);
-	stop_sync_thread(net, IP_VS_STATE_BACKUP);
+	int retc;
+
+	retc = stop_sync_thread(net, IP_VS_STATE_MASTER);
+	if (retc && retc != -EINVAL)
+		pr_err("Failed to stop Master Daemon\n");
+
+	retc = stop_sync_thread(net, IP_VS_STATE_BACKUP);
+	if (retc && retc != -EINVAL)
+		pr_err("Failed to stop Backup Daemon\n");
 }
 
 static struct pernet_operations ipvs_sync_ops = {
@@ -1682,10 +1698,10 @@ static struct pernet_operations ipvs_sync_ops = {
 
 int __init ip_vs_sync_init(void)
 {
-	return register_pernet_subsys(&ipvs_sync_ops);
+	return register_pernet_device(&ipvs_sync_ops);
 }
 
 void ip_vs_sync_cleanup(void)
 {
-	unregister_pernet_subsys(&ipvs_sync_ops);
+	unregister_pernet_device(&ipvs_sync_ops);
 }
-- 
1.7.4


^ permalink raw reply related

* [v3 PATCH 6/6] IPVS: add debug functions
From: Hans Schillstrom @ 2011-04-29  6:25 UTC (permalink / raw)
  To: ja, horms, ebiederm, lvs-devel, netdev, netfilter-devel
  Cc: hans.schillstrom, Hans Schillstrom
In-Reply-To: <1304058334-17594-1-git-send-email-hans@schillstrom.com>

Optional patch, but it is nice to have.

Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
---
 net/netfilter/ipvs/ip_vs_ctl.c |   13 ++++++++++++-
 1 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 64df5bf..aaa5dd4 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -351,8 +351,11 @@ static int ip_vs_svc_unhash(struct ip_vs_service *svc)
 	svc->flags &= ~IP_VS_SVC_F_HASHED;
 	atomic_dec(&svc->refcnt);
 	/* No more services, no need for input */
-	if (atomic_read(&svc->refcnt) == 0)
+	if (atomic_read(&svc->refcnt) == 0) {
 		net_ipvs(svc->net)->enable = 0;
+		IP_VS_DBG(2, "Last service removed in net(%d) input disabled\n",
+			     net_ipvs(svc->net)->gen);
+	}
 	return 1;
 }
 
@@ -1222,6 +1225,10 @@ ip_vs_add_service(struct net *net, struct ip_vs_service_user_kern *u,
 	write_unlock_bh(&__ip_vs_svc_lock);
 
 	*svc_p = svc;
+	if (!ipvs->enable)
+		pr_info("netns(%d) enabled first service added\n",
+			 ipvs->gen);
+
 	/* Now there is a service - full throttle */
 	ipvs->enable = 1;
 	return 0;
@@ -3693,6 +3700,7 @@ int __net_init ip_vs_control_net_init(struct net *net)
 	int idx;
 	struct netns_ipvs *ipvs = net_ipvs(net);
 
+	EnterFunction(2);
 	ipvs->rs_lock = __RW_LOCK_UNLOCKED(ipvs->rs_lock);
 
 	/* Initialize rs_table */
@@ -3719,6 +3727,7 @@ int __net_init ip_vs_control_net_init(struct net *net)
 	if (ip_vs_control_net_init_sysctl(net))
 		goto err;
 
+	LeaveFunction(2);
 	return 0;
 
 err:
@@ -3730,6 +3739,7 @@ void __net_exit ip_vs_control_net_cleanup(struct net *net)
 {
 	struct netns_ipvs *ipvs = net_ipvs(net);
 
+	EnterFunction(2);
 	ip_vs_trash_cleanup(net);
 	ip_vs_stop_estimator(net, &ipvs->tot_stats);
 	ip_vs_control_net_cleanup_sysctl(net);
@@ -3737,6 +3747,7 @@ void __net_exit ip_vs_control_net_cleanup(struct net *net)
 	proc_net_remove(net, "ip_vs_stats");
 	proc_net_remove(net, "ip_vs");
 	free_percpu(ipvs->tot_stats.cpustats);
+	LeaveFunction(2);
 }
 
 int __init ip_vs_control_init(void)
-- 
1.7.4


^ permalink raw reply related

* [v3 PATCH 4/6] IPVS: rename of netns init and cleanup functions.
From: Hans Schillstrom @ 2011-04-29  6:25 UTC (permalink / raw)
  To: ja, horms, ebiederm, lvs-devel, netdev, netfilter-devel
  Cc: hans.schillstrom, Hans Schillstrom
In-Reply-To: <1304058334-17594-1-git-send-email-hans@schillstrom.com>

Make it more clear what the functions does,
on request by Julian.

Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
---
 include/net/ip_vs.h              |   26 +++++++++++++-------------
 net/netfilter/ipvs/ip_vs_app.c   |    4 ++--
 net/netfilter/ipvs/ip_vs_conn.c  |    4 ++--
 net/netfilter/ipvs/ip_vs_core.c  |   36 ++++++++++++++++++------------------
 net/netfilter/ipvs/ip_vs_ctl.c   |   20 ++++++++++----------
 net/netfilter/ipvs/ip_vs_est.c   |    4 ++--
 net/netfilter/ipvs/ip_vs_proto.c |    4 ++--
 net/netfilter/ipvs/ip_vs_sync.c  |    4 ++--
 8 files changed, 51 insertions(+), 51 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 86aefed..da2aea9 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -1093,19 +1093,19 @@ ip_vs_control_add(struct ip_vs_conn *cp, struct ip_vs_conn *ctl_cp)
 /*
  * IPVS netns init & cleanup functions
  */
-extern int __ip_vs_estimator_init(struct net *net);
-extern int __ip_vs_control_init(struct net *net);
-extern int __ip_vs_protocol_init(struct net *net);
-extern int __ip_vs_app_init(struct net *net);
-extern int __ip_vs_conn_init(struct net *net);
-extern int __ip_vs_sync_init(struct net *net);
-extern void __ip_vs_conn_cleanup(struct net *net);
-extern void __ip_vs_app_cleanup(struct net *net);
-extern void __ip_vs_protocol_cleanup(struct net *net);
-extern void __ip_vs_control_cleanup(struct net *net);
-extern void __ip_vs_estimator_cleanup(struct net *net);
-extern void __ip_vs_sync_cleanup(struct net *net);
-extern void __ip_vs_service_cleanup(struct net *net);
+extern int ip_vs_estimator_net_init(struct net *net);
+extern int ip_vs_control_net_init(struct net *net);
+extern int ip_vs_protocol_net_init(struct net *net);
+extern int ip_vs_app_net_init(struct net *net);
+extern int ip_vs_conn_net_init(struct net *net);
+extern int ip_vs_sync_net_init(struct net *net);
+extern void ip_vs_conn_net_cleanup(struct net *net);
+extern void ip_vs_app_net_cleanup(struct net *net);
+extern void ip_vs_protocol_net_cleanup(struct net *net);
+extern void ip_vs_control_net_cleanup(struct net *net);
+extern void ip_vs_estimator_net_cleanup(struct net *net);
+extern void ip_vs_sync_net_cleanup(struct net *net);
+extern void ip_vs_service_net_cleanup(struct net *net);
 
 /*
  *      IPVS application functions
diff --git a/net/netfilter/ipvs/ip_vs_app.c b/net/netfilter/ipvs/ip_vs_app.c
index 51f3af7..81b299d 100644
--- a/net/netfilter/ipvs/ip_vs_app.c
+++ b/net/netfilter/ipvs/ip_vs_app.c
@@ -576,7 +576,7 @@ static const struct file_operations ip_vs_app_fops = {
 };
 #endif
 
-int __net_init __ip_vs_app_init(struct net *net)
+int __net_init ip_vs_app_net_init(struct net *net)
 {
 	struct netns_ipvs *ipvs = net_ipvs(net);
 
@@ -585,7 +585,7 @@ int __net_init __ip_vs_app_init(struct net *net)
 	return 0;
 }
 
-void __net_exit __ip_vs_app_cleanup(struct net *net)
+void __net_exit ip_vs_app_net_cleanup(struct net *net)
 {
 	proc_net_remove(net, "ip_vs_app");
 }
diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index d3fd91b..b1e7a0c 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -1247,7 +1247,7 @@ flush_again:
 /*
  * per netns init and exit
  */
-int __net_init __ip_vs_conn_init(struct net *net)
+int __net_init ip_vs_conn_net_init(struct net *net)
 {
 	struct netns_ipvs *ipvs = net_ipvs(net);
 
@@ -1258,7 +1258,7 @@ int __net_init __ip_vs_conn_init(struct net *net)
 	return 0;
 }
 
-void __net_exit __ip_vs_conn_cleanup(struct net *net)
+void __net_exit ip_vs_conn_net_cleanup(struct net *net)
 {
 	/* flush all the connection entries first */
 	ip_vs_conn_flush(net);
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index b9debc9..4582716 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -1911,22 +1911,22 @@ static int __net_init __ip_vs_init(struct net *net)
 	atomic_inc(&ipvs_netns_cnt);
 	net->ipvs = ipvs;
 
-	if (__ip_vs_estimator_init(net) < 0)
+	if (ip_vs_estimator_net_init(net) < 0)
 		goto estimator_fail;
 
-	if (__ip_vs_control_init(net) < 0)
+	if (ip_vs_control_net_init(net) < 0)
 		goto control_fail;
 
-	if (__ip_vs_protocol_init(net) < 0)
+	if (ip_vs_protocol_net_init(net) < 0)
 		goto protocol_fail;
 
-	if (__ip_vs_app_init(net) < 0)
+	if (ip_vs_app_net_init(net) < 0)
 		goto app_fail;
 
-	if (__ip_vs_conn_init(net) < 0)
+	if (ip_vs_conn_net_init(net) < 0)
 		goto conn_fail;
 
-	if (__ip_vs_sync_init(net) < 0)
+	if (ip_vs_sync_net_init(net) < 0)
 		goto sync_fail;
 
 	printk(KERN_INFO "IPVS: Creating netns size=%zu id=%d\n",
@@ -1937,27 +1937,27 @@ static int __net_init __ip_vs_init(struct net *net)
  */
 
 sync_fail:
-	__ip_vs_conn_cleanup(net);
+	ip_vs_conn_net_cleanup(net);
 conn_fail:
-	__ip_vs_app_cleanup(net);
+	ip_vs_app_net_cleanup(net);
 app_fail:
-	__ip_vs_protocol_cleanup(net);
+	ip_vs_protocol_net_cleanup(net);
 protocol_fail:
-	__ip_vs_control_cleanup(net);
+	ip_vs_control_net_cleanup(net);
 control_fail:
-	__ip_vs_estimator_cleanup(net);
+	ip_vs_estimator_net_cleanup(net);
 estimator_fail:
 	return -ENOMEM;
 }
 
 static void __net_exit __ip_vs_cleanup(struct net *net)
 {
-	__ip_vs_service_cleanup(net);	/* ip_vs_flush() with locks */
-	__ip_vs_conn_cleanup(net);
-	__ip_vs_app_cleanup(net);
-	__ip_vs_protocol_cleanup(net);
-	__ip_vs_control_cleanup(net);
-	__ip_vs_estimator_cleanup(net);
+	ip_vs_service_net_cleanup(net);	/* ip_vs_flush() with locks */
+	ip_vs_conn_net_cleanup(net);
+	ip_vs_app_net_cleanup(net);
+	ip_vs_protocol_net_cleanup(net);
+	ip_vs_control_net_cleanup(net);
+	ip_vs_estimator_net_cleanup(net);
 	IP_VS_DBG(2, "ipvs netns %d released\n", net_ipvs(net)->gen);
 }
 
@@ -1965,7 +1965,7 @@ static void __net_exit __ip_vs_dev_cleanup(struct net *net)
 {
 	EnterFunction(2);
 	net_ipvs(net)->enable = 0;	/* Disable packet reception */
-	__ip_vs_sync_cleanup(net);
+	ip_vs_sync_net_cleanup(net);
 	LeaveFunction(2);
 }
 
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index e63b395..64df5bf 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -1494,7 +1494,7 @@ static int ip_vs_flush(struct net *net)
  *	Delete service by {netns} in the service table.
  *	Called by __ip_vs_cleanup()
  */
-void __ip_vs_service_cleanup(struct net *net)
+void ip_vs_service_net_cleanup(struct net *net)
 {
 	EnterFunction(2);
 	/* Check for "full" addressed entries */
@@ -1673,7 +1673,7 @@ proc_do_sync_mode(ctl_table *table, int write,
 /*
  *	IPVS sysctl table (under the /proc/sys/net/ipv4/vs/)
  *	Do not change order or insert new entries without
- *	align with netns init in __ip_vs_control_init()
+ *	align with netns init in ip_vs_control_net_init()
  */
 
 static struct ctl_table vs_vars[] = {
@@ -3609,7 +3609,7 @@ static void ip_vs_genl_unregister(void)
  * per netns intit/exit func.
  */
 #ifdef CONFIG_SYSCTL
-int __net_init __ip_vs_control_init_sysctl(struct net *net)
+int __net_init ip_vs_control_net_init_sysctl(struct net *net)
 {
 	int idx;
 	struct netns_ipvs *ipvs = net_ipvs(net);
@@ -3668,7 +3668,7 @@ int __net_init __ip_vs_control_init_sysctl(struct net *net)
 	return 0;
 }
 
-void __net_init __ip_vs_control_cleanup_sysctl(struct net *net)
+void __net_init ip_vs_control_net_cleanup_sysctl(struct net *net)
 {
 	struct netns_ipvs *ipvs = net_ipvs(net);
 
@@ -3679,8 +3679,8 @@ void __net_init __ip_vs_control_cleanup_sysctl(struct net *net)
 
 #else
 
-int __net_init __ip_vs_control_init_sysctl(struct net *net) { return 0; }
-void __net_init __ip_vs_control_cleanup_sysctl(struct net *net) { }
+int __net_init ip_vs_control_net_init_sysctl(struct net *net) { return 0; }
+void __net_init ip_vs_control_net_cleanup_sysctl(struct net *net) { }
 
 #endif
 
@@ -3688,7 +3688,7 @@ static struct notifier_block ip_vs_dst_notifier = {
 	.notifier_call = ip_vs_dst_event,
 };
 
-int __net_init __ip_vs_control_init(struct net *net)
+int __net_init ip_vs_control_net_init(struct net *net)
 {
 	int idx;
 	struct netns_ipvs *ipvs = net_ipvs(net);
@@ -3716,7 +3716,7 @@ int __net_init __ip_vs_control_init(struct net *net)
 	proc_net_fops_create(net, "ip_vs_stats_percpu", 0,
 			     &ip_vs_stats_percpu_fops);
 
-	if (__ip_vs_control_init_sysctl(net))
+	if (ip_vs_control_net_init_sysctl(net))
 		goto err;
 
 	return 0;
@@ -3726,13 +3726,13 @@ err:
 	return -ENOMEM;
 }
 
-void __net_exit __ip_vs_control_cleanup(struct net *net)
+void __net_exit ip_vs_control_net_cleanup(struct net *net)
 {
 	struct netns_ipvs *ipvs = net_ipvs(net);
 
 	ip_vs_trash_cleanup(net);
 	ip_vs_stop_estimator(net, &ipvs->tot_stats);
-	__ip_vs_control_cleanup_sysctl(net);
+	ip_vs_control_net_cleanup_sysctl(net);
 	proc_net_remove(net, "ip_vs_stats_percpu");
 	proc_net_remove(net, "ip_vs_stats");
 	proc_net_remove(net, "ip_vs");
diff --git a/net/netfilter/ipvs/ip_vs_est.c b/net/netfilter/ipvs/ip_vs_est.c
index 508cce9..f5d2a01 100644
--- a/net/netfilter/ipvs/ip_vs_est.c
+++ b/net/netfilter/ipvs/ip_vs_est.c
@@ -192,7 +192,7 @@ void ip_vs_read_estimator(struct ip_vs_stats_user *dst,
 	dst->outbps = (e->outbps + 0xF) >> 5;
 }
 
-int __net_init __ip_vs_estimator_init(struct net *net)
+int __net_init ip_vs_estimator_net_init(struct net *net)
 {
 	struct netns_ipvs *ipvs = net_ipvs(net);
 
@@ -203,7 +203,7 @@ int __net_init __ip_vs_estimator_init(struct net *net)
 	return 0;
 }
 
-void __net_exit __ip_vs_estimator_cleanup(struct net *net)
+void __net_exit ip_vs_estimator_net_cleanup(struct net *net)
 {
 	del_timer_sync(&net_ipvs(net)->est_timer);
 }
diff --git a/net/netfilter/ipvs/ip_vs_proto.c b/net/netfilter/ipvs/ip_vs_proto.c
index eb86028..52d073c 100644
--- a/net/netfilter/ipvs/ip_vs_proto.c
+++ b/net/netfilter/ipvs/ip_vs_proto.c
@@ -316,7 +316,7 @@ ip_vs_tcpudp_debug_packet(int af, struct ip_vs_protocol *pp,
 /*
  * per network name-space init
  */
-int __net_init __ip_vs_protocol_init(struct net *net)
+int __net_init ip_vs_protocol_net_init(struct net *net)
 {
 #ifdef CONFIG_IP_VS_PROTO_TCP
 	register_ip_vs_proto_netns(net, &ip_vs_protocol_tcp);
@@ -336,7 +336,7 @@ int __net_init __ip_vs_protocol_init(struct net *net)
 	return 0;
 }
 
-void __net_exit __ip_vs_protocol_cleanup(struct net *net)
+void __net_exit ip_vs_protocol_net_cleanup(struct net *net)
 {
 	struct netns_ipvs *ipvs = net_ipvs(net);
 	struct ip_vs_proto_data *pd;
diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index f31ef18..1036b34 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -1663,7 +1663,7 @@ int stop_sync_thread(struct net *net, int state)
 /*
  * Initialize data struct for each netns
  */
-int __net_init __ip_vs_sync_init(struct net *net)
+int __net_init ip_vs_sync_net_init(struct net *net)
 {
 	struct netns_ipvs *ipvs = net_ipvs(net);
 
@@ -1677,7 +1677,7 @@ int __net_init __ip_vs_sync_init(struct net *net)
 	return 0;
 }
 
-void __ip_vs_sync_cleanup(struct net *net)
+void ip_vs_sync_net_cleanup(struct net *net)
 {
 	int retc;
 
-- 
1.7.4


^ permalink raw reply related

* [v3 PATCH 2/6] IPVS: labels at pos 0
From: Hans Schillstrom @ 2011-04-29  6:25 UTC (permalink / raw)
  To: ja, horms, ebiederm, lvs-devel, netdev, netfilter-devel
  Cc: hans.schillstrom, Hans Schillstrom
In-Reply-To: <1304058334-17594-1-git-send-email-hans@schillstrom.com>

Put goto labels at the beginig of row
acording to coding style example.

Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
---
 net/netfilter/ipvs/ip_vs_core.c |   10 +++++-----
 net/netfilter/ipvs/ip_vs_ctl.c  |    8 ++++----
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index a0791dc..d536b51 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -1388,7 +1388,7 @@ ip_vs_in_icmp(struct sk_buff *skb, int *related, unsigned int hooknum)
 		verdict = NF_DROP;
 	}
 
-  out:
+out:
 	__ip_vs_conn_put(cp);
 
 	return verdict;
@@ -1955,14 +1955,14 @@ static int __init ip_vs_init(void)
 
 cleanup_sync:
 	ip_vs_sync_cleanup();
-  cleanup_conn:
+cleanup_conn:
 	ip_vs_conn_cleanup();
-  cleanup_app:
+cleanup_app:
 	ip_vs_app_cleanup();
-  cleanup_protocol:
+cleanup_protocol:
 	ip_vs_protocol_cleanup();
 	ip_vs_control_cleanup();
-  cleanup_estimator:
+cleanup_estimator:
 	ip_vs_estimator_cleanup();
 	unregister_pernet_subsys(&ipvs_core_ops);	/* free ip_vs struct */
 	return ret;
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index ae47090..a31a70c 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -1327,9 +1327,9 @@ ip_vs_edit_service(struct ip_vs_service *svc, struct ip_vs_service_user_kern *u)
 		ip_vs_bind_pe(svc, pe);
 	}
 
-  out_unlock:
+out_unlock:
 	write_unlock_bh(&__ip_vs_svc_lock);
-  out:
+out:
 	ip_vs_scheduler_put(old_sched);
 	ip_vs_pe_put(old_pe);
 	return ret;
@@ -2387,7 +2387,7 @@ __ip_vs_get_service_entries(struct net *net,
 			count++;
 		}
 	}
-  out:
+out:
 	return ret;
 }
 
@@ -2625,7 +2625,7 @@ do_ip_vs_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 		ret = -EINVAL;
 	}
 
-  out:
+out:
 	mutex_unlock(&__ip_vs_mutex);
 	return ret;
 }
-- 
1.7.4


^ permalink raw reply related

* [v3 PATCH 5/6] IPVS: remove unused init and cleanup functions.
From: Hans Schillstrom @ 2011-04-29  6:25 UTC (permalink / raw)
  To: ja, horms, ebiederm, lvs-devel, netdev, netfilter-devel
  Cc: hans.schillstrom, Hans Schillstrom
In-Reply-To: <1304058334-17594-1-git-send-email-hans@schillstrom.com>

After restructuring, there is some unused or empty functions
left to be removed.

Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
---
 include/net/ip_vs.h             |    6 ------
 net/netfilter/ipvs/ip_vs_app.c  |   10 ----------
 net/netfilter/ipvs/ip_vs_core.c |   29 ++++-------------------------
 net/netfilter/ipvs/ip_vs_est.c  |    9 ---------
 net/netfilter/ipvs/ip_vs_sync.c |    9 ---------
 5 files changed, 4 insertions(+), 59 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index da2aea9..10417ab 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -1123,8 +1123,6 @@ extern void ip_vs_app_inc_put(struct ip_vs_app *inc);
 
 extern int ip_vs_app_pkt_out(struct ip_vs_conn *, struct sk_buff *skb);
 extern int ip_vs_app_pkt_in(struct ip_vs_conn *, struct sk_buff *skb);
-extern int ip_vs_app_init(void);
-extern void ip_vs_app_cleanup(void);
 
 void ip_vs_bind_pe(struct ip_vs_service *svc, struct ip_vs_pe *pe);
 void ip_vs_unbind_pe(struct ip_vs_service *svc);
@@ -1227,15 +1225,11 @@ extern int start_sync_thread(struct net *net, int state, char *mcast_ifn,
 			     __u8 syncid);
 extern int stop_sync_thread(struct net *net, int state);
 extern void ip_vs_sync_conn(struct net *net, struct ip_vs_conn *cp);
-extern int ip_vs_sync_init(void);
-extern void ip_vs_sync_cleanup(void);
 
 
 /*
  *      IPVS rate estimator prototypes (from ip_vs_est.c)
  */
-extern int ip_vs_estimator_init(void);
-extern void ip_vs_estimator_cleanup(void);
 extern void ip_vs_start_estimator(struct net *net, struct ip_vs_stats *stats);
 extern void ip_vs_stop_estimator(struct net *net, struct ip_vs_stats *stats);
 extern void ip_vs_zero_estimator(struct ip_vs_stats *stats);
diff --git a/net/netfilter/ipvs/ip_vs_app.c b/net/netfilter/ipvs/ip_vs_app.c
index 81b299d..5f34cf4 100644
--- a/net/netfilter/ipvs/ip_vs_app.c
+++ b/net/netfilter/ipvs/ip_vs_app.c
@@ -589,13 +589,3 @@ void __net_exit ip_vs_app_net_cleanup(struct net *net)
 {
 	proc_net_remove(net, "ip_vs_app");
 }
-
-int __init ip_vs_app_init(void)
-{
-	return 0;
-}
-
-
-void ip_vs_app_cleanup(void)
-{
-}
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 4582716..de92ca2 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -1987,36 +1987,23 @@ static int __init ip_vs_init(void)
 {
 	int ret;
 
-	ip_vs_estimator_init();
 	ret = ip_vs_control_init();
 	if (ret < 0) {
 		pr_err("can't setup control.\n");
-		goto cleanup_estimator;
+		goto exit;
 	}
 
 	ip_vs_protocol_init();
 
-	ret = ip_vs_app_init();
-	if (ret < 0) {
-		pr_err("can't setup application helper.\n");
-		goto cleanup_protocol;
-	}
-
 	ret = ip_vs_conn_init();
 	if (ret < 0) {
 		pr_err("can't setup connection table.\n");
-		goto cleanup_app;
-	}
-
-	ret = ip_vs_sync_init();
-	if (ret < 0) {
-		pr_err("can't setup sync data.\n");
-		goto cleanup_conn;
+		goto cleanup_protocol;
 	}
 
 	ret = register_pernet_subsys(&ipvs_core_ops);	/* Alloc ip_vs struct */
 	if (ret < 0)
-		goto cleanup_sync;
+		goto cleanup_conn;
 
 	ret = register_pernet_device(&ipvs_core_dev_ops);
 	if (ret < 0)
@@ -2036,17 +2023,12 @@ cleanup_dev:
 	unregister_pernet_device(&ipvs_core_dev_ops);
 cleanup_sub:
 	unregister_pernet_subsys(&ipvs_core_ops);
-cleanup_sync:
-	ip_vs_sync_cleanup();
 cleanup_conn:
 	ip_vs_conn_cleanup();
-cleanup_app:
-	ip_vs_app_cleanup();
 cleanup_protocol:
 	ip_vs_protocol_cleanup();
 	ip_vs_control_cleanup();
-cleanup_estimator:
-	ip_vs_estimator_cleanup();
+exit:
 	return ret;
 }
 
@@ -2055,12 +2037,9 @@ static void __exit ip_vs_cleanup(void)
 	nf_unregister_hooks(ip_vs_ops, ARRAY_SIZE(ip_vs_ops));
 	unregister_pernet_device(&ipvs_core_dev_ops);
 	unregister_pernet_subsys(&ipvs_core_ops);	/* free ip_vs struct */
-	ip_vs_sync_cleanup();
 	ip_vs_conn_cleanup();
-	ip_vs_app_cleanup();
 	ip_vs_protocol_cleanup();
 	ip_vs_control_cleanup();
-	ip_vs_estimator_cleanup();
 	pr_info("ipvs unloaded.\n");
 }
 
diff --git a/net/netfilter/ipvs/ip_vs_est.c b/net/netfilter/ipvs/ip_vs_est.c
index f5d2a01..0fac601 100644
--- a/net/netfilter/ipvs/ip_vs_est.c
+++ b/net/netfilter/ipvs/ip_vs_est.c
@@ -207,12 +207,3 @@ void __net_exit ip_vs_estimator_net_cleanup(struct net *net)
 {
 	del_timer_sync(&net_ipvs(net)->est_timer);
 }
-
-int __init ip_vs_estimator_init(void)
-{
-	return 0;
-}
-
-void ip_vs_estimator_cleanup(void)
-{
-}
diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index 1036b34..ef09afb 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -1689,12 +1689,3 @@ void ip_vs_sync_net_cleanup(struct net *net)
 	if (retc && retc != -EINVAL)
 		pr_err("Failed to stop Backup Daemon\n");
 }
-
-int __init ip_vs_sync_init(void)
-{
-	return 0;
-}

^ permalink raw reply related

* [v3 PATCH 3/6] IPVS: init and cleanup restructuring.
From: Hans Schillstrom @ 2011-04-29  6:25 UTC (permalink / raw)
  To: ja, horms, ebiederm, lvs-devel, netdev, netfilter-devel
  Cc: hans.schillstrom, Hans Schillstrom
In-Reply-To: <1304058334-17594-1-git-send-email-hans@schillstrom.com>

REVISION 3

DESCRIPTION
This patch tries to restore the initial init and cleanup
sequences that was before namspace patch.

The number of calls to register_pernet_device have been
reduced to one for the ip_vs.ko
Schedulers still have their own calls.

This patch adds a function __ip_vs_service_cleanup()
and a throttle or actually on/off switch for
the netfilter hooks.

The nf hooks will be enabled when the first service is loaded
and disabled when the last service is removed or when a
namespace exit starts.

CHANGES
 Rev 2
        receivening av device events, suggested by Julian.
        This change add 3 new functions ip_vs_dst_event()
        ip_vs_svc_reset() and __ip_vs_dev_reset().

 Rev 3
	Throttle renamed to enable.
	Comments from Julian implemented
	Check enable in ip_vs_in, ip_vs_out and ip_vs_forward_icmp*
        Remove in ip_vs_in_icmp*.
	ip_vs_svc_reset() moved into ip_vs_dst_event().
	ip_vs_service_cleanup() uses ip_vs_flush and mutex lock.
	ip_vs_unlink_service_nolock() added.

Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
---
 include/net/ip_vs.h              |   17 +++++
 net/netfilter/ipvs/ip_vs_app.c   |   15 +---
 net/netfilter/ipvs/ip_vs_conn.c  |   12 +---
 net/netfilter/ipvs/ip_vs_core.c  |  101 ++++++++++++++++++++++++---
 net/netfilter/ipvs/ip_vs_ctl.c   |  143 +++++++++++++++++++++++++++++++-------
 net/netfilter/ipvs/ip_vs_est.c   |   14 +---
 net/netfilter/ipvs/ip_vs_proto.c |   11 +---
 net/netfilter/ipvs/ip_vs_sync.c  |   13 +---
 8 files changed, 240 insertions(+), 86 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index d516f00..86aefed 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -791,6 +791,7 @@ struct ip_vs_app {
 /* IPVS in network namespace */
 struct netns_ipvs {
 	int			gen;		/* Generation */
+	int			enable;		/* enable like nf_hooks do */
 	/*
 	 *	Hash table: for real service lookups
 	 */
@@ -1089,6 +1090,22 @@ ip_vs_control_add(struct ip_vs_conn *cp, struct ip_vs_conn *ctl_cp)
 	atomic_inc(&ctl_cp->n_control);
 }
 
+/*
+ * IPVS netns init & cleanup functions
+ */
+extern int __ip_vs_estimator_init(struct net *net);
+extern int __ip_vs_control_init(struct net *net);
+extern int __ip_vs_protocol_init(struct net *net);
+extern int __ip_vs_app_init(struct net *net);
+extern int __ip_vs_conn_init(struct net *net);
+extern int __ip_vs_sync_init(struct net *net);
+extern void __ip_vs_conn_cleanup(struct net *net);
+extern void __ip_vs_app_cleanup(struct net *net);
+extern void __ip_vs_protocol_cleanup(struct net *net);
+extern void __ip_vs_control_cleanup(struct net *net);
+extern void __ip_vs_estimator_cleanup(struct net *net);
+extern void __ip_vs_sync_cleanup(struct net *net);
+extern void __ip_vs_service_cleanup(struct net *net);
 
 /*
  *      IPVS application functions
diff --git a/net/netfilter/ipvs/ip_vs_app.c b/net/netfilter/ipvs/ip_vs_app.c
index 2dc6de1..51f3af7 100644
--- a/net/netfilter/ipvs/ip_vs_app.c
+++ b/net/netfilter/ipvs/ip_vs_app.c
@@ -576,7 +576,7 @@ static const struct file_operations ip_vs_app_fops = {
 };
 #endif
 
-static int __net_init __ip_vs_app_init(struct net *net)
+int __net_init __ip_vs_app_init(struct net *net)
 {
 	struct netns_ipvs *ipvs = net_ipvs(net);
 
@@ -585,26 +585,17 @@ static int __net_init __ip_vs_app_init(struct net *net)
 	return 0;
 }
 
-static void __net_exit __ip_vs_app_cleanup(struct net *net)
+void __net_exit __ip_vs_app_cleanup(struct net *net)
 {
 	proc_net_remove(net, "ip_vs_app");
 }
 
-static struct pernet_operations ip_vs_app_ops = {
-	.init = __ip_vs_app_init,
-	.exit = __ip_vs_app_cleanup,
-};
-
 int __init ip_vs_app_init(void)
 {
-	int rv;
-
-	rv = register_pernet_subsys(&ip_vs_app_ops);
-	return rv;
+	return 0;
 }
 
 
 void ip_vs_app_cleanup(void)
 {
-	unregister_pernet_subsys(&ip_vs_app_ops);
 }
diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index c97bd45..d3fd91b 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -1258,22 +1258,17 @@ int __net_init __ip_vs_conn_init(struct net *net)
 	return 0;
 }
 
-static void __net_exit __ip_vs_conn_cleanup(struct net *net)
+void __net_exit __ip_vs_conn_cleanup(struct net *net)
 {
 	/* flush all the connection entries first */
 	ip_vs_conn_flush(net);
 	proc_net_remove(net, "ip_vs_conn");
 	proc_net_remove(net, "ip_vs_conn_sync");
 }
-static struct pernet_operations ipvs_conn_ops = {
-	.init = __ip_vs_conn_init,
-	.exit = __ip_vs_conn_cleanup,
-};
 
 int __init ip_vs_conn_init(void)
 {
 	int idx;
-	int retc;
 
 	/* Compute size and mask */
 	ip_vs_conn_tab_size = 1 << ip_vs_conn_tab_bits;
@@ -1309,17 +1304,14 @@ int __init ip_vs_conn_init(void)
 		rwlock_init(&__ip_vs_conntbl_lock_array[idx].l);
 	}
 
-	retc = register_pernet_subsys(&ipvs_conn_ops);
-
 	/* calculate the random value for connection hash */
 	get_random_bytes(&ip_vs_conn_rnd, sizeof(ip_vs_conn_rnd));
 
-	return retc;
+	return 0;
 }
 
 void ip_vs_conn_cleanup(void)
 {
-	unregister_pernet_subsys(&ipvs_conn_ops);
 	/* Release the empty cache */
 	kmem_cache_destroy(ip_vs_conn_cachep);
 	vfree(ip_vs_conn_tab);
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index d536b51..b9debc9 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -1113,6 +1113,9 @@ ip_vs_out(unsigned int hooknum, struct sk_buff *skb, int af)
 		return NF_ACCEPT;
 
 	net = skb_net(skb);
+	if (!net_ipvs(net)->enable)
+		return NF_ACCEPT;
+
 	ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
 #ifdef CONFIG_IP_VS_IPV6
 	if (af == AF_INET6) {
@@ -1343,6 +1346,7 @@ ip_vs_in_icmp(struct sk_buff *skb, int *related, unsigned int hooknum)
 		return NF_ACCEPT; /* The packet looks wrong, ignore */
 
 	net = skb_net(skb);
+
 	pd = ip_vs_proto_data_get(net, cih->protocol);
 	if (!pd)
 		return NF_ACCEPT;
@@ -1529,6 +1533,11 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
 			      IP_VS_DBG_ADDR(af, &iph.daddr), hooknum);
 		return NF_ACCEPT;
 	}
+	/* ipvs enabled in this netns ? */
+	net = skb_net(skb);
+	if (!net_ipvs(net)->enable)
+		return NF_ACCEPT;
+
 	ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
 
 	/* Bad... Do not break raw sockets */
@@ -1562,7 +1571,6 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
 			ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
 		}
 
-	net = skb_net(skb);
 	/* Protocol supported? */
 	pd = ip_vs_proto_data_get(net, iph.protocol);
 	if (unlikely(!pd))
@@ -1588,7 +1596,6 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
 	}
 
 	IP_VS_DBG_PKT(11, af, pp, skb, 0, "Incoming packet");
-	net = skb_net(skb);
 	ipvs = net_ipvs(net);
 	/* Check the server status */
 	if (cp->dest && !(cp->dest->flags & IP_VS_DEST_F_AVAILABLE)) {
@@ -1743,10 +1750,16 @@ ip_vs_forward_icmp(unsigned int hooknum, struct sk_buff *skb,
 		   int (*okfn)(struct sk_buff *))
 {
 	int r;
+	struct net *net;
 
 	if (ip_hdr(skb)->protocol != IPPROTO_ICMP)
 		return NF_ACCEPT;
 
+	/* ipvs enabled in this netns ? */
+	net = skb_net(skb);
+	if (!net_ipvs(net)->enable)
+		return NF_ACCEPT;
+
 	return ip_vs_in_icmp(skb, &r, hooknum);
 }
 
@@ -1757,10 +1770,16 @@ ip_vs_forward_icmp_v6(unsigned int hooknum, struct sk_buff *skb,
 		      int (*okfn)(struct sk_buff *))
 {
 	int r;
+	struct net *net;
 
 	if (ipv6_hdr(skb)->nexthdr != IPPROTO_ICMPV6)
 		return NF_ACCEPT;
 
+	/* ipvs enabled in this netns ? */
+	net = skb_net(skb);
+	if (!net_ipvs(net)->enable)
+		return NF_ACCEPT;
+
 	return ip_vs_in_icmp_v6(skb, &r, hooknum);
 }
 #endif
@@ -1884,21 +1903,72 @@ static int __net_init __ip_vs_init(struct net *net)
 		pr_err("%s(): no memory.\n", __func__);
 		return -ENOMEM;
 	}
+	/* Hold the beast until a service is registerd */
+	ipvs->enable = 0;
 	ipvs->net = net;
 	/* Counters used for creating unique names */
 	ipvs->gen = atomic_read(&ipvs_netns_cnt);
 	atomic_inc(&ipvs_netns_cnt);
 	net->ipvs = ipvs;
+
+	if (__ip_vs_estimator_init(net) < 0)
+		goto estimator_fail;
+
+	if (__ip_vs_control_init(net) < 0)
+		goto control_fail;
+
+	if (__ip_vs_protocol_init(net) < 0)
+		goto protocol_fail;
+
+	if (__ip_vs_app_init(net) < 0)
+		goto app_fail;
+
+	if (__ip_vs_conn_init(net) < 0)
+		goto conn_fail;
+
+	if (__ip_vs_sync_init(net) < 0)
+		goto sync_fail;
+
 	printk(KERN_INFO "IPVS: Creating netns size=%zu id=%d\n",
 			 sizeof(struct netns_ipvs), ipvs->gen);
 	return 0;
+/*
+ * Error handling
+ */
+
+sync_fail:
+	__ip_vs_conn_cleanup(net);
+conn_fail:
+	__ip_vs_app_cleanup(net);
+app_fail:
+	__ip_vs_protocol_cleanup(net);
+protocol_fail:
+	__ip_vs_control_cleanup(net);
+control_fail:
+	__ip_vs_estimator_cleanup(net);
+estimator_fail:
+	return -ENOMEM;
 }
 
 static void __net_exit __ip_vs_cleanup(struct net *net)
 {
+	__ip_vs_service_cleanup(net);	/* ip_vs_flush() with locks */
+	__ip_vs_conn_cleanup(net);
+	__ip_vs_app_cleanup(net);
+	__ip_vs_protocol_cleanup(net);
+	__ip_vs_control_cleanup(net);
+	__ip_vs_estimator_cleanup(net);
 	IP_VS_DBG(2, "ipvs netns %d released\n", net_ipvs(net)->gen);
 }
 
+static void __net_exit __ip_vs_dev_cleanup(struct net *net)
+{
+	EnterFunction(2);
+	net_ipvs(net)->enable = 0;	/* Disable packet reception */
+	__ip_vs_sync_cleanup(net);
+	LeaveFunction(2);
+}
+
 static struct pernet_operations ipvs_core_ops = {
 	.init = __ip_vs_init,
 	.exit = __ip_vs_cleanup,
@@ -1906,6 +1976,10 @@ static struct pernet_operations ipvs_core_ops = {
 	.size = sizeof(struct netns_ipvs),
 };
 
+static struct pernet_operations ipvs_core_dev_ops = {
+	.exit = __ip_vs_dev_cleanup,
+};
+
 /*
  *	Initialize IP Virtual Server
  */
@@ -1913,10 +1987,6 @@ static int __init ip_vs_init(void)
 {
 	int ret;
 
-	ret = register_pernet_subsys(&ipvs_core_ops);	/* Alloc ip_vs struct */
-	if (ret < 0)
-		return ret;
-
 	ip_vs_estimator_init();
 	ret = ip_vs_control_init();
 	if (ret < 0) {
@@ -1944,15 +2014,28 @@ static int __init ip_vs_init(void)
 		goto cleanup_conn;
 	}
 
+	ret = register_pernet_subsys(&ipvs_core_ops);	/* Alloc ip_vs struct */
+	if (ret < 0)
+		goto cleanup_sync;
+
+	ret = register_pernet_device(&ipvs_core_dev_ops);
+	if (ret < 0)
+		goto cleanup_sub;
+
 	ret = nf_register_hooks(ip_vs_ops, ARRAY_SIZE(ip_vs_ops));
 	if (ret < 0) {
 		pr_err("can't register hooks.\n");
-		goto cleanup_sync;
+		goto cleanup_dev;
 	}
 
 	pr_info("ipvs loaded.\n");
+
 	return ret;
 
+cleanup_dev:
+	unregister_pernet_device(&ipvs_core_dev_ops);
+cleanup_sub:
+	unregister_pernet_subsys(&ipvs_core_ops);
 cleanup_sync:
 	ip_vs_sync_cleanup();
 cleanup_conn:
@@ -1964,20 +2047,20 @@ cleanup_protocol:
 	ip_vs_control_cleanup();
 cleanup_estimator:
 	ip_vs_estimator_cleanup();
-	unregister_pernet_subsys(&ipvs_core_ops);	/* free ip_vs struct */
 	return ret;
 }
 
 static void __exit ip_vs_cleanup(void)
 {
 	nf_unregister_hooks(ip_vs_ops, ARRAY_SIZE(ip_vs_ops));
+	unregister_pernet_device(&ipvs_core_dev_ops);
+	unregister_pernet_subsys(&ipvs_core_ops);	/* free ip_vs struct */
 	ip_vs_sync_cleanup();
 	ip_vs_conn_cleanup();
 	ip_vs_app_cleanup();
 	ip_vs_protocol_cleanup();
 	ip_vs_control_cleanup();
 	ip_vs_estimator_cleanup();
-	unregister_pernet_subsys(&ipvs_core_ops);	/* free ip_vs struct */
 	pr_info("ipvs unloaded.\n");
 }
 
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index a31a70c..e63b395 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -69,6 +69,11 @@ int ip_vs_get_debug_level(void)
 }
 #endif
 
+
+/*  Protos */
+static void __ip_vs_del_service(struct ip_vs_service *svc);
+
+
 #ifdef CONFIG_IP_VS_IPV6
 /* Taken from rt6_fill_node() in net/ipv6/route.c, is there a better way? */
 static int __ip_vs_addr_is_local_v6(struct net *net,
@@ -345,6 +350,9 @@ static int ip_vs_svc_unhash(struct ip_vs_service *svc)
 
 	svc->flags &= ~IP_VS_SVC_F_HASHED;
 	atomic_dec(&svc->refcnt);
+	/* No more services, no need for input */
+	if (atomic_read(&svc->refcnt) == 0)
+		net_ipvs(svc->net)->enable = 0;
 	return 1;
 }
 
@@ -1214,6 +1222,8 @@ ip_vs_add_service(struct net *net, struct ip_vs_service_user_kern *u,
 	write_unlock_bh(&__ip_vs_svc_lock);
 
 	*svc_p = svc;
+	/* Now there is a service - full throttle */
+	ipvs->enable = 1;
 	return 0;
 
 
@@ -1407,22 +1417,30 @@ static void __ip_vs_del_service(struct ip_vs_service *svc)
 /*
  * Unlink a service from list and try to delete it if its refcnt reached 0
  */
-static void ip_vs_unlink_service(struct ip_vs_service *svc)
+static void ip_vs_unlink_service_nolock(struct ip_vs_service *svc)
 {
 	/*
 	 * Unhash it from the service table
 	 */
-	write_lock_bh(&__ip_vs_svc_lock);
-
 	ip_vs_svc_unhash(svc);
-
 	/*
 	 * Wait until all the svc users go away.
 	 */
 	IP_VS_WAIT_WHILE(atomic_read(&svc->usecnt) > 0);
 
 	__ip_vs_del_service(svc);
+}
 
+/*
+ * Unlink a service from list and try to delete it if its refcnt reached 0
+ */
+static void ip_vs_unlink_service(struct ip_vs_service *svc)
+{
+	/*
+	 * Unhash it from the service table
+	 */
+	write_lock_bh(&__ip_vs_svc_lock);
+	ip_vs_unlink_service_nolock(svc);
 	write_unlock_bh(&__ip_vs_svc_lock);
 }
 
@@ -1454,7 +1472,7 @@ static int ip_vs_flush(struct net *net)
 		list_for_each_entry_safe(svc, nxt, &ip_vs_svc_table[idx],
 					 s_list) {
 			if (net_eq(svc->net, net))
-				ip_vs_unlink_service(svc);
+				ip_vs_unlink_service_nolock(svc);
 		}
 	}
 
@@ -1465,13 +1483,91 @@ static int ip_vs_flush(struct net *net)
 		list_for_each_entry_safe(svc, nxt,
 					 &ip_vs_svc_fwm_table[idx], f_list) {
 			if (net_eq(svc->net, net))
-				ip_vs_unlink_service(svc);
+				ip_vs_unlink_service_nolock(svc);
 		}
 	}
 
 	return 0;
 }
 
+/*
+ *	Delete service by {netns} in the service table.
+ *	Called by __ip_vs_cleanup()
+ */
+void __ip_vs_service_cleanup(struct net *net)
+{
+	EnterFunction(2);
+	/* Check for "full" addressed entries */
+	mutex_lock(&__ip_vs_mutex);
+	ip_vs_flush(net);
+	mutex_unlock(&__ip_vs_mutex);
+	LeaveFunction(2);
+}
+/*
+ * Release dst hold by dst_cache
+ */
+static inline void
+__ip_vs_dev_reset(struct ip_vs_dest *dest, struct net_device *dev)
+{
+	spin_lock_bh(&dest->dst_lock);
+	if (dest->dst_cache && dest->dst_cache->dev == dev) {
+		IP_VS_DBG_BUF(3, "Reset dev:%s dest %s:%u ,dest->refcnt=%d\n",
+			      dev->name,
+			      IP_VS_DBG_ADDR(dest->af, &dest->addr),
+			      ntohs(dest->port),
+			      atomic_read(&dest->refcnt));
+		ip_vs_dst_reset(dest);
+	}
+	spin_unlock_bh(&dest->dst_lock);
+
+}
+/*
+ * Netdev event receiver
+ * Currently only NETDEV_UNREGISTER is handled, i.e. if we hold a reference to
+ * a device that is "unregister" it must be released.
+ */
+static int ip_vs_dst_event(struct notifier_block *this, unsigned long event,
+			    void *ptr)
+{
+	struct net_device *dev = ptr;
+	struct net *net = dev_net(dev);
+	struct ip_vs_service *svc;
+	struct ip_vs_dest *dest;
+	unsigned int idx;
+
+	if (event != NETDEV_UNREGISTER)
+		return NOTIFY_DONE;
+	IP_VS_DBG(3, "%s() dev=%s\n", __func__, dev->name);
+	EnterFunction(2);
+	mutex_lock(&__ip_vs_mutex);
+	for (idx = 0; idx < IP_VS_SVC_TAB_SIZE; idx++) {
+		list_for_each_entry(svc, &ip_vs_svc_table[idx], s_list) {
+			if (net_eq(svc->net, net)) {
+				list_for_each_entry(dest, &svc->destinations,
+						    n_list) {
+					__ip_vs_dev_reset(dest, dev);
+				}
+			}
+		}
+
+		list_for_each_entry(svc, &ip_vs_svc_fwm_table[idx], f_list) {
+			if (net_eq(svc->net, net)) {
+				list_for_each_entry(dest, &svc->destinations,
+						    n_list) {
+					__ip_vs_dev_reset(dest, dev);
+				}
+			}
+
+		}
+	}
+
+	list_for_each_entry(dest, &net_ipvs(net)->dest_trash, n_list) {
+		__ip_vs_dev_reset(dest, dev);
+	}
+	mutex_unlock(&__ip_vs_mutex);
+	LeaveFunction(2);
+	return NOTIFY_DONE;
+}
 
 /*
  *	Zero counters in a service or all services
@@ -3588,6 +3684,10 @@ void __net_init __ip_vs_control_cleanup_sysctl(struct net *net) { }
 
 #endif
 
+static struct notifier_block ip_vs_dst_notifier = {
+	.notifier_call = ip_vs_dst_event,
+};
+
 int __net_init __ip_vs_control_init(struct net *net)
 {
 	int idx;
@@ -3626,7 +3726,7 @@ err:
 	return -ENOMEM;
 }
 
-static void __net_exit __ip_vs_control_cleanup(struct net *net)
+void __net_exit __ip_vs_control_cleanup(struct net *net)
 {
 	struct netns_ipvs *ipvs = net_ipvs(net);
 
@@ -3639,11 +3739,6 @@ static void __net_exit __ip_vs_control_cleanup(struct net *net)
 	free_percpu(ipvs->tot_stats.cpustats);
 }
 
-static struct pernet_operations ipvs_control_ops = {
-	.init = __ip_vs_control_init,
-	.exit = __ip_vs_control_cleanup,
-};
-
 int __init ip_vs_control_init(void)
 {
 	int idx;
@@ -3657,33 +3752,32 @@ int __init ip_vs_control_init(void)
 		INIT_LIST_HEAD(&ip_vs_svc_fwm_table[idx]);
 	}
 
-	ret = register_pernet_subsys(&ipvs_control_ops);
-	if (ret) {
-		pr_err("cannot register namespace.\n");
-		goto err;
-	}
-
 	smp_wmb();	/* Do we really need it now ? */
 
 	ret = nf_register_sockopt(&ip_vs_sockopts);
 	if (ret) {
 		pr_err("cannot register sockopt.\n");
-		goto err_net;
+		goto err_sock;
 	}
 
 	ret = ip_vs_genl_register();
 	if (ret) {
 		pr_err("cannot register Generic Netlink interface.\n");
-		nf_unregister_sockopt(&ip_vs_sockopts);
-		goto err_net;
+		goto err_genl;
 	}
 
+	ret = register_netdevice_notifier(&ip_vs_dst_notifier);
+	if (ret < 0)
+		goto err_notf;
+
 	LeaveFunction(2);
 	return 0;
 
-err_net:
-	unregister_pernet_subsys(&ipvs_control_ops);
-err:
+err_notf:
+	ip_vs_genl_unregister();
+err_genl:
+	nf_unregister_sockopt(&ip_vs_sockopts);
+err_sock:
 	return ret;
 }
 
@@ -3691,7 +3785,6 @@ err:
 void ip_vs_control_cleanup(void)
 {
 	EnterFunction(2);
-	unregister_pernet_subsys(&ipvs_control_ops);
 	ip_vs_genl_unregister();
 	nf_unregister_sockopt(&ip_vs_sockopts);
 	LeaveFunction(2);
diff --git a/net/netfilter/ipvs/ip_vs_est.c b/net/netfilter/ipvs/ip_vs_est.c
index 8c8766c..508cce9 100644
--- a/net/netfilter/ipvs/ip_vs_est.c
+++ b/net/netfilter/ipvs/ip_vs_est.c
@@ -192,7 +192,7 @@ void ip_vs_read_estimator(struct ip_vs_stats_user *dst,
 	dst->outbps = (e->outbps + 0xF) >> 5;
 }
 
-static int __net_init __ip_vs_estimator_init(struct net *net)
+int __net_init __ip_vs_estimator_init(struct net *net)
 {
 	struct netns_ipvs *ipvs = net_ipvs(net);
 
@@ -203,24 +203,16 @@ static int __net_init __ip_vs_estimator_init(struct net *net)
 	return 0;
 }
 
-static void __net_exit __ip_vs_estimator_exit(struct net *net)
+void __net_exit __ip_vs_estimator_cleanup(struct net *net)
 {
 	del_timer_sync(&net_ipvs(net)->est_timer);
 }
-static struct pernet_operations ip_vs_app_ops = {
-	.init = __ip_vs_estimator_init,
-	.exit = __ip_vs_estimator_exit,
-};
 
 int __init ip_vs_estimator_init(void)
 {
-	int rv;
-
-	rv = register_pernet_subsys(&ip_vs_app_ops);
-	return rv;
+	return 0;
 }
 
 void ip_vs_estimator_cleanup(void)
 {
-	unregister_pernet_subsys(&ip_vs_app_ops);
 }
diff --git a/net/netfilter/ipvs/ip_vs_proto.c b/net/netfilter/ipvs/ip_vs_proto.c
index 17484a4..eb86028 100644
--- a/net/netfilter/ipvs/ip_vs_proto.c
+++ b/net/netfilter/ipvs/ip_vs_proto.c
@@ -316,7 +316,7 @@ ip_vs_tcpudp_debug_packet(int af, struct ip_vs_protocol *pp,
 /*
  * per network name-space init
  */
-static int __net_init __ip_vs_protocol_init(struct net *net)
+int __net_init __ip_vs_protocol_init(struct net *net)
 {
 #ifdef CONFIG_IP_VS_PROTO_TCP
 	register_ip_vs_proto_netns(net, &ip_vs_protocol_tcp);
@@ -336,7 +336,7 @@ static int __net_init __ip_vs_protocol_init(struct net *net)
 	return 0;
 }
 
-static void __net_exit __ip_vs_protocol_cleanup(struct net *net)
+void __net_exit __ip_vs_protocol_cleanup(struct net *net)
 {
 	struct netns_ipvs *ipvs = net_ipvs(net);
 	struct ip_vs_proto_data *pd;
@@ -349,11 +349,6 @@ static void __net_exit __ip_vs_protocol_cleanup(struct net *net)
 	}
 }
 
-static struct pernet_operations ipvs_proto_ops = {
-	.init = __ip_vs_protocol_init,
-	.exit = __ip_vs_protocol_cleanup,
-};
-
 int __init ip_vs_protocol_init(void)
 {
 	char protocols[64];
@@ -382,7 +377,6 @@ int __init ip_vs_protocol_init(void)
 	REGISTER_PROTOCOL(&ip_vs_protocol_esp);
 #endif
 	pr_info("Registered protocols (%s)\n", &protocols[2]);
-	return register_pernet_subsys(&ipvs_proto_ops);
 
 	return 0;
 }
@@ -393,7 +387,6 @@ void ip_vs_protocol_cleanup(void)
 	struct ip_vs_protocol *pp;
 	int i;
 
-	unregister_pernet_subsys(&ipvs_proto_ops);
 	/* unregister all the ipvs protocols */
 	for (i = 0; i < IP_VS_PROTO_TAB_SIZE; i++) {
 		while ((pp = ip_vs_proto_table[i]) != NULL)
diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index c187823..f31ef18 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -1663,7 +1663,7 @@ int stop_sync_thread(struct net *net, int state)
 /*
  * Initialize data struct for each netns
  */
-static int __net_init __ip_vs_sync_init(struct net *net)
+int __net_init __ip_vs_sync_init(struct net *net)
 {
 	struct netns_ipvs *ipvs = net_ipvs(net);
 
@@ -1677,7 +1677,7 @@ static int __net_init __ip_vs_sync_init(struct net *net)
 	return 0;
 }
 
-static void __ip_vs_sync_cleanup(struct net *net)
+void __ip_vs_sync_cleanup(struct net *net)
 {
 	int retc;
 
@@ -1690,18 +1690,11 @@ static void __ip_vs_sync_cleanup(struct net *net)
 		pr_err("Failed to stop Backup Daemon\n");
 }
 
-static struct pernet_operations ipvs_sync_ops = {
-	.init = __ip_vs_sync_init,
-	.exit = __ip_vs_sync_cleanup,
-};
-

^ permalink raw reply related

* [PATCH 2/2 net-next] net: drivers: set TSO/UFO offload option explicitly
From: Shan Wei @ 2011-04-29  5:34 UTC (permalink / raw)
  To: David Miller, netdev, rusty, mst, Eric Dumazet, mirq-linux,
	mirqus

The device drivers should not use NETIF_F_ALL_TSO mask to set hw_features(or features),
but have to explicitly set offload option. Because, This would make drivers automatically
clain to support any new TSO feature an the moment of NETIF_F_ALL_TSO is expanded.

Some code style tuning. Just compile test. 

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
---
 drivers/net/loopback.c   |   18 ++++++++----------
 drivers/net/virtio_net.c |    9 ++++++---
 2 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/drivers/net/loopback.c b/drivers/net/loopback.c
index d70fb76..bfb6a4a 100644
--- a/drivers/net/loopback.c
+++ b/drivers/net/loopback.c
@@ -152,6 +152,9 @@ static const struct net_device_ops loopback_ops = {
 	.ndo_get_stats64 = loopback_get_stats64,
 };
 
+#define LOOPBACK_USER_FEATURES (NETIF_F_TSO | NETIF_F_TSO_ECN | \
+				NETIF_F_TSO6 | NETIF_F_UFO)
+
 /*
  * The loopback device is special. There is only one instance
  * per network namespace.
@@ -165,16 +168,11 @@ static void loopback_setup(struct net_device *dev)
 	dev->type		= ARPHRD_LOOPBACK;	/* 0x0001*/
 	dev->flags		= IFF_LOOPBACK;
 	dev->priv_flags	       &= ~IFF_XMIT_DST_RELEASE;
-	dev->hw_features	= NETIF_F_ALL_TSO | NETIF_F_UFO;
-	dev->features 		= NETIF_F_SG | NETIF_F_FRAGLIST
-		| NETIF_F_ALL_TSO
-		| NETIF_F_UFO
-		| NETIF_F_NO_CSUM
-		| NETIF_F_RXCSUM
-		| NETIF_F_HIGHDMA
-		| NETIF_F_LLTX
-		| NETIF_F_NETNS_LOCAL
-		| NETIF_F_VLAN_CHALLENGED;
+	dev->hw_features	= LOOPBACK_USER_FEATURES;
+	dev->features		= NETIF_F_SG | NETIF_F_FRAGLIST
+		| LOOPBACK_USER_FEATURES | NETIF_F_NO_CSUM | NETIF_F_RXCSUM
+		| NETIF_F_HIGHDMA | NETIF_F_LLTX
+		| NETIF_F_NETNS_LOCAL | NETIF_F_VLAN_CHALLENGED;
 	dev->ethtool_ops	= &loopback_ethtool_ops;
 	dev->header_ops		= &eth_header_ops;
 	dev->netdev_ops		= &loopback_ops;
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 0cb0b06..addde86 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -875,6 +875,9 @@ static void virtnet_config_changed(struct virtio_device *vdev)
 	virtnet_update_status(vi);
 }
 
+#define VIRTNET_USER_FEATURES (NETIF_F_TSO | NETIF_F_TSO_ECN | \
+			       NETIF_F_TSO6 | NETIF_F_UFO)
+
 static int virtnet_probe(struct virtio_device *vdev)
 {
 	int err;
@@ -904,8 +907,7 @@ static int virtnet_probe(struct virtio_device *vdev)
 			dev->features |= NETIF_F_HW_CSUM|NETIF_F_SG|NETIF_F_FRAGLIST;
 
 		if (virtio_has_feature(vdev, VIRTIO_NET_F_GSO)) {
-			dev->hw_features |= NETIF_F_TSO | NETIF_F_UFO
-				| NETIF_F_TSO_ECN | NETIF_F_TSO6;
+			dev->hw_features |= VIRTNET_USER_FEATURES;
 		}
 		/* Individual feature bits: what can host handle? */
 		if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_TSO4))
@@ -918,7 +920,8 @@ static int virtnet_probe(struct virtio_device *vdev)
 			dev->hw_features |= NETIF_F_UFO;
 
 		if (gso)
-			dev->features |= dev->hw_features & (NETIF_F_ALL_TSO|NETIF_F_UFO);
+			dev->features |= dev->hw_features &
+					 VIRTNET_USER_FEATURES;
 		/* (!csum && gso) case will be fixed by register_netdev() */
 	}
 
-- 
1.7.3.1

^ permalink raw reply related

* [PATCH 3/3] ipv4: Use caller's on-stack flowi as-is in output route lookups.
From: David Miller @ 2011-04-29  5:25 UTC (permalink / raw)
  To: netdev


Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/route.h |    2 +-
 net/ipv4/route.c    |  158 ++++++++++++++++++++++++++-------------------------
 2 files changed, 81 insertions(+), 79 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index fdbdb92..81b6cf2 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -115,7 +115,7 @@ extern void		ip_rt_redirect(__be32 old_gw, __be32 dst, __be32 new_gw,
 				       __be32 src, struct net_device *dev);
 extern void		rt_cache_flush(struct net *net, int how);
 extern void		rt_cache_flush_batch(struct net *net);
-extern struct rtable *__ip_route_output_key(struct net *, const struct flowi4 *flp);
+extern struct rtable *__ip_route_output_key(struct net *, struct flowi4 *flp);
 extern struct rtable *ip_route_output_flow(struct net *, struct flowi4 *flp,
 					   struct sock *sk);
 extern struct dst_entry *ipv4_blackhole_route(struct net *net, struct dst_entry *dst_orig);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index fb9211a..93f71be 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1767,7 +1767,7 @@ static unsigned int ipv4_default_mtu(const struct dst_entry *dst)
 	return mtu;
 }
 
-static void rt_init_metrics(struct rtable *rt, const struct flowi4 *oldflp4,
+static void rt_init_metrics(struct rtable *rt, const struct flowi4 *fl4,
 			    struct fib_info *fi)
 {
 	struct inet_peer *peer;
@@ -1776,7 +1776,7 @@ static void rt_init_metrics(struct rtable *rt, const struct flowi4 *oldflp4,
 	/* If a peer entry exists for this destination, we must hook
 	 * it up in order to get at cached metrics.
 	 */
-	if (oldflp4 && (oldflp4->flowi4_flags & FLOWI_FLAG_PRECOW_METRICS))
+	if (fl4 && (fl4->flowi4_flags & FLOWI_FLAG_PRECOW_METRICS))
 		create = 1;
 
 	rt->peer = peer = inet_getpeer_v4(rt->rt_dst, create);
@@ -1803,7 +1803,7 @@ static void rt_init_metrics(struct rtable *rt, const struct flowi4 *oldflp4,
 	}
 }
 
-static void rt_set_nexthop(struct rtable *rt, const struct flowi4 *oldflp4,
+static void rt_set_nexthop(struct rtable *rt, const struct flowi4 *fl4,
 			   const struct fib_result *res,
 			   struct fib_info *fi, u16 type, u32 itag)
 {
@@ -1813,7 +1813,7 @@ static void rt_set_nexthop(struct rtable *rt, const struct flowi4 *oldflp4,
 		if (FIB_RES_GW(*res) &&
 		    FIB_RES_NH(*res).nh_scope == RT_SCOPE_LINK)
 			rt->rt_gateway = FIB_RES_GW(*res);
-		rt_init_metrics(rt, oldflp4, fi);
+		rt_init_metrics(rt, fl4, fi);
 #ifdef CONFIG_IP_ROUTE_CLASSID
 		dst->tclassid = FIB_RES_NH(*res).nh_tclassid;
 #endif
@@ -2354,12 +2354,12 @@ EXPORT_SYMBOL(ip_route_input_common);
 /* called with rcu_read_lock() */
 static struct rtable *__mkroute_output(const struct fib_result *res,
 				       const struct flowi4 *fl4,
-				       const struct flowi4 *oldflp4,
-				       struct net_device *dev_out,
+				       __be32 orig_daddr, __be32 orig_saddr,
+				       int orig_oif, struct net_device *dev_out,
 				       unsigned int flags)
 {
 	struct fib_info *fi = res->fi;
-	u32 tos = RT_FL_TOS(oldflp4);
+	u32 tos = RT_FL_TOS(fl4);
 	struct in_device *in_dev;
 	u16 type = res->type;
 	struct rtable *rth;
@@ -2386,8 +2386,8 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
 		fi = NULL;
 	} else if (type == RTN_MULTICAST) {
 		flags |= RTCF_MULTICAST | RTCF_LOCAL;
-		if (!ip_check_mc_rcu(in_dev, oldflp4->daddr, oldflp4->saddr,
-				     oldflp4->flowi4_proto))
+		if (!ip_check_mc_rcu(in_dev, fl4->daddr, fl4->saddr,
+				     fl4->flowi4_proto))
 			flags &= ~RTCF_LOCAL;
 		/* If multicast route do not exist use
 		 * default one, but do not gateway in this case.
@@ -2405,8 +2405,8 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
 
 	rth->dst.output = ip_output;
 
-	rth->rt_key_dst	= oldflp4->daddr;
-	rth->rt_key_src	= oldflp4->saddr;
+	rth->rt_key_dst	= orig_daddr;
+	rth->rt_key_src	= orig_saddr;
 	rth->rt_genid = rt_genid(dev_net(dev_out));
 	rth->rt_flags	= flags;
 	rth->rt_type	= type;
@@ -2414,9 +2414,9 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
 	rth->rt_dst	= fl4->daddr;
 	rth->rt_src	= fl4->saddr;
 	rth->rt_route_iif = 0;
-	rth->rt_iif	= oldflp4->flowi4_oif ? : dev_out->ifindex;
-	rth->rt_oif	= oldflp4->flowi4_oif;
-	rth->rt_mark    = oldflp4->flowi4_mark;
+	rth->rt_iif	= orig_oif ? : dev_out->ifindex;
+	rth->rt_oif	= orig_oif;
+	rth->rt_mark    = fl4->flowi4_mark;
 	rth->rt_gateway = fl4->daddr;
 	rth->rt_spec_dst= fl4->saddr;
 	rth->rt_peer_genid = 0;
@@ -2439,7 +2439,7 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
 #ifdef CONFIG_IP_MROUTE
 		if (type == RTN_MULTICAST) {
 			if (IN_DEV_MFORWARD(in_dev) &&
-			    !ipv4_is_local_multicast(oldflp4->daddr)) {
+			    !ipv4_is_local_multicast(fl4->daddr)) {
 				rth->dst.input = ip_mr_input;
 				rth->dst.output = ip_mc_output;
 			}
@@ -2447,7 +2447,7 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
 #endif
 	}
 
-	rt_set_nexthop(rth, oldflp4, res, fi, type, 0);
+	rt_set_nexthop(rth, fl4, res, fi, type, 0);
 
 	return rth;
 }
@@ -2457,36 +2457,37 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
  * called with rcu_read_lock();
  */
 
-static struct rtable *ip_route_output_slow(struct net *net,
-					   const struct flowi4 *oldflp4)
+static struct rtable *ip_route_output_slow(struct net *net, struct flowi4 *fl4)
 {
-	u32 tos	= RT_FL_TOS(oldflp4);
-	struct flowi4 fl4;
-	struct fib_result res;
-	unsigned int flags = 0;
 	struct net_device *dev_out = NULL;
+	u32 tos	= RT_FL_TOS(fl4);
+	unsigned int flags = 0;
+	struct fib_result res;
 	struct rtable *rth;
+	__be32 orig_daddr;
+	__be32 orig_saddr;
+	int orig_oif;
 
 	res.fi		= NULL;
 #ifdef CONFIG_IP_MULTIPLE_TABLES
 	res.r		= NULL;
 #endif
 
-	fl4.flowi4_oif = oldflp4->flowi4_oif;
-	fl4.flowi4_iif = net->loopback_dev->ifindex;
-	fl4.flowi4_mark = oldflp4->flowi4_mark;
-	fl4.daddr = oldflp4->daddr;
-	fl4.saddr = oldflp4->saddr;
-	fl4.flowi4_tos = tos & IPTOS_RT_MASK;
-	fl4.flowi4_scope = ((tos & RTO_ONLINK) ?
-			RT_SCOPE_LINK : RT_SCOPE_UNIVERSE);
+	orig_daddr = fl4->daddr;
+	orig_saddr = fl4->saddr;
+	orig_oif = fl4->flowi4_oif;
+
+	fl4->flowi4_iif = net->loopback_dev->ifindex;
+	fl4->flowi4_tos = tos & IPTOS_RT_MASK;
+	fl4->flowi4_scope = ((tos & RTO_ONLINK) ?
+			 RT_SCOPE_LINK : RT_SCOPE_UNIVERSE);
 
 	rcu_read_lock();
-	if (oldflp4->saddr) {
+	if (fl4->saddr) {
 		rth = ERR_PTR(-EINVAL);
-		if (ipv4_is_multicast(oldflp4->saddr) ||
-		    ipv4_is_lbcast(oldflp4->saddr) ||
-		    ipv4_is_zeronet(oldflp4->saddr))
+		if (ipv4_is_multicast(fl4->saddr) ||
+		    ipv4_is_lbcast(fl4->saddr) ||
+		    ipv4_is_zeronet(fl4->saddr))
 			goto out;
 
 		/* I removed check for oif == dev_out->oif here.
@@ -2497,11 +2498,11 @@ static struct rtable *ip_route_output_slow(struct net *net,
 		      of another iface. --ANK
 		 */
 
-		if (oldflp4->flowi4_oif == 0 &&
-		    (ipv4_is_multicast(oldflp4->daddr) ||
-		     ipv4_is_lbcast(oldflp4->daddr))) {
+		if (fl4->flowi4_oif == 0 &&
+		    (ipv4_is_multicast(fl4->daddr) ||
+		     ipv4_is_lbcast(fl4->daddr))) {
 			/* It is equivalent to inet_addr_type(saddr) == RTN_LOCAL */
-			dev_out = __ip_dev_find(net, oldflp4->saddr, false);
+			dev_out = __ip_dev_find(net, fl4->saddr, false);
 			if (dev_out == NULL)
 				goto out;
 
@@ -2520,20 +2521,20 @@ static struct rtable *ip_route_output_slow(struct net *net,
 			   Luckily, this hack is good workaround.
 			 */
 
-			fl4.flowi4_oif = dev_out->ifindex;
+			fl4->flowi4_oif = dev_out->ifindex;
 			goto make_route;
 		}
 
-		if (!(oldflp4->flowi4_flags & FLOWI_FLAG_ANYSRC)) {
+		if (!(fl4->flowi4_flags & FLOWI_FLAG_ANYSRC)) {
 			/* It is equivalent to inet_addr_type(saddr) == RTN_LOCAL */
-			if (!__ip_dev_find(net, oldflp4->saddr, false))
+			if (!__ip_dev_find(net, fl4->saddr, false))
 				goto out;
 		}
 	}
 
 
-	if (oldflp4->flowi4_oif) {
-		dev_out = dev_get_by_index_rcu(net, oldflp4->flowi4_oif);
+	if (fl4->flowi4_oif) {
+		dev_out = dev_get_by_index_rcu(net, fl4->flowi4_oif);
 		rth = ERR_PTR(-ENODEV);
 		if (dev_out == NULL)
 			goto out;
@@ -2543,37 +2544,37 @@ static struct rtable *ip_route_output_slow(struct net *net,
 			rth = ERR_PTR(-ENETUNREACH);
 			goto out;
 		}
-		if (ipv4_is_local_multicast(oldflp4->daddr) ||
-		    ipv4_is_lbcast(oldflp4->daddr)) {
-			if (!fl4.saddr)
-				fl4.saddr = inet_select_addr(dev_out, 0,
-							     RT_SCOPE_LINK);
+		if (ipv4_is_local_multicast(fl4->daddr) ||
+		    ipv4_is_lbcast(fl4->daddr)) {
+			if (!fl4->saddr)
+				fl4->saddr = inet_select_addr(dev_out, 0,
+							      RT_SCOPE_LINK);
 			goto make_route;
 		}
-		if (!fl4.saddr) {
-			if (ipv4_is_multicast(oldflp4->daddr))
-				fl4.saddr = inet_select_addr(dev_out, 0,
-							     fl4.flowi4_scope);
-			else if (!oldflp4->daddr)
-				fl4.saddr = inet_select_addr(dev_out, 0,
-							     RT_SCOPE_HOST);
+		if (fl4->saddr) {
+			if (ipv4_is_multicast(fl4->daddr))
+				fl4->saddr = inet_select_addr(dev_out, 0,
+							      fl4->flowi4_scope);
+			else if (!fl4->daddr)
+				fl4->saddr = inet_select_addr(dev_out, 0,
+							      RT_SCOPE_HOST);
 		}
 	}
 
-	if (!fl4.daddr) {
-		fl4.daddr = fl4.saddr;
-		if (!fl4.daddr)
-			fl4.daddr = fl4.saddr = htonl(INADDR_LOOPBACK);
+	if (!fl4->daddr) {
+		fl4->daddr = fl4->saddr;
+		if (!fl4->daddr)
+			fl4->daddr = fl4->saddr = htonl(INADDR_LOOPBACK);
 		dev_out = net->loopback_dev;
-		fl4.flowi4_oif = net->loopback_dev->ifindex;
+		fl4->flowi4_oif = net->loopback_dev->ifindex;
 		res.type = RTN_LOCAL;
 		flags |= RTCF_LOCAL;
 		goto make_route;
 	}
 
-	if (fib_lookup(net, &fl4, &res)) {
+	if (fib_lookup(net, fl4, &res)) {
 		res.fi = NULL;
-		if (oldflp4->flowi4_oif) {
+		if (fl4->flowi4_oif) {
 			/* Apparently, routing tables are wrong. Assume,
 			   that the destination is on link.
 
@@ -2592,9 +2593,9 @@ static struct rtable *ip_route_output_slow(struct net *net,
 			   likely IPv6, but we do not.
 			 */
 
-			if (fl4.saddr == 0)
-				fl4.saddr = inet_select_addr(dev_out, 0,
-							     RT_SCOPE_LINK);
+			if (fl4->saddr == 0)
+				fl4->saddr = inet_select_addr(dev_out, 0,
+							      RT_SCOPE_LINK);
 			res.type = RTN_UNICAST;
 			goto make_route;
 		}
@@ -2603,44 +2604,45 @@ static struct rtable *ip_route_output_slow(struct net *net,
 	}
 
 	if (res.type == RTN_LOCAL) {
-		if (!fl4.saddr) {
+		if (!fl4->saddr) {
 			if (res.fi->fib_prefsrc)
-				fl4.saddr = res.fi->fib_prefsrc;
+				fl4->saddr = res.fi->fib_prefsrc;
 			else
-				fl4.saddr = fl4.daddr;
+				fl4->saddr = fl4->daddr;
 		}
 		dev_out = net->loopback_dev;
-		fl4.flowi4_oif = dev_out->ifindex;
+		fl4->flowi4_oif = dev_out->ifindex;
 		res.fi = NULL;
 		flags |= RTCF_LOCAL;
 		goto make_route;
 	}
 
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
-	if (res.fi->fib_nhs > 1 && fl4.flowi4_oif == 0)
+	if (res.fi->fib_nhs > 1 && fl4->flowi4_oif == 0)
 		fib_select_multipath(&res);
 	else
 #endif
 	if (!res.prefixlen &&
 	    res.table->tb_num_default > 1 &&
-	    res.type == RTN_UNICAST && !fl4.flowi4_oif)
+	    res.type == RTN_UNICAST && !fl4->flowi4_oif)
 		fib_select_default(&res);
 
-	if (!fl4.saddr)
-		fl4.saddr = FIB_RES_PREFSRC(net, res);
+	if (!fl4->saddr)
+		fl4->saddr = FIB_RES_PREFSRC(net, res);
 
 	dev_out = FIB_RES_DEV(res);
-	fl4.flowi4_oif = dev_out->ifindex;
+	fl4->flowi4_oif = dev_out->ifindex;
 
 
 make_route:
-	rth = __mkroute_output(&res, &fl4, oldflp4, dev_out, flags);
+	rth = __mkroute_output(&res, fl4, orig_daddr, orig_saddr, orig_oif,
+			       dev_out, flags);
 	if (!IS_ERR(rth)) {
 		unsigned int hash;
 
-		hash = rt_hash(oldflp4->daddr, oldflp4->saddr, oldflp4->flowi4_oif,
+		hash = rt_hash(orig_daddr, orig_saddr, orig_oif,
 			       rt_genid(dev_net(dev_out)));
-		rth = rt_intern_hash(hash, rth, NULL, oldflp4->flowi4_oif);
+		rth = rt_intern_hash(hash, rth, NULL, orig_oif);
 	}
 
 out:
@@ -2648,7 +2650,7 @@ out:
 	return rth;
 }
 
-struct rtable *__ip_route_output_key(struct net *net, const struct flowi4 *flp4)
+struct rtable *__ip_route_output_key(struct net *net, struct flowi4 *flp4)
 {
 	struct rtable *rth;
 	unsigned int hash;
-- 
1.7.4.5


^ permalink raw reply related

* [PATCH 2/3] net: Use non-zero allocations in dst_alloc().
From: David Miller @ 2011-04-29  5:25 UTC (permalink / raw)
  To: netdev


Make dst_alloc() and it's users explicitly initialize the entire
entry.

The zero'ing done by kmem_cache_zalloc() was almost entirely
redundant.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/core/dst.c         |   20 ++++++++++--
 net/decnet/dn_route.c  |    2 +
 net/ipv4/route.c       |   78 +++++++++++++++++++++++++++++-------------------
 net/ipv6/route.c       |    8 ++++-
 net/xfrm/xfrm_policy.c |    1 +
 5 files changed, 74 insertions(+), 35 deletions(-)

diff --git a/net/core/dst.c b/net/core/dst.c
index 9505778..30f0093 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -175,22 +175,36 @@ void *dst_alloc(struct dst_ops *ops, struct net_device *dev,
 		if (ops->gc(ops))
 			return NULL;
 	}
-	dst = kmem_cache_zalloc(ops->kmem_cachep, GFP_ATOMIC);
+	dst = kmem_cache_alloc(ops->kmem_cachep, GFP_ATOMIC);
 	if (!dst)
 		return NULL;
-	dst->ops = ops;
+	dst->child = NULL;
 	dst->dev = dev;
 	if (dev)
 		dev_hold(dev);
+	dst->ops = ops;
 	dst_init_metrics(dst, dst_default_metrics, true);
+	dst->expires = 0UL;
 	dst->path = dst;
+	dst->neighbour = NULL;
+	dst->hh = NULL;
+#ifdef CONFIG_XFRM
+	dst->xfrm = NULL;
+#endif
 	dst->input = dst_discard;
 	dst->output = dst_discard;
-
+	dst->error = 0;
 	dst->obsolete = initial_obsolete;
+	dst->header_len = 0;
+	dst->trailer_len = 0;
+#ifdef CONFIG_IP_ROUTE_CLASSID
+	dst->tclassid = 0;
+#endif
 	atomic_set(&dst->__refcnt, initial_ref);
+	dst->__use = 0;
 	dst->lastuse = jiffies;
 	dst->flags = flags;
+	dst->next = NULL;
 #if RT_CACHE_DEBUG >= 2
 	atomic_inc(&dst_total);
 #endif
diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index f489b08..74544bc 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -1129,6 +1129,7 @@ make_route:
 	if (rt == NULL)
 		goto e_nobufs;
 
+	memset(&rt->fld, 0, sizeof(rt->fld));
 	rt->fld.saddr        = oldflp->saddr;
 	rt->fld.daddr        = oldflp->daddr;
 	rt->fld.flowidn_oif  = oldflp->flowidn_oif;
@@ -1398,6 +1399,7 @@ make_route:
 	if (rt == NULL)
 		goto e_nobufs;
 
+	memset(&rt->fld, 0, sizeof(rt->fld));
 	rt->rt_saddr      = fld.saddr;
 	rt->rt_daddr      = fld.daddr;
 	rt->rt_gateway    = fld.daddr;
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index b471d89..fb9211a 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1830,7 +1830,6 @@ static void rt_set_nexthop(struct rtable *rt, const struct flowi4 *oldflp4,
 #endif
 	set_class_tag(rt, itag);
 #endif
-	rt->rt_type = type;
 }
 
 static struct rtable *rt_dst_alloc(struct net_device *dev,
@@ -1877,25 +1876,28 @@ static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 	if (!rth)
 		goto e_nobufs;
 
+#ifdef CONFIG_IP_ROUTE_CLASSID
+	rth->dst.tclassid = itag;
+#endif
 	rth->dst.output = ip_rt_bug;
 
 	rth->rt_key_dst	= daddr;
-	rth->rt_dst	= daddr;
-	rth->rt_tos	= tos;
-	rth->rt_mark    = skb->mark;
 	rth->rt_key_src	= saddr;
+	rth->rt_genid	= rt_genid(dev_net(dev));
+	rth->rt_flags	= RTCF_MULTICAST;
+	rth->rt_type	= RTN_MULTICAST;
+	rth->rt_tos	= tos;
+	rth->rt_dst	= daddr;
 	rth->rt_src	= saddr;
-#ifdef CONFIG_IP_ROUTE_CLASSID
-	rth->dst.tclassid = itag;
-#endif
 	rth->rt_route_iif = dev->ifindex;
 	rth->rt_iif	= dev->ifindex;
 	rth->rt_oif	= 0;
+	rth->rt_mark    = skb->mark;
 	rth->rt_gateway	= daddr;
 	rth->rt_spec_dst= spec_dst;
-	rth->rt_genid	= rt_genid(dev_net(dev));
-	rth->rt_flags	= RTCF_MULTICAST;
-	rth->rt_type	= RTN_MULTICAST;
+	rth->rt_peer_genid = 0;
+	rth->peer = NULL;
+	rth->fi = NULL;
 	if (our) {
 		rth->dst.input= ip_local_deliver;
 		rth->rt_flags |= RTCF_LOCAL;
@@ -2017,25 +2019,28 @@ static int __mkroute_input(struct sk_buff *skb,
 	}
 
 	rth->rt_key_dst	= daddr;
-	rth->rt_dst	= daddr;
-	rth->rt_tos	= tos;
-	rth->rt_mark    = skb->mark;
 	rth->rt_key_src	= saddr;
+	rth->rt_genid = rt_genid(dev_net(rth->dst.dev));
+	rth->rt_flags = flags;
+	rth->rt_type = res->type;
+	rth->rt_tos	= tos;
+	rth->rt_dst	= daddr;
 	rth->rt_src	= saddr;
-	rth->rt_gateway	= daddr;
 	rth->rt_route_iif = in_dev->dev->ifindex;
 	rth->rt_iif 	= in_dev->dev->ifindex;
 	rth->rt_oif 	= 0;
+	rth->rt_mark    = skb->mark;
+	rth->rt_gateway	= daddr;
 	rth->rt_spec_dst= spec_dst;
+	rth->rt_peer_genid = 0;
+	rth->peer = NULL;
+	rth->fi = NULL;
 
 	rth->dst.input = ip_forward;
 	rth->dst.output = ip_output;
-	rth->rt_genid = rt_genid(dev_net(rth->dst.dev));
 
 	rt_set_nexthop(rth, NULL, res, res->fi, res->type, itag);
 
-	rth->rt_flags = flags;
-
 	*result = rth;
 	err = 0;
  cleanup:
@@ -2187,30 +2192,37 @@ local_input:
 	if (!rth)
 		goto e_nobufs;
 
+	rth->dst.input= ip_local_deliver;
 	rth->dst.output= ip_rt_bug;
-	rth->rt_genid = rt_genid(net);
+#ifdef CONFIG_IP_ROUTE_CLASSID
+	rth->dst.tclassid = itag;
+#endif
 
 	rth->rt_key_dst	= daddr;
-	rth->rt_dst	= daddr;
-	rth->rt_tos	= tos;
-	rth->rt_mark    = skb->mark;
 	rth->rt_key_src	= saddr;
+	rth->rt_genid = rt_genid(net);
+	rth->rt_flags 	= flags|RTCF_LOCAL;
+	rth->rt_type	= res.type;
+	rth->rt_tos	= tos;
+	rth->rt_dst	= daddr;
 	rth->rt_src	= saddr;
 #ifdef CONFIG_IP_ROUTE_CLASSID
 	rth->dst.tclassid = itag;
 #endif
 	rth->rt_route_iif = dev->ifindex;
 	rth->rt_iif	= dev->ifindex;
+	rth->rt_oif	= 0;
+	rth->rt_mark    = skb->mark;
 	rth->rt_gateway	= daddr;
 	rth->rt_spec_dst= spec_dst;
-	rth->dst.input= ip_local_deliver;
-	rth->rt_flags 	= flags|RTCF_LOCAL;
+	rth->rt_peer_genid = 0;
+	rth->peer = NULL;
+	rth->fi = NULL;
 	if (res.type == RTN_UNREACHABLE) {
 		rth->dst.input= ip_error;
 		rth->dst.error= -err;
 		rth->rt_flags 	&= ~RTCF_LOCAL;
 	}
-	rth->rt_type	= res.type;
 	hash = rt_hash(daddr, saddr, fl4.flowi4_iif, rt_genid(net));
 	rth = rt_intern_hash(hash, rth, skb, fl4.flowi4_iif);
 	err = 0;
@@ -2391,20 +2403,25 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
 	if (!rth)
 		return ERR_PTR(-ENOBUFS);
 
+	rth->dst.output = ip_output;
+
 	rth->rt_key_dst	= oldflp4->daddr;
-	rth->rt_tos	= tos;
 	rth->rt_key_src	= oldflp4->saddr;
-	rth->rt_oif	= oldflp4->flowi4_oif;
-	rth->rt_mark    = oldflp4->flowi4_mark;
+	rth->rt_genid = rt_genid(dev_net(dev_out));
+	rth->rt_flags	= flags;
+	rth->rt_type	= type;
+	rth->rt_tos	= tos;
 	rth->rt_dst	= fl4->daddr;
 	rth->rt_src	= fl4->saddr;
 	rth->rt_route_iif = 0;
 	rth->rt_iif	= oldflp4->flowi4_oif ? : dev_out->ifindex;
+	rth->rt_oif	= oldflp4->flowi4_oif;
+	rth->rt_mark    = oldflp4->flowi4_mark;
 	rth->rt_gateway = fl4->daddr;
 	rth->rt_spec_dst= fl4->saddr;
-
-	rth->dst.output=ip_output;
-	rth->rt_genid = rt_genid(dev_net(dev_out));
+	rth->rt_peer_genid = 0;
+	rth->peer = NULL;
+	rth->fi = NULL;
 
 	RT_CACHE_STAT_INC(out_slow_tot);
 
@@ -2432,7 +2449,6 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
 
 	rt_set_nexthop(rth, oldflp4, res, fi, type, 0);
 
-	rth->rt_flags = flags;
 	return rth;
 }
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index e8b2bb9..f1be5c5 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -230,7 +230,11 @@ static struct rt6_info ip6_blk_hole_entry_template = {
 static inline struct rt6_info *ip6_dst_alloc(struct dst_ops *ops,
 					     struct net_device *dev)
 {
-	return (struct rt6_info *)dst_alloc(ops, dev, 0, 0, 0);
+	struct rt6_info *rt = dst_alloc(ops, dev, 0, 0, 0);
+
+	memset(&rt->rt6i_table, 0, sizeof(*rt) - sizeof(struct dst_entry));
+
+	return rt;
 }
 
 static void ip6_dst_destroy(struct dst_entry *dst)
@@ -887,6 +891,8 @@ struct dst_entry *ip6_blackhole_route(struct net *net, struct dst_entry *dst_ori
 
 	rt = dst_alloc(&ip6_dst_blackhole_ops, ort->dst.dev, 1, 0, 0);
 	if (rt) {
+		memset(&rt->rt6i_table, 0, sizeof(*rt) - sizeof(struct dst_entry));
+
 		new = &rt->dst;
 
 		new->__use = 1;
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 70552c4..00bcb88 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1349,6 +1349,7 @@ static inline struct xfrm_dst *xfrm_alloc_dst(struct net *net, int family)
 		BUG();
 	}
 	xdst = dst_alloc(dst_ops, NULL, 0, 0, 0);
+	memset(&xdst->u.rt6.rt6i_table, 0, sizeof(*xdst) - sizeof(struct dst_entry));
 	xfrm_policy_put_afinfo(afinfo);
 
 	if (likely(xdst))
-- 
1.7.4.5


^ permalink raw reply related

* [PATCH 1/3] net: Make dst_alloc() take more explicit initializations.
From: David Miller @ 2011-04-29  5:25 UTC (permalink / raw)
  To: netdev


Now the dst->dev, dev->obsolete, and dst->flags values can
be specified as well.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/dst.h      |    3 ++-
 net/core/dst.c         |   18 +++++++++++++-----
 net/decnet/dn_route.c  |   13 ++-----------
 net/ipv4/route.c       |   40 +++++++++++++++-------------------------
 net/ipv6/route.c       |   29 +++++++++++------------------
 net/xfrm/xfrm_policy.c |    2 +-
 6 files changed, 44 insertions(+), 61 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index d7bb740..2588a9a 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -350,7 +350,8 @@ static inline struct dst_entry *skb_dst_pop(struct sk_buff *skb)
 }
 
 extern int dst_discard(struct sk_buff *skb);
-extern void *dst_alloc(struct dst_ops * ops, int initial_ref);
+extern void *dst_alloc(struct dst_ops * ops, struct net_device *dev,
+		       int initial_ref, int initial_obsolete, int flags);
 extern void __dst_free(struct dst_entry * dst);
 extern struct dst_entry *dst_destroy(struct dst_entry * dst);
 
diff --git a/net/core/dst.c b/net/core/dst.c
index 91104d3..9505778 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -166,7 +166,8 @@ EXPORT_SYMBOL(dst_discard);
 
 const u32 dst_default_metrics[RTAX_MAX];
 
-void *dst_alloc(struct dst_ops *ops, int initial_ref)
+void *dst_alloc(struct dst_ops *ops, struct net_device *dev,
+		int initial_ref, int initial_obsolete, int flags)
 {
 	struct dst_entry *dst;
 
@@ -177,12 +178,19 @@ void *dst_alloc(struct dst_ops *ops, int initial_ref)
 	dst = kmem_cache_zalloc(ops->kmem_cachep, GFP_ATOMIC);
 	if (!dst)
 		return NULL;
-	atomic_set(&dst->__refcnt, initial_ref);
 	dst->ops = ops;
-	dst->lastuse = jiffies;
-	dst->path = dst;
-	dst->input = dst->output = dst_discard;
+	dst->dev = dev;
+	if (dev)
+		dev_hold(dev);
 	dst_init_metrics(dst, dst_default_metrics, true);
+	dst->path = dst;
+	dst->input = dst_discard;
+	dst->output = dst_discard;
+
+	dst->obsolete = initial_obsolete;
+	atomic_set(&dst->__refcnt, initial_ref);
+	dst->lastuse = jiffies;
+	dst->flags = flags;
 #if RT_CACHE_DEBUG >= 2
 	atomic_inc(&dst_total);
 #endif
diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index 9f09d4f..f489b08 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -1125,13 +1125,10 @@ make_route:
 	if (dev_out->flags & IFF_LOOPBACK)
 		flags |= RTCF_LOCAL;
 
-	rt = dst_alloc(&dn_dst_ops, 0);
+	rt = dst_alloc(&dn_dst_ops, dev_out, 1, 0, DST_HOST);
 	if (rt == NULL)
 		goto e_nobufs;
 
-	atomic_set(&rt->dst.__refcnt, 1);
-	rt->dst.flags   = DST_HOST;
-
 	rt->fld.saddr        = oldflp->saddr;
 	rt->fld.daddr        = oldflp->daddr;
 	rt->fld.flowidn_oif  = oldflp->flowidn_oif;
@@ -1146,8 +1143,6 @@ make_route:
 	rt->rt_dst_map    = fld.daddr;
 	rt->rt_src_map    = fld.saddr;
 
-	rt->dst.dev = dev_out;
-	dev_hold(dev_out);
 	rt->dst.neighbour = neigh;
 	neigh = NULL;
 
@@ -1399,7 +1394,7 @@ static int dn_route_input_slow(struct sk_buff *skb)
 	}
 
 make_route:
-	rt = dst_alloc(&dn_dst_ops, 0);
+	rt = dst_alloc(&dn_dst_ops, out_dev, 0, 0, DST_HOST);
 	if (rt == NULL)
 		goto e_nobufs;
 
@@ -1419,9 +1414,7 @@ make_route:
 	rt->fld.flowidn_iif  = in_dev->ifindex;
 	rt->fld.flowidn_mark = fld.flowidn_mark;
 
-	rt->dst.flags = DST_HOST;
 	rt->dst.neighbour = neigh;
-	rt->dst.dev = out_dev;
 	rt->dst.lastuse = jiffies;
 	rt->dst.output = dn_rt_bug;
 	switch(res.type) {
@@ -1440,8 +1433,6 @@ make_route:
 			rt->dst.input = dst_discard;
 	}
 	rt->rt_flags = flags;
-	if (rt->dst.dev)
-		dev_hold(rt->dst.dev);
 
 	err = dn_rt_set_next_hop(rt, &res);
 	if (err)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index d63f780..b471d89 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1833,17 +1833,13 @@ static void rt_set_nexthop(struct rtable *rt, const struct flowi4 *oldflp4,
 	rt->rt_type = type;
 }
 
-static struct rtable *rt_dst_alloc(bool nopolicy, bool noxfrm)
+static struct rtable *rt_dst_alloc(struct net_device *dev,
+				   bool nopolicy, bool noxfrm)
 {
-	struct rtable *rt = dst_alloc(&ipv4_dst_ops, 1);
-	if (rt) {
-		rt->dst.obsolete = -1;
-
-		rt->dst.flags = DST_HOST |
-			(nopolicy ? DST_NOPOLICY : 0) |
-			(noxfrm ? DST_NOXFRM : 0);
-	}
-	return rt;
+	return dst_alloc(&ipv4_dst_ops, dev, 1, -1,
+			 DST_HOST |
+			 (nopolicy ? DST_NOPOLICY : 0) |
+			 (noxfrm ? DST_NOXFRM : 0));
 }
 
 /* called in rcu_read_lock() section */
@@ -1876,7 +1872,8 @@ static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 		if (err < 0)
 			goto e_err;
 	}
-	rth = rt_dst_alloc(IN_DEV_CONF_GET(in_dev, NOPOLICY), false);
+	rth = rt_dst_alloc(init_net.loopback_dev,
+			   IN_DEV_CONF_GET(in_dev, NOPOLICY), false);
 	if (!rth)
 		goto e_nobufs;
 
@@ -1893,8 +1890,6 @@ static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 #endif
 	rth->rt_route_iif = dev->ifindex;
 	rth->rt_iif	= dev->ifindex;
-	rth->dst.dev	= init_net.loopback_dev;
-	dev_hold(rth->dst.dev);
 	rth->rt_oif	= 0;
 	rth->rt_gateway	= daddr;
 	rth->rt_spec_dst= spec_dst;
@@ -2013,7 +2008,8 @@ static int __mkroute_input(struct sk_buff *skb,
 		}
 	}
 
-	rth = rt_dst_alloc(IN_DEV_CONF_GET(in_dev, NOPOLICY),
+	rth = rt_dst_alloc(out_dev->dev,
+			   IN_DEV_CONF_GET(in_dev, NOPOLICY),
 			   IN_DEV_CONF_GET(out_dev, NOXFRM));
 	if (!rth) {
 		err = -ENOBUFS;
@@ -2029,8 +2025,6 @@ static int __mkroute_input(struct sk_buff *skb,
 	rth->rt_gateway	= daddr;
 	rth->rt_route_iif = in_dev->dev->ifindex;
 	rth->rt_iif 	= in_dev->dev->ifindex;
-	rth->dst.dev	= (out_dev)->dev;
-	dev_hold(rth->dst.dev);
 	rth->rt_oif 	= 0;
 	rth->rt_spec_dst= spec_dst;
 
@@ -2188,7 +2182,8 @@ brd_input:
 	RT_CACHE_STAT_INC(in_brd);
 
 local_input:
-	rth = rt_dst_alloc(IN_DEV_CONF_GET(in_dev, NOPOLICY), false);
+	rth = rt_dst_alloc(net->loopback_dev,
+			   IN_DEV_CONF_GET(in_dev, NOPOLICY), false);
 	if (!rth)
 		goto e_nobufs;
 
@@ -2206,8 +2201,6 @@ local_input:
 #endif
 	rth->rt_route_iif = dev->ifindex;
 	rth->rt_iif	= dev->ifindex;
-	rth->dst.dev	= net->loopback_dev;
-	dev_hold(rth->dst.dev);
 	rth->rt_gateway	= daddr;
 	rth->rt_spec_dst= spec_dst;
 	rth->dst.input= ip_local_deliver;
@@ -2392,7 +2385,8 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
 			fi = NULL;
 	}
 
-	rth = rt_dst_alloc(IN_DEV_CONF_GET(in_dev, NOPOLICY),
+	rth = rt_dst_alloc(dev_out,
+			   IN_DEV_CONF_GET(in_dev, NOPOLICY),
 			   IN_DEV_CONF_GET(in_dev, NOXFRM));
 	if (!rth)
 		return ERR_PTR(-ENOBUFS);
@@ -2406,10 +2400,6 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
 	rth->rt_src	= fl4->saddr;
 	rth->rt_route_iif = 0;
 	rth->rt_iif	= oldflp4->flowi4_oif ? : dev_out->ifindex;
-	/* get references to the devices that are to be hold by the routing
-	   cache entry */
-	rth->dst.dev	= dev_out;
-	dev_hold(dev_out);
 	rth->rt_gateway = fl4->daddr;
 	rth->rt_spec_dst= fl4->saddr;
 
@@ -2711,7 +2701,7 @@ static struct dst_ops ipv4_dst_blackhole_ops = {
 
 struct dst_entry *ipv4_blackhole_route(struct net *net, struct dst_entry *dst_orig)
 {
-	struct rtable *rt = dst_alloc(&ipv4_dst_blackhole_ops, 1);
+	struct rtable *rt = dst_alloc(&ipv4_dst_blackhole_ops, NULL, 1, 0, 0);
 	struct rtable *ort = (struct rtable *) dst_orig;
 
 	if (rt) {
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 19a77d0..e8b2bb9 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -227,9 +227,10 @@ static struct rt6_info ip6_blk_hole_entry_template = {
 #endif
 
 /* allocate dst with ip6_dst_ops */
-static inline struct rt6_info *ip6_dst_alloc(struct dst_ops *ops)
+static inline struct rt6_info *ip6_dst_alloc(struct dst_ops *ops,
+					     struct net_device *dev)
 {
-	return (struct rt6_info *)dst_alloc(ops, 0);
+	return (struct rt6_info *)dst_alloc(ops, dev, 0, 0, 0);
 }
 
 static void ip6_dst_destroy(struct dst_entry *dst)
@@ -881,10 +882,10 @@ EXPORT_SYMBOL(ip6_route_output);
 
 struct dst_entry *ip6_blackhole_route(struct net *net, struct dst_entry *dst_orig)
 {
-	struct rt6_info *rt = dst_alloc(&ip6_dst_blackhole_ops, 1);
-	struct rt6_info *ort = (struct rt6_info *) dst_orig;
+	struct rt6_info *rt, *ort = (struct rt6_info *) dst_orig;
 	struct dst_entry *new = NULL;
 
+	rt = dst_alloc(&ip6_dst_blackhole_ops, ort->dst.dev, 1, 0, 0);
 	if (rt) {
 		new = &rt->dst;
 
@@ -893,9 +894,6 @@ struct dst_entry *ip6_blackhole_route(struct net *net, struct dst_entry *dst_ori
 		new->output = dst_discard;
 
 		dst_copy_metrics(new, &ort->dst);
-		new->dev = ort->dst.dev;
-		if (new->dev)
-			dev_hold(new->dev);
 		rt->rt6i_idev = ort->rt6i_idev;
 		if (rt->rt6i_idev)
 			in6_dev_hold(rt->rt6i_idev);
@@ -1038,13 +1036,12 @@ struct dst_entry *icmp6_dst_alloc(struct net_device *dev,
 	if (unlikely(idev == NULL))
 		return NULL;
 
-	rt = ip6_dst_alloc(&net->ipv6.ip6_dst_ops);
+	rt = ip6_dst_alloc(&net->ipv6.ip6_dst_ops, dev);
 	if (unlikely(rt == NULL)) {
 		in6_dev_put(idev);
 		goto out;
 	}
 
-	dev_hold(dev);
 	if (neigh)
 		neigh_hold(neigh);
 	else {
@@ -1053,7 +1050,6 @@ struct dst_entry *icmp6_dst_alloc(struct net_device *dev,
 			neigh = NULL;
 	}
 
-	rt->rt6i_dev	  = dev;
 	rt->rt6i_idev     = idev;
 	rt->rt6i_nexthop  = neigh;
 	atomic_set(&rt->dst.__refcnt, 1);
@@ -1212,7 +1208,7 @@ int ip6_route_add(struct fib6_config *cfg)
 		goto out;
 	}
 
-	rt = ip6_dst_alloc(&net->ipv6.ip6_dst_ops);
+	rt = ip6_dst_alloc(&net->ipv6.ip6_dst_ops, NULL);
 
 	if (rt == NULL) {
 		err = -ENOMEM;
@@ -1731,7 +1727,8 @@ void rt6_pmtu_discovery(const struct in6_addr *daddr, const struct in6_addr *sad
 static struct rt6_info * ip6_rt_copy(struct rt6_info *ort)
 {
 	struct net *net = dev_net(ort->rt6i_dev);
-	struct rt6_info *rt = ip6_dst_alloc(&net->ipv6.ip6_dst_ops);
+	struct rt6_info *rt = ip6_dst_alloc(&net->ipv6.ip6_dst_ops,
+					    ort->dst.dev);
 
 	if (rt) {
 		rt->dst.input = ort->dst.input;
@@ -1739,9 +1736,6 @@ static struct rt6_info * ip6_rt_copy(struct rt6_info *ort)
 
 		dst_copy_metrics(&rt->dst, &ort->dst);
 		rt->dst.error = ort->dst.error;
-		rt->dst.dev = ort->dst.dev;
-		if (rt->dst.dev)
-			dev_hold(rt->dst.dev);
 		rt->rt6i_idev = ort->rt6i_idev;
 		if (rt->rt6i_idev)
 			in6_dev_hold(rt->rt6i_idev);
@@ -2011,7 +2005,8 @@ struct rt6_info *addrconf_dst_alloc(struct inet6_dev *idev,
 				    int anycast)
 {
 	struct net *net = dev_net(idev->dev);
-	struct rt6_info *rt = ip6_dst_alloc(&net->ipv6.ip6_dst_ops);
+	struct rt6_info *rt = ip6_dst_alloc(&net->ipv6.ip6_dst_ops,
+					    net->loopback_dev);
 	struct neighbour *neigh;
 
 	if (rt == NULL) {
@@ -2021,13 +2016,11 @@ struct rt6_info *addrconf_dst_alloc(struct inet6_dev *idev,
 		return ERR_PTR(-ENOMEM);
 	}
 
-	dev_hold(net->loopback_dev);
 	in6_dev_hold(idev);
 
 	rt->dst.flags = DST_HOST;
 	rt->dst.input = ip6_input;
 	rt->dst.output = ip6_output;
-	rt->rt6i_dev = net->loopback_dev;
 	rt->rt6i_idev = idev;
 	rt->dst.obsolete = -1;
 
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 15792d8..70552c4 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1348,7 +1348,7 @@ static inline struct xfrm_dst *xfrm_alloc_dst(struct net *net, int family)
 	default:
 		BUG();
 	}
-	xdst = dst_alloc(dst_ops, 0);
+	xdst = dst_alloc(dst_ops, NULL, 0, 0, 0);
 	xfrm_policy_put_afinfo(afinfo);
 
 	if (likely(xdst))
-- 
1.7.4.5


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox