Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] IPv4: Enable use of 240/4 address space
From: David Miller @ 2008-01-21 11:19 UTC (permalink / raw)
  To: yoshfuji; +Cc: jengelh, netdev, linux-kernel, ak, vaf
In-Reply-To: <20080120.003019.119068925.yoshfuji@linux-ipv6.org>

From: YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org>
Date: Sun, 20 Jan 2008 00:30:19 +0900 (JST)

> In article <Pine.LNX.4.64.0801191443410.27831@fbirervta.pbzchgretzou.qr> (at Sat, 19 Jan 2008 14:44:13 +0100 (CET)), Jan Engelhardt <jengelh@computergmbh.de> says:
> 
> > From 84bccef295aa9754ee662191e32ba1d64edce2ba Mon Sep 17 00:00:00 2001
> > From: Jan Engelhardt <jengelh@computergmbh.de>
> > Date: Fri, 18 Jan 2008 02:10:44 +0100
> > Subject: [PATCH] IPv4: enable use of 240/4 address space
> > 
> > This short patch modifies the IPv4 networking to enable use of the
> > 240.0.0.0/4 (aka "class-E") address space as propsed in the internet
> > draft draft-fuller-240space-00.txt.
> > 
> > Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

I've applied this to net-2.6.25, thanks everyone.

I know I said we should deploy this as fast as possible,
but we are really coming down the wire as far as releasing
2.6.24 is concerned and I don't want to put anything into
my pushes to Linus that he might not like and thus cause
the entire set of bug fixes to be rejected.

Thanks again.

^ permalink raw reply

* Re: [PATCH 2/3][NET] gen_estimator: list_empty() check in est_timer() fixed
From: Jarek Poplawski @ 2008-01-21 11:28 UTC (permalink / raw)
  To: David Miller; +Cc: shemminger, slavon, kaber, hadi, netdev
In-Reply-To: <20080121.031553.224463409.davem@davemloft.net>

On Mon, Jan 21, 2008 at 03:15:53AM -0800, David Miller wrote:
...
> Life is difficult sometimes, but that is no excuse to further
> the pain :-)
 
YES! I've read somewhere about it too!

Jarek P.

^ permalink raw reply

* Re: : Emit event stream compat iw_point objects correctly.
From: Masakazu Mokuno @ 2008-01-21 11:23 UTC (permalink / raw)
  To: David Miller
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20080110.011602.74511551.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

	Hi

Sorry for my intermittent posts.

On Thu, 10 Jan 2008 01:16:02 -0800 (PST)
David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> wrote:

> From: Masakazu Mokuno <mokuno-DfbDroY8Xu1L9jVzuh4AOg@public.gmane.org>
> Date: Thu, 27 Dec 2007 18:24:40 +0900
> 
> > On ppc64 (PS3), IW_EV_LCP_LEN is 8, not 4.
> > 
> > include/linux/wireless.h:
> > 
> > #define IW_EV_LCP_LEN   (sizeof(struct iw_event) - sizeof(union iwreq_data))
> > 
> > where sizeof(struct iw_event) == 24, sizeof(union iwreq_data) == 16 on
> > PS3.
> 
> Here is a new version of the last patch (#12), it should handle
> all of these cases properly now.
> 
> Let me know if you spot any more errors.
> 
> Thanks!
> 
> [WEXT]: Emit event stream entries correctly when compat.
> 
> Three major portions to this change:
> 
> 1) Add IW_EV_COMPAT_LCP_LEN, IW_EV_COMPAT_POINT_OFF,
>    and IW_EV_COMPAT_POINT_LEN helper defines.
> 
> 2) Delete iw_stream_check_add_*(), they are unused.
> 
> 3) Add iw_request_info argument to iwe_stream_add_*(), and use it to
>    size the event and pointer lengths correctly depending upon whether
>    IW_REQUEST_FLAG_COMPAT is set or not.
> 
> 4) The mechanical transformations to the drivers and wireless stack
>    bits to get the iw_request_info passed down into the routines
>    modified in #3.
> 
> With help from Masakazu Mokuno
> 
> Signed-off-by: David S. Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
> ---
>  drivers/net/wireless/airo.c                |   39 +++++---
>  drivers/net/wireless/atmel.c               |   24 ++++-
>  drivers/net/wireless/hostap/hostap.h       |    3 +-
>  drivers/net/wireless/hostap/hostap_ap.c    |   32 +++---
>  drivers/net/wireless/hostap/hostap_ioctl.c |   54 ++++++-----
>  drivers/net/wireless/libertas/scan.c       |   35 ++++---
>  drivers/net/wireless/orinoco.c             |   30 ++++--
>  drivers/net/wireless/prism54/isl_ioctl.c   |   45 +++++----
>  drivers/net/wireless/wl3501_cs.c           |   10 +-
>  drivers/net/wireless/zd1201.c              |   21 +++--
>  include/linux/wireless.h                   |   16 +++
>  include/net/iw_handler.h                   |  150 ++++++++--------------------
>  net/ieee80211/ieee80211_wx.c               |   44 +++++----
>  net/mac80211/ieee80211_i.h                 |    5 +-
>  net/mac80211/ieee80211_ioctl.c             |    2 +-
>  net/mac80211/ieee80211_sta.c               |   59 ++++++-----
>  16 files changed, 293 insertions(+), 276 deletions(-)

<snip>

> diff --git a/drivers/net/wireless/prism54/isl_ioctl.c b/drivers/net/wireless/prism54/isl_ioctl.c
> index 6d80ca4..4dc0b5e 100644
> --- a/drivers/net/wireless/prism54/isl_ioctl.c
> +++ b/drivers/net/wireless/prism54/isl_ioctl.c
> @@ -572,8 +572,9 @@ prism54_set_scan(struct net_device *dev, struct iw_request_info *info,
>   */
>  
>  static char *
> -prism54_translate_bss(struct net_device *ndev, char *current_ev,
> -		      char *end_buf, struct obj_bss *bss, char noise)
> +prism54_translate_bss(struct net_device *ndev, struct iw_request_info *info,
> +		      char *current_ev, char *end_buf, struct obj_bss *bss,
> +		      char noise)
>  {
>  	struct iw_event iwe;	/* Temporary buffer */
>  	short cap;

<snip>

> @@ -2728,9 +2730,12 @@ prism2_ioctl_scan_req(struct net_device *ndev,
>  	rvalue |= mgt_get_request(priv, DOT11_OID_BSSLIST, 0, NULL, &r);
>  	bsslist = r.ptr;
>  
> +	info.cmd = PRISM54_HOSTAPD;
> +	info.flags = 0;
> +
>  	/* ok now, scan the list and translate its info */
>  	for (i = 0; i < min(IW_MAX_AP, (int) bsslist->nr); i++)
> -		current_ev = prism54_translate_bss(ndev, current_ev,
> +		current_ev = prism54_translate_bss(ndev, current_ev, &info,

The order of the arguments is wrong.

current_ev = prism54_translate_bss(ndev, &info, current_ev,

>  						   extra + IW_SCAN_MAX_DATA,
>  						   &(bsslist->bsslist[i]),
>  						   noise);

-- 
Masakazu MOKUNO

^ permalink raw reply

* Re: [PATCH] ICMP: ICMP_MIB_OUTMSGS increment duplicated
From: David Miller @ 2008-01-21 11:25 UTC (permalink / raw)
  To: wangchen; +Cc: dlstevens, netdev, herbert
In-Reply-To: <47947EBB.2020507@cn.fujitsu.com>

From: Wang Chen <wangchen@cn.fujitsu.com>
Date: Mon, 21 Jan 2008 19:15:07 +0800

> Dave, how about this one.
> It's like that one of IPV6.

I am confused, this changelog for this patch mentions changes made
only in the net-2.6.25 tree.

Yet you just told me the two other patches should be applied to
net-2.6

What is the exact story here?

Thank you.

^ permalink raw reply

* Re: [PATCH] ICMP: ICMP_MIB_OUTMSGS increment duplicated
From: Wang Chen @ 2008-01-21 11:34 UTC (permalink / raw)
  To: David Miller; +Cc: dlstevens, netdev, herbert
In-Reply-To: <20080121.032554.96053463.davem@davemloft.net>

David Miller said the following on 2008-1-21 19:25:
> I am confused, this changelog for this patch mentions changes made
> only in the net-2.6.25 tree.
> 

Although I wrote that I find David.S's patch in net-2.6.25 was wrong,
the guilty patch also in net-2.6 tree. Here is the web link.
http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commit;h=96793b482540f3a26e2188eaf75cb56b7829d3e3 

So, I think apply my patches to net-2.6 is ok.

^ permalink raw reply

* Re: : Emit event stream compat iw_point objects correctly.
From: David Miller @ 2008-01-21 11:37 UTC (permalink / raw)
  To: mokuno-DfbDroY8Xu1L9jVzuh4AOg
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20080121194942.613C.40F06B3A-DfbDroY8Xu1L9jVzuh4AOg@public.gmane.org>

From: Masakazu Mokuno <mokuno-DfbDroY8Xu1L9jVzuh4AOg@public.gmane.org>
Date: Mon, 21 Jan 2008 20:23:15 +0900

> Sorry for my intermittent posts.

No problem.

I am sorry for being to busy to get back to active work
on this patch set.

> > -prism54_translate_bss(struct net_device *ndev, char *current_ev,
> > -		      char *end_buf, struct obj_bss *bss, char noise)
> > +prism54_translate_bss(struct net_device *ndev, struct iw_request_info *info,
> > +		      char *current_ev, char *end_buf, struct obj_bss *bss,
> > +		      char noise)
> >  {
> >  	struct iw_event iwe;	/* Temporary buffer */
> >  	short cap;
> 
> <snip>
> 
> > @@ -2728,9 +2730,12 @@ prism2_ioctl_scan_req(struct net_device *ndev,
> >  	rvalue |= mgt_get_request(priv, DOT11_OID_BSSLIST, 0, NULL, &r);
> >  	bsslist = r.ptr;
> >  
> > +	info.cmd = PRISM54_HOSTAPD;
> > +	info.flags = 0;
> > +
> >  	/* ok now, scan the list and translate its info */
> >  	for (i = 0; i < min(IW_MAX_AP, (int) bsslist->nr); i++)
> > -		current_ev = prism54_translate_bss(ndev, current_ev,
> > +		current_ev = prism54_translate_bss(ndev, current_ev, &info,
> 
> The order of the arguments is wrong.
> 
> current_ev = prism54_translate_bss(ndev, &info, current_ev,

Indeed, I will fix this up in a future version.

I will also investigate why this escaped my build testing.
It is merely a PCI driver, so it should have been included
in the "make allmodconfig" test builds I do on sparc64.

Thank you.

^ permalink raw reply

* Re: [PATCH] ICMP: ICMP_MIB_OUTMSGS increment duplicated
From: David Miller @ 2008-01-21 11:40 UTC (permalink / raw)
  To: wangchen; +Cc: dlstevens, netdev, herbert
In-Reply-To: <47948360.9090709@cn.fujitsu.com>

From: Wang Chen <wangchen@cn.fujitsu.com>
Date: Mon, 21 Jan 2008 19:34:56 +0800

> David Miller said the following on 2008-1-21 19:25:
> > I am confused, this changelog for this patch mentions changes made
> > only in the net-2.6.25 tree.
> > 
> 
> Although I wrote that I find David.S's patch in net-2.6.25 was wrong,
> the guilty patch also in net-2.6 tree. Here is the web link.
> http://git.kernel.org/?p=linux/kernel/git/davem/net-2.6.git;a=commit;h=96793b482540f3a26e2188eaf75cb56b7829d3e3 
> 
> So, I think apply my patches to net-2.6 is ok.

Thank you for explaining this.

Patch applied to net-2.6

^ permalink raw reply

* RE: [PATCH][v2] phylib: add module owner to the mii_bus structure
From: Nicu Ioan Petru @ 2008-01-21 12:13 UTC (permalink / raw)
  To: Fleming Andy; +Cc: netdev, shemminger
In-Reply-To: <CFE6E252-4CB9-4F91-ABE6-330798408E50@freescale.com>



> -----Original Message-----
> From: Fleming Andy 
> Sent: Monday, January 14, 2008 8:56 PM
> To: Nicu Ioan Petru
> Cc: netdev@vger.kernel.org; shemminger@linux-foundation.org
> Subject: Re: [PATCH][v2] phylib: add module owner to the 
> mii_bus structure
>
> Any reason you didn't update the other drivers?
> 
>  > git grep mdiobus_register drivers/net/          // duplicates and  
> mdio_bus.c edited out
> drivers/net/au1000_eth.c:       mdiobus_register(&aup->mii_bus);
> drivers/net/bfin_mac.c: mdiobus_register(&lp->mii_bus);
> drivers/net/cpmac.c:    res = mdiobus_register(&cpmac_mii);
> drivers/net/fec_mpc52xx_phy.c:  err = mdiobus_register(bus);
> drivers/net/fs_enet/mii-bitbang.c:      ret = 
> mdiobus_register(new_bus);
> drivers/net/fs_enet/mii-fec.c:  ret = mdiobus_register(new_bus);
> drivers/net/gianfar_mii.c:      err = mdiobus_register(new_bus);
> drivers/net/macb.c:     if (mdiobus_register(&bp->mii_bus))
> drivers/net/sb1250-mac.c:       err = mdiobus_register(&sc->mii_bus);
> drivers/net/ucc_geth_mii.c:     err = mdiobus_register(new_bus);
> 
> I'm guessing this was only tested on the UEC, because unless 
> I misunderstand the code, any other driver would now crash 
> when you try to get the owner.
> 

That's not true. If you look closely at the implementation of
try_module_get(), you'll see it returns 1 when module is NULL. So my
change should make no difference for the other drivers.

If that's really a concern for you, I can update the other drivers as
well. How do you propose to do this? Resubmit this patch or send a new
one?

Ionut.
 

^ permalink raw reply

* Re: [Bugme-new] [Bug 9778] New: unregister_netdevice: waiting for [device] to become free
From: Evgeniy Polyakov @ 2008-01-21 12:14 UTC (permalink / raw)
  To: David Miller; +Cc: akpm, xemul, bugme-daemon, nigel, netdev
In-Reply-To: <20080120.023027.85710827.davem@davemloft.net>

On Sun, Jan 20, 2008 at 02:30:27AM -0800, David Miller (davem@davemloft.net) wrote:
> From: Andrew Morton <akpm@linux-foundation.org>
> Date: Sat, 19 Jan 2008 16:58:02 -0800
> 
> > ouch.
> 
> Yep, several people are hitting this it seems.
> 
> If Pavel doesn't provide a fix or direction soon I'll just revert.

It looks like patch is still valid.
Here is a problem description as I undestood.

When new device (let's talk about ethernet, since that is what I tested)
is being turned on, it gets neigh_parms entry allocated for it via
inetdev_init(), which is called for NETDEV_REGISTER inetdev event.
This entry is stored in arp_tbl table and is in_dev->arp_parms.

When later new arp entry is created, device is provided into
arp_constructor(), which clones (increase reference counter) device's
in_dev->arp_parms and puts it into provided neighbour entry.

When later we remove device, its in_dev->arp_parms's reference counter
is high enough (it is equal to number of arp entries found on given
device plu one), so neigh_parms_destroy() is not called. Later all
neighbour entries are flushed by garbage collector and reference counter
for that parm hits zero and device can be removed.

I will think about how to fix the problem nicely or if this patch still
can be simplified/dropped, but so far it looks valid. Maybe this
analysis will help someone to fix problem first.

Here is debug dmesg:
[   21.835595] inetdev_init: allocating parms.
[   21.839829] neigh_parms_alloc: parms: ffff81003d8e8df0, dev: eth0, refcnt: 1, dev_refcnt: 2.
...
[   30.251576] r8169: eth0: link up
[   31.067079] NET: Registered protocol family 10
[   31.072055] neigh_parms_alloc: parms: ffff81003efc72a8, dev: lo, refcnt: 1, dev_refcnt: 9.
[   31.080891] neigh_alloc: parms: ffffffff8812afe8, dev: <NULL>, refcnt: 2.
[   31.087816] neigh_parms_alloc: parms: ffff81003efc7210, dev: eth0, refcnt: 1, dev_refcnt: 9.
[   31.097335] neigh_alloc: parms: ffffffff804deb88, dev: <NULL>, refcnt: 2.
[   31.104172] arp_constructor: parms: ffff81003f8c3be8, dev: lo, refcnt: 2.
[   31.500348] neigh_alloc: parms: ffffffff8812afe8, dev: <NULL>, refcnt: 2.
[   32.499628] neigh_alloc: parms: ffffffff8812afe8, dev: <NULL>, refcnt: 2.
[  102.827796] neigh_destroy: parms: ffff81003efc7210, dev: eth0, refcnt: 3, dev_refcnt: 13.
[  106.828843] neigh_destroy: parms: ffff81003f8c3be8, dev: lo, refcnt: 2, dev_refcnt: 78.
[  109.810987] neigh_alloc: parms: ffffffff804deb88, dev: <NULL>, refcnt: 2.

First arp entry for eth0 device, bump the counter:
[  109.817827] arp_constructor: parms: ffff81003d8e8df0, dev: eth0, refcnt: 2.

[  109.831811] neigh_alloc: parms: ffffffff804deb88, dev: <NULL>, refcnt: 2.
[  109.838661] arp_constructor: parms: ffff81003f8c3be8, dev: lo, refcnt: 2.
[  110.837894] neigh_destroy: parms: ffff81003efc7210, dev: eth0, refcnt: 2, dev_refcnt: 15.

Can not release that neigh parm:
[  113.638228] neigh_parms_release: parms: ffff81003d8e8df0, dev: eth0, refcnt: 2, dev_refcnt: 5.

Can release some other (for ipv6):
[  113.649380] neigh_parms_release: parms: ffff81003efc7210, dev: eth0, refcnt: 1, dev_refcnt: 5.
[  113.671806] neigh_parms_destroy: parms: ffff81003efc7210, dev: eth0, dev_refcnt: 3.

[  123.916250] unregister_netdevice: waiting for eth0 to become free. Usage count = 1

GC hits us:
[  124.839572] neigh_destroy: parms: ffff81003d8e8df0, dev: eth0, refcnt: 1, dev_refcnt: 11.
[  124.847813] neigh_parms_destroy: parms: ffff81003d8e8df0, dev: eth0, dev_refcnt: 1.
[  124.952026] ACPI: PCI interrupt for device 0000:02:0d.0 disabled

-- 
	Evgeniy Polyakov

^ permalink raw reply

* Re: [Bugme-new] [Bug 9778] New: unregister_netdevice: waiting for [device] to become free
From: David Miller @ 2008-01-21 12:36 UTC (permalink / raw)
  To: johnpol; +Cc: akpm, xemul, bugme-daemon, nigel, netdev
In-Reply-To: <20080121121445.GA29459@2ka.mipt.ru>

From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Date: Mon, 21 Jan 2008 15:14:45 +0300

> I will think about how to fix the problem nicely or if this patch still
> can be simplified/dropped, but so far it looks valid. Maybe this
> analysis will help someone to fix problem first.

I have currently reverted Pavel's patch.

I have no doubt that his patch was in some aspects
correct, the regressions were worse than the disease
he initially intended to cure.

We can add the race fix back once we get to the
bottom of the reference count issue.

^ permalink raw reply

* Re: [PATCH 3/4] bonding: Fix work rearming
From: Jarek Poplawski @ 2008-01-21 13:33 UTC (permalink / raw)
  To: Makito SHIOKAWA; +Cc: netdev
In-Reply-To: <479419B6.9000000@miraclelinux.com>

On Mon, Jan 21, 2008 at 01:04:06PM +0900, Makito SHIOKAWA wrote:
> > (But new_value = 0 seems needed - just like from module_param()?)
> Do you mean to initialize new_value before sscanf()? (There is a check 'if (sscanf(buf, "%d", &new_value) != 1)', so is it necesarry?)

No: you mentioned about treating new_value == 0 like new_value < 0
with 'if (new_value <= 0)', and I didn't understand this idea...

>
> > - maybe to test if the value has changed at all,
> Tested.
>
> For now, patch will be like below. Any slight comment for this will be helpful, regards.

I think cancelling of delayed works in bond_sysfs is OK now.

Alas I don't understand the reason of this change in bond_main()...
Some comment?

Thanks,
Jarek P.

>
>
> Signed-off-by: Makito SHIOKAWA <mshiokawa@miraclelinux.com>
> ---
>  drivers/net/bonding/bond_main.c  |   11 ++++-------
>  drivers/net/bonding/bond_sysfs.c |   19 +++++++++++++++++--
>  2 files changed, 21 insertions(+), 9 deletions(-)
>
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -2699,7 +2699,7 @@ void bond_loadbalance_arp_mon(struct wor
>
>  	read_lock(&bond->lock);
>
> -	delta_in_ticks = (bond->params.arp_interval * HZ) / 1000;
> +	delta_in_ticks = ((bond->params.arp_interval * HZ) / 1000) ? : 1;
>
>  	if (bond->kill_timers) {
>  		goto out;
> @@ -2801,8 +2801,7 @@ void bond_loadbalance_arp_mon(struct wor
>  	}
>
>  re_arm:
> -	if (bond->params.arp_interval)
> -		queue_delayed_work(bond->wq, &bond->lb_arp_work, delta_in_ticks);
> +	queue_delayed_work(bond->wq, &bond->lb_arp_work, delta_in_ticks);
>  out:
>  	read_unlock(&bond->lock);
>  }
> @@ -2832,7 +2831,7 @@ void bond_activebackup_arp_mon(struct wo
>
>  	read_lock(&bond->lock);
>
> -	delta_in_ticks = (bond->params.arp_interval * HZ) / 1000;
> +	delta_in_ticks = ((bond->params.arp_interval * HZ) / 1000) ? : 1;
>
>  	if (bond->kill_timers) {
>  		goto out;
> @@ -3058,9 +3057,7 @@ void bond_activebackup_arp_mon(struct wo
>  	}
>
>  re_arm:
> -	if (bond->params.arp_interval) {
> -		queue_delayed_work(bond->wq, &bond->ab_arp_work, delta_in_ticks);
> -	}
> +	queue_delayed_work(bond->wq, &bond->ab_arp_work, delta_in_ticks);
>  out:
>  	read_unlock(&bond->lock);
>  }
> --- a/drivers/net/bonding/bond_sysfs.c
> +++ b/drivers/net/bonding/bond_sysfs.c
> @@ -644,6 +644,15 @@ static ssize_t bonding_store_arp_interva
>  	       ": %s: Setting ARP monitoring interval to %d.\n",
>  	       bond->dev->name, new_value);
>  	bond->params.arp_interval = new_value;
> +	if (bond->params.arp_interval == 0 && (bond->dev->flags & IFF_UP)) {
> +		printk(KERN_INFO DRV_NAME
> +		       ": %s: Disabling ARP monitoring.\n",
> +		       bond->dev->name);
> +		if (bond->params.mode == BOND_MODE_ACTIVEBACKUP)
> +			cancel_delayed_work_sync(&bond->ab_arp_work);
> +		else
> +			cancel_delayed_work_sync(&bond->lb_arp_work);
> +	}
>  	if (bond->params.miimon) {
>  		printk(KERN_INFO DRV_NAME
>  		       ": %s: ARP monitoring cannot be used with MII monitoring. "
> @@ -658,7 +667,7 @@ static ssize_t bonding_store_arp_interva
>  		       "but no ARP targets have been specified.\n",
>  		       bond->dev->name);
>  	}
> -	if (bond->dev->flags & IFF_UP) {
> +	if (bond->params.arp_interval && (bond->dev->flags & IFF_UP)) {
>  		/* If the interface is up, we may need to fire off
>  		 * the ARP timer.  If the interface is down, the
>  		 * timer will get fired off when the open function
> @@ -997,6 +1006,12 @@ static ssize_t bonding_store_miimon(stru
>  		       ": %s: Setting MII monitoring interval to %d.\n",
>  		       bond->dev->name, new_value);
>  		bond->params.miimon = new_value;
> +		if (bond->params.miimon == 0 && (bond->dev->flags & IFF_UP)) {
> +			printk(KERN_INFO DRV_NAME
> +			       ": %s: Disabling MII monitoring...\n",
> +			       bond->dev->name);
> +			cancel_delayed_work_sync(&bond->mii_work);
> +		}
>  		if(bond->params.updelay)
>  			printk(KERN_INFO DRV_NAME
>  			      ": %s: Note: Updating updelay (to %d) "
> @@ -1026,7 +1041,7 @@ static ssize_t bonding_store_miimon(stru
>  				cancel_delayed_work_sync(&bond->lb_arp_work);
>  		}
>
> -		if (bond->dev->flags & IFF_UP) {
> +		if (bond->params.miimon && (bond->dev->flags & IFF_UP)) {
>  			/* If the interface is up, we may need to fire off
>  			 * the MII timer. If the interface is down, the
>  			 * timer will get fired off when the open function
>
>
> -- 
> Makito SHIOKAWA
> MIRACLE LINUX CORPORATION

^ permalink raw reply

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
From: Robert Olsson @ 2008-01-21 13:27 UTC (permalink / raw)
  To: David Miller
  Cc: Robert.Olsson, elendil, jesse.brandeburg, slavon, netdev,
	linux-kernel
In-Reply-To: <20080118.041144.81957249.davem@davemloft.net>


David Miller writes:

 > Yes, this semaphore thing is highly problematic.  In the most crucial
 > areas where network driver consistency matters the most for ease of
 > understanding and debugging, the Intel drivers choose to be different
 > :-(
 > 
 > The way the napi_disable() logic breaks out from high packet load in
 > net_rx_action() is it simply returns even leaving interrupts disabled
 > when a pending napi_disable() is pending.
 > 
 > This is what trips up the semaphore logic.
 > 
 > Robert, give this patch a try.


 Yes it works. e1000 tested for ~3 hours with high very high load and 
 interface up/down every 5:th sec. Without the patch the irq's gets 
 disabled within a couple of seconds

 A resolute way of handling the semaphores. :)
   
 Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
 
 Cheers
					--ro


 > In the long term this semaphore should be completely eliminated,
 > there is no justification for it.
 > 
 > Signed-off-by: David S. Miller <davem@davemloft.net>
 > 
 > diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
 > index 0c9a6f7..76c0fa6 100644
 > --- a/drivers/net/e1000/e1000_main.c
 > +++ b/drivers/net/e1000/e1000_main.c
 > @@ -632,6 +632,7 @@ e1000_down(struct e1000_adapter *adapter)
 >  
 >  #ifdef CONFIG_E1000_NAPI
 >  	napi_disable(&adapter->napi);
 > +	atomic_set(&adapter->irq_sem, 0);
 >  #endif
 >  	e1000_irq_disable(adapter);
 >  
 > diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
 > index 2ab3bfb..9cc5a6b 100644
 > --- a/drivers/net/e1000e/netdev.c
 > +++ b/drivers/net/e1000e/netdev.c
 > @@ -2183,6 +2183,7 @@ void e1000e_down(struct e1000_adapter *adapter)
 >  	msleep(10);
 >  
 >  	napi_disable(&adapter->napi);
 > +	atomic_set(&adapter->irq_sem, 0);
 >  	e1000_irq_disable(adapter);
 >  
 >  	del_timer_sync(&adapter->watchdog_timer);
 > diff --git a/drivers/net/ixgb/ixgb_main.c b/drivers/net/ixgb/ixgb_main.c
 > index d2fb88d..4f63839 100644
 > --- a/drivers/net/ixgb/ixgb_main.c
 > +++ b/drivers/net/ixgb/ixgb_main.c
 > @@ -296,6 +296,11 @@ ixgb_down(struct ixgb_adapter *adapter, boolean_t kill_watchdog)
 >  {
 >  	struct net_device *netdev = adapter->netdev;
 >  
 > +#ifdef CONFIG_IXGB_NAPI
 > +	napi_disable(&adapter->napi);
 > +	atomic_set(&adapter->irq_sem, 0);
 > +#endif
 > +
 >  	ixgb_irq_disable(adapter);
 >  	free_irq(adapter->pdev->irq, netdev);
 >  
 > @@ -304,9 +309,7 @@ ixgb_down(struct ixgb_adapter *adapter, boolean_t kill_watchdog)
 >  
 >  	if(kill_watchdog)
 >  		del_timer_sync(&adapter->watchdog_timer);
 > -#ifdef CONFIG_IXGB_NAPI
 > -	napi_disable(&adapter->napi);
 > -#endif
 > +
 >  	adapter->link_speed = 0;
 >  	adapter->link_duplex = 0;
 >  	netif_carrier_off(netdev);
 > diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
 > index de3f45e..a4265bc 100644
 > --- a/drivers/net/ixgbe/ixgbe_main.c
 > +++ b/drivers/net/ixgbe/ixgbe_main.c
 > @@ -1409,9 +1409,11 @@ void ixgbe_down(struct ixgbe_adapter *adapter)
 >  	IXGBE_WRITE_FLUSH(&adapter->hw);
 >  	msleep(10);
 >  
 > +	napi_disable(&adapter->napi);
 > +	atomic_set(&adapter->irq_sem, 0);
 > +
 >  	ixgbe_irq_disable(adapter);
 >  
 > -	napi_disable(&adapter->napi);
 >  	del_timer_sync(&adapter->watchdog_timer);
 >  
 >  	netif_carrier_off(netdev);

^ permalink raw reply

* Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang
From: David Miller @ 2008-01-21 13:29 UTC (permalink / raw)
  To: Robert.Olsson; +Cc: elendil, jesse.brandeburg, slavon, netdev, linux-kernel
In-Reply-To: <18324.40369.984595.651675@robur.slu.se>

From: Robert Olsson <Robert.Olsson@data.slu.se>
Date: Mon, 21 Jan 2008 14:27:13 +0100

>  Yes it works. e1000 tested for ~3 hours with high very high load and 
>  interface up/down every 5:th sec. Without the patch the irq's gets 
>  disabled within a couple of seconds
> 
>  A resolute way of handling the semaphores. :)
>    
>  Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>


Thanks for testing Robert.

I sent off that fix to Linus an hour or so ago, hopefully
he will pick it up some time today.

^ permalink raw reply

* Re: [Bugme-new] [Bug 9778] New: unregister_netdevice: waiting for [device] to become free
From: Evgeniy Polyakov @ 2008-01-21 14:36 UTC (permalink / raw)
  To: David Miller; +Cc: akpm, xemul, netdev, bugme-daemon, nigel
In-Reply-To: <20080121121445.GA29459@2ka.mipt.ru>

On Mon, Jan 21, 2008 at 03:14:45PM +0300, Evgeniy Polyakov (johnpol@2ka.mipt.ru) wrote:
> It looks like patch is still valid.
> Here is a problem description as I undestood.
> 
> When new device (let's talk about ethernet, since that is what I tested)
> is being turned on, it gets neigh_parms entry allocated for it via
> inetdev_init(), which is called for NETDEV_REGISTER inetdev event.
> This entry is stored in arp_tbl table and is in_dev->arp_parms.
> 
> When later new arp entry is created, device is provided into
> arp_constructor(), which clones (increase reference counter) device's
> in_dev->arp_parms and puts it into provided neighbour entry.
> 
> When later we remove device, its in_dev->arp_parms's reference counter
> is high enough (it is equal to number of arp entries found on given
> device plu one), so neigh_parms_destroy() is not called. Later all
> neighbour entries are flushed by garbage collector and reference counter
> for that parm hits zero and device can be removed.
> 
> I will think about how to fix the problem nicely or if this patch still
> can be simplified/dropped, but so far it looks valid. Maybe this
> analysis will help someone to fix problem first.

Yes, patch is valid, and there is a (very noticeble) race between
neighbour processing and parm release - parm still can be accessed after
device was fully freed (as with old behaviour when dev_pu() was called
from neigh_parms_release()), although no one access it, so the simplest
solution is to move dev_put() under the table lock and allow to access
parms->dev only under table lock and always check if it is non-null.
So I propose a following patch as a simplest solution for the current
time.

Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>

diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index a4f2618..410b7e7 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -34,6 +34,11 @@ struct neighbour;
 
 struct neigh_parms
 {
+	/*
+	 * This device is only allowed to be accessed under table lock (bh turned off)
+	 * and while device is alive. After parm was released, it will be set to NULL
+	 * and has to be always checked before accessed.
+	 */
 	struct net_device *dev;
 	struct neigh_parms *next;
 	int	(*neigh_setup)(struct neighbour *);
 
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index cc8a2f1..5076acd 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1315,7 +1315,12 @@ void neigh_parms_release(struct neigh_table *tbl, struct neigh_parms *parms)
 		if (*p == parms) {
 			*p = parms->next;
 			parms->dead = 1;
+			if (parms->dev) {
+				dev_put(parms->dev);
+				parms->dev = NULL;
+			}
 			write_unlock_bh(&tbl->lock);
+
 			call_rcu(&parms->rcu_head, neigh_rcu_free_parms);
 			return;
 		}
@@ -1326,8 +1331,6 @@ void neigh_parms_release(struct neigh_table *tbl, struct neigh_parms *parms)
 
 void neigh_parms_destroy(struct neigh_parms *parms)
 {
-	if (parms->dev)
-		dev_put(parms->dev);
 	kfree(parms);
 }
 

-- 
	Evgeniy Polyakov

^ permalink raw reply related

* Re: [PATCH 2/3][NET] gen_estimator: list_empty() check in est_timer() fixed
From: Jarek Poplawski @ 2008-01-21 14:43 UTC (permalink / raw)
  To: David Miller; +Cc: shemminger, slavon, kaber, hadi, netdev
In-Reply-To: <20080121.031553.224463409.davem@davemloft.net>

On Mon, Jan 21, 2008 at 03:15:53AM -0800, David Miller wrote:
> From: Jarek Poplawski <jarkao2@gmail.com>
> Date: Mon, 21 Jan 2008 12:19:40 +0100
> 
> > On Mon, Jan 21, 2008 at 02:36:32AM -0800, David Miller wrote:
> > ...
> > > FWIW I agree that double-negatives are confusing and we should
> > > avoid them.
> > 
> > Right! No more: CHECKSUM_NONE, SOCK_NOSPACE, IFF_NOARP or KERN_NOTICE!
> 
> Life is difficult sometimes, but that is no excuse to further
> the pain :-)

BTW, maybe somebody else finds this interesting (because you seem to
know this very well), in some languages, like Polish, e.g.: "that is
no excuse" needs double-negative: "to nie jest zadne wytlumaczenie",
so literally: "that not is no excuse"...

Cheers,
Jarek P.

^ permalink raw reply

* [PATCH 0/6 net-2.6.25] Provide correct namespace on IPv4 packet input path.
From: Denis V. Lunev @ 2008-01-21 14:49 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, devel, Linux Containers

This patchset sequentially adds namespace parameter to fib_lookup and
inetdev_by_index. After that it is possible to pass network namespace
from input packet to routing engine.

Output path is much more intrusive and will be sent separately.

Signed-off-by: Denis V. Lunev <den@openvz.org>

^ permalink raw reply

* [PATCH 3/6 net-2.6.25] [NETNS] Pass correct namespace in fib_validate_source.
From: Denis V. Lunev @ 2008-01-21 14:50 UTC (permalink / raw)
  To: davem; +Cc: netdev, devel, containers, Denis V. Lunev
In-Reply-To: <4794B10E.7010703@sw.ru>

Correct network namespace is available inside fib_validate_source. It can be
obtained from the device passed in. The device is not NULL as in_device is
obtained from it just above.

Signed-off-by: Denis V. Lunev <den@openvz.org>
---
 net/ipv4/fib_frontend.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index dcd3a28..39b8b35 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -243,6 +243,7 @@ int fib_validate_source(__be32 src, __be32 dst, u8 tos, int oif,
 	struct fib_result res;
 	int no_addr, rpf;
 	int ret;
+	struct net *net;
 
 	no_addr = rpf = 0;
 	rcu_read_lock();
@@ -256,7 +257,8 @@ int fib_validate_source(__be32 src, __be32 dst, u8 tos, int oif,
 	if (in_dev == NULL)
 		goto e_inval;
 
-	if (fib_lookup(&init_net, &fl, &res))
+	net = dev->nd_net;
+	if (fib_lookup(net, &fl, &res))
 		goto last_resort;
 	if (res.type != RTN_UNICAST)
 		goto e_inval_res;
@@ -280,7 +282,7 @@ int fib_validate_source(__be32 src, __be32 dst, u8 tos, int oif,
 	fl.oif = dev->ifindex;
 
 	ret = 0;
-	if (fib_lookup(&init_net, &fl, &res) == 0) {
+	if (fib_lookup(net, &fl, &res) == 0) {
 		if (res.type == RTN_UNICAST) {
 			*spec_dst = FIB_RES_PREFSRC(res);
 			ret = FIB_RES_NH(res).nh_scope >= RT_SCOPE_HOST;
-- 
1.5.3.rc5


^ permalink raw reply related

* [PATCH 1/6 net-2.6.25] [NETNS] Add netns parameter to fib_lookup.
From: Denis V. Lunev @ 2008-01-21 14:50 UTC (permalink / raw)
  To: davem; +Cc: netdev, devel, containers, Denis V. Lunev
In-Reply-To: <4794B10E.7010703@sw.ru>

Signed-off-by: Denis V. Lunev <den@openvz.org>
---
 include/net/ip_fib.h     |    9 +++++----
 net/ipv4/fib_frontend.c  |    4 ++--
 net/ipv4/fib_rules.c     |    4 ++--
 net/ipv4/fib_semantics.c |    2 +-
 net/ipv4/route.c         |    6 +++---
 5 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 08ebb1e..9daa60b 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -178,15 +178,16 @@ static inline struct fib_table *fib_new_table(struct net *net, u32 id)
 	return fib_get_table(net, id);
 }
 
-static inline int fib_lookup(const struct flowi *flp, struct fib_result *res)
+static inline int fib_lookup(struct net *net, const struct flowi *flp,
+			     struct fib_result *res)
 {
 	struct fib_table *table;
 
-	table = fib_get_table(&init_net, RT_TABLE_LOCAL);
+	table = fib_get_table(net, RT_TABLE_LOCAL);
 	if (!table->tb_lookup(table, flp, res))
 		return 0;
 
-	table = fib_get_table(&init_net, RT_TABLE_MAIN);
+	table = fib_get_table(net, RT_TABLE_MAIN);
 	if (!table->tb_lookup(table, flp, res))
 		return 0;
 	return -ENETUNREACH;
@@ -200,7 +201,7 @@ extern void __net_exit fib4_rules_exit(struct net *net);
 extern u32 fib_rules_tclass(struct fib_result *res);
 #endif
 
-extern int fib_lookup(struct flowi *flp, struct fib_result *res);
+extern int fib_lookup(struct net *n, struct flowi *flp, struct fib_result *res);
 
 extern struct fib_table *fib_new_table(struct net *net, u32 id);
 extern struct fib_table *fib_get_table(struct net *net, u32 id);
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 8c0081c..dcd3a28 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -256,7 +256,7 @@ int fib_validate_source(__be32 src, __be32 dst, u8 tos, int oif,
 	if (in_dev == NULL)
 		goto e_inval;
 
-	if (fib_lookup(&fl, &res))
+	if (fib_lookup(&init_net, &fl, &res))
 		goto last_resort;
 	if (res.type != RTN_UNICAST)
 		goto e_inval_res;
@@ -280,7 +280,7 @@ int fib_validate_source(__be32 src, __be32 dst, u8 tos, int oif,
 	fl.oif = dev->ifindex;
 
 	ret = 0;
-	if (fib_lookup(&fl, &res) == 0) {
+	if (fib_lookup(&init_net, &fl, &res) == 0) {
 		if (res.type == RTN_UNICAST) {
 			*spec_dst = FIB_RES_PREFSRC(res);
 			ret = FIB_RES_NH(res).nh_scope >= RT_SCOPE_HOST;
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index 2b43002..19274d0 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -54,14 +54,14 @@ u32 fib_rules_tclass(struct fib_result *res)
 }
 #endif
 
-int fib_lookup(struct flowi *flp, struct fib_result *res)
+int fib_lookup(struct net *net, struct flowi *flp, struct fib_result *res)
 {
 	struct fib_lookup_arg arg = {
 		.result = res,
 	};
 	int err;
 
-	err = fib_rules_lookup(init_net.ipv4.rules_ops, flp, 0, &arg);
+	err = fib_rules_lookup(net->ipv4.rules_ops, flp, 0, &arg);
 	res->r = arg.rule;
 
 	return err;
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 0e08df4..ecd91c6 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -559,7 +559,7 @@ static int fib_check_nh(struct fib_config *cfg, struct fib_info *fi,
 			/* It is not necessary, but requires a bit of thinking */
 			if (fl.fl4_scope < RT_SCOPE_LINK)
 				fl.fl4_scope = RT_SCOPE_LINK;
-			if ((err = fib_lookup(&fl, &res)) != 0)
+			if ((err = fib_lookup(&init_net, &fl, &res)) != 0)
 				return err;
 		}
 		err = -EINVAL;
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 162e738..c107bc3 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1559,7 +1559,7 @@ void ip_rt_get_source(u8 *addr, struct rtable *rt)
 
 	if (rt->fl.iif == 0)
 		src = rt->rt_src;
-	else if (fib_lookup(&rt->fl, &res) == 0) {
+	else if (fib_lookup(&init_net, &rt->fl, &res) == 0) {
 		src = FIB_RES_PREFSRC(res);
 		fib_res_put(&res);
 	} else
@@ -1911,7 +1911,7 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 	/*
 	 *	Now we are ready to route packet.
 	 */
-	if ((err = fib_lookup(&fl, &res)) != 0) {
+	if ((err = fib_lookup(&init_net, &fl, &res)) != 0) {
 		if (!IN_DEV_FORWARD(in_dev))
 			goto e_hostunreach;
 		goto no_route;
@@ -2363,7 +2363,7 @@ static int ip_route_output_slow(struct rtable **rp, const struct flowi *oldflp)
 		goto make_route;
 	}
 
-	if (fib_lookup(&fl, &res)) {
+	if (fib_lookup(&init_net, &fl, &res)) {
 		res.fi = NULL;
 		if (oldflp->oif) {
 			/* Apparently, routing tables are wrong. Assume,
-- 
1.5.3.rc5


^ permalink raw reply related

* [PATCH 5/6 net-2.6.25] [NETNS] Pass correct namespace in ip_route_input_slow.
From: Denis V. Lunev @ 2008-01-21 14:50 UTC (permalink / raw)
  To: davem; +Cc: netdev, devel, containers, Denis V. Lunev
In-Reply-To: <4794B10E.7010703@sw.ru>

The packet on the input path always has a referrence to an input network
device it is passed from. Extract network namespace from it.

Signed-off-by: Denis V. Lunev <den@openvz.org>
---
 net/ipv4/route.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index c107bc3..b3c6122 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1881,6 +1881,7 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 	__be32		spec_dst;
 	int		err = -EINVAL;
 	int		free_res = 0;
+	struct net    * net = dev->nd_net;
 
 	/* IP on this device is disabled. */
 
@@ -1911,7 +1912,7 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 	/*
 	 *	Now we are ready to route packet.
 	 */
-	if ((err = fib_lookup(&init_net, &fl, &res)) != 0) {
+	if ((err = fib_lookup(net, &fl, &res)) != 0) {
 		if (!IN_DEV_FORWARD(in_dev))
 			goto e_hostunreach;
 		goto no_route;
@@ -1926,7 +1927,7 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 	if (res.type == RTN_LOCAL) {
 		int result;
 		result = fib_validate_source(saddr, daddr, tos,
-					     init_net.loopback_dev->ifindex,
+					     net->loopback_dev->ifindex,
 					     dev, &spec_dst, &itag);
 		if (result < 0)
 			goto martian_source;
@@ -1988,7 +1989,7 @@ local_input:
 #endif
 	rth->rt_iif	=
 	rth->fl.iif	= dev->ifindex;
-	rth->u.dst.dev	= init_net.loopback_dev;
+	rth->u.dst.dev	= net->loopback_dev;
 	dev_hold(rth->u.dst.dev);
 	rth->idev	= in_dev_get(rth->u.dst.dev);
 	rth->rt_gateway	= daddr;
-- 
1.5.3.rc5


^ permalink raw reply related

* [PATCH 4/6 net-2.6.25] [NETNS] Pass correct namespace in context fib_check_nh.
From: Denis V. Lunev @ 2008-01-21 14:50 UTC (permalink / raw)
  To: davem; +Cc: netdev, devel, containers, Denis V. Lunev
In-Reply-To: <4794B10E.7010703@sw.ru>

Correct network namespace is already used in fib_check_nh. Re-work its usage
for better readability and pass into fib_lookup & inetdev_by_index.

Signed-off-by: Denis V. Lunev <den@openvz.org>
---
 net/ipv4/fib_semantics.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 8b47e11..c791286 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -519,7 +519,9 @@ static int fib_check_nh(struct fib_config *cfg, struct fib_info *fi,
 			struct fib_nh *nh)
 {
 	int err;
+	struct net *net;
 
+	net = cfg->fc_nlinfo.nl_net;
 	if (nh->nh_gw) {
 		struct fib_result res;
 
@@ -532,11 +534,9 @@ static int fib_check_nh(struct fib_config *cfg, struct fib_info *fi,
 
 			if (cfg->fc_scope >= RT_SCOPE_LINK)
 				return -EINVAL;
-			if (inet_addr_type(cfg->fc_nlinfo.nl_net,
-					   nh->nh_gw) != RTN_UNICAST)
+			if (inet_addr_type(net, nh->nh_gw) != RTN_UNICAST)
 				return -EINVAL;
-			if ((dev = __dev_get_by_index(cfg->fc_nlinfo.nl_net,
-						      nh->nh_oif)) == NULL)
+			if ((dev = __dev_get_by_index(net, nh->nh_oif)) == NULL)
 				return -ENODEV;
 			if (!(dev->flags&IFF_UP))
 				return -ENETDOWN;
@@ -559,7 +559,7 @@ static int fib_check_nh(struct fib_config *cfg, struct fib_info *fi,
 			/* It is not necessary, but requires a bit of thinking */
 			if (fl.fl4_scope < RT_SCOPE_LINK)
 				fl.fl4_scope = RT_SCOPE_LINK;
-			if ((err = fib_lookup(&init_net, &fl, &res)) != 0)
+			if ((err = fib_lookup(net, &fl, &res)) != 0)
 				return err;
 		}
 		err = -EINVAL;
@@ -583,7 +583,7 @@ out:
 		if (nh->nh_flags&(RTNH_F_PERVASIVE|RTNH_F_ONLINK))
 			return -EINVAL;
 
-		in_dev = inetdev_by_index(&init_net, nh->nh_oif);
+		in_dev = inetdev_by_index(net, nh->nh_oif);
 		if (in_dev == NULL)
 			return -ENODEV;
 		if (!(in_dev->dev->flags&IFF_UP)) {
-- 
1.5.3.rc5


^ permalink raw reply related

* [PATCH 6/6 net-2.6.25] [NETNS] Pass correct namespace in ip_rt_get_source.
From: Denis V. Lunev @ 2008-01-21 14:50 UTC (permalink / raw)
  To: davem; +Cc: netdev, devel, containers, Denis V. Lunev
In-Reply-To: <4794B10E.7010703@sw.ru>

ip_rt_get_source is the infamous place for which dst_ifdown kludges have
been implemented. This means that rt->u.dst.dev can be safely dereferrenced
obtain nd_net.

Signed-off-by: Denis V. Lunev <den@openvz.org>
---
 net/ipv4/route.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index b3c6122..ede0571 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1559,7 +1559,7 @@ void ip_rt_get_source(u8 *addr, struct rtable *rt)
 
 	if (rt->fl.iif == 0)
 		src = rt->rt_src;
-	else if (fib_lookup(&init_net, &rt->fl, &res) == 0) {
+	else if (fib_lookup(rt->u.dst.dev->nd_net, &rt->fl, &res) == 0) {
 		src = FIB_RES_PREFSRC(res);
 		fib_res_put(&res);
 	} else
-- 
1.5.3.rc5


^ permalink raw reply related

* [PATCH 2/6 net-2.6.25] [NETNS] Add netns parameter to inetdev_by_index.
From: Denis V. Lunev @ 2008-01-21 14:50 UTC (permalink / raw)
  To: davem; +Cc: netdev, devel, containers, Denis V. Lunev
In-Reply-To: <4794B10E.7010703@sw.ru>

Signed-off-by: Denis V. Lunev <den@openvz.org>
---
 include/linux/inetdevice.h |    2 +-
 net/ipv4/devinet.c         |    6 +++---
 net/ipv4/fib_semantics.c   |    2 +-
 net/ipv4/igmp.c            |    4 ++--
 net/ipv4/ip_gre.c          |    3 ++-
 5 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h
index 45f3731..e74a2ee 100644
--- a/include/linux/inetdevice.h
+++ b/include/linux/inetdevice.h
@@ -133,7 +133,7 @@ extern struct net_device 	*ip_dev_find(__be32 addr);
 extern int		inet_addr_onlink(struct in_device *in_dev, __be32 a, __be32 b);
 extern int		devinet_ioctl(unsigned int cmd, void __user *);
 extern void		devinet_init(void);
-extern struct in_device	*inetdev_by_index(int);
+extern struct in_device	*inetdev_by_index(struct net *, int);
 extern __be32		inet_select_addr(const struct net_device *dev, __be32 dst, int scope);
 extern __be32		inet_confirm_addr(struct in_device *in_dev, __be32 dst, __be32 local, int scope);
 extern struct in_ifaddr *inet_ifa_byprefix(struct in_device *in_dev, __be32 prefix, __be32 mask);
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index e381edb..21f71bf 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -409,12 +409,12 @@ static int inet_set_ifa(struct net_device *dev, struct in_ifaddr *ifa)
 	return inet_insert_ifa(ifa);
 }
 
-struct in_device *inetdev_by_index(int ifindex)
+struct in_device *inetdev_by_index(struct net *net, int ifindex)
 {
 	struct net_device *dev;
 	struct in_device *in_dev = NULL;
 	read_lock(&dev_base_lock);
-	dev = __dev_get_by_index(&init_net, ifindex);
+	dev = __dev_get_by_index(net, ifindex);
 	if (dev)
 		in_dev = in_dev_get(dev);
 	read_unlock(&dev_base_lock);
@@ -454,7 +454,7 @@ static int inet_rtm_deladdr(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg
 		goto errout;
 
 	ifm = nlmsg_data(nlh);
-	in_dev = inetdev_by_index(ifm->ifa_index);
+	in_dev = inetdev_by_index(net, ifm->ifa_index);
 	if (in_dev == NULL) {
 		err = -ENODEV;
 		goto errout;
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index ecd91c6..8b47e11 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -583,7 +583,7 @@ out:
 		if (nh->nh_flags&(RTNH_F_PERVASIVE|RTNH_F_ONLINK))
 			return -EINVAL;
 
-		in_dev = inetdev_by_index(nh->nh_oif);
+		in_dev = inetdev_by_index(&init_net, nh->nh_oif);
 		if (in_dev == NULL)
 			return -ENODEV;
 		if (!(in_dev->dev->flags&IFF_UP)) {
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 285d262..b4df39a 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -1389,7 +1389,7 @@ static struct in_device * ip_mc_find_dev(struct ip_mreqn *imr)
 	struct in_device *idev = NULL;
 
 	if (imr->imr_ifindex) {
-		idev = inetdev_by_index(imr->imr_ifindex);
+		idev = inetdev_by_index(&init_net, imr->imr_ifindex);
 		if (idev)
 			__in_dev_put(idev);
 		return idev;
@@ -2222,7 +2222,7 @@ void ip_mc_drop_socket(struct sock *sk)
 		struct in_device *in_dev;
 		inet->mc_list = iml->next;
 
-		in_dev = inetdev_by_index(iml->multi.imr_ifindex);
+		in_dev = inetdev_by_index(&init_net, iml->multi.imr_ifindex);
 		(void) ip_mc_leave_src(sk, iml, in_dev);
 		if (in_dev != NULL) {
 			ip_mc_dec_group(in_dev, iml->multi.imr_multiaddr.s_addr);
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 8b81deb..a74983d 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -1193,7 +1193,8 @@ static int ipgre_close(struct net_device *dev)
 {
 	struct ip_tunnel *t = netdev_priv(dev);
 	if (ipv4_is_multicast(t->parms.iph.daddr) && t->mlink) {
-		struct in_device *in_dev = inetdev_by_index(t->mlink);
+		struct in_device *in_dev;
+		in_dev = inetdev_by_index(dev->nd_net, t->mlink);
 		if (in_dev) {
 			ip_mc_dec_group(in_dev, t->parms.iph.daddr);
 			in_dev_put(in_dev);
-- 
1.5.3.rc5


^ permalink raw reply related

* [PATCH 1/5] netns netfilter: change xt_table_register() return value convention
From: Alexey Dobriyan @ 2008-01-21 14:52 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel, netdev, devel

Switch from 0/-E to ptr/PTR_ERR convention.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
---

 include/linux/netfilter/x_tables.h |    6 +++---
 net/ipv4/netfilter/arp_tables.c    |    7 ++++---
 net/ipv4/netfilter/ip_tables.c     |    7 ++++---
 net/ipv6/netfilter/ip6_tables.c    |    7 ++++---
 net/netfilter/x_tables.c           |   15 ++++++++-------
 5 files changed, 23 insertions(+), 19 deletions(-)

--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -335,9 +335,9 @@ extern int xt_check_target(const struct xt_target *target, unsigned short family
 			   unsigned int size, const char *table, unsigned int hook,
 			   unsigned short proto, int inv_proto);
 
-extern int xt_register_table(struct xt_table *table,
-			     struct xt_table_info *bootstrap,
-			     struct xt_table_info *newinfo);
+extern struct xt_table *xt_register_table(struct xt_table *table,
+					  struct xt_table_info *bootstrap,
+					  struct xt_table_info *newinfo);
 extern void *xt_unregister_table(struct xt_table *table);
 
 extern struct xt_table_info *xt_replace_table(struct xt_table *table,
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -1727,6 +1727,7 @@ int arpt_register_table(struct arpt_table *table,
 	struct xt_table_info bootstrap
 		= { 0, 0, 0, { 0 }, { 0 }, { } };
 	void *loc_cpu_entry;
+	struct xt_table *new_table;
 
 	newinfo = xt_alloc_table_info(repl->size);
 	if (!newinfo) {
@@ -1750,10 +1751,10 @@ int arpt_register_table(struct arpt_table *table,
 		return ret;
 	}
 
-	ret = xt_register_table(table, &bootstrap, newinfo);
-	if (ret != 0) {
+	new_table = xt_register_table(table, &bootstrap, newinfo);
+	if (IS_ERR(new_table)) {
 		xt_free_table_info(newinfo);
-		return ret;
+		return PTR_ERR(new_table);
 	}
 
 	return 0;
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -2055,6 +2055,7 @@ int ipt_register_table(struct xt_table *table, const struct ipt_replace *repl)
 	struct xt_table_info bootstrap
 		= { 0, 0, 0, { 0 }, { 0 }, { } };
 	void *loc_cpu_entry;
+	struct xt_table *new_table;
 
 	newinfo = xt_alloc_table_info(repl->size);
 	if (!newinfo)
@@ -2074,10 +2075,10 @@ int ipt_register_table(struct xt_table *table, const struct ipt_replace *repl)
 		return ret;
 	}
 
-	ret = xt_register_table(table, &bootstrap, newinfo);
-	if (ret != 0) {
+	new_table = xt_register_table(table, &bootstrap, newinfo);
+	if (IS_ERR(new_table)) {
 		xt_free_table_info(newinfo);
-		return ret;
+		return PTR_ERR(new_table);
 	}
 
 	return 0;
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -2081,6 +2081,7 @@ int ip6t_register_table(struct xt_table *table, const struct ip6t_replace *repl)
 	struct xt_table_info bootstrap
 		= { 0, 0, 0, { 0 }, { 0 }, { } };
 	void *loc_cpu_entry;
+	struct xt_table *new_table;
 
 	newinfo = xt_alloc_table_info(repl->size);
 	if (!newinfo)
@@ -2100,10 +2101,10 @@ int ip6t_register_table(struct xt_table *table, const struct ip6t_replace *repl)
 		return ret;
 	}
 
-	ret = xt_register_table(table, &bootstrap, newinfo);
-	if (ret != 0) {
+	new_table = xt_register_table(table, &bootstrap, newinfo);
+	if (IS_ERR(new_table)) {
 		xt_free_table_info(newinfo);
-		return ret;
+		return PTR_ERR(new_table);
 	}
 
 	return 0;
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -660,9 +660,9 @@ xt_replace_table(struct xt_table *table,
 }
 EXPORT_SYMBOL_GPL(xt_replace_table);
 
-int xt_register_table(struct xt_table *table,
-		      struct xt_table_info *bootstrap,
-		      struct xt_table_info *newinfo)
+struct xt_table *xt_register_table(struct xt_table *table,
+				   struct xt_table_info *bootstrap,
+				   struct xt_table_info *newinfo)
 {
 	int ret;
 	struct xt_table_info *private;
@@ -670,7 +670,7 @@ int xt_register_table(struct xt_table *table,
 
 	ret = mutex_lock_interruptible(&xt[table->af].mutex);
 	if (ret != 0)
-		return ret;
+		goto out;
 
 	/* Don't autoload: we'd eat our tail... */
 	list_for_each_entry(t, &xt[table->af].tables, list) {
@@ -693,11 +693,13 @@ int xt_register_table(struct xt_table *table,
 	private->initial_entries = private->number;
 
 	list_add(&table->list, &xt[table->af].tables);
+	mutex_unlock(&xt[table->af].mutex);
+	return table;
 
-	ret = 0;
  unlock:
 	mutex_unlock(&xt[table->af].mutex);
-	return ret;
+out:
+	return ERR_PTR(ret);
 }
 EXPORT_SYMBOL_GPL(xt_register_table);
 


^ permalink raw reply

* [PATCH 2/5] netns netfilter: per-netns xt_tables
From: Alexey Dobriyan @ 2008-01-21 14:52 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel, netdev, devel

In fact all we want is per-netns set of rules, however doing that will
unnecessary complicate routines such as ipt_hook()/ipt_do_table, so
make full xt_table array per-netns.

Every user stubbed with init_net for a while.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
---

 include/linux/netfilter/x_tables.h |    6 ++++--
 include/net/net_namespace.h        |    4 ++++
 include/net/netns/x_tables.h       |   10 ++++++++++
 net/ipv4/netfilter/arp_tables.c    |   12 ++++++------
 net/ipv4/netfilter/ip_tables.c     |   12 ++++++------
 net/ipv6/netfilter/ip6_tables.c    |   12 ++++++------
 net/netfilter/x_tables.c           |   35 ++++++++++++++++++++++++-----------
 7 files changed, 60 insertions(+), 31 deletions(-)

--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -335,7 +335,8 @@ extern int xt_check_target(const struct xt_target *target, unsigned short family
 			   unsigned int size, const char *table, unsigned int hook,
 			   unsigned short proto, int inv_proto);
 
-extern struct xt_table *xt_register_table(struct xt_table *table,
+extern struct xt_table *xt_register_table(struct net *net,
+					  struct xt_table *table,
 					  struct xt_table_info *bootstrap,
 					  struct xt_table_info *newinfo);
 extern void *xt_unregister_table(struct xt_table *table);
@@ -352,7 +353,8 @@ extern struct xt_target *xt_request_find_target(int af, const char *name,
 extern int xt_find_revision(int af, const char *name, u8 revision, int target,
 			    int *err);
 
-extern struct xt_table *xt_find_table_lock(int af, const char *name);
+extern struct xt_table *xt_find_table_lock(struct net *net, int af,
+					   const char *name);
 extern void xt_table_unlock(struct xt_table *t);
 
 extern int xt_proto_init(int af);
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -12,6 +12,7 @@
 #include <net/netns/packet.h>
 #include <net/netns/ipv4.h>
 #include <net/netns/ipv6.h>
+#include <net/netns/x_tables.h>
 
 struct proc_dir_entry;
 struct net_device;
@@ -56,6 +57,9 @@ struct net {
 #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
 	struct netns_ipv6	ipv6;
 #endif
+#ifdef CONFIG_NETFILTER
+	struct netns_xt		xt;
+#endif
 };
 
 #ifdef CONFIG_NET
--- /dev/null
+++ b/include/net/netns/x_tables.h
@@ -0,0 +1,10 @@
+#ifndef __NETNS_X_TABLES_H
+#define __NETNS_X_TABLES_H
+
+#include <linux/list.h>
+#include <linux/net.h>
+
+struct netns_xt {
+	struct list_head tables[NPROTO];
+};
+#endif
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -870,7 +870,7 @@ static int get_info(void __user *user, int *len, int compat)
 	if (compat)
 		xt_compat_lock(NF_ARP);
 #endif
-	t = try_then_request_module(xt_find_table_lock(NF_ARP, name),
+	t = try_then_request_module(xt_find_table_lock(&init_net, NF_ARP, name),
 				    "arptable_%s", name);
 	if (t && !IS_ERR(t)) {
 		struct arpt_getinfo info;
@@ -926,7 +926,7 @@ static int get_entries(struct arpt_get_entries __user *uptr, int *len)
 		return -EINVAL;
 	}
 
-	t = xt_find_table_lock(NF_ARP, get.name);
+	t = xt_find_table_lock(&init_net, NF_ARP, get.name);
 	if (t && !IS_ERR(t)) {
 		struct xt_table_info *private = t->private;
 		duprintf("t->private->number = %u\n",
@@ -966,7 +966,7 @@ static int __do_replace(const char *name, unsigned int valid_hooks,
 		goto out;
 	}
 
-	t = try_then_request_module(xt_find_table_lock(NF_ARP, name),
+	t = try_then_request_module(xt_find_table_lock(&init_net, NF_ARP, name),
 				    "arptable_%s", name);
 	if (!t || IS_ERR(t)) {
 		ret = t ? PTR_ERR(t) : -ENOENT;
@@ -1132,7 +1132,7 @@ static int do_add_counters(void __user *user, unsigned int len, int compat)
 		goto free;
 	}
 
-	t = xt_find_table_lock(NF_ARP, name);
+	t = xt_find_table_lock(&init_net, NF_ARP, name);
 	if (!t || IS_ERR(t)) {
 		ret = t ? PTR_ERR(t) : -ENOENT;
 		goto free;
@@ -1604,7 +1604,7 @@ static int compat_get_entries(struct compat_arpt_get_entries __user *uptr,
 	}
 
 	xt_compat_lock(NF_ARP);
-	t = xt_find_table_lock(NF_ARP, get.name);
+	t = xt_find_table_lock(&init_net, NF_ARP, get.name);
 	if (t && !IS_ERR(t)) {
 		struct xt_table_info *private = t->private;
 		struct xt_table_info info;
@@ -1751,7 +1751,7 @@ int arpt_register_table(struct arpt_table *table,
 		return ret;
 	}
 
-	new_table = xt_register_table(table, &bootstrap, newinfo);
+	new_table = xt_register_table(&init_net, table, &bootstrap, newinfo);
 	if (IS_ERR(new_table)) {
 		xt_free_table_info(newinfo);
 		return PTR_ERR(new_table);
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -1112,7 +1112,7 @@ static int get_info(void __user *user, int *len, int compat)
 	if (compat)
 		xt_compat_lock(AF_INET);
 #endif
-	t = try_then_request_module(xt_find_table_lock(AF_INET, name),
+	t = try_then_request_module(xt_find_table_lock(&init_net, AF_INET, name),
 				    "iptable_%s", name);
 	if (t && !IS_ERR(t)) {
 		struct ipt_getinfo info;
@@ -1170,7 +1170,7 @@ get_entries(struct ipt_get_entries __user *uptr, int *len)
 		return -EINVAL;
 	}
 
-	t = xt_find_table_lock(AF_INET, get.name);
+	t = xt_find_table_lock(&init_net, AF_INET, get.name);
 	if (t && !IS_ERR(t)) {
 		struct xt_table_info *private = t->private;
 		duprintf("t->private->number = %u\n", private->number);
@@ -1208,7 +1208,7 @@ __do_replace(const char *name, unsigned int valid_hooks,
 		goto out;
 	}
 
-	t = try_then_request_module(xt_find_table_lock(AF_INET, name),
+	t = try_then_request_module(xt_find_table_lock(&init_net, AF_INET, name),
 				    "iptable_%s", name);
 	if (!t || IS_ERR(t)) {
 		ret = t ? PTR_ERR(t) : -ENOENT;
@@ -1383,7 +1383,7 @@ do_add_counters(void __user *user, unsigned int len, int compat)
 		goto free;
 	}
 
-	t = xt_find_table_lock(AF_INET, name);
+	t = xt_find_table_lock(&init_net, AF_INET, name);
 	if (!t || IS_ERR(t)) {
 		ret = t ? PTR_ERR(t) : -ENOENT;
 		goto free;
@@ -1924,7 +1924,7 @@ compat_get_entries(struct compat_ipt_get_entries __user *uptr, int *len)
 	}
 
 	xt_compat_lock(AF_INET);
-	t = xt_find_table_lock(AF_INET, get.name);
+	t = xt_find_table_lock(&init_net, AF_INET, get.name);
 	if (t && !IS_ERR(t)) {
 		struct xt_table_info *private = t->private;
 		struct xt_table_info info;
@@ -2075,7 +2075,7 @@ int ipt_register_table(struct xt_table *table, const struct ipt_replace *repl)
 		return ret;
 	}
 
-	new_table = xt_register_table(table, &bootstrap, newinfo);
+	new_table = xt_register_table(&init_net, table, &bootstrap, newinfo);
 	if (IS_ERR(new_table)) {
 		xt_free_table_info(newinfo);
 		return PTR_ERR(new_table);
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -1138,7 +1138,7 @@ static int get_info(void __user *user, int *len, int compat)
 	if (compat)
 		xt_compat_lock(AF_INET6);
 #endif
-	t = try_then_request_module(xt_find_table_lock(AF_INET6, name),
+	t = try_then_request_module(xt_find_table_lock(&init_net, AF_INET6, name),
 				    "ip6table_%s", name);
 	if (t && !IS_ERR(t)) {
 		struct ip6t_getinfo info;
@@ -1196,7 +1196,7 @@ get_entries(struct ip6t_get_entries __user *uptr, int *len)
 		return -EINVAL;
 	}
 
-	t = xt_find_table_lock(AF_INET6, get.name);
+	t = xt_find_table_lock(&init_net, AF_INET6, get.name);
 	if (t && !IS_ERR(t)) {
 		struct xt_table_info *private = t->private;
 		duprintf("t->private->number = %u\n", private->number);
@@ -1235,7 +1235,7 @@ __do_replace(const char *name, unsigned int valid_hooks,
 		goto out;
 	}
 
-	t = try_then_request_module(xt_find_table_lock(AF_INET6, name),
+	t = try_then_request_module(xt_find_table_lock(&init_net, AF_INET6, name),
 				    "ip6table_%s", name);
 	if (!t || IS_ERR(t)) {
 		ret = t ? PTR_ERR(t) : -ENOENT;
@@ -1410,7 +1410,7 @@ do_add_counters(void __user *user, unsigned int len, int compat)
 		goto free;
 	}
 
-	t = xt_find_table_lock(AF_INET6, name);
+	t = xt_find_table_lock(&init_net, AF_INET6, name);
 	if (!t || IS_ERR(t)) {
 		ret = t ? PTR_ERR(t) : -ENOENT;
 		goto free;
@@ -1950,7 +1950,7 @@ compat_get_entries(struct compat_ip6t_get_entries __user *uptr, int *len)
 	}
 
 	xt_compat_lock(AF_INET6);
-	t = xt_find_table_lock(AF_INET6, get.name);
+	t = xt_find_table_lock(&init_net, AF_INET6, get.name);
 	if (t && !IS_ERR(t)) {
 		struct xt_table_info *private = t->private;
 		struct xt_table_info info;
@@ -2101,7 +2101,7 @@ int ip6t_register_table(struct xt_table *table, const struct ip6t_replace *repl)
 		return ret;
 	}
 
-	new_table = xt_register_table(table, &bootstrap, newinfo);
+	new_table = xt_register_table(&init_net, table, &bootstrap, newinfo);
 	if (IS_ERR(new_table)) {
 		xt_free_table_info(newinfo);
 		return PTR_ERR(new_table);
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -44,7 +44,6 @@ struct xt_af {
 	struct mutex mutex;
 	struct list_head match;
 	struct list_head target;
-	struct list_head tables;
 #ifdef CONFIG_COMPAT
 	struct mutex compat_mutex;
 	struct compat_delta *compat_offsets;
@@ -597,14 +596,14 @@ void xt_free_table_info(struct xt_table_info *info)
 EXPORT_SYMBOL(xt_free_table_info);
 
 /* Find table by name, grabs mutex & ref.  Returns ERR_PTR() on error. */
-struct xt_table *xt_find_table_lock(int af, const char *name)
+struct xt_table *xt_find_table_lock(struct net *net, int af, const char *name)
 {
 	struct xt_table *t;
 
 	if (mutex_lock_interruptible(&xt[af].mutex) != 0)
 		return ERR_PTR(-EINTR);
 
-	list_for_each_entry(t, &xt[af].tables, list)
+	list_for_each_entry(t, &net->xt.tables[af], list)
 		if (strcmp(t->name, name) == 0 && try_module_get(t->me))
 			return t;
 	mutex_unlock(&xt[af].mutex);
@@ -660,7 +659,7 @@ xt_replace_table(struct xt_table *table,
 }
 EXPORT_SYMBOL_GPL(xt_replace_table);
 
-struct xt_table *xt_register_table(struct xt_table *table,
+struct xt_table *xt_register_table(struct net *net, struct xt_table *table,
 				   struct xt_table_info *bootstrap,
 				   struct xt_table_info *newinfo)
 {
@@ -673,7 +672,7 @@ struct xt_table *xt_register_table(struct xt_table *table,
 		goto out;
 
 	/* Don't autoload: we'd eat our tail... */
-	list_for_each_entry(t, &xt[table->af].tables, list) {
+	list_for_each_entry(t, &net->xt.tables[table->af], list) {
 		if (strcmp(t->name, table->name) == 0) {
 			ret = -EEXIST;
 			goto unlock;
@@ -692,7 +691,7 @@ struct xt_table *xt_register_table(struct xt_table *table,
 	/* save number of initial entries */
 	private->initial_entries = private->number;
 
-	list_add(&table->list, &xt[table->af].tables);
+	list_add(&table->list, &net->xt.tables[table->af]);
 	mutex_unlock(&xt[table->af].mutex);
 	return table;
 
@@ -744,7 +743,7 @@ static struct list_head *type2list(u_int16_t af, u_int16_t type)
 		list = &xt[af].match;
 		break;
 	case TABLE:
-		list = &xt[af].tables;
+		list = &init_net.xt.tables[af];
 		break;
 	default:
 		list = NULL;
@@ -919,10 +918,22 @@ void xt_proto_fini(int af)
 }
 EXPORT_SYMBOL_GPL(xt_proto_fini);
 
+static int __net_init xt_net_init(struct net *net)
+{
+	int i;
+
+	for (i = 0; i < NPROTO; i++)
+		INIT_LIST_HEAD(&net->xt.tables[i]);
+	return 0;
+}
+
+static struct pernet_operations xt_net_ops = {
+	.init = xt_net_init,
+};
 
 static int __init xt_init(void)
 {
-	int i;
+	int i, rv;
 
 	xt = kmalloc(sizeof(struct xt_af) * NPROTO, GFP_KERNEL);
 	if (!xt)
@@ -936,13 +947,16 @@ static int __init xt_init(void)
 #endif
 		INIT_LIST_HEAD(&xt[i].target);
 		INIT_LIST_HEAD(&xt[i].match);
-		INIT_LIST_HEAD(&xt[i].tables);
 	}
-	return 0;
+	rv = register_pernet_subsys(&xt_net_ops);
+	if (rv < 0)
+		kfree(xt);
+	return rv;
 }
 
 static void __exit xt_fini(void)
 {
+	unregister_pernet_subsys(&xt_net_ops);
 	kfree(xt);
 }
 


^ permalink raw reply

* [PATCH 3/5] netns netfilter: return new table from {arp,ip,ip6}t_register_table()
From: Alexey Dobriyan @ 2008-01-21 14:53 UTC (permalink / raw)
  To: kaber; +Cc: netfilter-devel, netdev, devel

Typical table module registers xt_table structure (i.e. packet_filter)
and link it to list during it. We can't use one template for it because
corresponding list_head will become corrupted. We also can't unregister
with template because it wasn't changed at all and thus doesn't know in
which list it is.

So, we duplicate template at the very first step of table registration.
Table modules will save it for use during unregistration time and actual
filtering.

Do it at once to not screw bisection.

P.S.: renaming i.e. packet_filter => __packet_filter is temporary until
      full netnsization of table modules is done.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
---

 include/linux/netfilter_arp/arp_tables.h  |    4 ++--
 include/linux/netfilter_ipv4/ip_tables.h  |    5 +++--
 include/linux/netfilter_ipv6/ip6_tables.h |    4 ++--
 net/ipv4/netfilter/arp_tables.c           |   22 ++++++++++++----------
 net/ipv4/netfilter/arptable_filter.c      |   15 ++++++++-------
 net/ipv4/netfilter/ip_tables.c            |   28 +++++++++++++++++-----------
 net/ipv4/netfilter/iptable_filter.c       |   18 ++++++++++--------
 net/ipv4/netfilter/iptable_mangle.c       |   18 ++++++++++--------
 net/ipv4/netfilter/iptable_raw.c          |   18 ++++++++++--------
 net/ipv4/netfilter/nf_nat_rule.c          |   16 +++++++++-------
 net/ipv6/netfilter/ip6_tables.c           |   24 ++++++++++++++----------
 net/ipv6/netfilter/ip6table_filter.c      |   17 +++++++++--------
 net/ipv6/netfilter/ip6table_mangle.c      |   17 +++++++++--------
 net/ipv6/netfilter/ip6table_raw.c         |   15 ++++++++-------
 net/netfilter/x_tables.c                  |   13 +++++++++++--
 15 files changed, 134 insertions(+), 100 deletions(-)

--- a/include/linux/netfilter_arp/arp_tables.h
+++ b/include/linux/netfilter_arp/arp_tables.h
@@ -271,8 +271,8 @@ struct arpt_error
  	xt_register_target(tgt); })
 #define arpt_unregister_target(tgt) xt_unregister_target(tgt)
 
-extern int arpt_register_table(struct arpt_table *table,
-			       const struct arpt_replace *repl);
+extern struct arpt_table *arpt_register_table(struct arpt_table *table,
+					      const struct arpt_replace *repl);
 extern void arpt_unregister_table(struct arpt_table *table);
 extern unsigned int arpt_do_table(struct sk_buff *skb,
 				  unsigned int hook,
--- a/include/linux/netfilter_ipv4/ip_tables.h
+++ b/include/linux/netfilter_ipv4/ip_tables.h
@@ -244,8 +244,9 @@ ipt_get_target(struct ipt_entry *e)
 #include <linux/init.h>
 extern void ipt_init(void) __init;
 
-extern int ipt_register_table(struct xt_table *table,
-			      const struct ipt_replace *repl);
+extern struct xt_table *ipt_register_table(struct net *net,
+					   struct xt_table *table,
+					   const struct ipt_replace *repl);
 extern void ipt_unregister_table(struct xt_table *table);
 
 /* Standard entry. */
--- a/include/linux/netfilter_ipv6/ip6_tables.h
+++ b/include/linux/netfilter_ipv6/ip6_tables.h
@@ -305,8 +305,8 @@ ip6t_get_target(struct ip6t_entry *e)
 #include <linux/init.h>
 extern void ip6t_init(void) __init;
 
-extern int ip6t_register_table(struct xt_table *table,
-			       const struct ip6t_replace *repl);
+extern struct xt_table *ip6t_register_table(struct xt_table *table,
+					    const struct ip6t_replace *repl);
 extern void ip6t_unregister_table(struct xt_table *table);
 extern unsigned int ip6t_do_table(struct sk_buff *skb,
 				  unsigned int hook,
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -1719,8 +1719,8 @@ static int do_arpt_get_ctl(struct sock *sk, int cmd, void __user *user, int *len
 	return ret;
 }
 
-int arpt_register_table(struct arpt_table *table,
-			const struct arpt_replace *repl)
+struct arpt_table *arpt_register_table(struct arpt_table *table,
+				       const struct arpt_replace *repl)
 {
 	int ret;
 	struct xt_table_info *newinfo;
@@ -1732,7 +1732,7 @@ int arpt_register_table(struct arpt_table *table,
 	newinfo = xt_alloc_table_info(repl->size);
 	if (!newinfo) {
 		ret = -ENOMEM;
-		return ret;
+		goto out;
 	}
 
 	/* choose the copy on our node/cpu */
@@ -1746,18 +1746,20 @@ int arpt_register_table(struct arpt_table *table,
 			      repl->underflow);
 
 	duprintf("arpt_register_table: translate table gives %d\n", ret);
-	if (ret != 0) {
-		xt_free_table_info(newinfo);
-		return ret;
-	}
+	if (ret != 0)
+		goto out_free;
 
 	new_table = xt_register_table(&init_net, table, &bootstrap, newinfo);
 	if (IS_ERR(new_table)) {
-		xt_free_table_info(newinfo);
-		return PTR_ERR(new_table);
+		ret = PTR_ERR(new_table);
+		goto out_free;
 	}
+	return new_table;
 
-	return 0;
+out_free:
+	xt_free_table_info(newinfo);
+out:
+	return ERR_PTR(ret);
 }
 
 void arpt_unregister_table(struct arpt_table *table)
--- a/net/ipv4/netfilter/arptable_filter.c
+++ b/net/ipv4/netfilter/arptable_filter.c
@@ -45,7 +45,7 @@ static struct
 	.term = ARPT_ERROR_INIT,
 };
 
-static struct arpt_table packet_filter = {
+static struct arpt_table __packet_filter = {
 	.name		= "filter",
 	.valid_hooks	= FILTER_VALID_HOOKS,
 	.lock		= RW_LOCK_UNLOCKED,
@@ -53,6 +53,7 @@ static struct arpt_table packet_filter = {
 	.me		= THIS_MODULE,
 	.af		= NF_ARP,
 };
+static struct arpt_table *packet_filter;
 
 /* The work comes in here from netfilter.c */
 static unsigned int arpt_hook(unsigned int hook,
@@ -61,7 +62,7 @@ static unsigned int arpt_hook(unsigned int hook,
 			      const struct net_device *out,
 			      int (*okfn)(struct sk_buff *))
 {
-	return arpt_do_table(skb, hook, in, out, &packet_filter);
+	return arpt_do_table(skb, hook, in, out, packet_filter);
 }
 
 static struct nf_hook_ops arpt_ops[] __read_mostly = {
@@ -90,9 +91,9 @@ static int __init arptable_filter_init(void)
 	int ret;
 
 	/* Register table */
-	ret = arpt_register_table(&packet_filter, &initial_table.repl);
-	if (ret < 0)
-		return ret;
+	packet_filter = arpt_register_table(&__packet_filter, &initial_table.repl);
+	if (IS_ERR(packet_filter))
+		return PTR_ERR(packet_filter);
 
 	ret = nf_register_hooks(arpt_ops, ARRAY_SIZE(arpt_ops));
 	if (ret < 0)
@@ -100,14 +101,14 @@ static int __init arptable_filter_init(void)
 	return ret;
 
 cleanup_table:
-	arpt_unregister_table(&packet_filter);
+	arpt_unregister_table(packet_filter);
 	return ret;
 }
 
 static void __exit arptable_filter_fini(void)
 {
 	nf_unregister_hooks(arpt_ops, ARRAY_SIZE(arpt_ops));
-	arpt_unregister_table(&packet_filter);
+	arpt_unregister_table(packet_filter);
 }
 
 module_init(arptable_filter_init);
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -2048,7 +2048,8 @@ do_ipt_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 	return ret;
 }
 
-int ipt_register_table(struct xt_table *table, const struct ipt_replace *repl)
+struct xt_table *ipt_register_table(struct net *net, struct xt_table *table,
+				    const struct ipt_replace *repl)
 {
 	int ret;
 	struct xt_table_info *newinfo;
@@ -2058,8 +2059,10 @@ int ipt_register_table(struct xt_table *table, const struct ipt_replace *repl)
 	struct xt_table *new_table;
 
 	newinfo = xt_alloc_table_info(repl->size);
-	if (!newinfo)
-		return -ENOMEM;
+	if (!newinfo) {
+		ret = -ENOMEM;
+		goto out;
+	}
 
 	/* choose the copy on our node/cpu, but dont care about preemption */
 	loc_cpu_entry = newinfo->entries[raw_smp_processor_id()];
@@ -2070,18 +2073,21 @@ int ipt_register_table(struct xt_table *table, const struct ipt_replace *repl)
 			      repl->num_entries,
 			      repl->hook_entry,
 			      repl->underflow);
-	if (ret != 0) {
-		xt_free_table_info(newinfo);
-		return ret;
-	}
+	if (ret != 0)
+		goto out_free;
 
-	new_table = xt_register_table(&init_net, table, &bootstrap, newinfo);
+	new_table = xt_register_table(net, table, &bootstrap, newinfo);
 	if (IS_ERR(new_table)) {
-		xt_free_table_info(newinfo);
-		return PTR_ERR(new_table);
+		ret = PTR_ERR(new_table);
+		goto out_free;
 	}
 
-	return 0;
+	return new_table;
+
+out_free:
+	xt_free_table_info(newinfo);
+out:
+	return ERR_PTR(ret);
 }
 
 void ipt_unregister_table(struct xt_table *table)
--- a/net/ipv4/netfilter/iptable_filter.c
+++ b/net/ipv4/netfilter/iptable_filter.c
@@ -53,13 +53,14 @@ static struct
 	.term = IPT_ERROR_INIT,			/* ERROR */
 };
 
-static struct xt_table packet_filter = {
+static struct xt_table __packet_filter = {
 	.name		= "filter",
 	.valid_hooks	= FILTER_VALID_HOOKS,
 	.lock		= RW_LOCK_UNLOCKED,
 	.me		= THIS_MODULE,
 	.af		= AF_INET,
 };
+static struct xt_table *packet_filter;
 
 /* The work comes in here from netfilter.c. */
 static unsigned int
@@ -69,7 +70,7 @@ ipt_hook(unsigned int hook,
 	 const struct net_device *out,
 	 int (*okfn)(struct sk_buff *))
 {
-	return ipt_do_table(skb, hook, in, out, &packet_filter);
+	return ipt_do_table(skb, hook, in, out, packet_filter);
 }
 
 static unsigned int
@@ -88,7 +89,7 @@ ipt_local_out_hook(unsigned int hook,
 		return NF_ACCEPT;
 	}
 
-	return ipt_do_table(skb, hook, in, out, &packet_filter);
+	return ipt_do_table(skb, hook, in, out, packet_filter);
 }
 
 static struct nf_hook_ops ipt_ops[] __read_mostly = {
@@ -132,9 +133,10 @@ static int __init iptable_filter_init(void)
 	initial_table.entries[1].target.verdict = -forward - 1;
 
 	/* Register table */
-	ret = ipt_register_table(&packet_filter, &initial_table.repl);
-	if (ret < 0)
-		return ret;
+	packet_filter = ipt_register_table(&init_net, &__packet_filter,
+					   &initial_table.repl);
+	if (IS_ERR(packet_filter))
+		return PTR_ERR(packet_filter);
 
 	/* Register hooks */
 	ret = nf_register_hooks(ipt_ops, ARRAY_SIZE(ipt_ops));
@@ -144,14 +146,14 @@ static int __init iptable_filter_init(void)
 	return ret;
 
  cleanup_table:
-	ipt_unregister_table(&packet_filter);
+	ipt_unregister_table(packet_filter);
 	return ret;
 }
 
 static void __exit iptable_filter_fini(void)
 {
 	nf_unregister_hooks(ipt_ops, ARRAY_SIZE(ipt_ops));
-	ipt_unregister_table(&packet_filter);
+	ipt_unregister_table(packet_filter);
 }
 
 module_init(iptable_filter_init);
--- a/net/ipv4/netfilter/iptable_mangle.c
+++ b/net/ipv4/netfilter/iptable_mangle.c
@@ -64,13 +64,14 @@ static struct
 	.term = IPT_ERROR_INIT,			/* ERROR */
 };
 
-static struct xt_table packet_mangler = {
+static struct xt_table __packet_mangler = {
 	.name		= "mangle",
 	.valid_hooks	= MANGLE_VALID_HOOKS,
 	.lock		= RW_LOCK_UNLOCKED,
 	.me		= THIS_MODULE,
 	.af		= AF_INET,
 };
+static struct xt_table *packet_mangler;
 
 /* The work comes in here from netfilter.c. */
 static unsigned int
@@ -80,7 +81,7 @@ ipt_route_hook(unsigned int hook,
 	 const struct net_device *out,
 	 int (*okfn)(struct sk_buff *))
 {
-	return ipt_do_table(skb, hook, in, out, &packet_mangler);
+	return ipt_do_table(skb, hook, in, out, packet_mangler);
 }
 
 static unsigned int
@@ -112,7 +113,7 @@ ipt_local_hook(unsigned int hook,
 	daddr = iph->daddr;
 	tos = iph->tos;
 
-	ret = ipt_do_table(skb, hook, in, out, &packet_mangler);
+	ret = ipt_do_table(skb, hook, in, out, packet_mangler);
 	/* Reroute for ANY change. */
 	if (ret != NF_DROP && ret != NF_STOLEN && ret != NF_QUEUE) {
 		iph = ip_hdr(skb);
@@ -171,9 +172,10 @@ static int __init iptable_mangle_init(void)
 	int ret;
 
 	/* Register table */
-	ret = ipt_register_table(&packet_mangler, &initial_table.repl);
-	if (ret < 0)
-		return ret;
+	packet_mangler = ipt_register_table(&init_net, &__packet_mangler,
+					    &initial_table.repl);
+	if (IS_ERR(packet_mangler))
+		return PTR_ERR(packet_mangler);
 
 	/* Register hooks */
 	ret = nf_register_hooks(ipt_ops, ARRAY_SIZE(ipt_ops));
@@ -183,14 +185,14 @@ static int __init iptable_mangle_init(void)
 	return ret;
 
  cleanup_table:
-	ipt_unregister_table(&packet_mangler);
+	ipt_unregister_table(packet_mangler);
 	return ret;
 }
 
 static void __exit iptable_mangle_fini(void)
 {
 	nf_unregister_hooks(ipt_ops, ARRAY_SIZE(ipt_ops));
-	ipt_unregister_table(&packet_mangler);
+	ipt_unregister_table(packet_mangler);
 }
 
 module_init(iptable_mangle_init);
--- a/net/ipv4/netfilter/iptable_raw.c
+++ b/net/ipv4/netfilter/iptable_raw.c
@@ -36,13 +36,14 @@ static struct
 	.term = IPT_ERROR_INIT,			/* ERROR */
 };
 
-static struct xt_table packet_raw = {
+static struct xt_table __packet_raw = {
 	.name = "raw",
 	.valid_hooks =  RAW_VALID_HOOKS,
 	.lock = RW_LOCK_UNLOCKED,
 	.me = THIS_MODULE,
 	.af = AF_INET,
 };
+static struct xt_table *packet_raw;
 
 /* The work comes in here from netfilter.c. */
 static unsigned int
@@ -52,7 +53,7 @@ ipt_hook(unsigned int hook,
 	 const struct net_device *out,
 	 int (*okfn)(struct sk_buff *))
 {
-	return ipt_do_table(skb, hook, in, out, &packet_raw);
+	return ipt_do_table(skb, hook, in, out, packet_raw);
 }
 
 static unsigned int
@@ -70,7 +71,7 @@ ipt_local_hook(unsigned int hook,
 			       "packet.\n");
 		return NF_ACCEPT;
 	}
-	return ipt_do_table(skb, hook, in, out, &packet_raw);
+	return ipt_do_table(skb, hook, in, out, packet_raw);
 }
 
 /* 'raw' is the very first table. */
@@ -96,9 +97,10 @@ static int __init iptable_raw_init(void)
 	int ret;
 
 	/* Register table */
-	ret = ipt_register_table(&packet_raw, &initial_table.repl);
-	if (ret < 0)
-		return ret;
+	packet_raw = ipt_register_table(&init_net, &__packet_raw,
+					&initial_table.repl);
+	if (IS_ERR(packet_raw))
+		return PTR_ERR(packet_raw);
 
 	/* Register hooks */
 	ret = nf_register_hooks(ipt_ops, ARRAY_SIZE(ipt_ops));
@@ -108,14 +110,14 @@ static int __init iptable_raw_init(void)
 	return ret;
 
  cleanup_table:
-	ipt_unregister_table(&packet_raw);
+	ipt_unregister_table(packet_raw);
 	return ret;
 }
 
 static void __exit iptable_raw_fini(void)
 {
 	nf_unregister_hooks(ipt_ops, ARRAY_SIZE(ipt_ops));
-	ipt_unregister_table(&packet_raw);
+	ipt_unregister_table(packet_raw);
 }
 
 module_init(iptable_raw_init);
--- a/net/ipv4/netfilter/nf_nat_rule.c
+++ b/net/ipv4/netfilter/nf_nat_rule.c
@@ -58,13 +58,14 @@ static struct
 	.term = IPT_ERROR_INIT,			/* ERROR */
 };
 
-static struct xt_table nat_table = {
+static struct xt_table __nat_table = {
 	.name		= "nat",
 	.valid_hooks	= NAT_VALID_HOOKS,
 	.lock		= RW_LOCK_UNLOCKED,
 	.me		= THIS_MODULE,
 	.af		= AF_INET,
 };
+static struct xt_table *nat_table;
 
 /* Source NAT */
 static unsigned int ipt_snat_target(struct sk_buff *skb,
@@ -214,7 +215,7 @@ int nf_nat_rule_find(struct sk_buff *skb,
 {
 	int ret;
 
-	ret = ipt_do_table(skb, hooknum, in, out, &nat_table);
+	ret = ipt_do_table(skb, hooknum, in, out, nat_table);
 
 	if (ret == NF_ACCEPT) {
 		if (!nf_nat_initialized(ct, HOOK2MANIP(hooknum)))
@@ -248,9 +249,10 @@ int __init nf_nat_rule_init(void)
 {
 	int ret;
 
-	ret = ipt_register_table(&nat_table, &nat_initial_table.repl);
-	if (ret != 0)
-		return ret;
+	nat_table = ipt_register_table(&init_net, &__nat_table,
+				       &nat_initial_table.repl);
+	if (IS_ERR(nat_table))
+		return PTR_ERR(nat_table);
 	ret = xt_register_target(&ipt_snat_reg);
 	if (ret != 0)
 		goto unregister_table;
@@ -264,7 +266,7 @@ int __init nf_nat_rule_init(void)
  unregister_snat:
 	xt_unregister_target(&ipt_snat_reg);
  unregister_table:
-	ipt_unregister_table(&nat_table);
+	ipt_unregister_table(nat_table);
 
 	return ret;
 }
@@ -273,5 +275,5 @@ void nf_nat_rule_cleanup(void)
 {
 	xt_unregister_target(&ipt_dnat_reg);
 	xt_unregister_target(&ipt_snat_reg);
-	ipt_unregister_table(&nat_table);
+	ipt_unregister_table(nat_table);
 }
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -2074,7 +2074,7 @@ do_ip6t_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
 	return ret;
 }
 
-int ip6t_register_table(struct xt_table *table, const struct ip6t_replace *repl)
+struct xt_table *ip6t_register_table(struct xt_table *table, const struct ip6t_replace *repl)
 {
 	int ret;
 	struct xt_table_info *newinfo;
@@ -2084,8 +2084,10 @@ int ip6t_register_table(struct xt_table *table, const struct ip6t_replace *repl)
 	struct xt_table *new_table;
 
 	newinfo = xt_alloc_table_info(repl->size);
-	if (!newinfo)
-		return -ENOMEM;
+	if (!newinfo) {
+		ret = -ENOMEM;
+		goto out;
+	}
 
 	/* choose the copy on our node/cpu, but dont care about preemption */
 	loc_cpu_entry = newinfo->entries[raw_smp_processor_id()];
@@ -2096,18 +2098,20 @@ int ip6t_register_table(struct xt_table *table, const struct ip6t_replace *repl)
 			      repl->num_entries,
 			      repl->hook_entry,
 			      repl->underflow);
-	if (ret != 0) {
-		xt_free_table_info(newinfo);
-		return ret;
-	}
+	if (ret != 0)
+		goto out_free;
 
 	new_table = xt_register_table(&init_net, table, &bootstrap, newinfo);
 	if (IS_ERR(new_table)) {
-		xt_free_table_info(newinfo);
-		return PTR_ERR(new_table);
+		ret = PTR_ERR(new_table);
+		goto out_free;
 	}
+	return new_table;
 
-	return 0;
+out_free:
+	xt_free_table_info(newinfo);
+out:
+	return ERR_PTR(ret);
 }
 
 void ip6t_unregister_table(struct xt_table *table)
--- a/net/ipv6/netfilter/ip6table_filter.c
+++ b/net/ipv6/netfilter/ip6table_filter.c
@@ -51,13 +51,14 @@ static struct
 	.term = IP6T_ERROR_INIT,		/* ERROR */
 };
 
-static struct xt_table packet_filter = {
+static struct xt_table __packet_filter = {
 	.name		= "filter",
 	.valid_hooks	= FILTER_VALID_HOOKS,
 	.lock		= RW_LOCK_UNLOCKED,
 	.me		= THIS_MODULE,
 	.af		= AF_INET6,
 };
+static struct xt_table *packet_filter;
 
 /* The work comes in here from netfilter.c. */
 static unsigned int
@@ -67,7 +68,7 @@ ip6t_hook(unsigned int hook,
 	 const struct net_device *out,
 	 int (*okfn)(struct sk_buff *))
 {
-	return ip6t_do_table(skb, hook, in, out, &packet_filter);
+	return ip6t_do_table(skb, hook, in, out, packet_filter);
 }
 
 static unsigned int
@@ -87,7 +88,7 @@ ip6t_local_out_hook(unsigned int hook,
 	}
 #endif
 
-	return ip6t_do_table(skb, hook, in, out, &packet_filter);
+	return ip6t_do_table(skb, hook, in, out, packet_filter);
 }
 
 static struct nf_hook_ops ip6t_ops[] __read_mostly = {
@@ -131,9 +132,9 @@ static int __init ip6table_filter_init(void)
 	initial_table.entries[1].target.verdict = -forward - 1;
 
 	/* Register table */
-	ret = ip6t_register_table(&packet_filter, &initial_table.repl);
-	if (ret < 0)
-		return ret;
+	packet_filter = ip6t_register_table(&__packet_filter, &initial_table.repl);
+	if (IS_ERR(packet_filter))
+		return PTR_ERR(packet_filter);
 
 	/* Register hooks */
 	ret = nf_register_hooks(ip6t_ops, ARRAY_SIZE(ip6t_ops));
@@ -143,14 +144,14 @@ static int __init ip6table_filter_init(void)
 	return ret;
 
  cleanup_table:
-	ip6t_unregister_table(&packet_filter);
+	ip6t_unregister_table(packet_filter);
 	return ret;
 }
 
 static void __exit ip6table_filter_fini(void)
 {
 	nf_unregister_hooks(ip6t_ops, ARRAY_SIZE(ip6t_ops));
-	ip6t_unregister_table(&packet_filter);
+	ip6t_unregister_table(packet_filter);
 }
 
 module_init(ip6table_filter_init);
--- a/net/ipv6/netfilter/ip6table_mangle.c
+++ b/net/ipv6/netfilter/ip6table_mangle.c
@@ -57,13 +57,14 @@ static struct
 	.term = IP6T_ERROR_INIT,		/* ERROR */
 };
 
-static struct xt_table packet_mangler = {
+static struct xt_table __packet_mangler = {
 	.name		= "mangle",
 	.valid_hooks	= MANGLE_VALID_HOOKS,
 	.lock		= RW_LOCK_UNLOCKED,
 	.me		= THIS_MODULE,
 	.af		= AF_INET6,
 };
+static struct xt_table *packet_mangler;
 
 /* The work comes in here from netfilter.c. */
 static unsigned int
@@ -73,7 +74,7 @@ ip6t_route_hook(unsigned int hook,
 	 const struct net_device *out,
 	 int (*okfn)(struct sk_buff *))
 {
-	return ip6t_do_table(skb, hook, in, out, &packet_mangler);
+	return ip6t_do_table(skb, hook, in, out, packet_mangler);
 }
 
 static unsigned int
@@ -108,7 +109,7 @@ ip6t_local_hook(unsigned int hook,
 	/* flowlabel and prio (includes version, which shouldn't change either */
 	flowlabel = *((u_int32_t *)ipv6_hdr(skb));
 
-	ret = ip6t_do_table(skb, hook, in, out, &packet_mangler);
+	ret = ip6t_do_table(skb, hook, in, out, packet_mangler);
 
 	if (ret != NF_DROP && ret != NF_STOLEN
 		&& (memcmp(&ipv6_hdr(skb)->saddr, &saddr, sizeof(saddr))
@@ -163,9 +164,9 @@ static int __init ip6table_mangle_init(void)
 	int ret;
 
 	/* Register table */
-	ret = ip6t_register_table(&packet_mangler, &initial_table.repl);
-	if (ret < 0)
-		return ret;
+	packet_mangler = ip6t_register_table(&__packet_mangler, &initial_table.repl);
+	if (IS_ERR(packet_mangler))
+		return PTR_ERR(packet_mangler);
 
 	/* Register hooks */
 	ret = nf_register_hooks(ip6t_ops, ARRAY_SIZE(ip6t_ops));
@@ -175,14 +176,14 @@ static int __init ip6table_mangle_init(void)
 	return ret;
 
  cleanup_table:
-	ip6t_unregister_table(&packet_mangler);
+	ip6t_unregister_table(packet_mangler);
 	return ret;
 }
 
 static void __exit ip6table_mangle_fini(void)
 {
 	nf_unregister_hooks(ip6t_ops, ARRAY_SIZE(ip6t_ops));
-	ip6t_unregister_table(&packet_mangler);
+	ip6t_unregister_table(packet_mangler);
 }
 
 module_init(ip6table_mangle_init);
--- a/net/ipv6/netfilter/ip6table_raw.c
+++ b/net/ipv6/netfilter/ip6table_raw.c
@@ -35,13 +35,14 @@ static struct
 	.term = IP6T_ERROR_INIT,		/* ERROR */
 };
 
-static struct xt_table packet_raw = {
+static struct xt_table __packet_raw = {
 	.name = "raw",
 	.valid_hooks = RAW_VALID_HOOKS,
 	.lock = RW_LOCK_UNLOCKED,
 	.me = THIS_MODULE,
 	.af = AF_INET6,
 };
+static struct xt_table *packet_raw;
 
 /* The work comes in here from netfilter.c. */
 static unsigned int
@@ -51,7 +52,7 @@ ip6t_hook(unsigned int hook,
 	 const struct net_device *out,
 	 int (*okfn)(struct sk_buff *))
 {
-	return ip6t_do_table(skb, hook, in, out, &packet_raw);
+	return ip6t_do_table(skb, hook, in, out, packet_raw);
 }
 
 static struct nf_hook_ops ip6t_ops[] __read_mostly = {
@@ -76,9 +77,9 @@ static int __init ip6table_raw_init(void)
 	int ret;
 
 	/* Register table */
-	ret = ip6t_register_table(&packet_raw, &initial_table.repl);
-	if (ret < 0)
-		return ret;
+	packet_raw = ip6t_register_table(&__packet_raw, &initial_table.repl);
+	if (IS_ERR(packet_raw))
+		return PTR_ERR(packet_raw);
 
 	/* Register hooks */
 	ret = nf_register_hooks(ip6t_ops, ARRAY_SIZE(ip6t_ops));
@@ -88,14 +89,14 @@ static int __init ip6table_raw_init(void)
 	return ret;
 
  cleanup_table:
-	ip6t_unregister_table(&packet_raw);
+	ip6t_unregister_table(packet_raw);
 	return ret;
 }
 
 static void __exit ip6table_raw_fini(void)
 {
 	nf_unregister_hooks(ip6t_ops, ARRAY_SIZE(ip6t_ops));
-	ip6t_unregister_table(&packet_raw);
+	ip6t_unregister_table(packet_raw);
 }
 
 module_init(ip6table_raw_init);
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -667,9 +667,16 @@ struct xt_table *xt_register_table(struct net *net, struct xt_table *table,
 	struct xt_table_info *private;
 	struct xt_table *t;
 
+	/* Don't add one object to multiple lists. */
+	table = kmemdup(table, sizeof(struct xt_table), GFP_KERNEL);
+	if (!table) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
 	ret = mutex_lock_interruptible(&xt[table->af].mutex);
 	if (ret != 0)
-		goto out;
+		goto out_free;
 
 	/* Don't autoload: we'd eat our tail... */
 	list_for_each_entry(t, &net->xt.tables[table->af], list) {
@@ -697,6 +704,8 @@ struct xt_table *xt_register_table(struct net *net, struct xt_table *table,
 
  unlock:
 	mutex_unlock(&xt[table->af].mutex);
+out_free:
+	kfree(table);
 out:
 	return ERR_PTR(ret);
 }
@@ -710,6 +719,7 @@ void *xt_unregister_table(struct xt_table *table)
 	private = table->private;
 	list_del(&table->list);
 	mutex_unlock(&xt[table->af].mutex);
+	kfree(table);
 
 	return private;
 }


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox