Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] ipv4: devconf: start IPV4_DEVCONF_* from 0
From: Lucian Adrian Grijincu @ 2011-01-13  7:50 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, tgraf, kuznet, pekkas, jmorris, yoshfuji, kaber, opurdila,
	ddvlad
In-Reply-To: <20110112.184734.38051229.davem@davemloft.net>

On Thu, Jan 13, 2011 at 4:47 AM, David Miller <davem@davemloft.net> wrote:
> From: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
> Date: Wed, 12 Jan 2011 12:19:10 +0200
>
>> The IPV4_DEVCONF_* enums are never exposed to the userspace and it
>> would make code simpler to remove all the useless (-1) adjustments.
>
> Starting values like this at "1" is usually done on purpose.
>
> It allows "0" to be illegal or mean "none", and thus easily trapping
> cases where the value fails to be initialized properly.  In this way
> the illegal sentinel "0" doesn't take up any space either.


Just that no one checks for zero as invalid anywhere.

We pass the enum names everywhere as parameters. And wherever we need
to use those values we must make sure to subtract 1 every time.

And some things work ok, but it's not entirely obvious why.

For example:
	struct ctl_table devinet_vars[__IPV4_DEVCONF_MAX] (...) = {
                DEVINET_SYSCTL_COMPLEX_ENTRY(FORWARDING, "forwarding",
                                             devinet_sysctl_forward),

This works ok because the DEVINET_SYSCTL_* macros subtract 1 from each
array index.

Because the size of the array is __IPV4_DEVCONF_MAX (without
subtracting 1), there's an extra element at the end and because this
is a global definition it gets initialized with zeros just as required
by register_net_sysctl_table: the last element's procname must be zero
to indicate end-of-array.

Yes it works, but there does not seem to be a good reason why to
complicate things like this (again the sentinel nature of zero is not
used in any place here).


-- 
 .
..: Lucian

^ permalink raw reply

* Re: [patch 2/2] [PATCH] qeth: l3 hw tx csum circumvent hw bug
From: David Miller @ 2011-01-13  7:47 UTC (permalink / raw)
  To: frank.blaschka; +Cc: netdev, linux-s390
In-Reply-To: <20110113064310.729751199@de.ibm.com>

From: frank.blaschka@de.ibm.com
Date: Thu, 13 Jan 2011 07:42:25 +0100

> --- a/drivers/s390/net/qeth_l3_main.c
> +++ b/drivers/s390/net/qeth_l3_main.c
> @@ -2998,7 +2998,9 @@ static inline void qeth_l3_hdr_csum(stru
>  	 */
>  	if (iph->protocol == IPPROTO_UDP)
>  		hdr->hdr.l3.ext_flags |= QETH_HDR_EXT_UDP;
> -	hdr->hdr.l3.ext_flags |= QETH_HDR_EXT_CSUM_TRANSP_REQ;
> +	hdr->hdr.l3.ext_flags |= QETH_HDR_EXT_CSUM_TRANSP_REQ |
> +		QETH_HDR_EXT_CSUM_HDR_REQ;
> +	iph->check = 0;
>  	if (card->options.performance_stats)
>  		card->perf_stats.tx_csum++;
>  }

You may not change the packet header contents blindly like this.
Otherwise unpredictable contents will be seen by tcpdump and any
other code path which has a clone of this packet.

Thus, you'll need to guard this change with something like:

		if (skb_header_cloned(skb) &&
		    pskb_expand_head(skb, 0, 0, GFP_ATOMIC)) {
			dev_kfree_skb(skb);
			goto tx_fail;
		}

Please fix this and resubmit your two patches.

^ permalink raw reply

* Re: [PATCH net-next-2.6] netdev: tilepro: Use is_multicast_ether_addr helper
From: David Miller @ 2011-01-13  7:42 UTC (permalink / raw)
  To: tklauser; +Cc: cmetcalf, netdev
In-Reply-To: <20110113065855.GU29757@distanz.ch>

From: Tobias Klauser <tklauser@distanz.ch>
Date: Thu, 13 Jan 2011 07:58:55 +0100

> On 2011-01-13 at 03:45:01 +0100, David Miller <davem@davemloft.net> wrote:
>> From: Chris Metcalf <cmetcalf@tilera.com>
>> Date: Wed, 12 Jan 2011 12:49:03 -0500
>> 
>> > On 1/12/2011 4:31 AM, Tobias Klauser wrote:
>> >> Use is_multicast_ether_addr from linux/etherdevice.h instead of a custom
>> >> macro. Also remove the broadcast address check, as it is considered a
>> >> multicast address too.
>> >>
>> >> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
>> >> ---
>> >>  drivers/net/tile/tilepro.c |   10 +---------
>> >>  1 files changed, 1 insertions(+), 9 deletions(-)
>> > 
>> > Thanks, I've taken this into the Tilera tree!
>> 
>> Don't, his transformation is buggy.
> 
> Why?
> 
> Doesn't the code want to make sure that only unicast addresses get
> filtered?
> 
>> You can't get rid of the broadcast check, it needs to be there.
>> Think about it.
> 
> If a unicast address is in buf, is_multicast_ether_addr returns false
> (and is_broadcast_addr would too) and thus it would get filtered.

Ok, this may be correct, but it makes the code hard to read.

I think the old code, whilst redundant, is easier to understand.

Why not add a function "is_unicast_ether_addr()" and use that
here instead?

That solves all of the issues I have with your change.

^ permalink raw reply

* Re: [PATCH net-next-2.6] netdev: tilepro: Use is_multicast_ether_addr helper
From: Tobias Klauser @ 2011-01-13  6:58 UTC (permalink / raw)
  To: David Miller; +Cc: cmetcalf, netdev
In-Reply-To: <20110112.184501.00455226.davem@davemloft.net>

On 2011-01-13 at 03:45:01 +0100, David Miller <davem@davemloft.net> wrote:
> From: Chris Metcalf <cmetcalf@tilera.com>
> Date: Wed, 12 Jan 2011 12:49:03 -0500
> 
> > On 1/12/2011 4:31 AM, Tobias Klauser wrote:
> >> Use is_multicast_ether_addr from linux/etherdevice.h instead of a custom
> >> macro. Also remove the broadcast address check, as it is considered a
> >> multicast address too.
> >>
> >> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
> >> ---
> >>  drivers/net/tile/tilepro.c |   10 +---------
> >>  1 files changed, 1 insertions(+), 9 deletions(-)
> > 
> > Thanks, I've taken this into the Tilera tree!
> 
> Don't, his transformation is buggy.

Why?

Doesn't the code want to make sure that only unicast addresses get
filtered?

> You can't get rid of the broadcast check, it needs to be there.
> Think about it.

If a unicast address is in buf, is_multicast_ether_addr returns false
(and is_broadcast_addr would too) and thus it would get filtered.

If a multicast address is in buf, is_multicast_ether_addr returns
true, so the it won't get filtered (no matter what is_broadcast_addr
would tell).

If a broadcast address is in buf, is_multicast_ether_addr returns true,
so it won't get filtered (no matter what is_broadcast_addr would tell).

Are there any special addresses I missed? Or did I didn't get the above
right?

^ permalink raw reply

* Re: Flow Control and Port Mirroring Revisited
From: Simon Horman @ 2011-01-13  6:47 UTC (permalink / raw)
  To: Jesse Gross
  Cc: Eric Dumazet, Rusty Russell, virtualization, dev, virtualization,
	netdev, kvm, Michael S. Tsirkin
In-Reply-To: <20110110093155.GB13420@verge.net.au>

On Mon, Jan 10, 2011 at 06:31:55PM +0900, Simon Horman wrote:
> On Fri, Jan 07, 2011 at 10:23:58AM +0900, Simon Horman wrote:
> > On Thu, Jan 06, 2011 at 05:38:01PM -0500, Jesse Gross wrote:
> > 
> > [ snip ]
> > > 
> > > I know that everyone likes a nice netperf result but I agree with
> > > Michael that this probably isn't the right question to be asking.  I
> > > don't think that socket buffers are a real solution to the flow
> > > control problem: they happen to provide that functionality but it's
> > > more of a side effect than anything.  It's just that the amount of
> > > memory consumed by packets in the queue(s) doesn't really have any
> > > implicit meaning for flow control (think multiple physical adapters,
> > > all with the same speed instead of a virtual device and a physical
> > > device with wildly different speeds).  The analog in the physical
> > > world that you're looking for would be Ethernet flow control.
> > > Obviously, if the question is limiting CPU or memory consumption then
> > > that's a different story.
> > 
> > Point taken. I will see if I can control CPU (and thus memory) consumption
> > using cgroups and/or tc.
> 
> I have found that I can successfully control the throughput using
> the following techniques
> 
> 1) Place a tc egress filter on dummy0
> 
> 2) Use ovs-ofctl to add a flow that sends skbs to dummy0 and then eth1,
>    this is effectively the same as one of my hacks to the datapath
>    that I mentioned in an earlier mail. The result is that eth1
>    "paces" the connection.

Further to this, I wonder if there is any interest in providing
a method to switch the action order - using ovs-ofctl is a hack imho -
and/or switching the default action order for mirroring.

> 3) 2) + place a tc egress filter on eth1
> 
> Which mostly makes sense to me although I am a little confused about
> why 1) needs a filter on dummy0 (a filter on eth1 has no effect)
> but 3) needs a filter on eth1 (a filter on dummy0 has no effect,
> even if the skb is sent to dummy0 last.
> 
> I also had some limited success using CPU cgroups, though obviously
> that targets CPU usage and thus the effect on throughput is fairly course.
> In short, its a useful technique but not one that bares further
> discussion here.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* [patch 0/2] s390: qeth bug fixes for 2.6.38 next rc
From: frank.blaschka @ 2011-01-13  6:42 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390

Hi Dave,

here are 2 qeth bug fixes for 2.6.38 next rc (net-2.6)

shortlog:

Ursula Braun (1)
qeth: postpone open till recovery is finished

Frank Blaschka (1)
qeth: l3 hw tx csum circumvent hw bug

Thanks,
        Frank

^ permalink raw reply

* [patch 2/2] [PATCH] qeth: l3 hw tx csum circumvent hw bug
From: frank.blaschka @ 2011-01-13  6:42 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390
In-Reply-To: <20110113064223.429450048@de.ibm.com>

[-- Attachment #1: 602-qeth-l3-tx-csum.diff --]
[-- Type: text/plain, Size: 781 bytes --]

From: Frank Blaschka <frank.blaschka@de.ibm.com>

Some OSA level have a bug in the hw tx csum logic. We can circumvent
this bug by turning on IP hw csum also.

Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
---

 drivers/s390/net/qeth_l3_main.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -2998,7 +2998,9 @@ static inline void qeth_l3_hdr_csum(stru
 	 */
 	if (iph->protocol == IPPROTO_UDP)
 		hdr->hdr.l3.ext_flags |= QETH_HDR_EXT_UDP;
-	hdr->hdr.l3.ext_flags |= QETH_HDR_EXT_CSUM_TRANSP_REQ;
+	hdr->hdr.l3.ext_flags |= QETH_HDR_EXT_CSUM_TRANSP_REQ |
+		QETH_HDR_EXT_CSUM_HDR_REQ;
+	iph->check = 0;
 	if (card->options.performance_stats)
 		card->perf_stats.tx_csum++;
 }


^ permalink raw reply

* [patch 1/2] [PATCH] qeth: postpone open till recovery is finished
From: frank.blaschka @ 2011-01-13  6:42 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, Ursula Braun
In-Reply-To: <20110113064223.429450048@de.ibm.com>

[-- Attachment #1: 601-qeth-postpone-open.diff --]
[-- Type: text/plain, Size: 3138 bytes --]

From: Ursula Braun <ursula.braun@de.ibm.com>

The open function of qeth is not executed if the qeth device is in
state DOWN or HARDSETUP. A recovery switches from state SOFTSETUP to 
HARDSETUP to DOWN to HARDSETUP and back to SOFTSETUP. If open and
recover are running concurrently, open fails if it hits the states
HARDSETUP or DOWN. This patch inserts waiting for recovery finish
in the qeth open functions to enable successful qeth device opening
in spite of a running recovery.

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
---

 drivers/s390/net/qeth_l2_main.c |   18 ++++++++++++++++--
 drivers/s390/net/qeth_l3_main.c |   18 ++++++++++++++++--
 2 files changed, 32 insertions(+), 4 deletions(-)

--- a/drivers/s390/net/qeth_l2_main.c
+++ b/drivers/s390/net/qeth_l2_main.c
@@ -831,12 +831,14 @@ tx_drop:
 	return NETDEV_TX_OK;
 }
 
-static int qeth_l2_open(struct net_device *dev)
+static int __qeth_l2_open(struct net_device *dev)
 {
 	struct qeth_card *card = dev->ml_priv;
 	int rc = 0;
 
 	QETH_CARD_TEXT(card, 4, "qethopen");
+	if (card->state == CARD_STATE_UP)
+		return rc;
 	if (card->state != CARD_STATE_SOFTSETUP)
 		return -ENODEV;
 
@@ -857,6 +859,18 @@ static int qeth_l2_open(struct net_devic
 	return rc;
 }
 
+static int qeth_l2_open(struct net_device *dev)
+{
+	struct qeth_card *card = dev->ml_priv;
+
+	QETH_CARD_TEXT(card, 5, "qethope_");
+	if (qeth_wait_for_threads(card, QETH_RECOVER_THREAD)) {
+		QETH_CARD_TEXT(card, 3, "openREC");
+		return -ERESTARTSYS;
+	}
+	return __qeth_l2_open(dev);
+}
+
 static int qeth_l2_stop(struct net_device *dev)
 {
 	struct qeth_card *card = dev->ml_priv;
@@ -1046,7 +1060,7 @@ contin:
 	if (recover_flag == CARD_STATE_RECOVER) {
 		if (recovery_mode &&
 		    card->info.type != QETH_CARD_TYPE_OSN) {
-			qeth_l2_open(card->dev);
+			__qeth_l2_open(card->dev);
 		} else {
 			rtnl_lock();
 			dev_open(card->dev);
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -3240,12 +3240,14 @@ tx_drop:
 	return NETDEV_TX_OK;
 }
 
-static int qeth_l3_open(struct net_device *dev)
+static int __qeth_l3_open(struct net_device *dev)
 {
 	struct qeth_card *card = dev->ml_priv;
 	int rc = 0;
 
 	QETH_CARD_TEXT(card, 4, "qethopen");
+	if (card->state == CARD_STATE_UP)
+		return rc;
 	if (card->state != CARD_STATE_SOFTSETUP)
 		return -ENODEV;
 	card->data.state = CH_STATE_UP;
@@ -3260,6 +3262,18 @@ static int qeth_l3_open(struct net_devic
 	return rc;
 }
 
+static int qeth_l3_open(struct net_device *dev)
+{
+	struct qeth_card *card = dev->ml_priv;
+
+	QETH_CARD_TEXT(card, 5, "qethope_");
+	if (qeth_wait_for_threads(card, QETH_RECOVER_THREAD)) {
+		QETH_CARD_TEXT(card, 3, "openREC");
+		return -ERESTARTSYS;
+	}
+	return __qeth_l3_open(dev);
+}
+
 static int qeth_l3_stop(struct net_device *dev)
 {
 	struct qeth_card *card = dev->ml_priv;
@@ -3564,7 +3578,7 @@ contin:
 		netif_carrier_off(card->dev);
 	if (recover_flag == CARD_STATE_RECOVER) {
 		if (recovery_mode)
-			qeth_l3_open(card->dev);
+			__qeth_l3_open(card->dev);
 		else {
 			rtnl_lock();
 			dev_open(card->dev);


^ permalink raw reply

* Re: TSO/GRO/LRO/somethingO breaks LVS on 2.6.36
From: Simon Horman @ 2011-01-13  6:34 UTC (permalink / raw)
  To: Simon Kirby; +Cc: Eric Dumazet, netdev
In-Reply-To: <20101222233948.GC17581@hostway.ca>

On Wed, Dec 22, 2010 at 03:39:48PM -0800, Simon Kirby wrote:
> On Thu, Dec 09, 2010 at 03:51:59AM +0100, Eric Dumazet wrote:
> 
> > Le mercredi 08 d??cembre 2010 ?? 18:35 -0800, Simon Kirby a ??crit :
> > 
> > > Right.  So, on 2.6.35, ethtool shows:
> > > 
> > > # ethtool -k eth1
> > > Offload parameters for eth1:
> > > rx-checksumming: on
> > > tx-checksumming: on
> > > scatter-gather: on
> > > tcp segmentation offload: on
> > > udp fragmentation offload: off
> > > generic segmentation offload: on
> > > large receive offload: off
> > > 
> > > # ethtool -k eth1.39
> > > Offload parameters for eth1.39:
> > > rx-checksumming: on
> > > tx-checksumming: off
> > > scatter-gather: off
> > > tcp segmentation offload: off
> > > udp fragmentation offload: off
> > > generic segmentation offload: off
> > > large receive offload: off
> > > 
> > > On 2.6.36, ethtool shows:
> > > 
> > > # ethtool -k eth1
> > > Offload parameters for eth1:
> > > rx-checksumming: on
> > > tx-checksumming: on
> > > scatter-gather: on
> > > tcp segmentation offload: on
> > > udp fragmentation offload: off
> > > generic segmentation offload: on
> > > large receive offload: off
> > > 
> > > # ethtool -k eth1.39
> > > Offload parameters for eth1.39:
> > > rx-checksumming: on
> > > tx-checksumming: on
> > > scatter-gather: on
> > > tcp segmentation offload: on
> > > udp fragmentation offload: off
> > > generic segmentation offload: on
> > > large receive offload: off
> > > 
> > > And if I set it with ethtool -K eth1.39 gso off; ethtool -K eth1 gso off,
> > > it says "off", but I still see merged frames, as verified with an nc <
> > > /dev/zero and tcpdump -i any -n 'length > 1500'.
> > 
> > Yep, but you also have "tso on" by default on 2.6.36 and eth1.39 (vlan)
> 
> and turning it off makes no difference, either.  Even if I down/up the
> interface after.  I still see merged packets on receive, even if I try to
> use ethtool to turn off everything that is "on".  It's as if the new code
> to enable it on vlans works, but the disable from ethtool does not.
> 
> In other words, I cannot find a way to disable it (and thus make LVS
> work) on 2.6.36 or 2.6.37-rc6.

Hi Simon,

thanks for prodding me to respond to this post offline and sorry for not
responding earlier.

Firstly, I think that this is a receive-side problem so I don't believe
that GSO (generic segmentation offload) or other transmit-side options are
likely to have any affect.

My understanding is that on the receive-side there are two options which
when enabled can result in the behaviour that you describe.

* LRO (large receive offload)

  You have this disabled, and assuming it really is disabled it
  shouldn't be causing a problem.

* GRO (generic receive offload)

  This does not seem to be in the output of your ethtool commands at all.
  So I wonder if your ethtool is too old to support this option?

  In any case, I was able to reproduce the problem that you describe (or at
  least something very similar) using 2.6.36 with GRO enabled on eth1.1 and
  the problem did not manifest when I disabled GRO on eth1.1.

N.B: I'm not sure that I have any hardware whose driver supports LRO
     so my tests were made without LRO support.

> One interesting note is that if I destroy and recreate the VLAN interface,
> it comes back with gso enabled:

I guess that relate to how the default is set.
I don't imagine that it is the cause of the problem that you are seeing.

^ permalink raw reply

* RE: [E1000-devel] [e100] Page allocation failure warning(?) in 2.6.36.3
From: Eric Dumazet @ 2011-01-13  4:55 UTC (permalink / raw)
  To: Chris Rankin
  Cc: JesseBrandeburg, David Miller, e1000-devel@lists.sourceforge.net,
	Tushar NDave, netdev@vger.kernel.org, Jeffrey TKirsher
In-Reply-To: <572050.51146.qm@web121706.mail.ne1.yahoo.com>

Le mercredi 12 janvier 2011 à 15:27 -0800, Chris Rankin a écrit :
> Thanks, the problem has not reoccurred since I've rebooted the box with the new e100 module.
> 
> However, the problem *did* reoccur when I tried just stopping networking, unloading the old module, loading the new module and restarting networking... (I think there were some residual network packets still taking up memory in the system. Maybe.)
> 
> Jan 12 22:27:04 wellhouse kernel: ifconfig: page allocation failure. order:6, mode:0x8020
> Jan 12 22:27:04 wellhouse kernel: Pid: 14968, comm: ifconfig Not tainted 2.6.36.3 #1
> Jan 12 22:27:04 wellhouse kernel: Call Trace:
> Jan 12 22:27:04 wellhouse kernel: [<c104b2a9>] ? __alloc_pages_nodemask+0x477/0x4a6
> Jan 12 22:27:04 wellhouse kernel: [<c106177d>] ? __slab_alloc+0x1eb/0x396
> Jan 12 22:27:04 wellhouse kernel: [<c1004ca6>] ? dma_generic_alloc_coherent+0x4e/0xac
> Jan 12 22:27:04 wellhouse kernel: [<c105fb5c>] ? dma_pool_alloc+0xe5/0x1d9
> Jan 12 22:27:04 wellhouse kernel: [<c1004c58>] ? dma_generic_alloc_coherent+0x0/0xac
> Jan 12 22:27:04 wellhouse kernel: [<c66f67ee>] ? e100_rx_alloc_skb+0x82/0x11d [e100]
> Jan 12 22:27:07 wellhouse kernel: [<c66f687e>] ? e100_rx_alloc_skb+0x112/0x11d [e100]
> Jan 12 22:27:07 wellhouse kernel: [<c66f68d7>] ? e100_alloc_cbs+0x4e/0xfa [e100]
> Jan 12 22:27:07 wellhouse kernel: [<c66f836e>] ? e100_up+0x1b/0xf1 [e100]
> Jan 12 22:27:07 wellhouse kernel: [<c66f845b>] ? e100_open+0x17/0x3b [e100]
> Jan 12 22:27:07 wellhouse kernel: [<c1121630>] ? __dev_open+0x7c/0xa0
> Jan 12 22:27:07 wellhouse kernel: [<c11217ed>] ? __dev_change_flags+0x8b/0x100
> Jan 12 22:27:07 wellhouse kernel: [<c11218c3>] ? dev_change_flags+0x10/0x3b
> Jan 12 22:27:07 wellhouse kernel: [<c1159880>] ? devinet_ioctl+0x25a/0x532
> Jan 12 22:27:07 wellhouse kernel: [<c11146d2>] ? sock_ioctl+0x1a8/0x1ca
> Jan 12 22:27:07 wellhouse kernel: [<c111452a>] ? sock_ioctl+0x0/0x1ca
> Jan 12 22:27:07 wellhouse kernel: [<c106e061>] ? do_vfs_ioctl+0x464/0x4a2
> Jan 12 22:27:07 wellhouse kernel: [<c1014ce0>] ? do_page_fault+0x2d2/0x2ea
> Jan 12 22:27:07 wellhouse kernel: [<c1014cc8>] ? do_page_fault+0x2ba/0x2ea
> Jan 12 22:27:07 wellhouse kernel: [<c10636f6>] ? sys_faccessat+0x144/0x151
> Jan 12 22:27:07 wellhouse kernel: [<c106e0cc>] ? sys_ioctl+0x2d/0x49
> Jan 12 22:27:07 wellhouse kernel: [<c1177dd5>] ? syscall_call+0x7/0xb
> Jan 12 22:27:07 wellhouse kernel: Mem-Info:
> Jan 12 22:27:07 wellhouse kernel: DMA per-cpu:
> Jan 12 22:27:07 wellhouse kernel: CPU    0: hi:    0, btch:   1 usd:   0
> Jan 12 22:27:07 wellhouse kernel: Normal per-cpu:
> Jan 12 22:27:07 wellhouse kernel: CPU    0: hi:    6, btch:   1 usd:   1
> Jan 12 22:27:07 wellhouse kernel: active_anon:2698 inactive_anon:3626 isolated_anon:0
> Jan 12 22:27:07 wellhouse kernel: active_file:2305 inactive_file:3574 isolated_file:0
> Jan 12 22:27:07 wellhouse kernel: unevictable:0 dirty:17 writeback:0 unstable:0
> Jan 12 22:27:07 wellhouse kernel: free:558 slab_reclaimable:484 slab_unreclaimable:1309
> Jan 12 22:27:07 wellhouse kernel: mapped:1209 shmem:650 pagetables:129 bounce:0
> Jan 12 22:27:07 wellhouse kernel: DMA free:1044kB min:248kB low:308kB high:372kB active_anon:3516kB inactive_anon:4004kB active_file:1784kB inactive_file:4216kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15864kB mlocked:0kB dirty:4kB writeback:0kB mapped:988kB shmem:8kB slab_reclaimable:288kB slab_unreclaimable:188kB kernel_stack:56kB pagetables:76kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> Jan 12 22:27:07 wellhouse kernel: lowmem_reserve[]: 0 47 47
> Jan 12 22:27:07 wellhouse kernel: Normal free:1188kB min:764kB low:952kB high:1144kB active_anon:7276kB inactive_anon:10500kB active_file:7436kB inactive_file:10080kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:48768kB mlocked:0kB dirty:64kB writeback:0kB mapped:3848kB shmem:2592kB slab_reclaimable:1648kB slab_unreclaimable:5048kB kernel_stack:240kB pagetables:440kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> Jan 12 22:27:07 wellhouse kernel: lowmem_reserve[]: 0 0 0
> Jan 12 22:27:07 wellhouse kernel: DMA: 87*4kB 9*8kB 5*16kB 3*32kB 1*64kB 3*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1044kB
> Jan 12 22:27:07 wellhouse kernel: Normal: 123*4kB 11*8kB 0*16kB 1*32kB 1*64kB 2*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1188kB
> Jan 12 22:27:07 wellhouse kernel: 6766 total pagecache pages
> Jan 12 22:27:07 wellhouse kernel: 237 pages in swap cache
> Jan 12 22:27:07 wellhouse kernel: Swap cache stats: add 1830, delete 1593, find 1018/1086
> Jan 12 22:27:07 wellhouse kernel: Free swap  = 2176336kB
> Jan 12 22:27:07 wellhouse kernel: Total swap = 2179596kB
> Jan 12 22:27:07 wellhouse kernel: 16383 pages RAM
> Jan 12 22:27:07 wellhouse kernel: 826 pages reserved
> Jan 12 22:27:07 wellhouse kernel: 5098 pages shared
> Jan 12 22:27:07 wellhouse kernel: 12619 pages non-shared
> 
> I also noticed that the e100 is still using GFP_ATOMIC in one place.
> Does this mean that the problem is ultimately only truly fixable by
> freeing up some memory?


Are you referring to dma_pool_alloc(), using at line 321 :

page = pool_alloc_page(pool, GFP_ATOMIC);

I am not sure it can be changed (we own &pool->lock spinlock), thus
cannot sleep.

We could try :

diff --git a/mm/dmapool.c b/mm/dmapool.c
index 4df2de7..faf254a 100644
--- a/mm/dmapool.c
+++ b/mm/dmapool.c
@@ -319,7 +319,7 @@ void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
 		if (page->offset < pool->allocation)
 			goto ready;
 	}
-	page = pool_alloc_page(pool, GFP_ATOMIC);
+	page = pool_alloc_page(pool, GFP_ATOMIC | __GFP_NOWARN);
 	if (!page) {
 		if (mem_flags & __GFP_WAIT) {
 			DECLARE_WAITQUEUE(wait, current);


Problem is e100 allocates an order-6 page in DMA zone
(a 256 KB contigous area of ram)

This contigous area of ram is not available but just after booting...

If you change :

struct param_range cbs  = { .min = 64, .max = 256, .count = 128 };

to

struct param_range cbs  = { .min = 64, .max = 64, .count = 64 };

the need will be reduced to 64 KB of contigous area, it should be OK
then ;)

On such small router, I doubt you need more than 64 slots in TX ring
buffer.




^ permalink raw reply related

* Re: [PATCH V8 08/13] posix clocks: cleanup the CLOCK_DISPTACH macro
From: Richard Cochran @ 2011-01-13  4:30 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Alan Cox, Arnd Bergmann, Christoph Lameter, David Miller,
	John Stultz, Krzysztof Halasa, Peter Zijlstra, Rodolfo Giometti
In-Reply-To: <alpine.LFD.2.00.1101111224280.12146-bi+AKbBUZKagILUCTcTcHdKyNwTtLsGr@public.gmane.org>

On Tue, Jan 11, 2011 at 01:57:23PM +0100, Thomas Gleixner wrote:
> 
> static clockid_t clock_get_array_id(const clockid_t id)
> {
> 	if (id >= 0)
> 	       return id < MAX_CLOCKS ? id : POSIX_INV_CLOCK_ID;
> 
>       	if (clock_is_posix_cpu(id))
> 		return POSIX_CPU_CLOCK_ID;
> 
> 	return POSIX_INV_CLOCK_ID;
> }
> 
> static inline int dispatch_clock_getres(const clockid_t id, struct timespec *ts)
> {
> 	struct k_clock *clk = &posix_clocks[clock_get_array_id(id)];
> 
> 	return clk->clock_getres ? clk->clock_getres(id, ts) : -EINVAL;
> }

I would like to take this idea one step further, like so:

static struct k_clock *clockid_to_kclock(const clockid_t id)
{
	if (id >= 0)
		return id < MAX_CLOCKS ?
			&posix_clocks[id] : &posix_clocks[POSIX_INV_CLOCK_ID];
	...
}

SYSCALL( ... , const clockid_t id, struct timespec *ts)
{
	struct k_clock *clk = clockid_to_kclock(id);

	return clk->clock_getres ? clk->clock_getres(id, ts) : -EINVAL;
}

What do you think?

Thanks,

Richard

^ permalink raw reply

* [PATCH net-next-2.6] netdev: bfin_mac: Remove is_multicast_ether_addr use in netdev_for_each_mc_addr
From: Joe Perches @ 2011-01-13  4:08 UTC (permalink / raw)
  To: David Miller
  Cc: vapier.adi, tklauser, michael.hennerich, uclinux-dist-devel,
	netdev
In-Reply-To: <20110112.190143.184432859.davem@davemloft.net>

Remove code that has no effect.

Signed-off-by: Joe Perches <joe@perches.com>

---

Uncompiled, untested...

> > Does a netdev_for_each_mc_addr loop entry really
> > need to verify that the address is multicast?
> > Couldn't this just be:
> It could, and I'd be happy to apply a follow-on patch that does
> this.

 drivers/net/bfin_mac.c |    9 +--------
 1 files changed, 1 insertions(+), 8 deletions(-)

diff --git a/drivers/net/bfin_mac.c b/drivers/net/bfin_mac.c
index fe75e7a..22abfb3 100644
--- a/drivers/net/bfin_mac.c
+++ b/drivers/net/bfin_mac.c
@@ -1284,19 +1284,12 @@ static void bfin_mac_multicast_hash(struct net_device *dev)
 {
 	u32 emac_hashhi, emac_hashlo;
 	struct netdev_hw_addr *ha;
-	char *addrs;
 	u32 crc;
 
 	emac_hashhi = emac_hashlo = 0;
 
 	netdev_for_each_mc_addr(ha, dev) {
-		addrs = ha->addr;
-
-		/* skip non-multicast addresses */
-		if (!is_multicast_ether_addr(addrs))
-			continue;
-
-		crc = ether_crc(ETH_ALEN, addrs);
+		crc = ether_crc(ETH_ALEN, ha->addr);
 		crc >>= 26;
 
 		if (crc & 0x20)





^ permalink raw reply related

* Re: [uclinux-dist-devel] [PATCH net-next-2.6] netdev: bfin_mac: Use is_multicast_ether_addr helper
From: David Miller @ 2011-01-13  3:01 UTC (permalink / raw)
  To: joe; +Cc: vapier.adi, tklauser, michael.hennerich, uclinux-dist-devel,
	netdev
In-Reply-To: <1294852681.4114.6.camel@Joe-Laptop>

From: Joe Perches <joe@perches.com>
Date: Wed, 12 Jan 2011 09:18:01 -0800

> On Wed, 2011-01-12 at 11:38 -0500, Mike Frysinger wrote:
>> On Wed, Jan 12, 2011 at 04:30, Tobias Klauser wrote:
>> > --- a/drivers/net/bfin_mac.c
>> > +++ b/drivers/net/bfin_mac.c
>> > @@ -1293,7 +1293,7 @@ static void bfin_mac_multicast_hash(struct net_device *dev)
>> >                addrs = ha->addr;
>> >
>> >                /* skip non-multicast addresses */
>> > -               if (!(*addrs & 1))
>> > +               if (!is_multicast_ether_addr(addrs))
>> >                        continue;
>> 
>> looks good to me ...
>> Acked-by: Mike Frysinger <vapier@gentoo.org>
> 
> Does a netdev_for_each_mc_addr loop entry really
> need to verify that the address is multicast?
> 
> Couldn't this just be:

It could, and I'd be happy to apply a follow-on patch that does
this.

^ permalink raw reply

* Re: [PATCH 2/3] bna: Remove unnecessary memset(,0,)
From: David Miller @ 2011-01-13  3:01 UTC (permalink / raw)
  To: rmody; +Cc: joe, ddutt, netdev, linux-kernel
In-Reply-To: <E5313AF6F2BFD14293E5FD0F94750F86A5CFF6C4A8@HQ1-EXCH01.corp.brocade.com>

From: Rasesh Mody <rmody@brocade.com>
Date: Wed, 12 Jan 2011 14:55:25 -0800

> 
>>From: Joe Perches [mailto:joe@perches.com]
>>Sent: Wednesday, January 12, 2011 1:21 PM
>>
>>kzalloc'd memory doesn't need a memset to 0.
>>
>>Signed-off-by: Joe Perches <joe@perches.com>
> 
> Acked-by: Rasesh Mody <rmody@brocade.com>

Applied.

^ permalink raw reply

* Re: [PATCH] eth: fix new kernel-doc warning
From: David Miller @ 2011-01-13  3:00 UTC (permalink / raw)
  To: randy.dunlap; +Cc: netdev
In-Reply-To: <20110112165051.153246cb.randy.dunlap@oracle.com>

From: Randy Dunlap <randy.dunlap@oracle.com>
Date: Wed, 12 Jan 2011 16:50:51 -0800

> From: Randy Dunlap <randy.dunlap@oracle.com>
> 
> Fix new kernel-doc warning (copy-paste typo):
> 
> Warning(net/ethernet/eth.c:366): No description found for parameter 'rxqs'
> 
> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>

Applied.

^ permalink raw reply

* Re: [PATCH] sched: remove unused backlog in RED stats
From: David Miller @ 2011-01-13  3:00 UTC (permalink / raw)
  To: shemminger; +Cc: netdev
In-Reply-To: <20110112174232.05d9429d@s6510>

From: Stephen Hemminger <shemminger@vyatta.com>
Date: Wed, 12 Jan 2011 17:42:32 -0800

> The RED statistics structure includes backlog field which is not
> set or used by any code.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Applied.

^ permalink raw reply

* Re: pull request: wireless-2.6 2011-01-12
From: David Miller @ 2011-01-13  2:52 UTC (permalink / raw)
  To: linville; +Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <20110112183754.GC8674@tuxdriver.com>

From: "John W. Linville" <linville@tuxdriver.com>
Date: Wed, 12 Jan 2011 13:37:54 -0500

> This is an initial batch of wireless fixes intended for 2.6.38-rc1.
> Highlights include a number of small ath9k fixes, some documentation
> fixes from Johannes, and an ssb fix that stops loading b44 inadvertently
> when certain b43 devices are present.
> 
> Please let me know if there are problems!

Pulled, thanks a lot John.

^ permalink raw reply

* Re: [PATCH] inet6: prevent network storms caused by linux IPv6 routers
From: David Miller @ 2011-01-13  2:52 UTC (permalink / raw)
  To: kuznet; +Cc: netdev
In-Reply-To: <20110112183408.GA8417@ms2.inr.ac.ru>

From: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Date: Wed, 12 Jan 2011 21:34:08 +0300

> Linux IPv6 forwards unicast packets, which are link layer multicasts...
> The hole was present since day one. I was 100% this check is there, but it is not.
> 
> The problem shows itself, f.e. when Microsoft Network Load Balancer runs on a network.
> This software resolves IPv6 unicast addresses to multicast MAC addresses.
> 
> Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>

Applied, thanks Alexey.

^ permalink raw reply

* Re: [PATCH] ipv4: devconf: start IPV4_DEVCONF_* from 0
From: David Miller @ 2011-01-13  2:47 UTC (permalink / raw)
  To: lucian.grijincu
  Cc: netdev, tgraf, kuznet, pekkas, jmorris, yoshfuji, kaber, opurdila,
	ddvlad
In-Reply-To: <AANLkTin1LCgEpBNay_o4dzF1ONKanR1HRqcOu3AqqyHu@mail.gmail.com>

From: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
Date: Wed, 12 Jan 2011 12:19:10 +0200

> The IPV4_DEVCONF_* enums are never exposed to the userspace and it
> would make code simpler to remove all the useless (-1) adjustments.

Starting values like this at "1" is usually done on purpose.

It allows "0" to be illegal or mean "none", and thus easily trapping
cases where the value fails to be initialized properly.  In this way
the illegal sentinel "0" doesn't take up any space either.

^ permalink raw reply

* Re: [PATCH net-next-2.6] netdev: ucc_geth: Use is_multicast_ether_addr helper
From: David Miller @ 2011-01-13  2:46 UTC (permalink / raw)
  To: tklauser; +Cc: leoli, netdev, linuxppc-dev
In-Reply-To: <1294824661-10564-1-git-send-email-tklauser@distanz.ch>

From: Tobias Klauser <tklauser@distanz.ch>
Date: Wed, 12 Jan 2011 10:31:01 +0100

> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6] netdev: bfin_mac: Use is_multicast_ether_addr helper
From: David Miller @ 2011-01-13  2:46 UTC (permalink / raw)
  To: tklauser; +Cc: michael.hennerich, uclinux-dist-devel, netdev
In-Reply-To: <1294824611-10483-1-git-send-email-tklauser@distanz.ch>

From: Tobias Klauser <tklauser@distanz.ch>
Date: Wed, 12 Jan 2011 10:30:11 +0100

> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>

Applied.

^ permalink raw reply

* Re: [PATCH net-next-2.6] netdev: tilepro: Use is_multicast_ether_addr helper
From: David Miller @ 2011-01-13  2:45 UTC (permalink / raw)
  To: cmetcalf; +Cc: tklauser, netdev
In-Reply-To: <4D2DE98F.1090705@tilera.com>

From: Chris Metcalf <cmetcalf@tilera.com>
Date: Wed, 12 Jan 2011 12:49:03 -0500

> On 1/12/2011 4:31 AM, Tobias Klauser wrote:
>> Use is_multicast_ether_addr from linux/etherdevice.h instead of a custom
>> macro. Also remove the broadcast address check, as it is considered a
>> multicast address too.
>>
>> Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
>> ---
>>  drivers/net/tile/tilepro.c |   10 +---------
>>  1 files changed, 1 insertions(+), 9 deletions(-)
> 
> Thanks, I've taken this into the Tilera tree!

Don't, his transformation is buggy.

You can't get rid of the broadcast check, it needs to be there.
Think about it.

^ permalink raw reply

* Re: [PATCH] ip: reuse ip_summed of first fragment for all subsequent fragments
From: David Miller @ 2011-01-13  2:42 UTC (permalink / raw)
  To: timo.lindfors; +Cc: netdev
In-Reply-To: <84mxn7czz4.fsf@sauna.l.org>

From: Timo Juhani Lindfors <timo.lindfors@iki.fi>
Date: Tue, 11 Jan 2011 15:49:19 +0200

> Anyways, socket option SO_NO_CHECK sets sk->sk_no_check. Could this be
> checked before calculating checksums of each fragment? Currently
> udp_push_pending_frames checks this but checksums have already been
> calculated at that point (and the only job left is to sum the
> checksums together). Here's a patch that works for me (=according to
> perf time is no longer spent calculating checksums) but probably
> should be reviewed carefully:

You're now not handling the code block above this one, guarded
by the "if (len <= 0)" check.

You seem to just be peppering checks all over the place rather
than coming up with a coherent, complete, fix for this problem.

^ permalink raw reply

* Re: [GIT PULL nf-next-2.6] ipvs namespaces
From: Simon Horman @ 2011-01-13  1:57 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel, lvs-devel, netdev, Patrick McHardy,
	Julian Anastasov, Hans Schillstrom
In-Reply-To: <4D2E0F32.8020602@netfilter.org>

On Wed, Jan 12, 2011 at 09:29:38PM +0100, Pablo Neira Ayuso wrote:
> Hi Simon,
> 
> On 06/01/11 07:15, Simon Horman wrote:
> > Hi Patrick,
> > 
> > please consider pulling
> > git://git.kernel.org/pub/scm/linux/kernel/git/horms/lvs-test-2.6.git master
> > in order to get the netns changes to IPVS from Hans Schillstrom.
> 
> I'm hitting some conflicts when pulling from your tree.
> 
> >From git://git.kernel.org/pub/scm/linux/kernel/git/horms/lvs-test-2.6
>  * branch            master     -> FETCH_HEAD
> Auto-merging include/linux/netfilter.h
> Auto-merging include/linux/skbuff.h
> Auto-merging include/net/netfilter/nf_conntrack.h
> Auto-merging net/core/skbuff.c
> Auto-merging net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c
> Auto-merging net/ipv6/netfilter/nf_conntrack_reasm.c
> Auto-merging net/netfilter/core.c
> Auto-merging net/netfilter/ipvs/ip_vs_ctl.c
> CONFLICT (content): Merge conflict in net/netfilter/ipvs/ip_vs_ctl.c

Hi Pablo,

I've sent a fresh series "[PATCH 00/22] ipvs namespaces v3.3" that should
resolve the conficts (I think).  More details in the cover email for that
series.


^ permalink raw reply

* [PATCH 16/22] IPVS: netns, connection hash got net as param.
From: Simon Horman @ 2011-01-13  1:53 UTC (permalink / raw)
  To: netfilter-devel, lvs-devel, netdev
  Cc: Patrick McHardy, Pablo Neira Ayuso, Julian Anastasov,
	Hans Schillstrom, Simon Horman
In-Reply-To: <1294883588-5683-1-git-send-email-horms@verge.net.au>

From: Hans Schillstrom <hans.schillstrom@ericsson.com>

Connection hash table is now name space aware.
i.e. net ptr >> 8 is xor:ed to the hash,
and this is the first param to be compared.
The net struct is 0xa40 in size ( a little bit smaller for 32 bit arch:s)
and cache-line aligned, so a ptr >> 5 might be a more clever solution ?

All lookups where net is compared uses net_eq() which returns 1 when netns
is disabled, and the compiler seems to do something clever in that case.

ip_vs_conn_fill_param() have *net as first param now.

Three new inlines added to keep conn struct smaller
when names space is disabled.
- ip_vs_conn_net()
- ip_vs_conn_net_set()
- ip_vs_conn_net_eq()

*v3
  moved net compare to the end in "fast path"

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
---
 include/net/ip_vs.h                     |   53 ++++++++++++---
 include/net/netns/ip_vs.h               |    2 +
 net/netfilter/ipvs/ip_vs_conn.c         |  112 +++++++++++++++++++------------
 net/netfilter/ipvs/ip_vs_core.c         |   15 +++--
 net/netfilter/ipvs/ip_vs_ftp.c          |   14 ++--
 net/netfilter/ipvs/ip_vs_nfct.c         |    6 +-
 net/netfilter/ipvs/ip_vs_proto_ah_esp.c |   15 +++--
 net/netfilter/ipvs/ip_vs_proto_sctp.c   |    2 +-
 net/netfilter/ipvs/ip_vs_proto_tcp.c    |    2 +-
 net/netfilter/ipvs/ip_vs_proto_udp.c    |    2 +-
 net/netfilter/ipvs/ip_vs_sync.c         |   13 ++--
 11 files changed, 153 insertions(+), 83 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 605d5db..f82c0ff 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -477,6 +477,7 @@ extern struct ip_vs_proto_data *ip_vs_proto_data_get(struct net *net,
 						     unsigned short proto);
 
 struct ip_vs_conn_param {
+	struct net			*net;
 	const union nf_inet_addr	*caddr;
 	const union nf_inet_addr	*vaddr;
 	__be16				cport;
@@ -494,17 +495,19 @@ struct ip_vs_conn_param {
  */
 struct ip_vs_conn {
 	struct list_head        c_list;         /* hashed list heads */
-
+#ifdef CONFIG_NET_NS
+	struct net              *net;           /* Name space */
+#endif
 	/* Protocol, addresses and port numbers */
-	u16                      af;		/* address family */
-	union nf_inet_addr       caddr;          /* client address */
-	union nf_inet_addr       vaddr;          /* virtual address */
-	union nf_inet_addr       daddr;          /* destination address */
-	volatile __u32           flags;          /* status flags */
-	__u32                    fwmark;         /* Fire wall mark from skb */
-	__be16                   cport;
-	__be16                   vport;
-	__be16                   dport;
+	u16                     af;             /* address family */
+	__be16                  cport;
+	__be16                  vport;
+	__be16                  dport;
+	__u32                   fwmark;         /* Fire wall mark from skb */
+	union nf_inet_addr      caddr;          /* client address */
+	union nf_inet_addr      vaddr;          /* virtual address */
+	union nf_inet_addr      daddr;          /* destination address */
+	volatile __u32          flags;          /* status flags */
 	__u16                   protocol;       /* Which protocol (TCP/UDP) */
 
 	/* counter and timer */
@@ -547,6 +550,33 @@ struct ip_vs_conn {
 	__u8			pe_data_len;
 };
 
+/*
+ *  To save some memory in conn table when name space is disabled.
+ */
+static inline struct net *ip_vs_conn_net(const struct ip_vs_conn *cp)
+{
+#ifdef CONFIG_NET_NS
+	return cp->net;
+#else
+	return &init_net;
+#endif
+}
+static inline void ip_vs_conn_net_set(struct ip_vs_conn *cp, struct net *net)
+{
+#ifdef CONFIG_NET_NS
+	cp->net = net;
+#endif
+}
+
+static inline int ip_vs_conn_net_eq(const struct ip_vs_conn *cp,
+				    struct net *net)
+{
+#ifdef CONFIG_NET_NS
+	return cp->net == net;
+#else
+	return 1;
+#endif
+}
 
 /*
  *	Extended internal versions of struct ip_vs_service_user and
@@ -796,13 +826,14 @@ enum {
 	IP_VS_DIR_LAST,
 };
 
-static inline void ip_vs_conn_fill_param(int af, int protocol,
+static inline void ip_vs_conn_fill_param(struct net *net, int af, int protocol,
 					 const union nf_inet_addr *caddr,
 					 __be16 cport,
 					 const union nf_inet_addr *vaddr,
 					 __be16 vport,
 					 struct ip_vs_conn_param *p)
 {
+	p->net = net;
 	p->af = af;
 	p->protocol = protocol;
 	p->caddr = caddr;
diff --git a/include/net/netns/ip_vs.h b/include/net/netns/ip_vs.h
index bd1dad8..1acfb33 100644
--- a/include/net/netns/ip_vs.h
+++ b/include/net/netns/ip_vs.h
@@ -66,6 +66,8 @@ struct netns_ipvs {
 	struct ip_vs_cpu_stats __percpu *cpustats;   /* Stats per cpu */
 	seqcount_t			*ustats_seq; /* u64 read retry */
 
+	/* ip_vs_conn */
+	atomic_t		conn_count;         /*  connection counter */
 	/* ip_vs_lblc */
 	int			sysctl_lblc_expiration;
 	struct ctl_table_header	*lblc_ctl_header;
diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index b2024c9..0d5e4fe 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -64,9 +64,6 @@ static struct list_head *ip_vs_conn_tab __read_mostly;
 /*  SLAB cache for IPVS connections */
 static struct kmem_cache *ip_vs_conn_cachep __read_mostly;
 
-/*  counter for current IPVS connections */
-static atomic_t ip_vs_conn_count = ATOMIC_INIT(0);
-
 /*  counter for no client port connections */
 static atomic_t ip_vs_conn_no_cport_cnt = ATOMIC_INIT(0);
 
@@ -76,7 +73,7 @@ static unsigned int ip_vs_conn_rnd __read_mostly;
 /*
  *  Fine locking granularity for big connection hash table
  */
-#define CT_LOCKARRAY_BITS  4
+#define CT_LOCKARRAY_BITS  5
 #define CT_LOCKARRAY_SIZE  (1<<CT_LOCKARRAY_BITS)
 #define CT_LOCKARRAY_MASK  (CT_LOCKARRAY_SIZE-1)
 
@@ -133,19 +130,19 @@ static inline void ct_write_unlock_bh(unsigned key)
 /*
  *	Returns hash value for IPVS connection entry
  */
-static unsigned int ip_vs_conn_hashkey(int af, unsigned proto,
+static unsigned int ip_vs_conn_hashkey(struct net *net, int af, unsigned proto,
 				       const union nf_inet_addr *addr,
 				       __be16 port)
 {
 #ifdef CONFIG_IP_VS_IPV6
 	if (af == AF_INET6)
-		return jhash_3words(jhash(addr, 16, ip_vs_conn_rnd),
-				    (__force u32)port, proto, ip_vs_conn_rnd)
-			& ip_vs_conn_tab_mask;
+		return (jhash_3words(jhash(addr, 16, ip_vs_conn_rnd),
+				    (__force u32)port, proto, ip_vs_conn_rnd) ^
+			((size_t)net>>8)) & ip_vs_conn_tab_mask;
 #endif
-	return jhash_3words((__force u32)addr->ip, (__force u32)port, proto,
-			    ip_vs_conn_rnd)
-		& ip_vs_conn_tab_mask;
+	return (jhash_3words((__force u32)addr->ip, (__force u32)port, proto,
+			    ip_vs_conn_rnd) ^
+		((size_t)net>>8)) & ip_vs_conn_tab_mask;
 }
 
 static unsigned int ip_vs_conn_hashkey_param(const struct ip_vs_conn_param *p,
@@ -166,15 +163,15 @@ static unsigned int ip_vs_conn_hashkey_param(const struct ip_vs_conn_param *p,
 		port = p->vport;
 	}
 
-	return ip_vs_conn_hashkey(p->af, p->protocol, addr, port);
+	return ip_vs_conn_hashkey(p->net, p->af, p->protocol, addr, port);
 }
 
 static unsigned int ip_vs_conn_hashkey_conn(const struct ip_vs_conn *cp)
 {
 	struct ip_vs_conn_param p;
 
-	ip_vs_conn_fill_param(cp->af, cp->protocol, &cp->caddr, cp->cport,
-			      NULL, 0, &p);
+	ip_vs_conn_fill_param(ip_vs_conn_net(cp), cp->af, cp->protocol,
+			      &cp->caddr, cp->cport, NULL, 0, &p);
 
 	if (cp->pe) {
 		p.pe = cp->pe;
@@ -186,7 +183,7 @@ static unsigned int ip_vs_conn_hashkey_conn(const struct ip_vs_conn *cp)
 }
 
 /*
- *	Hashes ip_vs_conn in ip_vs_conn_tab by proto,addr,port.
+ *	Hashes ip_vs_conn in ip_vs_conn_tab by netns,proto,addr,port.
  *	returns bool success.
  */
 static inline int ip_vs_conn_hash(struct ip_vs_conn *cp)
@@ -269,11 +266,12 @@ __ip_vs_conn_in_get(const struct ip_vs_conn_param *p)
 
 	list_for_each_entry(cp, &ip_vs_conn_tab[hash], c_list) {
 		if (cp->af == p->af &&
+		    p->cport == cp->cport && p->vport == cp->vport &&
 		    ip_vs_addr_equal(p->af, p->caddr, &cp->caddr) &&
 		    ip_vs_addr_equal(p->af, p->vaddr, &cp->vaddr) &&
-		    p->cport == cp->cport && p->vport == cp->vport &&
 		    ((!p->cport) ^ (!(cp->flags & IP_VS_CONN_F_NO_CPORT))) &&
-		    p->protocol == cp->protocol) {
+		    p->protocol == cp->protocol &&
+		    ip_vs_conn_net_eq(cp, p->net)) {
 			/* HIT */
 			atomic_inc(&cp->refcnt);
 			ct_read_unlock(hash);
@@ -313,17 +311,18 @@ ip_vs_conn_fill_param_proto(int af, const struct sk_buff *skb,
 			    struct ip_vs_conn_param *p)
 {
 	__be16 _ports[2], *pptr;
+	struct net *net = skb_net(skb);
 
 	pptr = skb_header_pointer(skb, proto_off, sizeof(_ports), _ports);
 	if (pptr == NULL)
 		return 1;
 
 	if (likely(!inverse))
-		ip_vs_conn_fill_param(af, iph->protocol, &iph->saddr, pptr[0],
-				      &iph->daddr, pptr[1], p);
+		ip_vs_conn_fill_param(net, af, iph->protocol, &iph->saddr,
+				      pptr[0], &iph->daddr, pptr[1], p);
 	else
-		ip_vs_conn_fill_param(af, iph->protocol, &iph->daddr, pptr[1],
-				      &iph->saddr, pptr[0], p);
+		ip_vs_conn_fill_param(net, af, iph->protocol, &iph->daddr,
+				      pptr[1], &iph->saddr, pptr[0], p);
 	return 0;
 }
 
@@ -352,6 +351,8 @@ struct ip_vs_conn *ip_vs_ct_in_get(const struct ip_vs_conn_param *p)
 	ct_read_lock(hash);
 
 	list_for_each_entry(cp, &ip_vs_conn_tab[hash], c_list) {
+		if (!ip_vs_conn_net_eq(cp, p->net))
+			continue;
 		if (p->pe_data && p->pe->ct_match) {
 			if (p->pe == cp->pe && p->pe->ct_match(p, cp))
 				goto out;
@@ -403,10 +404,11 @@ struct ip_vs_conn *ip_vs_conn_out_get(const struct ip_vs_conn_param *p)
 
 	list_for_each_entry(cp, &ip_vs_conn_tab[hash], c_list) {
 		if (cp->af == p->af &&
+		    p->vport == cp->cport && p->cport == cp->dport &&
 		    ip_vs_addr_equal(p->af, p->vaddr, &cp->caddr) &&
 		    ip_vs_addr_equal(p->af, p->caddr, &cp->daddr) &&
-		    p->vport == cp->cport && p->cport == cp->dport &&
-		    p->protocol == cp->protocol) {
+		    p->protocol == cp->protocol &&
+		    ip_vs_conn_net_eq(cp, p->net)) {
 			/* HIT */
 			atomic_inc(&cp->refcnt);
 			ret = cp;
@@ -609,8 +611,8 @@ struct ip_vs_dest *ip_vs_try_bind_dest(struct ip_vs_conn *cp)
 	struct ip_vs_dest *dest;
 
 	if ((cp) && (!cp->dest)) {
-		dest = ip_vs_find_dest(&init_net, cp->af, &cp->daddr, cp->dport,
-				       &cp->vaddr, cp->vport,
+		dest = ip_vs_find_dest(ip_vs_conn_net(cp), cp->af, &cp->daddr,
+				       cp->dport, &cp->vaddr, cp->vport,
 				       cp->protocol, cp->fwmark);
 		ip_vs_bind_dest(cp, dest);
 		return dest;
@@ -728,6 +730,7 @@ int ip_vs_check_template(struct ip_vs_conn *ct)
 static void ip_vs_conn_expire(unsigned long data)
 {
 	struct ip_vs_conn *cp = (struct ip_vs_conn *)data;
+	struct netns_ipvs *ipvs = net_ipvs(ip_vs_conn_net(cp));
 
 	cp->timeout = 60*HZ;
 
@@ -770,7 +773,7 @@ static void ip_vs_conn_expire(unsigned long data)
 		ip_vs_unbind_dest(cp);
 		if (cp->flags & IP_VS_CONN_F_NO_CPORT)
 			atomic_dec(&ip_vs_conn_no_cport_cnt);
-		atomic_dec(&ip_vs_conn_count);
+		atomic_dec(&ipvs->conn_count);
 
 		kmem_cache_free(ip_vs_conn_cachep, cp);
 		return;
@@ -804,7 +807,9 @@ ip_vs_conn_new(const struct ip_vs_conn_param *p,
 	       struct ip_vs_dest *dest, __u32 fwmark)
 {
 	struct ip_vs_conn *cp;
-	struct ip_vs_proto_data *pd = ip_vs_proto_data_get(&init_net, p->protocol);
+	struct netns_ipvs *ipvs = net_ipvs(p->net);
+	struct ip_vs_proto_data *pd = ip_vs_proto_data_get(p->net,
+							   p->protocol);
 
 	cp = kmem_cache_zalloc(ip_vs_conn_cachep, GFP_ATOMIC);
 	if (cp == NULL) {
@@ -814,6 +819,7 @@ ip_vs_conn_new(const struct ip_vs_conn_param *p,
 
 	INIT_LIST_HEAD(&cp->c_list);
 	setup_timer(&cp->timer, ip_vs_conn_expire, (unsigned long)cp);
+	ip_vs_conn_net_set(cp, p->net);
 	cp->af		   = p->af;
 	cp->protocol	   = p->protocol;
 	ip_vs_addr_copy(p->af, &cp->caddr, p->caddr);
@@ -844,7 +850,7 @@ ip_vs_conn_new(const struct ip_vs_conn_param *p,
 	atomic_set(&cp->n_control, 0);
 	atomic_set(&cp->in_pkts, 0);
 
-	atomic_inc(&ip_vs_conn_count);
+	atomic_inc(&ipvs->conn_count);
 	if (flags & IP_VS_CONN_F_NO_CPORT)
 		atomic_inc(&ip_vs_conn_no_cport_cnt);
 
@@ -886,17 +892,22 @@ ip_vs_conn_new(const struct ip_vs_conn_param *p,
  *	/proc/net/ip_vs_conn entries
  */
 #ifdef CONFIG_PROC_FS
+struct ip_vs_iter_state {
+	struct seq_net_private p;
+	struct list_head *l;
+};
 
 static void *ip_vs_conn_array(struct seq_file *seq, loff_t pos)
 {
 	int idx;
 	struct ip_vs_conn *cp;
+	struct ip_vs_iter_state *iter = seq->private;
 
 	for (idx = 0; idx < ip_vs_conn_tab_size; idx++) {
 		ct_read_lock_bh(idx);
 		list_for_each_entry(cp, &ip_vs_conn_tab[idx], c_list) {
 			if (pos-- == 0) {
-				seq->private = &ip_vs_conn_tab[idx];
+				iter->l = &ip_vs_conn_tab[idx];
 			return cp;
 			}
 		}
@@ -908,14 +919,17 @@ static void *ip_vs_conn_array(struct seq_file *seq, loff_t pos)
 
 static void *ip_vs_conn_seq_start(struct seq_file *seq, loff_t *pos)
 {
-	seq->private = NULL;
+	struct ip_vs_iter_state *iter = seq->private;
+
+	iter->l = NULL;
 	return *pos ? ip_vs_conn_array(seq, *pos - 1) :SEQ_START_TOKEN;
 }
 
 static void *ip_vs_conn_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
 	struct ip_vs_conn *cp = v;
-	struct list_head *e, *l = seq->private;
+	struct ip_vs_iter_state *iter = seq->private;
+	struct list_head *e, *l = iter->l;
 	int idx;
 
 	++*pos;
@@ -932,18 +946,19 @@ static void *ip_vs_conn_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 	while (++idx < ip_vs_conn_tab_size) {
 		ct_read_lock_bh(idx);
 		list_for_each_entry(cp, &ip_vs_conn_tab[idx], c_list) {
-			seq->private = &ip_vs_conn_tab[idx];
+			iter->l = &ip_vs_conn_tab[idx];
 			return cp;
 		}
 		ct_read_unlock_bh(idx);
 	}
-	seq->private = NULL;
+	iter->l = NULL;
 	return NULL;
 }
 
 static void ip_vs_conn_seq_stop(struct seq_file *seq, void *v)
 {
-	struct list_head *l = seq->private;
+	struct ip_vs_iter_state *iter = seq->private;
+	struct list_head *l = iter->l;
 
 	if (l)
 		ct_read_unlock_bh(l - ip_vs_conn_tab);
@@ -957,9 +972,12 @@ static int ip_vs_conn_seq_show(struct seq_file *seq, void *v)
    "Pro FromIP   FPrt ToIP     TPrt DestIP   DPrt State       Expires PEName PEData\n");
 	else {
 		const struct ip_vs_conn *cp = v;
+		struct net *net = seq_file_net(seq);
 		char pe_data[IP_VS_PENAME_MAXLEN + IP_VS_PEDATA_MAXLEN + 3];
 		size_t len = 0;
 
+		if (!ip_vs_conn_net_eq(cp, net))
+			return 0;
 		if (cp->pe_data) {
 			pe_data[0] = ' ';
 			len = strlen(cp->pe->name);
@@ -1004,7 +1022,8 @@ static const struct seq_operations ip_vs_conn_seq_ops = {
 
 static int ip_vs_conn_open(struct inode *inode, struct file *file)
 {
-	return seq_open(file, &ip_vs_conn_seq_ops);
+	return seq_open_net(inode, file, &ip_vs_conn_seq_ops,
+			    sizeof(struct ip_vs_iter_state));
 }
 
 static const struct file_operations ip_vs_conn_fops = {
@@ -1031,6 +1050,10 @@ static int ip_vs_conn_sync_seq_show(struct seq_file *seq, void *v)
    "Pro FromIP   FPrt ToIP     TPrt DestIP   DPrt State       Origin Expires\n");
 	else {
 		const struct ip_vs_conn *cp = v;
+		struct net *net = seq_file_net(seq);
+
+		if (!ip_vs_conn_net_eq(cp, net))
+			return 0;
 
 #ifdef CONFIG_IP_VS_IPV6
 		if (cp->af == AF_INET6)
@@ -1067,7 +1090,8 @@ static const struct seq_operations ip_vs_conn_sync_seq_ops = {
 
 static int ip_vs_conn_sync_open(struct inode *inode, struct file *file)
 {
-	return seq_open(file, &ip_vs_conn_sync_seq_ops);
+	return seq_open_net(inode, file, &ip_vs_conn_sync_seq_ops,
+			    sizeof(struct ip_vs_iter_state));
 }
 
 static const struct file_operations ip_vs_conn_sync_fops = {
@@ -1168,10 +1192,11 @@ void ip_vs_random_dropentry(void)
 /*
  *      Flush all the connection entries in the ip_vs_conn_tab
  */
-static void ip_vs_conn_flush(void)
+static void ip_vs_conn_flush(struct net *net)
 {
 	int idx;
 	struct ip_vs_conn *cp;
+	struct netns_ipvs *ipvs = net_ipvs(net);
 
   flush_again:
 	for (idx = 0; idx < ip_vs_conn_tab_size; idx++) {
@@ -1181,7 +1206,8 @@ static void ip_vs_conn_flush(void)
 		ct_write_lock_bh(idx);
 
 		list_for_each_entry(cp, &ip_vs_conn_tab[idx], c_list) {
-
+			if (!ip_vs_conn_net_eq(cp, net))
+				continue;
 			IP_VS_DBG(4, "del connection\n");
 			ip_vs_conn_expire_now(cp);
 			if (cp->control) {
@@ -1194,7 +1220,7 @@ static void ip_vs_conn_flush(void)
 
 	/* the counter may be not NULL, because maybe some conn entries
 	   are run by slow timer handler or unhashed but still referred */
-	if (atomic_read(&ip_vs_conn_count) != 0) {
+	if (atomic_read(&ipvs->conn_count) != 0) {
 		schedule();
 		goto flush_again;
 	}
@@ -1204,8 +1230,11 @@ static void ip_vs_conn_flush(void)
  */
 int __net_init __ip_vs_conn_init(struct net *net)
 {
+	struct netns_ipvs *ipvs = net_ipvs(net);
+
 	if (!net_eq(net, &init_net))	/* netns not enabled yet */
 		return -EPERM;
+	atomic_set(&ipvs->conn_count, 0);
 
 	proc_net_fops_create(net, "ip_vs_conn", 0, &ip_vs_conn_fops);
 	proc_net_fops_create(net, "ip_vs_conn_sync", 0, &ip_vs_conn_sync_fops);
@@ -1217,6 +1246,8 @@ static void __net_exit __ip_vs_conn_cleanup(struct net *net)
 	if (!net_eq(net, &init_net))	/* netns not enabled yet */
 		return;
 
+	/* flush all the connection entries first */
+	ip_vs_conn_flush(net);
 	proc_net_remove(net, "ip_vs_conn");
 	proc_net_remove(net, "ip_vs_conn_sync");
 }
@@ -1277,9 +1308,6 @@ int __init ip_vs_conn_init(void)
 void ip_vs_conn_cleanup(void)
 {
 	unregister_pernet_subsys(&ipvs_conn_ops);
-	/* flush all the connection entries first */
-	ip_vs_conn_flush();
-
 	/* Release the empty cache */
 	kmem_cache_destroy(ip_vs_conn_cachep);
 	vfree(ip_vs_conn_tab);
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 7e6a2a0..7205b49 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -205,7 +205,8 @@ ip_vs_conn_fill_param_persist(const struct ip_vs_service *svc,
 			      const union nf_inet_addr *vaddr, __be16 vport,
 			      struct ip_vs_conn_param *p)
 {
-	ip_vs_conn_fill_param(svc->af, protocol, caddr, cport, vaddr, vport, p);
+	ip_vs_conn_fill_param(svc->net, svc->af, protocol, caddr, cport, vaddr,
+			      vport, p);
 	p->pe = svc->pe;
 	if (p->pe && p->pe->fill_param)
 		return p->pe->fill_param(p, skb);
@@ -348,8 +349,8 @@ ip_vs_sched_persist(struct ip_vs_service *svc,
 	/*
 	 *    Create a new connection according to the template
 	 */
-	ip_vs_conn_fill_param(svc->af, iph.protocol, &iph.saddr, src_port,
-			      &iph.daddr, dst_port, &param);
+	ip_vs_conn_fill_param(svc->net, svc->af, iph.protocol, &iph.saddr,
+			      src_port, &iph.daddr, dst_port, &param);
 
 	cp = ip_vs_conn_new(&param, &dest->addr, dport, flags, dest, skb->mark);
 	if (cp == NULL) {
@@ -464,8 +465,10 @@ ip_vs_schedule(struct ip_vs_service *svc, struct sk_buff *skb,
 	 */
 	{
 		struct ip_vs_conn_param p;
-		ip_vs_conn_fill_param(svc->af, iph.protocol, &iph.saddr,
-				      pptr[0], &iph.daddr, pptr[1], &p);
+
+		ip_vs_conn_fill_param(svc->net, svc->af, iph.protocol,
+				      &iph.saddr, pptr[0], &iph.daddr, pptr[1],
+				      &p);
 		cp = ip_vs_conn_new(&p, &dest->addr,
 				    dest->port ? dest->port : pptr[1],
 				    flags, dest, skb->mark);
@@ -532,7 +535,7 @@ int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff *skb,
 		IP_VS_DBG(6, "%s(): create a cache_bypass entry\n", __func__);
 		{
 			struct ip_vs_conn_param p;
-			ip_vs_conn_fill_param(svc->af, iph.protocol,
+			ip_vs_conn_fill_param(svc->net, svc->af, iph.protocol,
 					      &iph.saddr, pptr[0],
 					      &iph.daddr, pptr[1], &p);
 			cp = ip_vs_conn_new(&p, &daddr, 0,
diff --git a/net/netfilter/ipvs/ip_vs_ftp.c b/net/netfilter/ipvs/ip_vs_ftp.c
index 77b0036..6a04f9a 100644
--- a/net/netfilter/ipvs/ip_vs_ftp.c
+++ b/net/netfilter/ipvs/ip_vs_ftp.c
@@ -198,13 +198,15 @@ static int ip_vs_ftp_out(struct ip_vs_app *app, struct ip_vs_conn *cp,
 		 */
 		{
 			struct ip_vs_conn_param p;
-			ip_vs_conn_fill_param(AF_INET, iph->protocol,
-					      &from, port, &cp->caddr, 0, &p);
+			ip_vs_conn_fill_param(ip_vs_conn_net(cp), AF_INET,
+					      iph->protocol, &from, port,
+					      &cp->caddr, 0, &p);
 			n_cp = ip_vs_conn_out_get(&p);
 		}
 		if (!n_cp) {
 			struct ip_vs_conn_param p;
-			ip_vs_conn_fill_param(AF_INET, IPPROTO_TCP, &cp->caddr,
+			ip_vs_conn_fill_param(ip_vs_conn_net(cp),
+					      AF_INET, IPPROTO_TCP, &cp->caddr,
 					      0, &cp->vaddr, port, &p);
 			n_cp = ip_vs_conn_new(&p, &from, port,
 					      IP_VS_CONN_F_NO_CPORT |
@@ -361,9 +363,9 @@ static int ip_vs_ftp_in(struct ip_vs_app *app, struct ip_vs_conn *cp,
 
 	{
 		struct ip_vs_conn_param p;
-		ip_vs_conn_fill_param(AF_INET, iph->protocol, &to, port,
-				      &cp->vaddr, htons(ntohs(cp->vport)-1),
-				      &p);
+		ip_vs_conn_fill_param(ip_vs_conn_net(cp), AF_INET,
+				      iph->protocol, &to, port, &cp->vaddr,
+				      htons(ntohs(cp->vport)-1), &p);
 		n_cp = ip_vs_conn_in_get(&p);
 		if (!n_cp) {
 			n_cp = ip_vs_conn_new(&p, &cp->daddr,
diff --git a/net/netfilter/ipvs/ip_vs_nfct.c b/net/netfilter/ipvs/ip_vs_nfct.c
index 4680647..f454c80 100644
--- a/net/netfilter/ipvs/ip_vs_nfct.c
+++ b/net/netfilter/ipvs/ip_vs_nfct.c
@@ -141,6 +141,7 @@ static void ip_vs_nfct_expect_callback(struct nf_conn *ct,
 	struct nf_conntrack_tuple *orig, new_reply;
 	struct ip_vs_conn *cp;
 	struct ip_vs_conn_param p;
+	struct net *net = nf_ct_net(ct);
 
 	if (exp->tuple.src.l3num != PF_INET)
 		return;
@@ -155,7 +156,7 @@ static void ip_vs_nfct_expect_callback(struct nf_conn *ct,
 
 	/* RS->CLIENT */
 	orig = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
-	ip_vs_conn_fill_param(exp->tuple.src.l3num, orig->dst.protonum,
+	ip_vs_conn_fill_param(net, exp->tuple.src.l3num, orig->dst.protonum,
 			      &orig->src.u3, orig->src.u.tcp.port,
 			      &orig->dst.u3, orig->dst.u.tcp.port, &p);
 	cp = ip_vs_conn_out_get(&p);
@@ -268,7 +269,8 @@ void ip_vs_conn_drop_conntrack(struct ip_vs_conn *cp)
 		" for conn " FMT_CONN "\n",
 		__func__, ARG_TUPLE(&tuple), ARG_CONN(cp));
 
-	h = nf_conntrack_find_get(&init_net, NF_CT_DEFAULT_ZONE, &tuple);
+	h = nf_conntrack_find_get(ip_vs_conn_net(cp), NF_CT_DEFAULT_ZONE,
+				  &tuple);
 	if (h) {
 		ct = nf_ct_tuplehash_to_ctrack(h);
 		/* Show what happens instead of calling nf_ct_kill() */
diff --git a/net/netfilter/ipvs/ip_vs_proto_ah_esp.c b/net/netfilter/ipvs/ip_vs_proto_ah_esp.c
index 28039cb..5b8eb8b 100644
--- a/net/netfilter/ipvs/ip_vs_proto_ah_esp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_ah_esp.c
@@ -41,15 +41,16 @@ struct isakmp_hdr {
 #define PORT_ISAKMP	500
 
 static void
-ah_esp_conn_fill_param_proto(int af, const struct ip_vs_iphdr *iph,
-			     int inverse, struct ip_vs_conn_param *p)
+ah_esp_conn_fill_param_proto(struct net *net, int af,
+			     const struct ip_vs_iphdr *iph, int inverse,
+			     struct ip_vs_conn_param *p)
 {
 	if (likely(!inverse))
-		ip_vs_conn_fill_param(af, IPPROTO_UDP,
+		ip_vs_conn_fill_param(net, af, IPPROTO_UDP,
 				      &iph->saddr, htons(PORT_ISAKMP),
 				      &iph->daddr, htons(PORT_ISAKMP), p);
 	else
-		ip_vs_conn_fill_param(af, IPPROTO_UDP,
+		ip_vs_conn_fill_param(net, af, IPPROTO_UDP,
 				      &iph->daddr, htons(PORT_ISAKMP),
 				      &iph->saddr, htons(PORT_ISAKMP), p);
 }
@@ -61,8 +62,9 @@ ah_esp_conn_in_get(int af, const struct sk_buff *skb,
 {
 	struct ip_vs_conn *cp;
 	struct ip_vs_conn_param p;
+	struct net *net = skb_net(skb);
 
-	ah_esp_conn_fill_param_proto(af, iph, inverse, &p);
+	ah_esp_conn_fill_param_proto(net, af, iph, inverse, &p);
 	cp = ip_vs_conn_in_get(&p);
 	if (!cp) {
 		/*
@@ -89,8 +91,9 @@ ah_esp_conn_out_get(int af, const struct sk_buff *skb,
 {
 	struct ip_vs_conn *cp;
 	struct ip_vs_conn_param p;
+	struct net *net = skb_net(skb);
 
-	ah_esp_conn_fill_param_proto(af, iph, inverse, &p);
+	ah_esp_conn_fill_param_proto(net, af, iph, inverse, &p);
 	cp = ip_vs_conn_out_get(&p);
 	if (!cp) {
 		IP_VS_DBG_BUF(12, "Unknown ISAKMP entry for inout packet "
diff --git a/net/netfilter/ipvs/ip_vs_proto_sctp.c b/net/netfilter/ipvs/ip_vs_proto_sctp.c
index 569e77b..550365a 100644
--- a/net/netfilter/ipvs/ip_vs_proto_sctp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_sctp.c
@@ -1055,7 +1055,7 @@ static void sctp_unregister_app(struct net *net, struct ip_vs_app *inc)
 
 static int sctp_app_conn_bind(struct ip_vs_conn *cp)
 {
-	struct netns_ipvs *ipvs = net_ipvs(&init_net);
+	struct netns_ipvs *ipvs = net_ipvs(ip_vs_conn_net(cp));
 	int hash;
 	struct ip_vs_app *inc;
 	int result = 0;
diff --git a/net/netfilter/ipvs/ip_vs_proto_tcp.c b/net/netfilter/ipvs/ip_vs_proto_tcp.c
index 757aaaf..d8b3f9f 100644
--- a/net/netfilter/ipvs/ip_vs_proto_tcp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_tcp.c
@@ -620,7 +620,7 @@ tcp_unregister_app(struct net *net, struct ip_vs_app *inc)
 static int
 tcp_app_conn_bind(struct ip_vs_conn *cp)
 {
-	struct netns_ipvs *ipvs = net_ipvs(&init_net);
+	struct netns_ipvs *ipvs = net_ipvs(ip_vs_conn_net(cp));
 	int hash;
 	struct ip_vs_app *inc;
 	int result = 0;
diff --git a/net/netfilter/ipvs/ip_vs_proto_udp.c b/net/netfilter/ipvs/ip_vs_proto_udp.c
index 1dc3941..581157b 100644
--- a/net/netfilter/ipvs/ip_vs_proto_udp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_udp.c
@@ -396,7 +396,7 @@ udp_unregister_app(struct net *net, struct ip_vs_app *inc)
 
 static int udp_app_conn_bind(struct ip_vs_conn *cp)
 {
-	struct netns_ipvs *ipvs = net_ipvs(&init_net);
+	struct netns_ipvs *ipvs = net_ipvs(ip_vs_conn_net(cp));
 	int hash;
 	struct ip_vs_app *inc;
 	int result = 0;
diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index c29e73d..f85e47d 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -660,21 +660,21 @@ control:
  *  fill_param used by version 1
  */
 static inline int
-ip_vs_conn_fill_param_sync(int af, union ip_vs_sync_conn *sc,
+ip_vs_conn_fill_param_sync(struct net *net, int af, union ip_vs_sync_conn *sc,
 			   struct ip_vs_conn_param *p,
 			   __u8 *pe_data, unsigned int pe_data_len,
 			   __u8 *pe_name, unsigned int pe_name_len)
 {
 #ifdef CONFIG_IP_VS_IPV6
 	if (af == AF_INET6)
-		ip_vs_conn_fill_param(af, sc->v6.protocol,
+		ip_vs_conn_fill_param(net, af, sc->v6.protocol,
 				      (const union nf_inet_addr *)&sc->v6.caddr,
 				      sc->v6.cport,
 				      (const union nf_inet_addr *)&sc->v6.vaddr,
 				      sc->v6.vport, p);
 	else
 #endif
-		ip_vs_conn_fill_param(af, sc->v4.protocol,
+		ip_vs_conn_fill_param(net, af, sc->v4.protocol,
 				      (const union nf_inet_addr *)&sc->v4.caddr,
 				      sc->v4.cport,
 				      (const union nf_inet_addr *)&sc->v4.vaddr,
@@ -881,7 +881,7 @@ static void ip_vs_process_message_v0(struct net *net, const char *buffer,
 			}
 		}
 
-		ip_vs_conn_fill_param(AF_INET, s->protocol,
+		ip_vs_conn_fill_param(net, AF_INET, s->protocol,
 				      (const union nf_inet_addr *)&s->caddr,
 				      s->cport,
 				      (const union nf_inet_addr *)&s->vaddr,
@@ -1043,9 +1043,8 @@ static inline int ip_vs_proc_sync_conn(struct net *net, __u8 *p, __u8 *msg_end)
 			state = 0;
 		}
 	}
-	if (ip_vs_conn_fill_param_sync(af, s, &param,
-					pe_data, pe_data_len,
-					pe_name, pe_name_len)) {
+	if (ip_vs_conn_fill_param_sync(net, af, s, &param, pe_data,
+				       pe_data_len, pe_name, pe_name_len)) {
 		retc = 50;
 		goto out;
 	}
-- 
1.7.2.3


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox