Netdev List
 help / color / mirror / Atom feed
* Re: Linux 2.6.35/TIPC 2.0 ABI breaking changes
From: Neil Horman @ 2010-10-19 11:04 UTC (permalink / raw)
  To: David Miller
  Cc: jon.maloy, netdev, linux-kernel, luca, paul.gortmaker,
	tipc-discussion
In-Reply-To: <20101019.011649.71113115.davem@davemloft.net>

On Tue, Oct 19, 2010 at 01:16:49AM -0700, David Miller wrote:
> From: Leandro Lucarella <luca@llucax.com.ar>
> Date: Mon, 18 Oct 2010 23:16:57 -0300
> 
> > 
> > The problem is not between the tipc stacks in different hosts, is
> > between the tipc stack and the applications using it (well, maybe
> > there is a problem somewhere else too).
> > 
> > This was a deliberate API change, not a subtle bug...
> 
> Neil et al., if these packets live only between the kernel stack
> and the userspace API layer, we should not be byte-swapping this
> stuff and we need to fix this fast.
> 
Copy that Dave.  I think I see the problem.  The subscription code handles
messages both off the wire and from local user space.  The off the wire case
should work because the subscription code assumes that all the incomming data is
in network byte order, but user space is an exception to that rule as its in
local byte order.  I'll have a patch together for Leandro to test soon.
Neil

> Thanks.
> 

------------------------------------------------------------------------------
Download new Adobe(R) Flash(R) Builder(TM) 4
The new Adobe(R) Flex(R) 4 and Flash(R) Builder(TM) 4 (formerly 
Flex(R) Builder(TM)) enable the development of rich applications that run
across multiple browsers and platforms. Download your free trials today!
http://p.sf.net/sfu/adobe-dev2dev

^ permalink raw reply

* Re: [patch 1/1] phy/marvell: fix 88e1121 support
From: Cyril Chemparathy @ 2010-10-19 10:52 UTC (permalink / raw)
  To: Arnaud Patard; +Cc: netdev@vger.kernel.org, David S. Miller
In-Reply-To: <20101018223056.716357140@rtp-net.org>

Hi Arnaud,

On 10/18/2010 06:29 PM, Arnaud Patard wrote:
> Commit c477d0447db08068a497e7beb892b2b2a7bff64b added support for RGMII
> rx/tx delays except that it ends up clearing rx/tx delays bit for modes
> differents that RGMII*ID. Due to this, ethernet is not working anymore
> on my guruplug server +. This patch is fixing that.
> 
> Signed-off-by: Arnaud Patard <arnaud.patard@rtp-net.org>
> Index: linux-2.6/drivers/net/phy/marvell.c
> ===================================================================
> --- linux-2.6.orig/drivers/net/phy/marvell.c	2010-10-18 22:46:09.000000000 +0200
> +++ linux-2.6/drivers/net/phy/marvell.c	2010-10-19 00:20:22.000000000 +0200
> @@ -196,20 +196,27 @@
>  			MII_88E1121_PHY_MSCR_PAGE);
>  	if (err < 0)
>  		return err;
> -	mscr = phy_read(phydev, MII_88E1121_PHY_MSCR_REG) &
> -		MII_88E1121_PHY_MSCR_DELAY_MASK;
>  
> -	if (phydev->interface == PHY_INTERFACE_MODE_RGMII_ID)
> -		mscr |= (MII_88E1121_PHY_MSCR_RX_DELAY |
> -			 MII_88E1121_PHY_MSCR_TX_DELAY);
> -	else if (phydev->interface == PHY_INTERFACE_MODE_RGMII_RXID)
> -		mscr |= MII_88E1121_PHY_MSCR_RX_DELAY;
> -	else if (phydev->interface == PHY_INTERFACE_MODE_RGMII_TXID)
> -		mscr |= MII_88E1121_PHY_MSCR_TX_DELAY;
> +	if ((phydev->interface == PHY_INTERFACE_MODE_RGMII) ||
> +	    (phydev->interface == PHY_INTERFACE_MODE_RGMII_ID) ||
> +	    (phydev->interface == PHY_INTERFACE_MODE_RGMII_RXID) ||
> +	    (phydev->interface == PHY_INTERFACE_MODE_RGMII_TXID)) {
>  
> -	err = phy_write(phydev, MII_88E1121_PHY_MSCR_REG, mscr);
> -	if (err < 0)
> -		return err;
> +		mscr = phy_read(phydev, MII_88E1121_PHY_MSCR_REG) &
> +			MII_88E1121_PHY_MSCR_DELAY_MASK;
> +
> +		if (phydev->interface == PHY_INTERFACE_MODE_RGMII_ID)
> +			mscr |= (MII_88E1121_PHY_MSCR_RX_DELAY |
> +				 MII_88E1121_PHY_MSCR_TX_DELAY);
> +		else if (phydev->interface == PHY_INTERFACE_MODE_RGMII_RXID)
> +			mscr |= MII_88E1121_PHY_MSCR_RX_DELAY;
> +		else if (phydev->interface == PHY_INTERFACE_MODE_RGMII_TXID)
> +			mscr |= MII_88E1121_PHY_MSCR_TX_DELAY;
> +
> +		err = phy_write(phydev, MII_88E1121_PHY_MSCR_REG, mscr);
> +		if (err < 0)
> +			return err;
> +	}
>  
>  	phy_write(phydev, MII_88E1121_PHY_PAGE, oldpage);

That looks more correct.  Just out of curiosity, what is the interface
mode on your platform?

Thanks
Cyril.

^ permalink raw reply

* Re: [PATCH net-next 2/2] inet: RCU changes in inetdev_by_index()
From: David Miller @ 2010-10-19 10:51 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1287484766.2676.74.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 19 Oct 2010 12:39:26 +0200

> Convert inetdev_by_index() to not increment in_dev refcount.
> 
> Callers hold RCU or RTNL, and should not decrement in_dev refcount.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next 1/2] net: avoid a dev refcount in ip_mc_find_dev()
From: David Miller @ 2010-10-19 10:51 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1287484758.2676.73.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 19 Oct 2010 12:39:18 +0200

> We hold RTNL in ip_mc_find_dev(), no need to touch device refcount.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Applied.

^ permalink raw reply

* [PATCH net-next 2/2] inet: RCU changes in inetdev_by_index()
From: Eric Dumazet @ 2010-10-19 10:39 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Convert inetdev_by_index() to not increment in_dev refcount.

Callers hold RCU or RTNL, and should not decrement in_dev refcount.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv4/devinet.c       |    7 ++++---
 net/ipv4/fib_semantics.c |   25 +++++++++++--------------
 net/ipv4/igmp.c          |    2 --
 net/ipv4/ip_gre.c        |    4 +---
 4 files changed, 16 insertions(+), 22 deletions(-)

diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index c2ff48f..dc94b03 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -403,6 +403,9 @@ static int inet_set_ifa(struct net_device *dev, struct in_ifaddr *ifa)
 	return inet_insert_ifa(ifa);
 }
 
+/* Caller must hold RCU or RTNL :
+ * We dont take a reference on found in_device
+ */
 struct in_device *inetdev_by_index(struct net *net, int ifindex)
 {
 	struct net_device *dev;
@@ -411,7 +414,7 @@ struct in_device *inetdev_by_index(struct net *net, int ifindex)
 	rcu_read_lock();
 	dev = dev_get_by_index_rcu(net, ifindex);
 	if (dev)
-		in_dev = in_dev_get(dev);
+		in_dev = rcu_dereference_rtnl(dev->ip_ptr);
 	rcu_read_unlock();
 	return in_dev;
 }
@@ -453,8 +456,6 @@ static int inet_rtm_deladdr(struct sk_buff *skb, struct nlmsghdr *nlh, void *arg
 		goto errout;
 	}
 
-	__in_dev_put(in_dev);
-
 	for (ifap = &in_dev->ifa_list; (ifa = *ifap) != NULL;
 	     ifap = &ifa->ifa_next) {
 		if (tb[IFA_LOCAL] &&
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 0f80dfc..6734c9c 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -590,32 +590,29 @@ static int fib_check_nh(struct fib_config *cfg, struct fib_info *fi,
 		if (!dev)
 			goto out;
 		dev_hold(dev);
-		err = -ENETDOWN;
-		if (!(dev->flags & IFF_UP))
-			goto out;
-		err = 0;
-out:
-		rcu_read_unlock();
-		return err;
+		err = (dev->flags & IFF_UP) ? 0 : -ENETDOWN;
 	} else {
 		struct in_device *in_dev;
 
 		if (nh->nh_flags & (RTNH_F_PERVASIVE | RTNH_F_ONLINK))
 			return -EINVAL;
 
+		rcu_read_lock();
+		err = -ENODEV;
 		in_dev = inetdev_by_index(net, nh->nh_oif);
 		if (in_dev == NULL)
-			return -ENODEV;
-		if (!(in_dev->dev->flags & IFF_UP)) {
-			in_dev_put(in_dev);
-			return -ENETDOWN;
-		}
+			goto out;
+		err = -ENETDOWN;
+		if (!(in_dev->dev->flags & IFF_UP))
+			goto out;
 		nh->nh_dev = in_dev->dev;
 		dev_hold(nh->nh_dev);
 		nh->nh_scope = RT_SCOPE_HOST;
-		in_dev_put(in_dev);
+		err = 0;
 	}
-	return 0;
+out:
+	rcu_read_unlock();
+	return err;
 }
 
 static inline unsigned int fib_laddr_hashfn(__be32 val)
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index a525328..c8877c6 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -1429,8 +1429,6 @@ static struct in_device *ip_mc_find_dev(struct net *net, struct ip_mreqn *imr)
 
 	if (imr->imr_ifindex) {
 		idev = inetdev_by_index(net, imr->imr_ifindex);
-		if (idev)
-			__in_dev_put(idev);
 		return idev;
 	}
 	if (imr->imr_address.s_addr) {
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 9d421f4..d0ffcbe 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -1245,10 +1245,8 @@ static int ipgre_close(struct net_device *dev)
 	if (ipv4_is_multicast(t->parms.iph.daddr) && t->mlink) {
 		struct in_device *in_dev;
 		in_dev = inetdev_by_index(dev_net(dev), t->mlink);
-		if (in_dev) {
+		if (in_dev)
 			ip_mc_dec_group(in_dev, t->parms.iph.daddr);
-			in_dev_put(in_dev);
-		}
 	}
 	return 0;
 }



^ permalink raw reply related

* [PATCH net-next 1/2] net: avoid a dev refcount in ip_mc_find_dev()
From: Eric Dumazet @ 2010-10-19 10:39 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

We hold RTNL in ip_mc_find_dev(), no need to touch device refcount.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv4/fib_frontend.c |    2 +-
 net/ipv4/igmp.c         |    4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 3df057e..36e27c2 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -153,7 +153,7 @@ static void fib_flush(struct net *net)
  * @addr: the source address
  * @devref: if true, take a reference on the found device
  *
- * If a caller uses devref=false, it should be protected by RCU
+ * If a caller uses devref=false, it should be protected by RCU, or RTNL
  */
 struct net_device *__ip_dev_find(struct net *net, __be32 addr, bool devref)
 {
diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 25f3396..a525328 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -1418,6 +1418,7 @@ void ip_mc_destroy_dev(struct in_device *in_dev)
 	write_unlock_bh(&in_dev->mc_list_lock);
 }
 
+/* RTNL is locked */
 static struct in_device *ip_mc_find_dev(struct net *net, struct ip_mreqn *imr)
 {
 	struct flowi fl = { .nl_u = { .ip4_u =
@@ -1433,10 +1434,9 @@ static struct in_device *ip_mc_find_dev(struct net *net, struct ip_mreqn *imr)
 		return idev;
 	}
 	if (imr->imr_address.s_addr) {
-		dev = ip_dev_find(net, imr->imr_address.s_addr);
+		dev = __ip_dev_find(net, imr->imr_address.s_addr, false);
 		if (!dev)
 			return NULL;
-		dev_put(dev);
 	}
 
 	if (!dev && !ip_route_output_key(net, &rt, &fl)) {



^ permalink raw reply related

* Re: [PATCH V3 net-next] can-raw: add msg_flags to distinguish local traffic
From: Alexander Stein @ 2010-10-19 10:28 UTC (permalink / raw)
  To: Kurt Van Dijck
  Cc: socketcan-core-0fE9KPoRgkgATYTw5x5z8w, netdev, Oliver Hartkopp
In-Reply-To: <20101019095703.GA436-MxZ6Iy/zr/UdbCeoMzGj59i2O/JbrIOy@public.gmane.org>

On Tuesday 19 October 2010, 11:57:03 Kurt Van Dijck wrote:
> On Tue, Oct 19, 2010 at 11:12:56AM +0200, Alexander Stein wrote:
> > You mixed tabs and spaces at MSG_CONFIRM entry. Dunno if that was
> > intended.
> 
> That was not intended.
> Is that a problem (I never studied the code style wrt. Documentation)?

Dunno if this is a problem, I just stumbled over it.

Regards,
Alexander

^ permalink raw reply

* Re: openvswitch/flow WAS ( Re: [rfc] Merging the Open vSwitch datapath
From: jamal @ 2010-10-19 10:22 UTC (permalink / raw)
  To: Simon Horman; +Cc: Jesse Gross, Ben Pfaff, netdev, ovs-team
In-Reply-To: <20101018152010.GE319@verge.net.au>

On Mon, 2010-10-18 at 17:20 +0200, Simon Horman wrote:

> As I understand things, the packet goes from the kernel to userspace
> and then (typically) comes back again.

Injection back is trivial.

> I guess that it would be possible to send a copy of the headers
> to user-sapce while the packet is quarantined in the kernel pending
> a response from user-space. I say only the headers, as typically
> that is all user-space needs to make a decision, though I guess it
> may need the body to make some types of decisions. I have no idea
> if such a scheme would be desirable in any circumstances.

quarantine the packet in the kernel would be trickier than sending the
whole thing up - for a sample of how it is done i believe the netfilter
approach (ipq?) as well as ipsec would be good samples to look at.

cheers,
jamal


^ permalink raw reply

* [PATCH] gianfar: Fix crashes on RX path (Was Re: [Bugme-new] [Bug 19692] New: linux-2.6.36-rc5 crash with gianfar ethernet at full line rate traffic)
From: Jarek Poplawski @ 2010-10-19 10:06 UTC (permalink / raw)
  To: David Miller
  Cc: emin ak, Andrew Morton, netdev, bugzilla-daemon, bugme-daemon,
	Anton Vorontsov, Andy Fleming
In-Reply-To: <AANLkTi=08Myz1=XOHkGJWTdyC8vK+y+zf0UQhjxUi=qC@mail.gmail.com>

On Tue, Oct 19, 2010 at 09:44:33AM +0300, emin ak wrote:
> Hi Jarek;
> After 5 days and more then 20 billion packets passed without crash, it
> seems that this patch is working for me, at least for crash type 2.
> (For type 1, it only occured once and I can never reproduce this
> again, but still trying. I think with this patch is also lowers the
> risk for type 1.

It would be interesting to have a look if it's exactly type 1, because
skb_over_panic can happen for different reasons, e.g. like here:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=63b88b9041ceef8217f34de71a2e96f0c3f0fd3b

> For adding a new bug entry for skb_over_panic, before that I think I
> must find a reliable way to make this type of crash reproducable,
> otherwise I don't know how to test it if it solved or not.

Maybe for now let's try to get and see this type 1 again? Since the
recycle path is suspicious a bit to me, probably limiting memory or
slowing tx (maybe different mtus on eth0 and 1) under heavy multi cpu
load might help.

> Lastly, thanks a lot for your valuable help to overcome this problem
> and also is there anything that I can do  for testing / commiting this
> patch to mainline?

Here it is for David to handle the rest.

Thanks a lot for such an intense testing,
Jarek P.
--------------------------->

The rx_recycle queue is global per device but can be accesed by many
napi handlers at the same time, so it needs full skb_queue primitives
(with locking). Otherwise, various crashes caused by broken skbs are
possible.

This patch resolves, at least partly, bugzilla bug 19692. (Because of
some doubts that there could be still something around which is hard
to reproduce my proposal is to leave this bug opened for a month.)

Fixes commit: 0fd56bb5be6455d0d42241e65aed057244665e5e

Reported-by: emin ak <eminak71@gmail.com>
Tested-by: emin ak <eminak71@gmail.com>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
CC: Andy Fleming <afleming@freescale.com>
---
diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index 4f7c3f3..db47b55 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -2515,7 +2515,7 @@ static int gfar_clean_tx_ring(struct gfar_priv_tx_q *tx_queue)
 				skb_recycle_check(skb, priv->rx_buffer_size +
 					RXBUF_ALIGNMENT)) {
 			gfar_align_skb(skb);
-			__skb_queue_head(&priv->rx_recycle, skb);
+			skb_queue_head(&priv->rx_recycle, skb);
 		} else
 			dev_kfree_skb_any(skb);
 
@@ -2598,7 +2598,7 @@ struct sk_buff * gfar_new_skb(struct net_device *dev)
 	struct gfar_private *priv = netdev_priv(dev);
 	struct sk_buff *skb = NULL;
 
-	skb = __skb_dequeue(&priv->rx_recycle);
+	skb = skb_dequeue(&priv->rx_recycle);
 	if (!skb)
 		skb = gfar_alloc_skb(dev);
 
@@ -2754,7 +2754,7 @@ int gfar_clean_rx_ring(struct gfar_priv_rx_q *rx_queue, int rx_work_limit)
 			if (unlikely(!newskb))
 				newskb = skb;
 			else if (skb)
-				__skb_queue_head(&priv->rx_recycle, skb);
+				skb_queue_head(&priv->rx_recycle, skb);
 		} else {
 			/* Increment the number of packets */
 			rx_queue->stats.rx_packets++;

^ permalink raw reply related

* Re: [patch v5 04/12] IPVS: Add struct ip_vs_conn_param, IPv6 forgotten No
From: Simon Horman @ 2010-10-19 10:03 UTC (permalink / raw)
  To: Hans Schillstrom
  Cc: lvs-devel@vger.kernel.org, netdev@vger.kernel.org,
	netfilter@vger.kernel.org, netfilter-devel@vger.kernel.org,
	Jan Engelhardt, Stephen Hemminger, Wensong Zhang,
	Julian Anastasov, Patrick McHardy
In-Reply-To: <201010191156.44757.hans.schillstrom@ericsson.com>

On Tue, Oct 19, 2010 at 11:56:43AM +0200, Hans Schillstrom wrote:
> Hi Simon
> I'm sorry it's not a bug it works perfectly OK.
> (I didn't see that it was a pointer, It was another bug in my test code...)
> so forget about the last mail

No problem, thanks for looking over things :-)


^ permalink raw reply

* Re: [PATCH V3 net-next] can-raw: add msg_flags to distinguish local traffic
From: Kurt Van Dijck @ 2010-10-19  9:57 UTC (permalink / raw)
  To: Alexander Stein
  Cc: socketcan-core-0fE9KPoRgkgATYTw5x5z8w, netdev, Oliver Hartkopp
In-Reply-To: <201010191113.46999.alexander.stein-93q1YBGzJSMe9JSWTWOYM3xStJ4P+DSV@public.gmane.org>

On Tue, Oct 19, 2010 at 11:12:56AM +0200, Alexander Stein wrote:
> You mixed tabs and spaces at MSG_CONFIRM entry. Dunno if that was intended.
That was not intended.
Is that a problem (I never studied the code style wrt. Documentation)?
> 
> Regards,
> Alexander

Thanks,
Kurt

^ permalink raw reply

* Re: [patch v5 04/12] IPVS: Add struct ip_vs_conn_param, IPv6 forgotten No
From: Hans Schillstrom @ 2010-10-19  9:56 UTC (permalink / raw)
  To: Simon Horman
  Cc: lvs-devel@vger.kernel.org, netdev@vger.kernel.org,
	netfilter@vger.kernel.org, netfilter-devel@vger.kernel.org,
	Jan Engelhardt, Stephen Hemminger, Wensong Zhang,
	Julian Anastasov, Patrick McHardy
In-Reply-To: <201010191141.22496.hans.schillstrom@ericsson.com>

Hi Simon
I'm sorry it's not a bug it works perfectly OK.
(I didn't see that it was a pointer, It was another bug in my test code...)
so forget about the last mail

On Tuesday 19 October 2010 11:41:21 Hans Schillstrom wrote:
> Hi Simon,
> There is a show stopper for IPv6 in this patch.
> ip_vs_conn_fill_param() doesn't handle IPv6.
>
> On Monday 04 October 2010 15:46:28 Simon Horman wrote:
> > Signed-off-by: Simon Horman <horms@verge.net.au>
> > Acked-by: Julian Anastasov <ja@ssi.bg>
> >
> > ---
> >
> > The motivation for this is to allow persistence engine modules to
> > fill in the parameters.


--
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

^ permalink raw reply

* Re: [RFC PATCH 0/7] Move vlan acceleration into networking core.
From: David Miller @ 2010-10-19  9:50 UTC (permalink / raw)
  To: jesse; +Cc: netdev
In-Reply-To: <1287000177-7126-1-git-send-email-jesse@nicira.com>

From: Jesse Gross <jesse@nicira.com>
Date: Wed, 13 Oct 2010 13:02:50 -0700

> Hardware vlan acceleration behaves fairly differently from other
> types of offloading, which limits its usefulness.  This patch series
> aims to bring it more in line with other common forms of
> acceleration, such as checksum offloading and TSO.  In doing this it
> eliminates common driver bugs, increases flexibility, and improves
> performance, while reducing the number of lines of code.
> 
> The first four patches can be applied immediately, while the last
> three need to wait until all drivers that support vlan acceleration
> are updated.  If people agree that this patch set makes sense I will
> go ahead and switch over the dozen or so drivers that would need to
> change.

These changes look great (and everyone I've asked tends to agree)
so please submit the updated series with feedback integrated
and I'll add at least the first 4 patches to net-next-2.6

Thanks!

^ permalink raw reply

* Re: [patch v5 04/12] IPVS: Add struct ip_vs_conn_param, IPv6 forgotten ?
From: Hans Schillstrom @ 2010-10-19  9:41 UTC (permalink / raw)
  To: Simon Horman
  Cc: lvs-devel@vger.kernel.org, netdev@vger.kernel.org,
	netfilter@vger.kernel.org, netfilter-devel@vger.kernel.org,
	Jan Engelhardt, Stephen Hemminger, Wensong Zhang,
	Julian Anastasov, Patrick McHardy
In-Reply-To: <20101004134825.963640441@akiko.akashicho.tokyo.vergenet.net>

Hi Simon,
There is a show stopper for IPv6 in this patch.
ip_vs_conn_fill_param() doesn't handle IPv6.

On Monday 04 October 2010 15:46:28 Simon Horman wrote:
> Signed-off-by: Simon Horman <horms@verge.net.au>
> Acked-by: Julian Anastasov <ja@ssi.bg>
>
> ---
>
> The motivation for this is to allow persistence engine modules to
> fill in the parameters.
>
> v0.3
> * Add missing changes to ip_vs_ftp.c
>
> v2
> * make "union nf_inet_addr fwmark" const
> * Update for the recent addition of ip_vs_nfct.c
>
> v3
> * As suggested by Julian Anastasov
>   - Correct logic for inverse case in ip_vs_conn_fill_param_proto()
>     and ah_esp_conn_fill_param_proto()
>   - Update ip_vs_conn_out_get()'s comments as its parameters have changed
>   - Add missing call to ip_vs_conn_fill_param() before the second
>     invocation of ip_vs_conn_new() in ip_vs_sched_persist()
> * Trivial re-diff
>
> v4
> * As suggested by Julian Anastasov
>   - Fix logic in ip_vs_conn_out_get_proto()
>     + An error occurs if ip_vs_conn_fill_param_proto is non-zero,
>       not if it is zero
>
  v5+
   * IPv6 addr copy in  ip_vs_conn_fill_param()

> Index: lvs-test-2.6/include/net/ip_vs.h
> ===================================================================
> --- lvs-test-2.6.orig/include/net/ip_vs.h       2010-10-04 17:00:28.000000000 +0900
> +++ lvs-test-2.6/include/net/ip_vs.h    2010-10-04 17:49:20.000000000 +0900
> @@ -357,6 +357,15 @@ struct ip_vs_protocol {
>
>  extern struct ip_vs_protocol * ip_vs_proto_get(unsigned short proto);
>
> +struct ip_vs_conn_param {
> +       const union nf_inet_addr        *caddr;
> +       const union nf_inet_addr        *vaddr;
> +       __be16                          cport;
> +       __be16                          vport;
> +       __u16                           protocol;
> +       u16                             af;
> +};
> +
>  /*
>   *     IP_VS structure allocated for each dynamically scheduled connection
>   */
> @@ -626,13 +635,23 @@ enum {
>         IP_VS_DIR_LAST,
>  };
>
> -extern struct ip_vs_conn *ip_vs_conn_in_get
> -(int af, int protocol, const union nf_inet_addr *s_addr, __be16 s_port,
> - const union nf_inet_addr *d_addr, __be16 d_port);
> -
> -extern struct ip_vs_conn *ip_vs_ct_in_get
> -(int af, int protocol, const union nf_inet_addr *s_addr, __be16 s_port,
> - const union nf_inet_addr *d_addr, __be16 d_port);
> +static inline void ip_vs_conn_fill_param(int af, int protocol,
> +                                        const union nf_inet_addr *caddr,
> +                                        __be16 cport,
> +                                        const union nf_inet_addr *vaddr,
> +                                        __be16 vport,
> +                                        struct ip_vs_conn_param *p)
> +{
> +       p->af = af;
> +       p->protocol = protocol;

#ifdef CONFIG_IP_VS_IPV6
	if (af == AF_INET6) {
		ipv6_addr_copy(&p->caddr, caddr);
		ipv6_addr_copy(&p->caddr, caddr);
	}
	else
#else
	}
		p->caddr = caddr;
		p->vaddr = vaddr;
	}
+#endif

> +       p->caddr = caddr;
remove line above
> +       p->cport = cport;

> +       p->vaddr = vaddr;
> +       p->vport = vport;
remove line above
> +}
> +
> +struct ip_vs_conn *ip_vs_conn_in_get(const struct ip_vs_conn_param *p);
> +struct ip_vs_conn *ip_vs_ct_in_get(const struct ip_vs_conn_param *p);
>
>  struct ip_vs_conn * ip_vs_conn_in_get_proto(int af, const struct sk_buff *skb,
>                                             struct ip_vs_protocol *pp,
> @@ -640,9 +659,7 @@ struct ip_vs_conn * ip_vs_conn_in_get_pr
>                                             unsigned int proto_off,
>                                             int inverse);
>
> -extern struct ip_vs_conn *ip_vs_conn_out_get
> -(int af, int protocol, const union nf_inet_addr *s_addr, __be16 s_port,
> - const union nf_inet_addr *d_addr, __be16 d_port);
> +struct ip_vs_conn *ip_vs_conn_out_get(const struct ip_vs_conn_param *p);
>
>  struct ip_vs_conn * ip_vs_conn_out_get_proto(int af, const struct sk_buff *skb,
>                                              struct ip_vs_protocol *pp,
> @@ -658,11 +675,10 @@ static inline void __ip_vs_conn_put(stru
>  extern void ip_vs_conn_put(struct ip_vs_conn *cp);
>  extern void ip_vs_conn_fill_cport(struct ip_vs_conn *cp, __be16 cport);
>
> -extern struct ip_vs_conn *
> -ip_vs_conn_new(int af, int proto, const union nf_inet_addr *caddr, __be16 cport,
> -              const union nf_inet_addr *vaddr, __be16 vport,
> -              const union nf_inet_addr *daddr, __be16 dport, unsigned flags,
> -              struct ip_vs_dest *dest);
> +struct ip_vs_conn *ip_vs_conn_new(const struct ip_vs_conn_param *p,
> +                                 const union nf_inet_addr *daddr,
> +                                 __be16 dport, unsigned flags,
> +                                 struct ip_vs_dest *dest);
>  extern void ip_vs_conn_expire_now(struct ip_vs_conn *cp);
>
>  extern const char * ip_vs_state_name(__u16 proto, int state);
> Index: lvs-test-2.6/net/netfilter/ipvs/ip_vs_conn.c
> ===================================================================
> --- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_conn.c   2010-10-04 17:00:28.000000000 +0900
> +++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_conn.c        2010-10-04 17:49:20.000000000 +0900
> @@ -218,27 +218,26 @@ static inline int ip_vs_conn_unhash(stru
>  /*
>   *  Gets ip_vs_conn associated with supplied parameters in the ip_vs_conn_tab.
>   *  Called for pkts coming from OUTside-to-INside.
> - *     s_addr, s_port: pkt source address (foreign host)
> - *     d_addr, d_port: pkt dest address (load balancer)
> + *     p->caddr, p->cport: pkt source address (foreign host)
> + *     p->vaddr, p->vport: pkt dest address (load balancer)
>   */
> -static inline struct ip_vs_conn *__ip_vs_conn_in_get
> -(int af, int protocol, const union nf_inet_addr *s_addr, __be16 s_port,
> - const union nf_inet_addr *d_addr, __be16 d_port)
> +static inline struct ip_vs_conn *
> +__ip_vs_conn_in_get(const struct ip_vs_conn_param *p)
>  {
>         unsigned hash;
>         struct ip_vs_conn *cp;
>
> -       hash = ip_vs_conn_hashkey(af, protocol, s_addr, s_port);
> +       hash = ip_vs_conn_hashkey(p->af, p->protocol, p->caddr, p->cport);
>
>         ct_read_lock(hash);
>
>         list_for_each_entry(cp, &ip_vs_conn_tab[hash], c_list) {
> -               if (cp->af == af &&
> -                   ip_vs_addr_equal(af, s_addr, &cp->caddr) &&
> -                   ip_vs_addr_equal(af, d_addr, &cp->vaddr) &&
> -                   s_port == cp->cport && d_port == cp->vport &&
> -                   ((!s_port) ^ (!(cp->flags & IP_VS_CONN_F_NO_CPORT))) &&
> -                   protocol == cp->protocol) {
> +               if (cp->af == p->af &&
> +                   ip_vs_addr_equal(p->af, p->caddr, &cp->caddr) &&
> +                   ip_vs_addr_equal(p->af, p->vaddr, &cp->vaddr) &&
> +                   p->cport == cp->cport && p->vport == cp->vport &&
> +                   ((!p->cport) ^ (!(cp->flags & IP_VS_CONN_F_NO_CPORT))) &&
> +                   p->protocol == cp->protocol) {
>                         /* HIT */
>                         atomic_inc(&cp->refcnt);
>                         ct_read_unlock(hash);
> @@ -251,71 +250,82 @@ static inline struct ip_vs_conn *__ip_vs
>         return NULL;
>  }
>
> -struct ip_vs_conn *ip_vs_conn_in_get
> -(int af, int protocol, const union nf_inet_addr *s_addr, __be16 s_port,
> - const union nf_inet_addr *d_addr, __be16 d_port)
> +struct ip_vs_conn *ip_vs_conn_in_get(const struct ip_vs_conn_param *p)
>  {
>         struct ip_vs_conn *cp;
>
> -       cp = __ip_vs_conn_in_get(af, protocol, s_addr, s_port, d_addr, d_port);
> -       if (!cp && atomic_read(&ip_vs_conn_no_cport_cnt))
> -               cp = __ip_vs_conn_in_get(af, protocol, s_addr, 0, d_addr,
> -                                        d_port);
> +       cp = __ip_vs_conn_in_get(p);
> +       if (!cp && atomic_read(&ip_vs_conn_no_cport_cnt)) {
> +               struct ip_vs_conn_param cport_zero_p = *p;
> +               cport_zero_p.cport = 0;
> +               cp = __ip_vs_conn_in_get(&cport_zero_p);
> +       }
>
>         IP_VS_DBG_BUF(9, "lookup/in %s %s:%d->%s:%d %s\n",
> -                     ip_vs_proto_name(protocol),
> -                     IP_VS_DBG_ADDR(af, s_addr), ntohs(s_port),
> -                     IP_VS_DBG_ADDR(af, d_addr), ntohs(d_port),
> +                     ip_vs_proto_name(p->protocol),
> +                     IP_VS_DBG_ADDR(p->af, p->caddr), ntohs(p->cport),
> +                     IP_VS_DBG_ADDR(p->af, p->vaddr), ntohs(p->vport),
>                       cp ? "hit" : "not hit");
>
>         return cp;
>  }
>
> +static int
> +ip_vs_conn_fill_param_proto(int af, const struct sk_buff *skb,
> +                           const struct ip_vs_iphdr *iph,
> +                           unsigned int proto_off, int inverse,
> +                           struct ip_vs_conn_param *p)
> +{
> +       __be16 _ports[2], *pptr;
> +
> +       pptr = skb_header_pointer(skb, proto_off, sizeof(_ports), _ports);
> +       if (pptr == NULL)
> +               return 1;
> +
> +       if (likely(!inverse))
> +               ip_vs_conn_fill_param(af, iph->protocol, &iph->saddr, pptr[0],
> +                                     &iph->daddr, pptr[1], p);
> +       else
> +               ip_vs_conn_fill_param(af, iph->protocol, &iph->daddr, pptr[1],
> +                                     &iph->saddr, pptr[0], p);
> +       return 0;
> +}
> +
>  struct ip_vs_conn *
>  ip_vs_conn_in_get_proto(int af, const struct sk_buff *skb,
>                         struct ip_vs_protocol *pp,
>                         const struct ip_vs_iphdr *iph,
>                         unsigned int proto_off, int inverse)
>  {
> -       __be16 _ports[2], *pptr;
> +       struct ip_vs_conn_param p;
>
> -       pptr = skb_header_pointer(skb, proto_off, sizeof(_ports), _ports);
> -       if (pptr == NULL)
> +       if (ip_vs_conn_fill_param_proto(af, skb, iph, proto_off, inverse, &p))
>                 return NULL;
>
> -       if (likely(!inverse))
> -               return ip_vs_conn_in_get(af, iph->protocol,
> -                                        &iph->saddr, pptr[0],
> -                                        &iph->daddr, pptr[1]);
> -       else
> -               return ip_vs_conn_in_get(af, iph->protocol,
> -                                        &iph->daddr, pptr[1],
> -                                        &iph->saddr, pptr[0]);
> +       return ip_vs_conn_in_get(&p);
>  }
>  EXPORT_SYMBOL_GPL(ip_vs_conn_in_get_proto);
>
>  /* Get reference to connection template */
> -struct ip_vs_conn *ip_vs_ct_in_get
> -(int af, int protocol, const union nf_inet_addr *s_addr, __be16 s_port,
> - const union nf_inet_addr *d_addr, __be16 d_port)
> +struct ip_vs_conn *ip_vs_ct_in_get(const struct ip_vs_conn_param *p)
>  {
>         unsigned hash;
>         struct ip_vs_conn *cp;
>
> -       hash = ip_vs_conn_hashkey(af, protocol, s_addr, s_port);
> +       hash = ip_vs_conn_hashkey(p->af, p->protocol, p->caddr, p->cport);
>
>         ct_read_lock(hash);
>
>         list_for_each_entry(cp, &ip_vs_conn_tab[hash], c_list) {
> -               if (cp->af == af &&
> -                   ip_vs_addr_equal(af, s_addr, &cp->caddr) &&
> +               if (cp->af == p->af &&
> +                   ip_vs_addr_equal(p->af, p->caddr, &cp->caddr) &&
>                     /* protocol should only be IPPROTO_IP if
> -                    * d_addr is a fwmark */
> -                   ip_vs_addr_equal(protocol == IPPROTO_IP ? AF_UNSPEC : af,
> -                                    d_addr, &cp->vaddr) &&
> -                   s_port == cp->cport && d_port == cp->vport &&
> +                    * p->vaddr is a fwmark */
> +                   ip_vs_addr_equal(p->protocol == IPPROTO_IP ? AF_UNSPEC :
> +                                    p->af, p->vaddr, &cp->vaddr) &&
> +                   p->cport == cp->cport && p->vport == cp->vport &&
>                     cp->flags & IP_VS_CONN_F_TEMPLATE &&
> -                   protocol == cp->protocol) {
> +                   p->protocol == cp->protocol) {
>                         /* HIT */
>                         atomic_inc(&cp->refcnt);
>                         goto out;
> @@ -327,23 +337,19 @@ struct ip_vs_conn *ip_vs_ct_in_get
>         ct_read_unlock(hash);
>
>         IP_VS_DBG_BUF(9, "template lookup/in %s %s:%d->%s:%d %s\n",
> -                     ip_vs_proto_name(protocol),
> -                     IP_VS_DBG_ADDR(af, s_addr), ntohs(s_port),
> -                     IP_VS_DBG_ADDR(af, d_addr), ntohs(d_port),
> +                     ip_vs_proto_name(p->protocol),
> +                     IP_VS_DBG_ADDR(p->af, p->caddr), ntohs(p->cport),
> +                     IP_VS_DBG_ADDR(p->af, p->vaddr), ntohs(p->vport),
>                       cp ? "hit" : "not hit");
>
>         return cp;
>  }
>
> -/*
> - *  Gets ip_vs_conn associated with supplied parameters in the ip_vs_conn_tab.
> - *  Called for pkts coming from inside-to-OUTside.
> - *     s_addr, s_port: pkt source address (inside host)
> - *     d_addr, d_port: pkt dest address (foreign host)
> - */
> -struct ip_vs_conn *ip_vs_conn_out_get
> -(int af, int protocol, const union nf_inet_addr *s_addr, __be16 s_port,
> - const union nf_inet_addr *d_addr, __be16 d_port)
> +/* Gets ip_vs_conn associated with supplied parameters in the ip_vs_conn_tab.
> + * Called for pkts coming from inside-to-OUTside.
> + *     p->caddr, p->cport: pkt source address (inside host)
> + *     p->vaddr, p->vport: pkt dest address (foreign host) */
> +struct ip_vs_conn *ip_vs_conn_out_get(const struct ip_vs_conn_param *p)
>  {
>         unsigned hash;
>         struct ip_vs_conn *cp, *ret=NULL;
> @@ -351,16 +357,16 @@ struct ip_vs_conn *ip_vs_conn_out_get
>         /*
>          *      Check for "full" addressed entries
>          */
> -       hash = ip_vs_conn_hashkey(af, protocol, d_addr, d_port);
> +       hash = ip_vs_conn_hashkey(p->af, p->protocol, p->vaddr, p->vport);
>
>         ct_read_lock(hash);
>
>         list_for_each_entry(cp, &ip_vs_conn_tab[hash], c_list) {
> -               if (cp->af == af &&
> -                   ip_vs_addr_equal(af, d_addr, &cp->caddr) &&
> -                   ip_vs_addr_equal(af, s_addr, &cp->daddr) &&
> -                   d_port == cp->cport && s_port == cp->dport &&
> -                   protocol == cp->protocol) {
> +               if (cp->af == p->af &&
> +                   ip_vs_addr_equal(p->af, p->vaddr, &cp->caddr) &&
> +                   ip_vs_addr_equal(p->af, p->caddr, &cp->daddr) &&
> +                   p->vport == cp->cport && p->cport == cp->dport &&
> +                   p->protocol == cp->protocol) {
>                         /* HIT */
>                         atomic_inc(&cp->refcnt);
>                         ret = cp;
> @@ -371,9 +377,9 @@ struct ip_vs_conn *ip_vs_conn_out_get
>         ct_read_unlock(hash);
>
>         IP_VS_DBG_BUF(9, "lookup/out %s %s:%d->%s:%d %s\n",
> -                     ip_vs_proto_name(protocol),
> -                     IP_VS_DBG_ADDR(af, s_addr), ntohs(s_port),
> -                     IP_VS_DBG_ADDR(af, d_addr), ntohs(d_port),
> +                     ip_vs_proto_name(p->protocol),
> +                     IP_VS_DBG_ADDR(p->af, p->caddr), ntohs(p->cport),
> +                     IP_VS_DBG_ADDR(p->af, p->vaddr), ntohs(p->vport),
>                       ret ? "hit" : "not hit");
>
>         return ret;
> @@ -385,20 +391,12 @@ ip_vs_conn_out_get_proto(int af, const s
>                          const struct ip_vs_iphdr *iph,
>                          unsigned int proto_off, int inverse)
>  {
> -       __be16 _ports[2], *pptr;
> +       struct ip_vs_conn_param p;
>
> -       pptr = skb_header_pointer(skb, proto_off, sizeof(_ports), _ports);
> -       if (pptr == NULL)
> +       if (ip_vs_conn_fill_param_proto(af, skb, iph, proto_off, inverse, &p))
>                 return NULL;
>
> -       if (likely(!inverse))
> -               return ip_vs_conn_out_get(af, iph->protocol,
> -                                         &iph->saddr, pptr[0],
> -                                         &iph->daddr, pptr[1]);
> -       else
> -               return ip_vs_conn_out_get(af, iph->protocol,
> -                                         &iph->daddr, pptr[1],
> -                                         &iph->saddr, pptr[0]);
> +       return ip_vs_conn_out_get(&p);
>  }
>  EXPORT_SYMBOL_GPL(ip_vs_conn_out_get_proto);
>
> @@ -758,13 +756,12 @@ void ip_vs_conn_expire_now(struct ip_vs_
>   *     Create a new connection entry and hash it into the ip_vs_conn_tab
>   */
>  struct ip_vs_conn *
> -ip_vs_conn_new(int af, int proto, const union nf_inet_addr *caddr, __be16 cport,
> -              const union nf_inet_addr *vaddr, __be16 vport,
> +ip_vs_conn_new(const struct ip_vs_conn_param *p,
>                const union nf_inet_addr *daddr, __be16 dport, unsigned flags,
>                struct ip_vs_dest *dest)
>  {
>         struct ip_vs_conn *cp;
> -       struct ip_vs_protocol *pp = ip_vs_proto_get(proto);
> +       struct ip_vs_protocol *pp = ip_vs_proto_get(p->protocol);
>
>         cp = kmem_cache_zalloc(ip_vs_conn_cachep, GFP_ATOMIC);
>         if (cp == NULL) {
> @@ -774,14 +771,14 @@ ip_vs_conn_new(int af, int proto, const
>
>         INIT_LIST_HEAD(&cp->c_list);
>         setup_timer(&cp->timer, ip_vs_conn_expire, (unsigned long)cp);
> -       cp->af             = af;
> -       cp->protocol       = proto;
> -       ip_vs_addr_copy(af, &cp->caddr, caddr);
> -       cp->cport          = cport;
> -       ip_vs_addr_copy(af, &cp->vaddr, vaddr);
> -       cp->vport          = vport;
> +       cp->af             = p->af;
> +       cp->protocol       = p->protocol;
> +       ip_vs_addr_copy(p->af, &cp->caddr, p->caddr);
> +       cp->cport          = p->cport;
> +       ip_vs_addr_copy(p->af, &cp->vaddr, p->vaddr);
> +       cp->vport          = p->vport;
>         /* proto should only be IPPROTO_IP if d_addr is a fwmark */
> -       ip_vs_addr_copy(proto == IPPROTO_IP ? AF_UNSPEC : af,
> +       ip_vs_addr_copy(p->protocol == IPPROTO_IP ? AF_UNSPEC : p->af,
>                         &cp->daddr, daddr);
>         cp->dport          = dport;
>         cp->flags          = flags;
> @@ -810,7 +807,7 @@ ip_vs_conn_new(int af, int proto, const
>
>         /* Bind its packet transmitter */
>  #ifdef CONFIG_IP_VS_IPV6
> -       if (af == AF_INET6)
> +       if (p->af == AF_INET6)
>                 ip_vs_bind_xmit_v6(cp);
>         else
>  #endif
> Index: lvs-test-2.6/net/netfilter/ipvs/ip_vs_core.c
> ===================================================================
> --- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_core.c   2010-10-04 17:00:45.000000000 +0900
> +++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_core.c        2010-10-04 17:49:20.000000000 +0900
> @@ -193,14 +193,11 @@ ip_vs_sched_persist(struct ip_vs_service
>         struct ip_vs_iphdr iph;
>         struct ip_vs_dest *dest;
>         struct ip_vs_conn *ct;
> -       int protocol = iph.protocol;
>         __be16 dport = 0;               /* destination port to forward */
> -       __be16 vport = 0;               /* virtual service port */
>         unsigned int flags;
> +       struct ip_vs_conn_param param;
>         union nf_inet_addr snet;        /* source network of the client,
>                                            after masking */
> -       const union nf_inet_addr fwmark = { .ip = htonl(svc->fwmark) };
> -       const union nf_inet_addr *vaddr = &iph.daddr;
>
>         ip_vs_fill_iphdr(svc->af, skb_network_header(skb), &iph);
>
> @@ -232,6 +229,11 @@ ip_vs_sched_persist(struct ip_vs_service
>          * is created for other persistent services.
>          */
>         {
> +               int protocol = iph.protocol;
> +               const union nf_inet_addr *vaddr = &iph.daddr;
> +               const union nf_inet_addr fwmark = { .ip = htonl(svc->fwmark) };
> +               __be16 vport = 0;
> +
>                 if (ports[1] == svc->port) {
>                         /* non-FTP template:
>                          * <protocol, caddr, 0, vaddr, vport, daddr, dport>
> @@ -253,11 +255,12 @@ ip_vs_sched_persist(struct ip_vs_service
>                                 vaddr = &fwmark;
>                         }
>                 }
> +               ip_vs_conn_fill_param(svc->af, protocol, &snet, 0,
> +                                     vaddr, vport, &param);
>         }
>
>         /* Check if a template already exists */
> -       ct = ip_vs_ct_in_get(svc->af, protocol, &snet, 0, vaddr, vport);
> -
> +       ct = ip_vs_ct_in_get(&param);
>         if (!ct || !ip_vs_check_template(ct)) {
>                 /* No template found or the dest of the connection
>                  * template is not available.
> @@ -272,8 +275,7 @@ ip_vs_sched_persist(struct ip_vs_service
>                         dport = dest->port;
>
>                 /* Create a template */
> -               ct = ip_vs_conn_new(svc->af, protocol, &snet, 0,vaddr, vport,
> -                                   &dest->addr, dport,
> +               ct = ip_vs_conn_new(&param, &dest->addr, dport,
>                                     IP_VS_CONN_F_TEMPLATE, dest);
>                 if (ct == NULL)
>                         return NULL;
> @@ -294,12 +296,9 @@ ip_vs_sched_persist(struct ip_vs_service
>         /*
>          *    Create a new connection according to the template
>          */
> -       cp = ip_vs_conn_new(svc->af, iph.protocol,
> -                           &iph.saddr, ports[0],
> -                           &iph.daddr, ports[1],
> -                           &dest->addr, dport,
> -                           flags,
> -                           dest);
> +       ip_vs_conn_fill_param(svc->af, iph.protocol, &iph.saddr, ports[0],
> +                             &iph.daddr, ports[1], &param);
> +       cp = ip_vs_conn_new(&param, &dest->addr, dport, flags, dest);
>         if (cp == NULL) {
>                 ip_vs_conn_put(ct);
>                 return NULL;
> @@ -366,14 +365,16 @@ ip_vs_schedule(struct ip_vs_service *svc
>         /*
>          *    Create a connection entry.
>          */
> -       cp = ip_vs_conn_new(svc->af, iph.protocol,
> -                           &iph.saddr, pptr[0],
> -                           &iph.daddr, pptr[1],
> -                           &dest->addr, dest->port ? dest->port : pptr[1],
> -                           flags,
> -                           dest);
> -       if (cp == NULL)
> -               return NULL;
> +       {
> +               struct ip_vs_conn_param p;
> +               ip_vs_conn_fill_param(svc->af, iph.protocol, &iph.saddr,
> +                                     pptr[0], &iph.daddr, pptr[1], &p);
> +               cp = ip_vs_conn_new(&p, &dest->addr,
> +                                   dest->port ? dest->port : pptr[1],
> +                                   flags, dest);
> +               if (!cp)
> +                       return NULL;
> +       }
>
>         IP_VS_DBG_BUF(6, "Schedule fwd:%c c:%s:%u v:%s:%u "
>                       "d:%s:%u conn->flags:%X conn->refcnt:%d\n",
> @@ -429,14 +430,17 @@ int ip_vs_leave(struct ip_vs_service *sv
>
>                 /* create a new connection entry */
>                 IP_VS_DBG(6, "%s(): create a cache_bypass entry\n", __func__);
> -               cp = ip_vs_conn_new(svc->af, iph.protocol,
> -                                   &iph.saddr, pptr[0],
> -                                   &iph.daddr, pptr[1],
> -                                   &daddr, 0,
> -                                   IP_VS_CONN_F_BYPASS | flags,
> -                                   NULL);
> -               if (cp == NULL)
> -                       return NF_DROP;
> +               {
> +                       struct ip_vs_conn_param p;
> +                       ip_vs_conn_fill_param(svc->af, iph.protocol,
> +                                             &iph.saddr, pptr[0],
> +                                             &iph.daddr, pptr[1], &p);
> +                       cp = ip_vs_conn_new(&p, &daddr, 0,
> +                                           IP_VS_CONN_F_BYPASS | flags,
> +                                           NULL);
> +                       if (!cp)
> +                               return NF_DROP;
> +               }
>
>                 /* statistics */
>                 ip_vs_in_stats(cp, skb);
> Index: lvs-test-2.6/net/netfilter/ipvs/ip_vs_ftp.c
> ===================================================================
> --- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_ftp.c    2010-10-04 16:58:31.000000000 +0900
> +++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_ftp.c 2010-10-04 17:00:45.000000000 +0900
> @@ -195,13 +195,17 @@ static int ip_vs_ftp_out(struct ip_vs_ap
>                 /*
>                  * Now update or create an connection entry for it
>                  */
> -               n_cp = ip_vs_conn_out_get(AF_INET, iph->protocol, &from, port,
> -                                         &cp->caddr, 0);
> +               {
> +                       struct ip_vs_conn_param p;
> +                       ip_vs_conn_fill_param(AF_INET, iph->protocol,
> +                                             &from, port, &cp->caddr, 0, &p);
> +                       n_cp = ip_vs_conn_out_get(&p);
> +               }
>                 if (!n_cp) {
> -                       n_cp = ip_vs_conn_new(AF_INET, IPPROTO_TCP,
> -                                             &cp->caddr, 0,
> -                                             &cp->vaddr, port,
> -                                             &from, port,
> +                       struct ip_vs_conn_param p;
> +                       ip_vs_conn_fill_param(AF_INET, IPPROTO_TCP, &cp->caddr,
> +                                             0, &cp->vaddr, port, &p);
> +                       n_cp = ip_vs_conn_new(&p, &from, port,
>                                               IP_VS_CONN_F_NO_CPORT |
>                                               IP_VS_CONN_F_NFCT,
>                                               cp->dest);
> @@ -347,21 +351,22 @@ static int ip_vs_ftp_in(struct ip_vs_app
>                   ip_vs_proto_name(iph->protocol),
>                   &to.ip, ntohs(port), &cp->vaddr.ip, 0);
>
> -       n_cp = ip_vs_conn_in_get(AF_INET, iph->protocol,
> -                                &to, port,
> -                                &cp->vaddr, htons(ntohs(cp->vport)-1));
> -       if (!n_cp) {
> -               n_cp = ip_vs_conn_new(AF_INET, IPPROTO_TCP,
> -                                     &to, port,
> +       {
> +               struct ip_vs_conn_param p;
> +               ip_vs_conn_fill_param(AF_INET, iph->protocol, &to, port,
>                                       &cp->vaddr, htons(ntohs(cp->vport)-1),
> -                                     &cp->daddr, htons(ntohs(cp->dport)-1),
> -                                     IP_VS_CONN_F_NFCT,
> -                                     cp->dest);
> -               if (!n_cp)
> -                       return 0;
> +                                     &p);
> +               n_cp = ip_vs_conn_in_get(&p);
> +               if (!n_cp) {
> +                       n_cp = ip_vs_conn_new(&p, &cp->daddr,
> +                                             htons(ntohs(cp->dport)-1),
> +                                             IP_VS_CONN_F_NFCT, cp->dest);
> +                       if (!n_cp)
> +                               return 0;
>
> -               /* add its controller */
> -               ip_vs_control_add(n_cp, cp);
> +                       /* add its controller */
> +                       ip_vs_control_add(n_cp, cp);
> +               }
>         }
>
>         /*
> Index: lvs-test-2.6/net/netfilter/ipvs/ip_vs_nfct.c
> ===================================================================
> --- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_nfct.c   2010-10-04 16:58:31.000000000 +0900
> +++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_nfct.c        2010-10-04 17:00:45.000000000 +0900
> @@ -140,6 +140,7 @@ static void ip_vs_nfct_expect_callback(s
>  {
>         struct nf_conntrack_tuple *orig, new_reply;
>         struct ip_vs_conn *cp;
> +       struct ip_vs_conn_param p;
>
>         if (exp->tuple.src.l3num != PF_INET)
>                 return;
> @@ -154,9 +155,10 @@ static void ip_vs_nfct_expect_callback(s
>
>         /* RS->CLIENT */
>         orig = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
> -       cp = ip_vs_conn_out_get(exp->tuple.src.l3num, orig->dst.protonum,
> -                               &orig->src.u3, orig->src.u.tcp.port,
> -                               &orig->dst.u3, orig->dst.u.tcp.port);
> +       ip_vs_conn_fill_param(exp->tuple.src.l3num, orig->dst.protonum,
> +                             &orig->src.u3, orig->src.u.tcp.port,
> +                             &orig->dst.u3, orig->dst.u.tcp.port, &p);
> +       cp = ip_vs_conn_out_get(&p);
>         if (cp) {
>                 /* Change reply CLIENT->RS to CLIENT->VS */
>                 new_reply = ct->tuplehash[IP_CT_DIR_REPLY].tuple;
> @@ -176,9 +178,7 @@ static void ip_vs_nfct_expect_callback(s
>         }
>
>         /* CLIENT->VS */
> -       cp = ip_vs_conn_in_get(exp->tuple.src.l3num, orig->dst.protonum,
> -                              &orig->src.u3, orig->src.u.tcp.port,
> -                              &orig->dst.u3, orig->dst.u.tcp.port);
> +       cp = ip_vs_conn_in_get(&p);
>         if (cp) {
>                 /* Change reply VS->CLIENT to RS->CLIENT */
>                 new_reply = ct->tuplehash[IP_CT_DIR_REPLY].tuple;
> Index: lvs-test-2.6/net/netfilter/ipvs/ip_vs_proto_ah_esp.c
> ===================================================================
> --- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_proto_ah_esp.c   2010-10-04 16:58:31.000000000 +0900
> +++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_proto_ah_esp.c        2010-10-04 17:00:45.000000000 +0900
> @@ -40,6 +40,19 @@ struct isakmp_hdr {
>
>  #define PORT_ISAKMP    500
>
> +static void
> +ah_esp_conn_fill_param_proto(int af, const struct ip_vs_iphdr *iph,
> +                            int inverse, struct ip_vs_conn_param *p)
> +{
> +       if (likely(!inverse))
> +               ip_vs_conn_fill_param(af, IPPROTO_UDP,
> +                                     &iph->saddr, htons(PORT_ISAKMP),
> +                                     &iph->daddr, htons(PORT_ISAKMP), p);
> +       else
> +               ip_vs_conn_fill_param(af, IPPROTO_UDP,
> +                                     &iph->daddr, htons(PORT_ISAKMP),
> +                                     &iph->saddr, htons(PORT_ISAKMP), p);
> +}
>
>  static struct ip_vs_conn *
>  ah_esp_conn_in_get(int af, const struct sk_buff *skb, struct ip_vs_protocol *pp,
> @@ -47,21 +60,10 @@ ah_esp_conn_in_get(int af, const struct
>                    int inverse)
>  {
>         struct ip_vs_conn *cp;
> +       struct ip_vs_conn_param p;
>
> -       if (likely(!inverse)) {
> -               cp = ip_vs_conn_in_get(af, IPPROTO_UDP,
> -                                      &iph->saddr,
> -                                      htons(PORT_ISAKMP),
> -                                      &iph->daddr,
> -                                      htons(PORT_ISAKMP));
> -       } else {
> -               cp = ip_vs_conn_in_get(af, IPPROTO_UDP,
> -                                      &iph->daddr,
> -                                      htons(PORT_ISAKMP),
> -                                      &iph->saddr,
> -                                      htons(PORT_ISAKMP));
> -       }
> -
> +       ah_esp_conn_fill_param_proto(af, iph, inverse, &p);
> +       cp = ip_vs_conn_in_get(&p);
>         if (!cp) {
>                 /*
>                  * We are not sure if the packet is from our
> @@ -87,21 +89,10 @@ ah_esp_conn_out_get(int af, const struct
>                     int inverse)
>  {
>         struct ip_vs_conn *cp;
> +       struct ip_vs_conn_param p;
>
> -       if (likely(!inverse)) {
> -               cp = ip_vs_conn_out_get(af, IPPROTO_UDP,
> -                                       &iph->saddr,
> -                                       htons(PORT_ISAKMP),
> -                                       &iph->daddr,
> -                                       htons(PORT_ISAKMP));
> -       } else {
> -               cp = ip_vs_conn_out_get(af, IPPROTO_UDP,
> -                                       &iph->daddr,
> -                                       htons(PORT_ISAKMP),
> -                                       &iph->saddr,
> -                                       htons(PORT_ISAKMP));
> -       }
> -
> +       ah_esp_conn_fill_param_proto(af, iph, inverse, &p);
> +       cp = ip_vs_conn_out_get(&p);
>         if (!cp) {
>                 IP_VS_DBG_BUF(12, "Unknown ISAKMP entry for inout packet "
>                               "%s%s %s->%s\n",
> Index: lvs-test-2.6/net/netfilter/ipvs/ip_vs_sync.c
> ===================================================================
> --- lvs-test-2.6.orig/net/netfilter/ipvs/ip_vs_sync.c   2010-10-04 16:58:31.000000000 +0900
> +++ lvs-test-2.6/net/netfilter/ipvs/ip_vs_sync.c        2010-10-04 17:49:20.000000000 +0900
> @@ -301,6 +301,7 @@ static void ip_vs_process_message(const
>         struct ip_vs_conn *cp;
>         struct ip_vs_protocol *pp;
>         struct ip_vs_dest *dest;
> +       struct ip_vs_conn_param param;
>         char *p;
>         int i;
>
> @@ -370,18 +371,17 @@ static void ip_vs_process_message(const
>                         }
>                 }
>
> -               if (!(flags & IP_VS_CONN_F_TEMPLATE))
> -                       cp = ip_vs_conn_in_get(AF_INET, s->protocol,
> -                                              (union nf_inet_addr *)&s->caddr,
> -                                              s->cport,
> -                                              (union nf_inet_addr *)&s->vaddr,
> -                                              s->vport);
> -               else
> -                       cp = ip_vs_ct_in_get(AF_INET, s->protocol,
> -                                            (union nf_inet_addr *)&s->caddr,
> -                                            s->cport,
> -                                            (union nf_inet_addr *)&s->vaddr,
> -                                            s->vport);
> +               {
> +                       ip_vs_conn_fill_param(AF_INET, s->protocol,
> +                                             (union nf_inet_addr *)&s->caddr,
> +                                             s->cport,
> +                                             (union nf_inet_addr *)&s->vaddr,
> +                                             s->vport, &param);
> +                       if (!(flags & IP_VS_CONN_F_TEMPLATE))
> +                               cp = ip_vs_conn_in_get(&param);
> +                       else
> +                               cp = ip_vs_ct_in_get(&param);
> +               }
>                 if (!cp) {
>                         /*
>                          * Find the appropriate destination for the connection.
> @@ -406,14 +406,9 @@ static void ip_vs_process_message(const
>                                 else
>                                         flags &= ~IP_VS_CONN_F_INACTIVE;
>                         }
> -                       cp = ip_vs_conn_new(AF_INET, s->protocol,
> -                                           (union nf_inet_addr *)&s->caddr,
> -                                           s->cport,
> -                                           (union nf_inet_addr *)&s->vaddr,
> -                                           s->vport,
> +                       cp = ip_vs_conn_new(&param,
>                                             (union nf_inet_addr *)&s->daddr,
> -                                           s->dport,
> -                                           flags, dest);
> +                                           s->dport, flags, dest);
>                         if (dest)
>                                 atomic_dec(&dest->refcnt);
>                         if (!cp) {
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Patch applied for patch :-)

iff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 52fbe23..7646ff7 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -670,9 +670,19 @@ static inline void ip_vs_conn_fill_param(int af, int protocol,
 {
 	p->af = af;
 	p->protocol = protocol;
-	p->caddr = caddr;
 	p->cport = cport;
-	p->vaddr = vaddr;
+#ifdef CONFIG_IP_VS_IPV6
+	if (af == AF_INET6) {
+		ipv6_addr_copy(&p->caddr, caddr);
+		ipv6_addr_copy(&p->caddr, caddr);
+	}
+	else
+#else
+	}
+		p->caddr = caddr;
+		p->vaddr = vaddr;
+	}
+#endif
 	p->vport = vport;
 	p->pe = NULL;
 	p->pe_data = NULL;


--
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

^ permalink raw reply related

* Re: [PATCH net-next] 3c52x: remove IRQF_SAMPLE_RANDOM from legacy MCA drivers.
From: David Miller @ 2010-10-19  9:32 UTC (permalink / raw)
  To: paul.gortmaker; +Cc: netdev
In-Reply-To: <1287102111-5482-1-git-send-email-paul.gortmaker@windriver.com>

From: Paul Gortmaker <paul.gortmaker@windriver.com>
Date: Thu, 14 Oct 2010 20:21:51 -0400

> If you are genuinely using one of these legacy MCA drivers
> then you are tragically on hardware where you really don't
> have the extra CPU cycles to be wasting on this.
> 
> In addition, it makes two less cases for people to inadvertently
> blindly copy flags from without explicitly thinking whether it
> makes sense -- see the addition to feature-removal.txt as per
> commit 9d9b8fb0e5ebf4b0398e579f6061d4451fea3242.
> 
> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

Applied, thank you.

^ permalink raw reply

* [PATCH] Fixed race condition at ip_vs.ko module init.
From: Eduardo Blanco @ 2010-10-19  9:26 UTC (permalink / raw)
  To: Simon Horman; +Cc: lvs-devel, netdev, Wensong Zhang, Julian Anastasov

Lists were initialized after the module was registered.  Multiple ipvsadm
processes at module load triggered a race condition that resulted in a null
pointer dereference in do_ip_vs_get_ctl(). As a result, __ip_vs_mutex
was left locked preventing all further ipvsadm commands.

Signed-off-by: Eduardo J. Blanco <ejblanco@google.com>
---
 net/netfilter/ipvs/ip_vs_ctl.c |   19 ++++++++++---------
 1 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 0f0c079..68624dc 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -3385,6 +3385,16 @@ int __init ip_vs_control_init(void)

 	EnterFunction(2);

+	/* Initialize ip_vs_svc_table, ip_vs_svc_fwm_table, ip_vs_rtable */
+	for(idx = 0; idx < IP_VS_SVC_TAB_SIZE; idx++)  {
+		INIT_LIST_HEAD(&ip_vs_svc_table[idx]);
+		INIT_LIST_HEAD(&ip_vs_svc_fwm_table[idx]);
+	}
+	for(idx = 0; idx < IP_VS_RTAB_SIZE; idx++)  {
+		INIT_LIST_HEAD(&ip_vs_rtable[idx]);
+	}
+	smp_wmb();
+
 	ret = nf_register_sockopt(&ip_vs_sockopts);
 	if (ret) {
 		pr_err("cannot register sockopt.\n");
@@ -3403,15 +3413,6 @@ int __init ip_vs_control_init(void)

 	sysctl_header = register_sysctl_paths(net_vs_ctl_path, vs_vars);

-	/* Initialize ip_vs_svc_table, ip_vs_svc_fwm_table, ip_vs_rtable */
-	for(idx = 0; idx < IP_VS_SVC_TAB_SIZE; idx++)  {
-		INIT_LIST_HEAD(&ip_vs_svc_table[idx]);
-		INIT_LIST_HEAD(&ip_vs_svc_fwm_table[idx]);
-	}
-	for(idx = 0; idx < IP_VS_RTAB_SIZE; idx++)  {
-		INIT_LIST_HEAD(&ip_vs_rtable[idx]);
-	}

^ permalink raw reply related

* Re: [PATCH V3 net-next] can-raw: add msg_flags to distinguish local traffic
From: Alexander Stein @ 2010-10-19  9:12 UTC (permalink / raw)
  To: Kurt Van Dijck
  Cc: socketcan-core-0fE9KPoRgkgATYTw5x5z8w, netdev, Oliver Hartkopp
In-Reply-To: <20101019090247.GB334-MxZ6Iy/zr/UdbCeoMzGj59i2O/JbrIOy@public.gmane.org>

Hello Kurt,

On Tuesday 19 October 2010, 11:02:47 Kurt Van Dijck wrote:
> diff --git a/Documentation/networking/can.txt
> b/Documentation/networking/can.txt index cd79735..bc015d0 100644
> --- a/Documentation/networking/can.txt
> +++ b/Documentation/networking/can.txt
> @@ -22,6 +22,7 @@ This file contains
>        4.1.2 RAW socket option CAN_RAW_ERR_FILTER
>        4.1.3 RAW socket option CAN_RAW_LOOPBACK
>        4.1.4 RAW socket option CAN_RAW_RECV_OWN_MSGS
> +      4.1.5 RAW socket returned flags
>      4.2 Broadcast Manager protocol sockets (SOCK_DGRAM)
>      4.3 connected transport protocols (SOCK_SEQPACKET)
>      4.4 unconnected transport protocols (SOCK_DGRAM)
> @@ -471,6 +472,18 @@ solution for a couple of reasons:
>      setsockopt(s, SOL_CAN_RAW, CAN_RAW_RECV_OWN_MSGS,
>                 &recv_own_msgs, sizeof(recv_own_msgs));
> 
> +  4.1.5 RAW socket returned flags
> +
> +  When using recvmsg() call, the msg->msg_flags may contain following
> flags: +
> +    MSG_DONTROUTE: set when the received frame was created on the local
> host. +
> +    MSG_CONFIRM: set when the frame was sent via the socket it is received
> on. +        This flag can be interpreted as a 'transmission confirmation'
> +	when the CAN driver supports the echo of CAN frames on driver level,
> +	see 3.2 and 6.2.
> +	In order to receive such messages, CAN_RAW_RECV_OWN_MSGS must be set.
> +

You mixed tabs and spaces at MSG_CONFIRM entry. Dunno if that was intended.

Regards,
Alexander

^ permalink raw reply

* [PATCH V3 net-next] can-raw: add msg_flags to distinguish local traffic
From: Kurt Van Dijck @ 2010-10-19  9:02 UTC (permalink / raw)
  To: netdev, socketcan-core-0fE9KPoRgkgATYTw5x5z8w; +Cc: Oliver Hartkopp

The difference in V3 is proper documentation.
Both code & documentation look very complete now.

CAN has no addressing scheme. It is currently impossible
for userspace to tell is a received CAN frame comes from
another process on the local host, or from a remote CAN
device.
This patch add support for userspace applications to distinguish
between 'own', 'local' and 'remote' CAN traffic.
Distinction is made by returning flags in msg->msg_flags
in the call to recvmsg.
The Documentation/...  explains the flags.

Signed-off-by: Kurt Van Dijck <kurt.van.dijck-/BeEPy95v10@public.gmane.org>
---
 Documentation/networking/can.txt |   13 +++++++++++++
 net/can/raw.c                    |   33 ++++++++++++++++++++++++++++++---
 2 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/Documentation/networking/can.txt b/Documentation/networking/can.txt
index cd79735..bc015d0 100644
--- a/Documentation/networking/can.txt
+++ b/Documentation/networking/can.txt
@@ -22,6 +22,7 @@ This file contains
       4.1.2 RAW socket option CAN_RAW_ERR_FILTER
       4.1.3 RAW socket option CAN_RAW_LOOPBACK
       4.1.4 RAW socket option CAN_RAW_RECV_OWN_MSGS
+      4.1.5 RAW socket returned flags
     4.2 Broadcast Manager protocol sockets (SOCK_DGRAM)
     4.3 connected transport protocols (SOCK_SEQPACKET)
     4.4 unconnected transport protocols (SOCK_DGRAM)
@@ -471,6 +472,18 @@ solution for a couple of reasons:
     setsockopt(s, SOL_CAN_RAW, CAN_RAW_RECV_OWN_MSGS,
                &recv_own_msgs, sizeof(recv_own_msgs));
 
+  4.1.5 RAW socket returned flags
+
+  When using recvmsg() call, the msg->msg_flags may contain following flags:
+
+    MSG_DONTROUTE: set when the received frame was created on the local host.
+
+    MSG_CONFIRM: set when the frame was sent via the socket it is received on.
+        This flag can be interpreted as a 'transmission confirmation'
+	when the CAN driver supports the echo of CAN frames on driver level,
+	see 3.2 and 6.2.
+	In order to receive such messages, CAN_RAW_RECV_OWN_MSGS must be set.
+
   4.2 Broadcast Manager protocol sockets (SOCK_DGRAM)
   4.3 connected transport protocols (SOCK_SEQPACKET)
   4.4 unconnected transport protocols (SOCK_DGRAM)
diff --git a/net/can/raw.c b/net/can/raw.c
index 7d77e67..a9edd69 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -90,23 +90,39 @@ struct raw_sock {
 	can_err_mask_t err_mask;
 };
 
+/*
+ * Return pointer to store the extra msg flags for raw_recvmsg().
+ * We use the space of one unsigned int beyond the 'struct sockaddr_can'
+ * in skb->cb.
+ */
+static inline unsigned int *raw_flags(struct sk_buff *skb)
+{
+	BUILD_BUG_ON(sizeof(skb->cb) <= (sizeof(struct sockaddr_can)
+				+ sizeof(unsigned int)));
+
+	/* return pointer after struct sockaddr_can */
+	return (unsigned int *)(&((struct sockaddr_can *)skb->cb)[1]);
+}
+
 static inline struct raw_sock *raw_sk(const struct sock *sk)
 {
 	return (struct raw_sock *)sk;
 }
 
-static void raw_rcv(struct sk_buff *skb, void *data)
+static void raw_rcv(struct sk_buff *oskb, void *data)
 {
 	struct sock *sk = (struct sock *)data;
 	struct raw_sock *ro = raw_sk(sk);
 	struct sockaddr_can *addr;
+	struct sk_buff *skb;
+	unsigned int *pflags;
 
 	/* check the received tx sock reference */
-	if (!ro->recv_own_msgs && skb->sk == sk)
+	if (!ro->recv_own_msgs && oskb->sk == sk)
 		return;
 
 	/* clone the given skb to be able to enqueue it into the rcv queue */
-	skb = skb_clone(skb, GFP_ATOMIC);
+	skb = skb_clone(oskb, GFP_ATOMIC);
 	if (!skb)
 		return;
 
@@ -123,6 +139,14 @@ static void raw_rcv(struct sk_buff *skb, void *data)
 	addr->can_family  = AF_CAN;
 	addr->can_ifindex = skb->dev->ifindex;
 
+	/* prepare the flags for raw_recvmsg() */
+	pflags = raw_flags(skb);
+	*pflags = 0;
+	if (oskb->sk)
+		*pflags |= MSG_DONTROUTE;
+	if (oskb->sk == sk)
+		*pflags |= MSG_CONFIRM;
+
 	if (sock_queue_rcv_skb(sk, skb) < 0)
 		kfree_skb(skb);
 }
@@ -707,6 +731,9 @@ static int raw_recvmsg(struct kiocb *iocb, struct socket *sock,
 		memcpy(msg->msg_name, skb->cb, msg->msg_namelen);
 	}
 
+	/* assign the flags that have been recorded in raw_rcv() */
+	msg->msg_flags |= *(raw_flags(skb));
+
 	skb_free_datagram(sk, skb);
 
 	return size;

^ permalink raw reply related

* Re: Nested GRE locking bug
From: Eric Dumazet @ 2010-10-19  9:02 UTC (permalink / raw)
  To: David Miller; +Cc: ben, netdev, beatrice.barbe, 599816
In-Reply-To: <20101019.015303.104063240.davem@davemloft.net>

Le mardi 19 octobre 2010 à 01:53 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 14 Oct 2010 06:11:59 +0200
> 
> > net-next-2.6 contains a fix for this, adding the perc_cpu
> > xmit_recursion limit. We might push it to net-2.6
> 
> We need to think a bit more about this.
> 
> We are essentially now saying that one can only configure
> tunnels 3 levels deep, and no more.
> 
> I can guarentee you someone out there uses at least 4,
> perhaps more.
> 
> And those people will be broken by the new limit.
> 
> So putting this into net-2.6 with such a low limit will
> be quite dangerous.

Well limit is actually 4, but I get your point ;)




^ permalink raw reply

* Re: Nested GRE locking bug
From: David Miller @ 2010-10-19  8:53 UTC (permalink / raw)
  To: eric.dumazet; +Cc: ben, netdev, beatrice.barbe, 599816
In-Reply-To: <1287029519.2649.108.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 14 Oct 2010 06:11:59 +0200

> net-next-2.6 contains a fix for this, adding the perc_cpu
> xmit_recursion limit. We might push it to net-2.6

We need to think a bit more about this.

We are essentially now saying that one can only configure
tunnels 3 levels deep, and no more.

I can guarentee you someone out there uses at least 4,
perhaps more.

And those people will be broken by the new limit.

So putting this into net-2.6 with such a low limit will
be quite dangerous.

^ permalink raw reply

* Re: [patch v4.1] ipvs: IPv6 tunnel mode
From: Patrick McHardy @ 2010-10-19  8:40 UTC (permalink / raw)
  To: Simon Horman
  Cc: lvs-devel, netdev, netfilter-devel, Julian Anastasov, Julius Volz,
	Wensong Zhang, Hans Schillström
In-Reply-To: <20101016075837.GB2027@verge.net.au>

Am 16.10.2010 09:58, schrieb Simon Horman:
> From: Hans Schillstrom <hans.schillstrom@ericsson.com>
> 
> ipvs: IPv6 tunnel mode
> 
> IPv6 encapsulation uses a bad source address for the tunnel.
> i.e. VIP will be used as local-addr and encap. dst addr.
> Decapsulation will not accept this.
> 
> Example
> LVS (eth1 2003::2:0:1/96, VIP 2003::2:0:100)
>    (eth0 2003::1:0:1/96)
> RS  (ethX 2003::1:0:5/96)
> 
> tcpdump
> 2003::2:0:100 > 2003::1:0:5:
> IP6 (hlim 63, next-header TCP (6) payload length: 40)
>  2003::3:0:10.50991 > 2003::2:0:100.http: Flags [S], cksum 0x7312
> (correct), seq 3006460279, win 5760, options [mss 1440,sackOK,TS val
> 1904932 ecr 0,nop,wscale 3], length 0
> 
> In Linux IPv6 impl. you can't have a tunnel with an any cast address
> receiving packets (I have not tried to interpret RFC 2473)
> To have receive capabilities the tunnel must have:
>  - Local address set as multicast addr or an unicast addr
>  - Remote address set as an unicast addr.
>  - Loop back addres or Link local address are not allowed.
> 
> This causes us to setup a tunnel in the Real Server with the
> LVS as the remote address, here you can't use the VIP address since it's
> used inside the tunnel.
> 
> Solution
> Use outgoing interface IPv6 address (match against the destination).
> i.e. use ip6_route_output() to look up the route cache and
> then use ipv6_dev_get_saddr(...) to set the source address of the
> encapsulated packet.
> 
> Additionally, cache the results in new destination
> fields: dst_cookie and dst_saddr and properly check the
> returned dst from ip6_route_output. We now add xfrm_lookup
> call only for the tunneling method where the source address
> is a local one.
> 
> Signed-off-by:Hans Schillstrom <hans.schillstrom@ericsson.com>
> 
> ---
> 
> Original patch by Hans Schillstrom.
> Check dst state and cache results for IPv6 by Julian Anastasov.
> Subsequent revisions made by Hans Schillstrom:
> 
> * v1
>     
>   This is Julian's patch with a slightly edited version of the description
>   from Hans's original patch.
> 
> * v2
>     
>   Updated changelog as per commends from Julian
> 
> * v3
>     
>   Flowi dest address used as destination instead of rt6_info in
> +ip_vs_tunnel_xmit_v6()
>   rt6_info somtimes contains a netw address insted of a tunnel
> 
> * v4
>     
>   Update destination as recommended from Julian
>   i.e. use &cp->daddr.in6
> 
> * v4.1 Simon Horman
>   Fix patch corruption
> 
> Patrick, please consider this for nf-next-2.6

Applied, thanks.


^ permalink raw reply

* Re: Linux 2.6.35/TIPC 2.0 ABI breaking changes
From: David Miller @ 2010-10-19  8:16 UTC (permalink / raw)
  To: luca
  Cc: nhorman, paul.gortmaker, jon.maloy, netdev, linux-kernel,
	tipc-discussion
In-Reply-To: <20101019021657.GZ8781@llucax.com.ar>

From: Leandro Lucarella <luca@llucax.com.ar>
Date: Mon, 18 Oct 2010 23:16:57 -0300

> 
> The problem is not between the tipc stacks in different hosts, is
> between the tipc stack and the applications using it (well, maybe
> there is a problem somewhere else too).
> 
> This was a deliberate API change, not a subtle bug...

Neil et al., if these packets live only between the kernel stack
and the userspace API layer, we should not be byte-swapping this
stuff and we need to fix this fast.

Thanks.

^ permalink raw reply

* Re: [PATCH net-next] bnx2: Increase max rx ring size from 1K to 2K
From: David Miller @ 2010-10-19  8:14 UTC (permalink / raw)
  To: mchan; +Cc: andy, jfeeney, netdev
In-Reply-To: <1287448254-14173-1-git-send-email-mchan@broadcom.com>

From: "Michael Chan" <mchan@broadcom.com>
Date: Mon, 18 Oct 2010 17:30:54 -0700

> A number of customers are reporting packet loss under certain workloads
> (e.g. heavy bursts of small packets) with flow control disabled.  A larger
> rx ring helps to prevent these losses.
> 
> No change in default rx ring size and memory consumption.
> 
> Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
> Acked-by: John Feeney <jfeeney@redhat.com>
> Signed-off-by: Michael Chan <mchan@broadcom.com>

I don't see how it's any better to queue things more deeply in
hardware, compared to simply using hardware flow control since that's
what it's for and makes the queuing happen in the networking stack of
the sender, in software, which in the end performs better and gives
better feedback to the source of the data.

These huge RX queue sizes are absolutely rediculious, and I've
complained about this before.

And instead of seeing less of this, I keep seeing more of this stuff.
Please exert some pushback on these folks who are doing such insane
things.

Thanks.

^ permalink raw reply

* Re: [PATCH 3/3 v2] net: allocate tx queues in register_netdevice
From: Eric Dumazet @ 2010-10-19  8:13 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev, bhutchings
In-Reply-To: <alpine.DEB.1.00.1010182100180.6676@pokey.mtv.corp.google.com>

Le lundi 18 octobre 2010 à 21:04 -0700, Tom Herbert a écrit :
> This patch introduces netif_alloc_netdev_queues which is called from
> register_device instead of alloc_netdev_mq.  This makes TX queue
> allocation symmetric with RX allocation.  Also, queue locks allocation
> is done in netdev_init_one_queue.  Change set_real_num_tx_queues to
> fail if requested number < 1 or greater than number of allocated
> queues.
> 
> Signed-off-by: Tom Herbert <therbert@google.com>
> ---
>  include/linux/netdevice.h |    4 +-
>  net/core/dev.c            |  106 ++++++++++++++++++++++----------------------
>  2 files changed, 55 insertions(+), 55 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 14fbb04..880d565 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1696,8 +1696,8 @@ static inline int netif_is_multiqueue(const struct net_device *dev)
>  	return dev->num_tx_queues > 1;
>  }
>  
> -extern void netif_set_real_num_tx_queues(struct net_device *dev,
> -					 unsigned int txq);
> +extern int netif_set_real_num_tx_queues(struct net_device *dev,
> +					unsigned int txq);
>  
>  #ifdef CONFIG_RPS
>  extern int netif_set_real_num_rx_queues(struct net_device *dev,
> diff --git a/net/core/dev.c b/net/core/dev.c
> index d33adec..7ae5c7e 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1553,18 +1553,20 @@ static void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev)
>   * Routine to help set real_num_tx_queues. To avoid skbs mapped to queues
>   * greater then real_num_tx_queues stale skbs on the qdisc must be flushed.
>   */
> -void netif_set_real_num_tx_queues(struct net_device *dev, unsigned int txq)
> +int netif_set_real_num_tx_queues(struct net_device *dev, unsigned int txq)
>  {
> -	unsigned int real_num = dev->real_num_tx_queues;
> +	if (txq < 1 || txq > dev->num_tx_queues)
> +		return -EINVAL;
>  
> -	if (unlikely(txq > dev->num_tx_queues))
> -		;
> -	else if (txq > real_num)
> -		dev->real_num_tx_queues = txq;
> -	else if (txq < real_num) {
> -		dev->real_num_tx_queues = txq;
> -		qdisc_reset_all_tx_gt(dev, txq);
> +	if (dev->reg_state == NETREG_REGISTERED) {
> +		ASSERT_RTNL();
> +
> +		if (txq < dev->real_num_tx_queues)
> +			qdisc_reset_all_tx_gt(dev, txq);
>  	}
> +
> +	dev->real_num_tx_queues = txq;
> +	return 0;
>  }
>  EXPORT_SYMBOL(netif_set_real_num_tx_queues);
>  
> @@ -4928,20 +4930,6 @@ static void rollback_registered(struct net_device *dev)
>  	rollback_registered_many(&single);
>  }
>  
> -static void __netdev_init_queue_locks_one(struct net_device *dev,
> -					  struct netdev_queue *dev_queue,
> -					  void *_unused)
> -{
> -	spin_lock_init(&dev_queue->_xmit_lock);
> -	netdev_set_xmit_lockdep_class(&dev_queue->_xmit_lock, dev->type);
> -	dev_queue->xmit_lock_owner = -1;
> -}
> -
> -static void netdev_init_queue_locks(struct net_device *dev)
> -{
> -	netdev_for_each_tx_queue(dev, __netdev_init_queue_locks_one, NULL);
> -}
> -
>  unsigned long netdev_fix_features(unsigned long features, const char *name)
>  {
>  	/* Fix illegal SG+CSUM combinations. */
> @@ -5034,6 +5022,41 @@ static int netif_alloc_rx_queues(struct net_device *dev)
>  	return 0;
>  }
>  
> +static int netif_alloc_netdev_queues(struct net_device *dev)
> +{
> +	unsigned int count = dev->num_tx_queues;
> +	struct netdev_queue *tx;
> +
> +	BUG_ON(count < 1);
> +
> +	tx = kcalloc(count, sizeof(struct netdev_queue), GFP_KERNEL);
> +	if (!tx) {
> +		pr_err("netdev: Unable to allocate %u tx queues.\n",
> +		       count);
> +			return -ENOMEM;

One extra tab before the return

Other than that, patch seems fine to me, thanks !

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>




^ permalink raw reply

* Re: [PATCH 2/3 v2] net: cleanups in RX queue allocation
From: Eric Dumazet @ 2010-10-19  8:13 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem, netdev, bhutchings
In-Reply-To: <alpine.DEB.1.00.1010182056010.6676@pokey.mtv.corp.google.com>

Le lundi 18 octobre 2010 à 21:00 -0700, Tom Herbert a écrit :
> Clean up in RX queue allocation.  In netif_set_real_num_rx_queues
> return error on attempt to set zero queues, or requested number is
> greater than number of allocated queues.  In netif_alloc_rx_queues,
> do BUG_ON if queue_count is zero.
> 
> Signed-off-by: Tom Herbert <therbert@google.com>
> ---

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>



^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox