Netdev List
 help / color / mirror / Atom feed
* Re: RCU'ed dst_get_neighbour()
From: Roland Dreier @ 2011-11-29 20:47 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Marc Aurele La France, David Miller,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1322599437.2596.10.camel@edumazet-laptop>

Thanks Eric, I'll send this to Linus shortly.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: RCU'ed dst_get_neighbour()
From: Eric Dumazet @ 2011-11-29 20:53 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Marc Aurele La France, David Miller,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <CAG4TOxNJfr3X1p358LBWdNQKdkw8KOSekcKpNu6K5_phPEiR4A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Le mardi 29 novembre 2011 à 12:47 -0800, Roland Dreier a écrit :
> Thanks Eric, I'll send this to Linus shortly.

Oh well, I forgot one rcu_read_unlock(), I'll send a V2...



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: RCU'ed dst_get_neighbour()
From: Roland Dreier @ 2011-11-29 20:56 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Marc Aurele La France, David Miller,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1322599991.2596.11.camel@edumazet-laptop>

OK, haven't sent it on yet :)

On Tue, Nov 29, 2011 at 12:53 PM, Eric Dumazet <eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> Le mardi 29 novembre 2011 à 12:47 -0800, Roland Dreier a écrit :
>> Thanks Eric, I'll send this to Linus shortly.
>
> Oh well, I forgot one rcu_read_unlock(), I'll send a V2...
>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: is non-inheritance of congestion control algorithm from the listen socket a bug or a feature?
From: Eric Dumazet @ 2011-11-29 20:56 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev
In-Reply-To: <4ED53AA1.9090800@hp.com>

Le mardi 29 novembre 2011 à 12:03 -0800, Rick Jones a écrit :

> Is the non-inheritance of the congestion control algorithm a bug or a 
> feature?

IMHO its a bug.

Can you provide a fix ?

Thanks !

^ permalink raw reply

* Re: [PATCH v4 06/10] e1000e: Support for byte queue limits
From: Jesse Brandeburg @ 2011-11-29 21:01 UTC (permalink / raw)
  To: Tom Herbert; +Cc: davem@davemloft.net, netdev@vger.kernel.org
In-Reply-To: <alpine.DEB.2.00.1111281819350.24413@pokey.mtv.corp.google.com>

On Mon, 28 Nov 2011 18:33:16 -0800
Tom Herbert <therbert@google.com> wrote:

> Changes to e1000e to use byte queue limits.

First:  thanks Tom for looking into e1000e with this work.

> @@ -1096,6 +1097,10 @@ static bool e1000_clean_tx_irq(struct e1000_adapter *adapter)
>  			if (cleaned) {
>  				total_tx_packets += buffer_info->segs;
>  				total_tx_bytes += buffer_info->bytecount;
> +				if (buffer_info->skb) {
> +					bytes_compl += buffer_info->skb->len;

whats wrong with using total_tx_bytes or buffer_info->bytecount?  it
contains the "bytes on the wire" value which will be slightly larger
than skb->len, but avoids warming the skb->len cacheline unnecessarily.

the rest of the patch to e1000e looks okay.

^ permalink raw reply

* Re: RCU'ed dst_get_neighbour()
From: Marc Aurele La France @ 2011-11-29 21:00 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Roland Dreier, David Miller, netdev, linux-rdma
In-Reply-To: <1322599991.2596.11.camel@edumazet-laptop>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1149 bytes --]

On Tue, 29 Nov 2011, Eric Dumazet wrote:

> Le mardi 29 novembre 2011 à 12:47 -0800, Roland Dreier a écrit :
>> Thanks Eric, I'll send this to Linus shortly.

> Oh well, I forgot one rcu_read_unlock(), I'll send a V2...

This also doesn't address the other dst_get_neighbour() instances 
introduced by 
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=69cce1d1404968f78b177a0314f5822d5afdbbfb

Marc.

+----------------------------------+----------------------------------+
|  Marc Aurele La France           |  work:   1-780-492-9310          |
|  Academic Information and        |  fax:    1-780-492-1729          |
|    Communications Technologies   |  email:  tsi@ualberta.ca         |
|  352 General Services Building   +----------------------------------+
|  University of Alberta           |                                  |
|  Edmonton, Alberta               |    Standard disclaimers apply    |
|  T6G 2H1                         |                                  |
|  CANADA                          |                                  |
+----------------------------------+----------------------------------+

^ permalink raw reply

* [PATCH] bnx2x: Use kcalloc instead of kzalloc to allocate array
From: Thomas Meyer @ 2011-11-29 21:08 UTC (permalink / raw)
  To: netdev, linux-kernel

The advantage of kcalloc is, that will prevent integer overflows which could
result from the multiplication of number of elements and size and it is also
a bit nicer to read.

The semantic patch that makes this change is available
in https://lkml.org/lkml/2011/11/25/107

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
---

diff -u -p a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c 2011-11-13 11:07:33.983607086 +0100
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c 2011-11-28 19:52:50.979887113 +0100
@@ -3278,14 +3278,14 @@ int __devinit bnx2x_alloc_mem_bp(struct
 	msix_table_size = bp->igu_sb_cnt + 1;
 
 	/* fp array: RSS plus CNIC related L2 queues */
-	fp = kzalloc((BNX2X_MAX_RSS_COUNT(bp) + NON_ETH_CONTEXT_USE) *
+	fp = kcalloc(BNX2X_MAX_RSS_COUNT(bp) + NON_ETH_CONTEXT_USE,
 		     sizeof(*fp), GFP_KERNEL);
 	if (!fp)
 		goto alloc_err;
 	bp->fp = fp;
 
 	/* msix table */
-	tbl = kzalloc(msix_table_size * sizeof(*tbl), GFP_KERNEL);
+	tbl = kcalloc(msix_table_size, sizeof(*tbl), GFP_KERNEL);
 	if (!tbl)
 		goto alloc_err;
 	bp->msix_table = tbl;
diff -u -p a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c 2011-11-28 19:36:47.716773832 +0100
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_sp.c 2011-11-28 19:52:53.063259517 +0100
@@ -3342,7 +3342,7 @@ static inline int bnx2x_mcast_refresh_re
 		if (!list_empty(&o->registry.exact_match.macs))
 			return 0;
 
-		elem = kzalloc(sizeof(*elem)*len, GFP_ATOMIC);
+		elem = kcalloc(len, sizeof(*elem), GFP_ATOMIC);
 		if (!elem) {
 			BNX2X_ERR("Failed to allocate registry memory\n");
 			return -ENOMEM;

^ permalink raw reply

* [PATCH] enic: Use kcalloc instead of kzalloc to allocate array
From: Thomas Meyer @ 2011-11-29 21:08 UTC (permalink / raw)
  To: netdev, linux-kernel

The advantage of kcalloc is, that will prevent integer overflows which could
result from the multiplication of number of elements and size and it is also
a bit nicer to read.

The semantic patch that makes this change is available
in https://lkml.org/lkml/2011/11/25/107

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
---

diff -u -p a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c
--- a/drivers/net/ethernet/cisco/enic/enic_main.c 2011-11-13 11:07:34.306945516 +0100
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c 2011-11-28 19:53:11.983614560 +0100
@@ -2379,7 +2379,7 @@ static int __devinit enic_probe(struct p
 
 #endif
 	/* Allocate structure for port profiles */
-	enic->pp = kzalloc(num_pps * sizeof(*enic->pp), GFP_KERNEL);
+	enic->pp = kcalloc(num_pps, sizeof(*enic->pp), GFP_KERNEL);
 	if (!enic->pp) {
 		pr_err("port profile alloc failed, aborting\n");
 		err = -ENOMEM;

^ permalink raw reply

* [PATCH] rt2x00: Use kcalloc instead of kzalloc to allocate array
From: Thomas Meyer @ 2011-11-29 21:08 UTC (permalink / raw)
  To: IvDoorn, gwingerde, helmut.schaa, linville, linux-wireless, users,
	netdev, linux-kernel

The advantage of kcalloc is, that will prevent integer overflows which could
result from the multiplication of number of elements and size and it is also
a bit nicer to read.

The semantic patch that makes this change is available
in https://lkml.org/lkml/2011/11/25/107

Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
---

diff -u -p a/drivers/net/wireless/rt2x00/rt2x00dev.c b/drivers/net/wireless/rt2x00/rt2x00dev.c
--- a/drivers/net/wireless/rt2x00/rt2x00dev.c 2011-11-28 19:36:47.770108588 +0100
+++ b/drivers/net/wireless/rt2x00/rt2x00dev.c 2011-11-28 19:54:53.495525543 +0100
@@ -831,11 +831,11 @@ static int rt2x00lib_probe_hw_modes(stru
 	if (spec->supported_rates & SUPPORT_RATE_OFDM)
 		num_rates += 8;
 
-	channels = kzalloc(sizeof(*channels) * spec->num_channels, GFP_KERNEL);
+	channels = kcalloc(spec->num_channels, sizeof(*channels), GFP_KERNEL);
 	if (!channels)
 		return -ENOMEM;
 
-	rates = kzalloc(sizeof(*rates) * num_rates, GFP_KERNEL);
+	rates = kcalloc(num_rates, sizeof(*rates), GFP_KERNEL);
 	if (!rates)
 		goto exit_free_channels;
 

^ permalink raw reply

* Re: is non-inheritance of congestion control algorithm from the listen socket a bug or a feature?
From: Rick Jones @ 2011-11-29 21:17 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1322600212.2596.13.camel@edumazet-laptop>

On 11/29/2011 12:56 PM, Eric Dumazet wrote:
> Le mardi 29 novembre 2011 à 12:03 -0800, Rick Jones a écrit :
>
>> Is the non-inheritance of the congestion control algorithm a bug or a
>> feature?
>
> IMHO its a bug.
>
> Can you provide a fix ?

I can try once I find my way through the maze.

rick

^ permalink raw reply

* Re: RCU'ed dst_get_neighbour()
From: Eric Dumazet @ 2011-11-29 21:17 UTC (permalink / raw)
  To: Marc Aurele La France
  Cc: Roland Dreier, David Miller, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <alpine.WNT.2.00.1111291359290.1036@TSI>

Le mardi 29 novembre 2011 à 14:00 -0700, Marc Aurele La France a écrit :
> On Tue, 29 Nov 2011, Eric Dumazet wrote:
> 
> > Le mardi 29 novembre 2011 à 12:47 -0800, Roland Dreier a écrit :
> >> Thanks Eric, I'll send this to Linus shortly.
> 
> > Oh well, I forgot one rcu_read_unlock(), I'll send a V2...
> 
> This also doesn't address the other dst_get_neighbour() instances 
> introduced by 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=69cce1d1404968f78b177a0314f5822d5afdbbfb
> 

Oh well, a complete audit is needed, and I have no choice but doing it.

Thanks !


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: is non-inheritance of congestion control algorithm from the listen socket a bug or a feature?
From: Yuchung Cheng @ 2011-11-29 21:20 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Rick Jones, netdev
In-Reply-To: <1322600212.2596.13.camel@edumazet-laptop>

On Tue, Nov 29, 2011 at 12:56 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mardi 29 novembre 2011 à 12:03 -0800, Rick Jones a écrit :
>
>> Is the non-inheritance of the congestion control algorithm a bug or a
>> feature?
>
> IMHO its a bug.
>
> Can you provide a fix ?
>
> Thanks !
I actually think it's a feature :)

I find it awkward to set CC on listening socket. And current document
defines the sysctl well

tcp_congestion_control - STRING
        Set the congestion control algorithm to be used for new
        connections. The algorithm "reno" is always available, but
        additional choices may be available based on kernel configuration.
        Default is set as part of kernel configuration.

^ permalink raw reply

* Re: RCU'ed dst_get_neighbour()
From: Eric Dumazet @ 2011-11-29 21:31 UTC (permalink / raw)
  To: Marc Aurele La France
  Cc: Roland Dreier, David Miller, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1322601444.2596.21.camel@edumazet-laptop>

Le mardi 29 novembre 2011 à 22:17 +0100, Eric Dumazet a écrit :
> Le mardi 29 novembre 2011 à 14:00 -0700, Marc Aurele La France a écrit :
> > On Tue, 29 Nov 2011, Eric Dumazet wrote:
> > 
> > > Le mardi 29 novembre 2011 à 12:47 -0800, Roland Dreier a écrit :
> > >> Thanks Eric, I'll send this to Linus shortly.
> > 
> > > Oh well, I forgot one rcu_read_unlock(), I'll send a V2...
> > 
> > This also doesn't address the other dst_get_neighbour() instances 
> > introduced by 
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=69cce1d1404968f78b177a0314f5822d5afdbbfb
> > 
> 
> Oh well, a complete audit is needed, and I have no choice but doing it.
> 
> Thanks !
> 

Here is the result of this audit, please double check and test it, I
only compiled this.

Thanks !

[PATCH V2] drivers/infiniband: fix lockdep splats

commit f2c31e32b37 (net: fix NULL dereferences in check_peer_redir())
forgot to take care of infiniband uses of dst neighbours.

Many thanks to Marc Aurele who provided a nice bug report and feedback.

Reported-by: Marc Aurele La France <tsi-yfeSBMgouQgsA/PxXw9srA@public.gmane.org>
Signed-off-by: Eric Dumazet <eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
CC: David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
CC: Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
 drivers/infiniband/core/addr.c                 |    9 +++++--
 drivers/infiniband/hw/cxgb3/iwch_cm.c          |    4 +++
 drivers/infiniband/hw/cxgb4/cm.c               |    6 +++++
 drivers/infiniband/hw/nes/nes_cm.c             |    6 +++--
 drivers/infiniband/ulp/ipoib/ipoib_main.c      |   18 +++++++++------
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |    6 +++--
 6 files changed, 35 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index 691276b..e9cf51b 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -216,7 +216,9 @@ static int addr4_resolve(struct sockaddr_in *src_in,
 
 	neigh = neigh_lookup(&arp_tbl, &rt->rt_gateway, rt->dst.dev);
 	if (!neigh || !(neigh->nud_state & NUD_VALID)) {
+		rcu_read_lock();
 		neigh_event_send(dst_get_neighbour(&rt->dst), NULL);
+		rcu_read_unlock();
 		ret = -ENODATA;
 		if (neigh)
 			goto release;
@@ -274,15 +276,16 @@ static int addr6_resolve(struct sockaddr_in6 *src_in,
 		goto put;
 	}
 
+	rcu_read_lock();
 	neigh = dst_get_neighbour(dst);
 	if (!neigh || !(neigh->nud_state & NUD_VALID)) {
 		if (neigh)
 			neigh_event_send(neigh, NULL);
 		ret = -ENODATA;
-		goto put;
+	} else {
+		ret = rdma_copy_addr(addr, dst->dev, neigh->ha);
 	}
-
-	ret = rdma_copy_addr(addr, dst->dev, neigh->ha);
+	rcu_read_unlock();
 put:
 	dst_release(dst);
 	return ret;
diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c
index de6d077..c88b12b 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_cm.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c
@@ -1375,8 +1375,10 @@ static int pass_accept_req(struct t3cdev *tdev, struct sk_buff *skb, void *ctx)
 		goto reject;
 	}
 	dst = &rt->dst;
+	rcu_read_lock();
 	neigh = dst_get_neighbour(dst);
 	l2t = t3_l2t_get(tdev, neigh, neigh->dev);
+	rcu_read_unlock();
 	if (!l2t) {
 		printk(KERN_ERR MOD "%s - failed to allocate l2t entry!\n",
 		       __func__);
@@ -1946,10 +1948,12 @@ int iwch_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param)
 	}
 	ep->dst = &rt->dst;
 
+	rcu_read_lock();
 	neigh = dst_get_neighbour(ep->dst);
 
 	/* get a l2t entry */
 	ep->l2t = t3_l2t_get(ep->com.tdev, neigh, neigh->dev);
+	rcu_read_unlock();
 	if (!ep->l2t) {
 		printk(KERN_ERR MOD "%s - cannot alloc l2e.\n", __func__);
 		err = -ENOMEM;
diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index b36cdac..75b57be 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -1594,6 +1594,7 @@ static int pass_accept_req(struct c4iw_dev *dev, struct sk_buff *skb)
 		goto reject;
 	}
 	dst = &rt->dst;
+	rcu_read_lock();
 	neigh = dst_get_neighbour(dst);
 	if (neigh->dev->flags & IFF_LOOPBACK) {
 		pdev = ip_dev_find(&init_net, peer_ip);
@@ -1620,6 +1621,7 @@ static int pass_accept_req(struct c4iw_dev *dev, struct sk_buff *skb)
 		rss_qid = dev->rdev.lldi.rxq_ids[
 			  cxgb4_port_idx(neigh->dev) * step];
 	}
+	rcu_read_unlock();
 	if (!l2t) {
 		printk(KERN_ERR MOD "%s - failed to allocate l2t entry!\n",
 		       __func__);
@@ -1820,6 +1822,7 @@ static int c4iw_reconnect(struct c4iw_ep *ep)
 	}
 	ep->dst = &rt->dst;
 
+	rcu_read_lock();
 	neigh = dst_get_neighbour(ep->dst);
 
 	/* get a l2t entry */
@@ -1856,6 +1859,7 @@ static int c4iw_reconnect(struct c4iw_ep *ep)
 		ep->rss_qid = ep->com.dev->rdev.lldi.rxq_ids[
 			cxgb4_port_idx(neigh->dev) * step];
 	}
+	rcu_read_unlock();
 	if (!ep->l2t) {
 		printk(KERN_ERR MOD "%s - cannot alloc l2e.\n", __func__);
 		err = -ENOMEM;
@@ -2301,6 +2305,7 @@ int c4iw_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param)
 	}
 	ep->dst = &rt->dst;
 
+	rcu_read_lock();
 	neigh = dst_get_neighbour(ep->dst);
 
 	/* get a l2t entry */
@@ -2339,6 +2344,7 @@ int c4iw_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param)
 		ep->retry_with_mpa_v1 = 0;
 		ep->tried_with_mpa_v1 = 0;
 	}
+	rcu_read_unlock();
 	if (!ep->l2t) {
 		printk(KERN_ERR MOD "%s - cannot alloc l2e.\n", __func__);
 		err = -ENOMEM;
diff --git a/drivers/infiniband/hw/nes/nes_cm.c b/drivers/infiniband/hw/nes/nes_cm.c
index dfce9ea..0a52d72 100644
--- a/drivers/infiniband/hw/nes/nes_cm.c
+++ b/drivers/infiniband/hw/nes/nes_cm.c
@@ -1377,9 +1377,11 @@ static int nes_addr_resolve_neigh(struct nes_vnic *nesvnic, u32 dst_ip, int arpi
 		neigh_release(neigh);
 	}
 
-	if ((neigh == NULL) || (!(neigh->nud_state & NUD_VALID)))
+	if ((neigh == NULL) || (!(neigh->nud_state & NUD_VALID))) {
+		rcu_read_lock();
 		neigh_event_send(dst_get_neighbour(&rt->dst), NULL);
-
+		rcu_read_unlock();
+	}
 	ip_rt_put(rt);
 	return rc;
 }
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 7567b60..ef38848 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -555,6 +555,7 @@ static int path_rec_start(struct net_device *dev,
 	return 0;
 }
 
+/* called with rcu_read_lock */
 static void neigh_add_path(struct sk_buff *skb, struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(dev);
@@ -636,6 +637,7 @@ err_drop:
 	spin_unlock_irqrestore(&priv->lock, flags);
 }
 
+/* called with rcu_read_lock */
 static void ipoib_path_lookup(struct sk_buff *skb, struct net_device *dev)
 {
 	struct ipoib_dev_priv *priv = netdev_priv(skb->dev);
@@ -720,13 +722,14 @@ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct neighbour *n = NULL;
 	unsigned long flags;
 
+	rcu_read_lock();
 	if (likely(skb_dst(skb)))
 		n = dst_get_neighbour(skb_dst(skb));
 
 	if (likely(n)) {
 		if (unlikely(!*to_ipoib_neigh(n))) {
 			ipoib_path_lookup(skb, dev);
-			return NETDEV_TX_OK;
+			goto unlock;
 		}
 
 		neigh = *to_ipoib_neigh(n);
@@ -749,17 +752,17 @@ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
 			ipoib_neigh_free(dev, neigh);
 			spin_unlock_irqrestore(&priv->lock, flags);
 			ipoib_path_lookup(skb, dev);
-			return NETDEV_TX_OK;
+			goto unlock;
 		}
 
 		if (ipoib_cm_get(neigh)) {
 			if (ipoib_cm_up(neigh)) {
 				ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
-				return NETDEV_TX_OK;
+				goto unlock;
 			}
 		} else if (neigh->ah) {
 			ipoib_send(dev, skb, neigh->ah, IPOIB_QPN(n->ha));
-			return NETDEV_TX_OK;
+			goto unlock;
 		}
 
 		if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) {
@@ -793,13 +796,14 @@ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
 					   phdr->hwaddr + 4);
 				dev_kfree_skb_any(skb);
 				++dev->stats.tx_dropped;
-				return NETDEV_TX_OK;
+				goto unlock;
 			}
 
 			unicast_arp_send(skb, dev, phdr);
 		}
 	}
-
+unlock:
+	rcu_read_unlock();
 	return NETDEV_TX_OK;
 }
 
@@ -837,7 +841,7 @@ static int ipoib_hard_header(struct sk_buff *skb,
 	dst = skb_dst(skb);
 	n = NULL;
 	if (dst)
-		n = dst_get_neighbour(dst);
+		n = dst_get_neighbour_raw(dst);
 	if ((!dst || !n) && daddr) {
 		struct ipoib_pseudoheader *phdr =
 			(struct ipoib_pseudoheader *) skb_push(skb, sizeof *phdr);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index 1b7a976..cad1894 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -266,7 +266,7 @@ static int ipoib_mcast_join_finish(struct ipoib_mcast *mcast,
 
 		skb->dev = dev;
 		if (dst)
-			n = dst_get_neighbour(dst);
+			n = dst_get_neighbour_raw(dst);
 		if (!dst || !n) {
 			/* put pseudoheader back on for next time */
 			skb_push(skb, sizeof (struct ipoib_pseudoheader));
@@ -722,6 +722,8 @@ out:
 	if (mcast && mcast->ah) {
 		struct dst_entry *dst = skb_dst(skb);
 		struct neighbour *n = NULL;
+
+		rcu_read_lock();
 		if (dst)
 			n = dst_get_neighbour(dst);
 		if (n && !*to_ipoib_neigh(n)) {
@@ -734,7 +736,7 @@ out:
 				list_add_tail(&neigh->list, &mcast->neigh_list);
 			}
 		}
-
+		rcu_read_unlock();
 		spin_unlock_irqrestore(&priv->lock, flags);
 		ipoib_send(dev, skb, mcast->ah, IB_MULTICAST_QPN);
 		return;


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: RCU'ed dst_get_neighbour()
From: Roland Dreier @ 2011-11-29 21:35 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Marc Aurele La France, David Miller, netdev, linux-rdma
In-Reply-To: <1322602283.2596.25.camel@edumazet-laptop>

On Tue, Nov 29, 2011 at 1:31 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Here is the result of this audit, please double check and test it, I
> only compiled this.

Thanks Eric... I'll queue this up and send it on once we get a good
report from Marc.

Thanks!
  Roland

^ permalink raw reply

* Re: [RFC PATCH 00/18] netfilter: IPv6 NAT
From: Krzysztof Olędzki @ 2011-11-29 21:38 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Ulrich Weber, Amos Jeffries, sclark46@earthlink.net,
	kaber@trash.net, netfilter-devel@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <alpine.LNX.2.01.1111291255200.20965@frira.zrqbmnf.qr>

On 2011-11-29 13:23, Jan Engelhardt wrote:
>
> On Tuesday 2011-11-29 10:19, Ulrich Weber wrote:
>> On 28.11.2011 23:03, Amos Jeffries wrote:
>>> I'm going to dare to call FUD on those statements...
>>>    * Load Balancing - what is preventing your routing rules or packet
>>>   marking using the same criteria as the NAT changer? nothing. Load
>>>   balancing works perfectly fine without NAT.
>
> Source address selection, having to occur on the source, would
> require that the source has to know all the parameters that a {what
> would have been your NAT GW} would need to know, which means you have
> to (a) collect and/or (b) distribute this information. Given two
> uplinks that only allow a certain source network address (different
> for each uplink), combined with the desire to balance on utilization,
> (a) a client is not in the position to easily obtain this data unless
> it is the router for all participants itself, (b) the clients needs
> to cooperate, and one cannot always trust client devices, or hope for
> their technical cooperation (firewalled themselves off).
>
> Yes, NAT is evil, but if you actually think about it, policies are
> best applied where [the policy] originates from. After all, we also
> don't do LSRR, instead, routers do the routing, because they just
> know much better.
>
>> I fully agree. NAT can not replace your firewall rules.
>>
>> However with NAT you could get some kind of anonymity.
>
> Same network prefix, some cookies, or a login form. Blam, identified,
> or at least (Almost-)Uniquely Identified Visitor tagging.

But without NAT you have pretty big chance to have the same IPv6 
*suffix* everywhere, based on you MAC address. In your Home, your Work, 
in a Cafe or in a hotel during your vacations in Portugal. So yes, NAT 
is not a perfect solution but it really helps you privacy.

Best regards,

			Krzysztof Olędzki

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: is non-inheritance of congestion control algorithm from the listen socket a bug or a feature?
From: Eric Dumazet @ 2011-11-29 21:46 UTC (permalink / raw)
  To: Yuchung Cheng; +Cc: Rick Jones, netdev
In-Reply-To: <CAK6E8=djWDn=ZvD0XLxex3vOfsDjCT64k+DwDSNp6+tLbPen2A@mail.gmail.com>

Le mardi 29 novembre 2011 à 13:20 -0800, Yuchung Cheng a écrit :
> I actually think it's a feature :)
> 
> I find it awkward to set CC on listening socket. And current document
> defines the sysctl well
> 
> tcp_congestion_control - STRING
>         Set the congestion control algorithm to be used for new
>         connections. The algorithm "reno" is always available, but
>         additional choices may be available based on kernel configuration.
>         Default is set as part of kernel configuration.

This might be a feature, but contradicts most socket options set on
listener and inherited by a child socket on accept()

tcp_congestion_control should be system wide default.

Anyway, "man 7 tcp" doesnt document TCP_CONGESTION, so we are not
supposed to play with it ;)

Oh well...

^ permalink raw reply

* Re: [PATCH v2] netlabel: Fix build problems when IPv6 is not enabled
From: David Miller @ 2011-11-29 21:49 UTC (permalink / raw)
  To: rdunlap; +Cc: pmoore, netdev, linux-next, linux-kernel
In-Reply-To: <4ED555E4.4000006@xenotime.net>

From: Randy Dunlap <rdunlap@xenotime.net>
Date: Tue, 29 Nov 2011 14:00:04 -0800

> On 11/29/2011 12:10 PM, Paul Moore wrote:
>> A recent fix to the the NetLabel code caused build problem with
>> configurations that did not have IPv6 enabled; see below:
>> 
>>  netlabel_kapi.c: In function 'netlbl_cfg_unlbl_map_add':
>>  netlabel_kapi.c:165:4:
>>   error: implicit declaration of function 'netlbl_af6list_add'
>> 
>> This patch fixes this problem by making the IPv6 specific code conditional
>> on the IPv6 configuration flags as we done in the rest of NetLabel and the
>> network stack as a whole.  We have to move some variable declarations
>> around as a result so things may not be quite as pretty, but at least it
>> builds cleanly now.
>> 
>> Some additional IPv6 conditionals were added to the NetLabel code as well
>> for the sake of consistency.
>> 
>> Reported-by: Randy Dunlap <rdunlap@xenotime.net>
>> Signed-off-by: Paul Moore <pmoore@redhat.com>
> 
> Acked-by: Randy Dunlap <rdunlap@xenotime.net>

Applied, thanks everyone.

^ permalink raw reply

* Re: is non-inheritance of congestion control algorithm from the listen socket a bug or a feature?
From: David Miller @ 2011-11-29 21:52 UTC (permalink / raw)
  To: eric.dumazet; +Cc: ycheng, rick.jones2, netdev
In-Reply-To: <1322603175.2596.31.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 29 Nov 2011 22:46:15 +0100

> Le mardi 29 novembre 2011 à 13:20 -0800, Yuchung Cheng a écrit :
>> I actually think it's a feature :)
>> 
>> I find it awkward to set CC on listening socket. And current document
>> defines the sysctl well
>> 
>> tcp_congestion_control - STRING
>>         Set the congestion control algorithm to be used for new
>>         connections. The algorithm "reno" is always available, but
>>         additional choices may be available based on kernel configuration.
>>         Default is set as part of kernel configuration.
> 
> This might be a feature, but contradicts most socket options set on
> listener and inherited by a child socket on accept()

There is really no reason to keep the current behavior.

If an application sets the congestion control algorithm on a listening
socket to a non-default value, what effect could possibly be intended?

Congestion control doesn't even come into play at all on a listening
socket, therefore the only logical expectation is that it inherits to
the child.

The only other logical behavior would be to forbid this operation on a
listening socket, since it has no effect, but that doesn't make any
sense now does it? :-)

^ permalink raw reply

* [PATCH net-next] bnx2: Support for byte queue limits
From: Eric Dumazet @ 2011-11-29 21:53 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Tom Herbert

Changes to bnx2 to use byte queue limits.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Tom Herbert <therbert@google.com>
---
 drivers/net/ethernet/broadcom/bnx2.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c
index d573169..787e175 100644
--- a/drivers/net/ethernet/broadcom/bnx2.c
+++ b/drivers/net/ethernet/broadcom/bnx2.c
@@ -2810,6 +2810,7 @@ bnx2_tx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
 	struct bnx2_tx_ring_info *txr = &bnapi->tx_ring;
 	u16 hw_cons, sw_cons, sw_ring_cons;
 	int tx_pkt = 0, index;
+	unsigned int tx_bytes = 0;
 	struct netdev_queue *txq;
 
 	index = (bnapi - bp->bnx2_napi);
@@ -2864,6 +2865,7 @@ bnx2_tx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
 
 		sw_cons = NEXT_TX_BD(sw_cons);
 
+		tx_bytes += skb->len;
 		dev_kfree_skb(skb);
 		tx_pkt++;
 		if (tx_pkt == budget)
@@ -2873,6 +2875,7 @@ bnx2_tx_int(struct bnx2 *bp, struct bnx2_napi *bnapi, int budget)
 			hw_cons = bnx2_get_hw_tx_cons(bnapi);
 	}
 
+	netdev_tx_completed_queue(txq, tx_pkt, tx_bytes);
 	txr->hw_tx_cons = hw_cons;
 	txr->tx_cons = sw_cons;
 
@@ -5393,6 +5396,7 @@ bnx2_free_tx_skbs(struct bnx2 *bp)
 			}
 			dev_kfree_skb(skb);
 		}
+		netdev_tx_reset_queue(netdev_get_tx_queue(bp->dev, i));
 	}
 }
 
@@ -6546,6 +6550,8 @@ bnx2_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	}
 	txbd->tx_bd_vlan_tag_flags |= TX_BD_FLAGS_END;
 
+	netdev_tx_sent_queue(txq, skb->len);
+
 	prod = NEXT_TX_BD(prod);
 	txr->tx_prod_bseq += skb->len;
 

^ permalink raw reply related

* Re: is non-inheritance of congestion control algorithm from the listen socket a bug or a feature?
From: Eric Dumazet @ 2011-11-29 21:56 UTC (permalink / raw)
  To: David Miller; +Cc: ycheng, rick.jones2, netdev
In-Reply-To: <20111129.165205.91103035999089185.davem@davemloft.net>

Le mardi 29 novembre 2011 à 16:52 -0500, David Miller a écrit :

> There is really no reason to keep the current behavior.
> 
> If an application sets the congestion control algorithm on a listening
> socket to a non-default value, what effect could possibly be intended?
> 
> Congestion control doesn't even come into play at all on a listening
> socket, therefore the only logical expectation is that it inherits to
> the child.
> 
> The only other logical behavior would be to forbid this operation on a
> listening socket, since it has no effect, but that doesn't make any
> sense now does it? :-)

Moreover, an application can use setsockopt(TCP_CONGESTION) before
calling listen() (while socket is still in CLOSE state)

^ permalink raw reply

* Re: [PATCH v2] netlabel: Fix build problems when IPv6 is not enabled
From: Randy Dunlap @ 2011-11-29 22:00 UTC (permalink / raw)
  To: Paul Moore; +Cc: netdev, linux-next, linux-kernel
In-Reply-To: <20111129201054.20141.86401.stgit@sifl>

On 11/29/2011 12:10 PM, Paul Moore wrote:
> A recent fix to the the NetLabel code caused build problem with
> configurations that did not have IPv6 enabled; see below:
> 
>  netlabel_kapi.c: In function 'netlbl_cfg_unlbl_map_add':
>  netlabel_kapi.c:165:4:
>   error: implicit declaration of function 'netlbl_af6list_add'
> 
> This patch fixes this problem by making the IPv6 specific code conditional
> on the IPv6 configuration flags as we done in the rest of NetLabel and the
> network stack as a whole.  We have to move some variable declarations
> around as a result so things may not be quite as pretty, but at least it
> builds cleanly now.
> 
> Some additional IPv6 conditionals were added to the NetLabel code as well
> for the sake of consistency.
> 
> Reported-by: Randy Dunlap <rdunlap@xenotime.net>
> Signed-off-by: Paul Moore <pmoore@redhat.com>

Acked-by: Randy Dunlap <rdunlap@xenotime.net>

Thanks.

> ---
>  net/netlabel/netlabel_kapi.c |   22 ++++++++++++++--------
>  1 files changed, 14 insertions(+), 8 deletions(-)
> 
> diff --git a/net/netlabel/netlabel_kapi.c b/net/netlabel/netlabel_kapi.c
> index 3735297..5952237 100644
> --- a/net/netlabel/netlabel_kapi.c
> +++ b/net/netlabel/netlabel_kapi.c
> @@ -111,8 +111,6 @@ int netlbl_cfg_unlbl_map_add(const char *domain,
>  	struct netlbl_domaddr_map *addrmap = NULL;
>  	struct netlbl_domaddr4_map *map4 = NULL;
>  	struct netlbl_domaddr6_map *map6 = NULL;
> -	const struct in_addr *addr4, *mask4;
> -	const struct in6_addr *addr6, *mask6;
>  
>  	entry = kzalloc(sizeof(*entry), GFP_ATOMIC);
>  	if (entry == NULL)
> @@ -133,9 +131,9 @@ int netlbl_cfg_unlbl_map_add(const char *domain,
>  		INIT_LIST_HEAD(&addrmap->list6);
>  
>  		switch (family) {
> -		case AF_INET:
> -			addr4 = addr;
> -			mask4 = mask;
> +		case AF_INET: {
> +			const struct in_addr *addr4 = addr;
> +			const struct in_addr *mask4 = mask;
>  			map4 = kzalloc(sizeof(*map4), GFP_ATOMIC);
>  			if (map4 == NULL)
>  				goto cfg_unlbl_map_add_failure;
> @@ -148,9 +146,11 @@ int netlbl_cfg_unlbl_map_add(const char *domain,
>  			if (ret_val != 0)
>  				goto cfg_unlbl_map_add_failure;
>  			break;
> -		case AF_INET6:
> -			addr6 = addr;
> -			mask6 = mask;
> +			}
> +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
> +		case AF_INET6: {
> +			const struct in6_addr *addr6 = addr;
> +			const struct in6_addr *mask6 = mask;
>  			map6 = kzalloc(sizeof(*map6), GFP_ATOMIC);
>  			if (map6 == NULL)
>  				goto cfg_unlbl_map_add_failure;
> @@ -167,6 +167,8 @@ int netlbl_cfg_unlbl_map_add(const char *domain,
>  			if (ret_val != 0)
>  				goto cfg_unlbl_map_add_failure;
>  			break;
> +			}
> +#endif /* IPv6 */
>  		default:
>  			goto cfg_unlbl_map_add_failure;
>  			break;
> @@ -225,9 +227,11 @@ int netlbl_cfg_unlbl_static_add(struct net *net,
>  	case AF_INET:
>  		addr_len = sizeof(struct in_addr);
>  		break;
> +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
>  	case AF_INET6:
>  		addr_len = sizeof(struct in6_addr);
>  		break;
> +#endif /* IPv6 */
>  	default:
>  		return -EPFNOSUPPORT;
>  	}
> @@ -266,9 +270,11 @@ int netlbl_cfg_unlbl_static_del(struct net *net,
>  	case AF_INET:
>  		addr_len = sizeof(struct in_addr);
>  		break;
> +#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
>  	case AF_INET6:
>  		addr_len = sizeof(struct in6_addr);
>  		break;
> +#endif /* IPv6 */
>  	default:
>  		return -EPFNOSUPPORT;
>  	}
> 


-- 
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply

* Re: is non-inheritance of congestion control algorithm from the listen socket a bug or a feature?
From: Stephen Hemminger @ 2011-11-29 22:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, ycheng, rick.jones2, netdev
In-Reply-To: <1322603786.2596.36.camel@edumazet-laptop>

On Tue, 29 Nov 2011 22:56:26 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le mardi 29 novembre 2011 à 16:52 -0500, David Miller a écrit :
> 
> > There is really no reason to keep the current behavior.
> > 
> > If an application sets the congestion control algorithm on a listening
> > socket to a non-default value, what effect could possibly be intended?
> > 
> > Congestion control doesn't even come into play at all on a listening
> > socket, therefore the only logical expectation is that it inherits to
> > the child.
> > 
> > The only other logical behavior would be to forbid this operation on a
> > listening socket, since it has no effect, but that doesn't make any
> > sense now does it? :-)
> 
> Moreover, an application can use setsockopt(TCP_CONGESTION) before
> calling listen() (while socket is still in CLOSE state)

Agreed, it was just an oversight of the initial design.
The setsockopt() on the listening socket is ignored.

^ permalink raw reply

* [PATCH 1/1] IPVS: Modify the SH scheduler to use weights
From: Michael Maxim @ 2011-11-29 22:02 UTC (permalink / raw)
  To: Wensong Zhang, Simon Horman, Julian Anastasov, Pablo Neira Ayuso,
	Patrick McHardy, David S. Miller, netdev, lvs-devel
  Cc: Mike Maxim

Modify the algorithm to build the source hashing hash table to add
extra slots for destinations with higher weight. This has the effect
of allowing an IPVS SH user to give more connections to hosts that
have been configured to have a higher weight.

Signed-off-by: Michael Maxim <mike@okcupid.com>
---
 net/netfilter/ipvs/Kconfig    |   15 +++++++++++++++
 net/netfilter/ipvs/ip_vs_sh.c |   20 ++++++++++++++++++--
 2 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/ipvs/Kconfig b/net/netfilter/ipvs/Kconfig
index 70bd1d0..af4c0b8 100644
--- a/net/netfilter/ipvs/Kconfig
+++ b/net/netfilter/ipvs/Kconfig
@@ -232,6 +232,21 @@ config	IP_VS_NQ
 	  If you want to compile it in kernel, say Y. To compile it as a
 	  module, choose M here. If unsure, say N.
 
+comment 'IPVS SH scheduler'
+
+config IP_VS_SH_TAB_BITS
+	int "IPVS source hashing table size (the Nth power of 2)"
+	range 4 20
+	default 8
+	---help---
+	  The source hashing scheduler maps source IPs to destinations
+	  stored in a hash table. This table is tiled by each destination
+	  until all slots in the table are filled. When using weights to
+	  allow destinations to receive more connections, the table is
+	  tiled an amount proportional to the weights specified. The table
+	  needs to be large enough to effectively fit all the destinations
+	  multiplied by their respective weights.
+
 comment 'IPVS application helper'
 
 config	IP_VS_FTP
diff --git a/net/netfilter/ipvs/ip_vs_sh.c b/net/netfilter/ipvs/ip_vs_sh.c
index 33815f4..e0ca520 100644
--- a/net/netfilter/ipvs/ip_vs_sh.c
+++ b/net/netfilter/ipvs/ip_vs_sh.c
@@ -30,6 +30,11 @@
  * server is dead or overloaded, the load balancer can bypass the cache
  * server and send requests to the original server directly.
  *
+ * The weight destination attribute can be used to control the
+ * distribution of connections to the destinations in servernode. The
+ * greater the weight, the more connections the destination
+ * will receive.
+ *
  */
 
 #define KMSG_COMPONENT "IPVS"
@@ -99,9 +104,11 @@ ip_vs_sh_assign(struct ip_vs_sh_bucket *tbl, struct ip_vs_service *svc)
 	struct ip_vs_sh_bucket *b;
 	struct list_head *p;
 	struct ip_vs_dest *dest;
+	int d_count;
 
 	b = tbl;
 	p = &svc->destinations;
+	d_count = 0;
 	for (i=0; i<IP_VS_SH_TAB_SIZE; i++) {
 		if (list_empty(p)) {
 			b->dest = NULL;
@@ -113,14 +120,23 @@ ip_vs_sh_assign(struct ip_vs_sh_bucket *tbl, struct ip_vs_service *svc)
 			atomic_inc(&dest->refcnt);
 			b->dest = dest;
 
-			p = p->next;
+			IP_VS_DBG_BUF(6, "assigned i: %d dest: %s weight: %d\n",
+				      i, IP_VS_DBG_ADDR(svc->af, &dest->addr),
+				      atomic_read(&dest->weight));
+
+			/* Don't move to next dest until filling weight */
+			if (++d_count >= atomic_read(&dest->weight)) {
+				p = p->next;
+				d_count = 0;
+			}
+
 		}
 		b++;
 	}
+
 	return 0;
 }
 

^ permalink raw reply related

* Re: [RFC PATCH 00/18] netfilter: IPv6 NAT
From: Eric Dumazet @ 2011-11-29 22:15 UTC (permalink / raw)
  To: Krzysztof Olędzki
  Cc: Jan Engelhardt, Ulrich Weber, Amos Jeffries,
	sclark46@earthlink.net, kaber@trash.net,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org
In-Reply-To: <4ED550E7.1090609@ans.pl>

Le mardi 29 novembre 2011 à 22:38 +0100, Krzysztof Olędzki a écrit :

> But without NAT you have pretty big chance to have the same IPv6 
> *suffix* everywhere, based on you MAC address. In your Home, your Work, 
> in a Cafe or in a hotel during your vacations in Portugal. So yes, NAT 
> is not a perfect solution but it really helps you privacy.
> 

Good point, but we can change MAC address (use a random one) on most
NIC, cant we ?

^ permalink raw reply

* Re: [RFC PATCH 00/18] netfilter: IPv6 NAT
From: Jan Engelhardt @ 2011-11-29 22:21 UTC (permalink / raw)
  To: Krzysztof Olędzki
  Cc: Ulrich Weber, Amos Jeffries, sclark46@earthlink.net,
	kaber@trash.net, netfilter-devel@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <4ED550E7.1090609@ans.pl>


On Tuesday 2011-11-29 22:38, Krzysztof Olędzki wrote:
>>
>> Same network prefix, some cookies, or a login form. Blam, identified,
>> or at least (Almost-)Uniquely Identified Visitor tagging.
>
> But without NAT you have pretty big chance to have the same IPv6 *suffix*
> everywhere, based on you MAC address.

Everywhere? No, one small village of indomitable Gauls.^^^^^^^^W

$ ip a
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 00:0d:93:9e:08:78 brd ff:ff:ff:ff:ff:ff
    inet6 2001:638:600:8810:d070:3a36:464e:b3db/64 scope global temporary dynamic 
       valid_lft 583732sec preferred_lft 64732sec
    inet6 2001:638:600:8810:d9f5:18f5:4fc1:9a20/64 scope global temporary deprecated dynamic 
       valid_lft 497938sec preferred_lft 0sec
    [...]

Same suffix? Certainly not with contemporary configurations (and
Linux did this quite on its own there). In fact, now that there is
almost v6-NAT in the kernel, I fear that users who are blinded by NAT
now make the problem worse by actually feeding perfectly good Privacy
Extension Addresses into a n:1-configured SNAT/MASQUERADE target
instead of a NETMAP.

> In your Home, your Work, in a Cafe or in
> a hotel during your vacations in Portugal.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox