Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next 09/10] net/mlx4_en: Manage flow steering rules with ethtool
From: Or Gerlitz @ 2012-07-03  8:14 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: David Laight, Joe Perches, Ben Hutchings, davem, roland, yevgenyp,
	oren, netdev, Hadar Hen Zion, Amir Vadai
In-Reply-To: <m28vf2o0oa.fsf@igel.home>

On 7/2/2012 2:35 PM, Andreas Schwab wrote:
>
> 	field == 0 || field == (typeof field)~(typeof field)0
> You can avoid that by using (typeof field)-1.
>

OK, thanks everybody, we will take that path.

Or.

^ permalink raw reply

* [PATCH] qlge: fix endian issue
From: roy.qing.li @ 2012-07-03  8:47 UTC (permalink / raw)
  To: netdev; +Cc: ron.mercer

From: RongQing.Li <roy.qing.li@gmail.com>

commit 6d29b1ef introduces a bug, ntohs is __be16_to_cpu,
not cpu_to_be16.

We always use htons on IP_OFFSET and IP_MF, then compare
with network package.

Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
---
 drivers/net/ethernet/qlogic/qlge/qlge_main.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_main.c b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
index 09d8d33..7c520fa 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge_main.c
+++ b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
@@ -1546,7 +1546,7 @@ static void ql_process_mac_rx_page(struct ql_adapter *qdev,
 			struct iphdr *iph =
 				(struct iphdr *) ((u8 *)addr + ETH_HLEN);
 			if (!(iph->frag_off &
-				cpu_to_be16(IP_MF|IP_OFFSET))) {
+				htons(IP_MF|IP_OFFSET))) {
 				skb->ip_summed = CHECKSUM_UNNECESSARY;
 				netif_printk(qdev, rx_status, KERN_DEBUG,
 					     qdev->ndev,
@@ -1654,7 +1654,7 @@ static void ql_process_mac_rx_skb(struct ql_adapter *qdev,
 			/* Unfragmented ipv4 UDP frame. */
 			struct iphdr *iph = (struct iphdr *) skb->data;
 			if (!(iph->frag_off &
-				ntohs(IP_MF|IP_OFFSET))) {
+				htons(IP_MF|IP_OFFSET))) {
 				skb->ip_summed = CHECKSUM_UNNECESSARY;
 				netif_printk(qdev, rx_status, KERN_DEBUG,
 					     qdev->ndev,
@@ -1968,7 +1968,7 @@ static void ql_process_mac_split_rx_intr(struct ql_adapter *qdev,
 		/* Unfragmented ipv4 UDP frame. */
 			struct iphdr *iph = (struct iphdr *) skb->data;
 			if (!(iph->frag_off &
-				ntohs(IP_MF|IP_OFFSET))) {
+				htons(IP_MF|IP_OFFSET))) {
 				skb->ip_summed = CHECKSUM_UNNECESSARY;
 				netif_printk(qdev, rx_status, KERN_DEBUG, qdev->ndev,
 					     "TCP checksum done!\n");
-- 
1.7.1

^ permalink raw reply related

* Re: [PATCH net-next 09/10] net/mlx4_en: Manage flow steering rules with ethtool
From: Or Gerlitz @ 2012-07-03  8:56 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: davem, roland, Yevgeny Petrilin, Oren Duer, netdev,
	Hadar Hen Zion, Amir Vadai
In-Reply-To: <1341280041.2590.39.camel@bwh-desktop.uk.solarflarecom.com>

On 7/3/2012 4:47 AM, Ben Hutchings wrote:
> If the hardware can only match the VID then you need to validate that
> the mask is either 0 or 0xfff instead of 0 or 0xffff.

sure, will fix

^ permalink raw reply

* Re: [PATCH 00/12] Swap-over-NFS without deadlocking V8
From: Mel Gorman @ 2012-07-03  8:58 UTC (permalink / raw)
  To: Eric B Munson
  Cc: Andrew Morton, Linux-MM, Linux-Netdev, Linux-NFS, LKML,
	David Miller, Trond Myklebust, Neil Brown, Christoph Hellwig,
	Peter Zijlstra, Mike Christie, Sebastian Andrzej Siewior
In-Reply-To: <20120703001051.GA5508@mgebm.net>

On Mon, Jul 02, 2012 at 08:10:51PM -0400, Eric B Munson wrote:
> On Mon, 02 Jul 2012, Mel Gorman wrote:
> 
> > On Sun, Jul 01, 2012 at 01:22:54PM -0400, Eric B Munson wrote:
> > > On Fri, 29 Jun 2012, Mel Gorman wrote:
> > > 
> > > > Changelog since V7
> > > >   o Rebase to linux-next 20120629
> > > >   o bi->page_dma instead of bi->page in intel driver
> > > >   o Build fix for !CONFIG_NET					(sebastian)
> > > >   o Restore PF_MEMALLOC flags correctly in all cases		(jlayton)
> > > > 
> > > > Changelog since V6
> > > >   o Rebase to linux-next 20120622
> > > > 
> > > > Changelog since V5
> > > >   o Rebase to v3.5-rc3
> > > > 
> > > > Changelog since V4
> > > >   o Catch if SOCK_MEMALLOC flag is cleared with rmem tokens	(davem)
> > > > 
> > > > Changelog since V3
> > > >   o Rebase to 3.4-rc5
> > > >   o kmap pages for writing to swap				(akpm)
> > > >   o Move forward declaration to reduce chance of duplication	(akpm)
> > > > 
> > > > Changelog since V2
> > > >   o Nothing significant, just rebases. A radix tree lookup is replaced with
> > > >     a linear search would be the biggest rebase artifact
> > > > 
> > > > This patch series is based on top of "Swap-over-NBD without deadlocking v14"
> > > > as it depends on the same reservation of PF_MEMALLOC reserves logic.
> > > > 
> > > > When a user or administrator requires swap for their application, they
> > > > create a swap partition and file, format it with mkswap and activate it with
> > > > swapon. In diskless systems this is not an option so if swap if required
> > > > then swapping over the network is considered.  The two likely scenarios
> > > > are when blade servers are used as part of a cluster where the form factor
> > > > or maintenance costs do not allow the use of disks and thin clients.
> > > > 
> > > > The Linux Terminal Server Project recommends the use of the Network
> > > > Block Device (NBD) for swap but this is not always an option.  There is
> > > > no guarantee that the network attached storage (NAS) device is running
> > > > Linux or supports NBD. However, it is likely that it supports NFS so there
> > > > are users that want support for swapping over NFS despite any performance
> > > > concern. Some distributions currently carry patches that support swapping
> > > > over NFS but it would be preferable to support it in the mainline kernel.
> > > > 
> > > > Patch 1 avoids a stream-specific deadlock that potentially affects TCP.
> > > > 
> > > > Patch 2 is a small modification to SELinux to avoid using PFMEMALLOC
> > > > 	reserves.
> > > > 
> > > > Patch 3 adds three helpers for filesystems to handle swap cache pages.
> > > > 	For example, page_file_mapping() returns page->mapping for
> > > > 	file-backed pages and the address_space of the underlying
> > > > 	swap file for swap cache pages.
> > > > 
> > > > Patch 4 adds two address_space_operations to allow a filesystem
> > > > 	to pin all metadata relevant to a swapfile in memory. Upon
> > > > 	successful activation, the swapfile is marked SWP_FILE and
> > > > 	the address space operation ->direct_IO is used for writing
> > > > 	and ->readpage for reading in swap pages.
> > > > 
> > > > Patch 5 notes that patch 3 is bolting
> > > > 	filesystem-specific-swapfile-support onto the side and that
> > > > 	the default handlers have different information to what
> > > > 	is available to the filesystem. This patch refactors the
> > > > 	code so that there are generic handlers for each of the new
> > > > 	address_space operations.
> > > > 
> > > > Patch 6 adds an API to allow a vector of kernel addresses to be
> > > > 	translated to struct pages and pinned for IO.
> > > > 
> > > > Patch 7 adds support for using highmem pages for swap by kmapping
> > > > 	the pages before calling the direct_IO handler.
> > > > 
> > > > Patch 8 updates NFS to use the helpers from patch 3 where necessary.
> > > > 
> > > > Patch 9 avoids setting PF_private on PG_swapcache pages within NFS.
> > > > 
> > > > Patch 10 implements the new swapfile-related address_space operations
> > > > 	for NFS and teaches the direct IO handler how to manage
> > > > 	kernel addresses.
> > > > 
> > > > Patch 11 prevents page allocator recursions in NFS by using GFP_NOIO
> > > > 	where appropriate.
> > > > 
> > > > Patch 12 fixes a NULL pointer dereference that occurs when using
> > > > 	swap-over-NFS.
> > > > 
> > > > With the patches applied, it is possible to mount a swapfile that is on an
> > > > NFS filesystem. Swap performance is not great with a swap stress test taking
> > > > roughly twice as long to complete than if the swap device was backed by NBD.
> > > 
> > > To test this set I am using memory cgroups to force swap usage.  I am seeing
> > > the cgroup controller killing my processes instead of using the nfs swapfile.
> > > 
> > 
> > How sure are you that this is not a cgroup bug? For dirty file data on some
> > kernels, cgroups can prematurely kill processes if pages are not being
> > cleaned fast enough. I would not expect the same problem for anonymous
> > pages but it's worth considering. Please also test with a normal swapfile.
> > 
> > If OOM is disabled and the process hangs, try capturing a sysrq+t and
> > see where the process is stuck.
> > 
> 
> It looks like the problem is with cgroups, when I run without cgroups and limit
> memory on the boot command line everything works fine.  To test I limited the
> machine to 1G of ram then ran several memory benchmarks with work set sizes of
> 1.5G, all completed successfully with my swap file located on an NFS share.
> 
> Tested-by: Eric B Munson <emunson@mgebm.net>

Thanks a lot for testing.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH net-next 09/10] net/mlx4_en: Manage flow steering rules with ethtool
From: Or Gerlitz @ 2012-07-03  8:58 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: davem, roland, Yevgeny Petrilin, Oren Duer, netdev,
	Hadar Hen Zion, Amir Vadai
In-Reply-To: <1341280041.2590.39.camel@bwh-desktop.uk.solarflarecom.com>

On 7/3/2012 4:47 AM, Ben Hutchings wrote:
>> Under this logic, we can use the values and not the masks, isn't that?
> No, it's perfectly valid to specify a filter that matches, for example,
> a destination IP address of 0.0.0.0 with mask of 255.255.255.255.  So
> you really need to check the mask.  If your filter hardware doesn't
> support zero values for some fields then you'll need to reject them in
> mlx4_en_validate_flow.

Got it, will change to use masks all over the place, as you pointed out 
we need to do.

^ permalink raw reply

* Re: [PATCH net-next 09/10] net/mlx4_en: Manage flow steering rules with ethtool
From: Or Gerlitz @ 2012-07-03  9:00 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: davem, roland, yevgenyp, oren, netdev, Hadar Hen Zion, Amir Vadai
In-Reply-To: <1341158452.4852.107.camel@deadeye.wl.decadent.org.uk>

On 7/1/2012 7:00 PM, Ben Hutchings wrote:
>> +#define not_all_zeros_or_all_ones(field, type) \
>> >+	(field && (type)~field)
>> >+
>> >+static int mlx4_en_validate_flow(struct net_device *dev,
>> >+				 struct ethtool_rxnfc *cmd)
>> >+{
>> >+	struct ethtool_usrip4_spec *l3_mask;
>> >+	struct ethtool_tcpip4_spec *l4_mask;
>> >+	struct ethhdr *eth_mask;
>> >+	u64 full_mac = ~0ull;
>> >+	u64 zero_mac = 0;
>> >+
>> >+	if (cmd->fs.location >= MAX_NUM_OF_FS_RULES)
>> >+		return -EINVAL;
>> >+
>> >+	switch (cmd->fs.flow_type & ~FLOW_EXT) {
>> >+	case TCP_V4_FLOW:
>> >+	case UDP_V4_FLOW:
>> >+		if (cmd->fs.h_u.tcp_ip4_spec.tos)
>> >+			return -EOPNOTSUPP;
> I suspect that your filter ignores TOS, rather than only matching TOS ==
> 0, so you should actually be checking the corresponding field in the
> mask (fs.m_u). [...]

OK, thanks for pointing this over, will fix.

> >+		break;
> >+	case IP_USER_FLOW:
> >+		l3_mask = &cmd->fs.m_u.usr_ip4_spec;
> >+		if (cmd->fs.h_u.usr_ip4_spec.l4_4_bytes ||
> >+		    cmd->fs.h_u.usr_ip4_spec.tos ||
> I think this should be checking l4_4_bytes and tos in the mask.

OK

>
>> >+		    cmd->fs.h_u.usr_ip4_spec.proto ||
>> >+		    cmd->fs.h_u.usr_ip4_spec.ip_ver != ETH_RX_NFC_IP4 ||
>> >+		    (!cmd->fs.h_u.usr_ip4_spec.ip4src &&
>> >+		     !cmd->fs.h_u.usr_ip4_spec.ip4dst) ||
>> >+		    not_all_zeros_or_all_ones(l3_mask->ip4src, __be32) ||
>> >+		    not_all_zeros_or_all_ones(l3_mask->ip4dst, __be32))
>> >+			return -EOPNOTSUPP;
>> >+		break;
>> >+	case ETHER_FLOW:
>> >+		eth_mask = &cmd->fs.m_u.ether_spec;
>> >+		if (memcmp(eth_mask->h_source, &zero_mac, ETH_ALEN))
>> >+			return -EOPNOTSUPP;
>> >+		if (!memcmp(eth_mask->h_dest, &zero_mac, ETH_ALEN))
>> >+			return -EOPNOTSUPP;
> But in the next statement you test whether eth_mask->h_dest is either
> all-zeroes or all-ones.  Is all-zeroes valid or not?  I suspect you
> actually intend to reject the case where both h_dest and h_proto are masked out.

indeed, this code section can be better written, will fix for V1


>
>> >+		if (not_all_zeros_or_all_ones(eth_mask->h_proto, __be16) ||
>> >+		    (memcmp(eth_mask->h_dest, &zero_mac, ETH_ALEN) &&
>> >+		     memcmp(eth_mask->h_dest, &full_mac, ETH_ALEN)))
>> >+			return -EOPNOTSUPP;
>> >+		break;
>> >+	default:
>> >+		return -EOPNOTSUPP;
>> >+	}
>> >+
>> >+	if ((cmd->fs.flow_type & FLOW_EXT)) {
>> >+		if (cmd->fs.m_ext.vlan_etype ||
>> >+		    not_all_zeros_or_all_ones(cmd->fs.m_ext.vlan_tci,
>> >+					       __be16)) {
>> >+			return -EOPNOTSUPP;
>> >+		}
>> >+	}
>> >+
>> >+	return 0;
>> >+}

^ permalink raw reply

* Re: [PATCH] qlge: fix endian issue
From: David Miller @ 2012-07-03  9:22 UTC (permalink / raw)
  To: roy.qing.li; +Cc: netdev, ron.mercer
In-Reply-To: <1341305276-17053-1-git-send-email-roy.qing.li@gmail.com>

From: roy.qing.li@gmail.com
Date: Tue,  3 Jul 2012 16:47:56 +0800

> Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>

Is "RongQing.Li" really how you would write your name?

^ permalink raw reply

* [PATCH] netem: fix rate extension and drop accounting
From: Eric Dumazet @ 2012-07-03  9:25 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Hagen Paul Pfeifer, Yuchung Cheng, Andreas Terzis,
	Mark Gordon

From: Eric Dumazet <edumazet@google.com>

commit 7bc0f28c7a0c (netem: rate extension) did wrong maths when packet
is enqueued while queue is not empty.

Result is unexpected cumulative delays

# tc qd add dev eth0 root est 1sec 4sec netem delay 200ms rate 100kbit
# ping -i 0.1 172.30.42.18
PING 172.30.42.18 (172.30.42.18) 56(84) bytes of data.
64 bytes from 172.30.42.18: icmp_req=1 ttl=64 time=208 ms
64 bytes from 172.30.42.18: icmp_req=2 ttl=64 time=424 ms
64 bytes from 172.30.42.18: icmp_req=3 ttl=64 time=838 ms
64 bytes from 172.30.42.18: icmp_req=4 ttl=64 time=1142 ms
64 bytes from 172.30.42.18: icmp_req=5 ttl=64 time=1335 ms
64 bytes from 172.30.42.18: icmp_req=6 ttl=64 time=1949 ms
64 bytes from 172.30.42.18: icmp_req=7 ttl=64 time=2450 ms
64 bytes from 172.30.42.18: icmp_req=8 ttl=64 time=2840 ms
64 bytes from 172.30.42.18: icmp_req=9 ttl=64 time=3121 ms
64 bytes from 172.30.42.18: icmp_req=10 ttl=64 time=3291 ms
64 bytes from 172.30.42.18: icmp_req=11 ttl=64 time=3784 ms

This patch also fixes a double drop accounting in case packet is dropped
in tfifo_enqueue()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Andreas Terzis <aterzis@google.com>
Cc: Mark Gordon <msg@google.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
---
 net/sched/sch_netem.c |   14 ++------------
 1 file changed, 2 insertions(+), 12 deletions(-)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index a2a95aa..e8b5ac3 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -368,7 +368,6 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 	/* We don't fill cb now as skb_unshare() may invalidate it */
 	struct netem_skb_cb *cb;
 	struct sk_buff *skb2;
-	int ret;
 	int count = 1;
 
 	/* Random duplication */
@@ -443,14 +442,14 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 				 * calculate this time bonus and substract
 				 * from delay.
 				 */
-				delay -= now - netem_skb_cb(skb_peek(list))->time_to_send;
+				delay -= netem_skb_cb(skb_peek(list))->time_to_send - now;
 				now = netem_skb_cb(skb_peek_tail(list))->time_to_send;
 			}
 		}
 
 		cb->time_to_send = now + delay;
 		++q->counter;
-		ret = tfifo_enqueue(skb, sch);
+		return tfifo_enqueue(skb, sch);
 	} else {
 		/*
 		 * Do re-ordering by putting one out of N packets at the front
@@ -462,16 +461,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch)
 		__skb_queue_head(&sch->q, skb);
 		sch->qstats.backlog += qdisc_pkt_len(skb);
 		sch->qstats.requeues++;
-		ret = NET_XMIT_SUCCESS;
-	}
-
-	if (ret != NET_XMIT_SUCCESS) {
-		if (net_xmit_drop_count(ret)) {
-			sch->qstats.drops++;
-			return ret;
-		}
 	}
-
 	return NET_XMIT_SUCCESS;
 }
 

^ permalink raw reply related

* Re: BNX2 MIPS06 firmware in tree is 6.2.1, driver wants 6.2.3
From: Eric Dumazet @ 2012-07-03  9:30 UTC (permalink / raw)
  To: Tony Vroon; +Cc: Michael Chan, LKML, netdev, David S. Miller, Anthony G. Basile
In-Reply-To: <4FF2B64A.5020002@linx.net>

On Tue, 2012-07-03 at 10:07 +0100, Tony Vroon wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 03/07/12 10:05, Eric Dumazet wrote:
> > You're supposed to upgrade firmware blobs on your own.
> 
> But the firmware is in the kernel tree still, should the outdated
> files be removed?

I believe firmware/README.AddingFirmware contains the rationale.

^ permalink raw reply

* Re: [PATCH] qlge: fix endian issue
From: RongQing Li @ 2012-07-03  9:44 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, ron.mercer
In-Reply-To: <20120703.022250.295045256493986282.davem@davemloft.net>

2012/7/3 David Miller <davem@davemloft.net>:
> From: roy.qing.li@gmail.com
> Date: Tue,  3 Jul 2012 16:47:56 +0800
>
>> Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
>
> Is "RongQing.Li" really how you would write your name?

Yes

Chinese name, Li RongQing

I maybe should not add the dot.

-RongQing

^ permalink raw reply

* [PATCH 0/19] Disconnect neigh from dst_entry
From: David Miller @ 2012-07-03  9:45 UTC (permalink / raw)
  To: netdev


This finally severs neighbour table entries from dst_entry enough that
we no longer depend upon them outside of the individual protocols.

Besides being a major step towards making routing cache removal
practical, it also means an end to the infamous "neighbour table
overflow" condition.

Routes in ipv4 no longer refer to neighbour table entries, they are
used on an as-needed basis during packet output in a refcount-less
manner.

Therefore garbage collection is trivial since almost nothing actually
holds onto neighbour table references.

On the routing cache removal side, this set of changes removes another
dependency upon rt->rt_dst.  The only one left is for inetpeer, and
once that is severed (I have a rough plan for it) rt->rt_dst is finally
without any use and we can construct ipv4 routes directly inside of
FIB table entries.

Signed-off-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply

* [PATCH 01/19] ipv4: Make neigh lookups directly in output packet path.
From: David Miller @ 2012-07-03  9:45 UTC (permalink / raw)
  To: netdev


Do not use the dst cached neigh, we'll be getting rid of that.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/arp.h       |   28 +++++++++++++++++++---------
 include/net/neighbour.h |   11 +++++++++--
 net/core/neighbour.c    |   12 +++++++-----
 net/ipv4/ip_output.c    |   12 ++++++++----
 net/ipv4/route.c        |    6 +-----
 5 files changed, 44 insertions(+), 25 deletions(-)

diff --git a/include/net/arp.h b/include/net/arp.h
index 4a1f3fb..4617d98 100644
--- a/include/net/arp.h
+++ b/include/net/arp.h
@@ -15,24 +15,34 @@ static inline u32 arp_hashfn(u32 key, const struct net_device *dev, u32 hash_rnd
 	return val * hash_rnd;
 }
 
-static inline struct neighbour *__ipv4_neigh_lookup(struct net_device *dev, u32 key)
+static inline struct neighbour *__ipv4_neigh_lookup_noref(struct net_device *dev, u32 key)
 {
-	struct neigh_hash_table *nht;
+	struct neigh_hash_table *nht = rcu_dereference_bh(arp_tbl.nht);
 	struct neighbour *n;
 	u32 hash_val;
 
-	rcu_read_lock_bh();
-	nht = rcu_dereference_bh(arp_tbl.nht);
+	if (dev->flags & (IFF_LOOPBACK | IFF_POINTOPOINT))
+		key = 0;
+
 	hash_val = arp_hashfn(key, dev, nht->hash_rnd[0]) >> (32 - nht->hash_shift);
 	for (n = rcu_dereference_bh(nht->hash_buckets[hash_val]);
 	     n != NULL;
 	     n = rcu_dereference_bh(n->next)) {
-		if (n->dev == dev && *(u32 *)n->primary_key == key) {
-			if (!atomic_inc_not_zero(&n->refcnt))
-				n = NULL;
-			break;
-		}
+		if (n->dev == dev && *(u32 *)n->primary_key == key)
+			return n;
 	}
+
+	return NULL;
+}
+
+static inline struct neighbour *__ipv4_neigh_lookup(struct net_device *dev, u32 key)
+{
+	struct neighbour *n;
+
+	rcu_read_lock_bh();
+	n = __ipv4_neigh_lookup_noref(dev, key);
+	if (n && !atomic_inc_not_zero(&n->refcnt))
+		n = NULL;
 	rcu_read_unlock_bh();
 
 	return n;
diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index 6cdfeed..e1d18bd 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -202,9 +202,16 @@ extern struct neighbour *	neigh_lookup(struct neigh_table *tbl,
 extern struct neighbour *	neigh_lookup_nodev(struct neigh_table *tbl,
 						   struct net *net,
 						   const void *pkey);
-extern struct neighbour *	neigh_create(struct neigh_table *tbl,
+extern struct neighbour *	__neigh_create(struct neigh_table *tbl,
+					       const void *pkey,
+					       struct net_device *dev,
+					       bool want_ref);
+static inline struct neighbour *neigh_create(struct neigh_table *tbl,
 					     const void *pkey,
-					     struct net_device *dev);
+					     struct net_device *dev)
+{
+	return __neigh_create(tbl, pkey, dev, true);
+}
 extern void			neigh_destroy(struct neighbour *neigh);
 extern int			__neigh_event_send(struct neighbour *neigh, struct sk_buff *skb);
 extern int			neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new, 
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index d81d026..a793af9 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -474,8 +474,8 @@ struct neighbour *neigh_lookup_nodev(struct neigh_table *tbl, struct net *net,
 }
 EXPORT_SYMBOL(neigh_lookup_nodev);
 
-struct neighbour *neigh_create(struct neigh_table *tbl, const void *pkey,
-			       struct net_device *dev)
+struct neighbour *__neigh_create(struct neigh_table *tbl, const void *pkey,
+				 struct net_device *dev, bool want_ref)
 {
 	u32 hash_val;
 	int key_len = tbl->key_len;
@@ -535,14 +535,16 @@ struct neighbour *neigh_create(struct neigh_table *tbl, const void *pkey,
 	     n1 = rcu_dereference_protected(n1->next,
 			lockdep_is_held(&tbl->lock))) {
 		if (dev == n1->dev && !memcmp(n1->primary_key, pkey, key_len)) {
-			neigh_hold(n1);
+			if (want_ref)
+				neigh_hold(n1);
 			rc = n1;
 			goto out_tbl_unlock;
 		}
 	}
 
 	n->dead = 0;
-	neigh_hold(n);
+	if (want_ref)
+		neigh_hold(n);
 	rcu_assign_pointer(n->next,
 			   rcu_dereference_protected(nht->hash_buckets[hash_val],
 						     lockdep_is_held(&tbl->lock)));
@@ -558,7 +560,7 @@ out_neigh_release:
 	neigh_release(n);
 	goto out;
 }
-EXPORT_SYMBOL(neigh_create);
+EXPORT_SYMBOL(__neigh_create);
 
 static u32 pneigh_hash(const void *pkey, int key_len)
 {
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 2630900..6e9a266 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -170,6 +170,7 @@ static inline int ip_finish_output2(struct sk_buff *skb)
 	struct net_device *dev = dst->dev;
 	unsigned int hh_len = LL_RESERVED_SPACE(dev);
 	struct neighbour *neigh;
+	u32 nexthop;
 
 	if (rt->rt_type == RTN_MULTICAST) {
 		IP_UPD_PO_STATS(dev_net(dev), IPSTATS_MIB_OUTMCAST, skb->len);
@@ -191,15 +192,18 @@ static inline int ip_finish_output2(struct sk_buff *skb)
 		skb = skb2;
 	}
 
-	rcu_read_lock();
-	neigh = dst_get_neighbour_noref(dst);
+	rcu_read_lock_bh();
+	nexthop = rt->rt_gateway ? rt->rt_gateway : ip_hdr(skb)->daddr;
+	neigh = __ipv4_neigh_lookup_noref(dev, nexthop);
+	if (unlikely(!neigh))
+		neigh = __neigh_create(&arp_tbl, &nexthop, dev, false);
 	if (neigh) {
 		int res = neigh_output(neigh, skb);
 
-		rcu_read_unlock();
+		rcu_read_unlock_bh();
 		return res;
 	}
-	rcu_read_unlock();
+	rcu_read_unlock_bh();
 
 	net_dbg_ratelimited("%s: No header cache and no neighbour!\n",
 			    __func__);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 6a5afc7..2f40363 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1098,17 +1098,13 @@ static int slow_chain_length(const struct rtable *head)
 
 static struct neighbour *ipv4_neigh_lookup(const struct dst_entry *dst, const void *daddr)
 {
-	static const __be32 inaddr_any = 0;
 	struct net_device *dev = dst->dev;
 	const __be32 *pkey = daddr;
 	const struct rtable *rt;
 	struct neighbour *n;
 
 	rt = (const struct rtable *) dst;
-
-	if (dev->flags & (IFF_LOOPBACK | IFF_POINTOPOINT))
-		pkey = &inaddr_any;
-	else if (rt->rt_gateway)
+	if (rt->rt_gateway)
 		pkey = (const __be32 *) &rt->rt_gateway;
 
 	n = __ipv4_neigh_lookup(dev, *(__force u32 *)pkey);
-- 
1.7.10

^ permalink raw reply related

* [PATCH 02/19] ipv4: Don't report neigh uptodate state in rtcache procfs.
From: David Miller @ 2012-07-03  9:45 UTC (permalink / raw)
  To: netdev


Soon routes will not have a cached neigh attached, nor will we
be able to necessarily go directly to a neigh from an arbitrary
route.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/route.c |   12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 2f40363..bae3638 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -418,13 +418,7 @@ static int rt_cache_seq_show(struct seq_file *seq, void *v)
 			   "HHUptod\tSpecDst");
 	else {
 		struct rtable *r = v;
-		struct neighbour *n;
-		int len, HHUptod;
-
-		rcu_read_lock();
-		n = dst_get_neighbour_noref(&r->dst);
-		HHUptod = (n && (n->nud_state & NUD_CONNECTED)) ? 1 : 0;
-		rcu_read_unlock();
+		int len;
 
 		seq_printf(seq, "%s\t%08X\t%08X\t%8X\t%d\t%u\t%d\t"
 			      "%08X\t%d\t%u\t%u\t%02X\t%d\t%1d\t%08X%n",
@@ -438,9 +432,7 @@ static int rt_cache_seq_show(struct seq_file *seq, void *v)
 			(int)((dst_metric(&r->dst, RTAX_RTT) >> 3) +
 			      dst_metric(&r->dst, RTAX_RTTVAR)),
 			r->rt_key_tos,
-			-1,
-			HHUptod,
-			0, &len);
+			-1, 0, 0, &len);
 
 		seq_printf(seq, "%*s\n", 127 - len, "");
 	}
-- 
1.7.10

^ permalink raw reply related

* [PATCH 03/19] sunrpc: Don't do a dst_confirm() on an input routes.
From: David Miller @ 2012-07-03  9:46 UTC (permalink / raw)
  To: netdev


xs_udp_data_ready() is operating on received packets, and tries to
do a dst_confirm() on the dst attached to the SKB.

This isn't right, dst confirmation is for output routes, not input
routes.  It's for resetting the timers on the nexthop neighbour entry
for the route, indicating that we've got good evidence that we've
successfully reached it.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/sunrpc/xprtsock.c |    3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 890b03f..62d0dac 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1014,9 +1014,6 @@ static void xs_udp_data_ready(struct sock *sk, int len)
 
 	UDPX_INC_STATS_BH(sk, UDP_MIB_INDATAGRAMS);
 
-	/* Something worked... */
-	dst_confirm(skb_dst(skb));
-
 	xprt_adjust_cwnd(task, copied);
 	xprt_complete_rqst(task, copied);
 
-- 
1.7.10

^ permalink raw reply related

* [PATCH 04/19] net: Do delayed neigh confirmation.
From: David Miller @ 2012-07-03  9:46 UTC (permalink / raw)
  To: netdev


When a dst_confirm() happens, mark the confirmation as pending in the
dst.  Then on the next packet out, when we have the neigh in-hand, do
the update.

This removes the dependency in dst_confirm() of dst's having an
attached neigh.

While we're here, remove the explicit 'dst' NULL check, all except 2
or 3 call sites ensure it's not NULL.  So just fix those cases up.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/dst.h       |   29 +++++++++++++++++++++--------
 include/net/neighbour.h |   15 ---------------
 net/core/dst.c          |    3 ++-
 net/ipv4/ip_output.c    |    2 +-
 net/ipv4/tcp_input.c    |   19 +++++++++++++------
 net/ipv6/ip6_output.c   |    2 +-
 6 files changed, 38 insertions(+), 32 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index f0bf3b8..84e7a3f 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -51,7 +51,7 @@ struct dst_entry {
 	int			(*input)(struct sk_buff *);
 	int			(*output)(struct sk_buff *);
 
-	int			flags;
+	unsigned short		flags;
 #define DST_HOST		0x0001
 #define DST_NOXFRM		0x0002
 #define DST_NOPOLICY		0x0004
@@ -62,6 +62,8 @@ struct dst_entry {
 #define DST_FAKE_RTABLE		0x0080
 #define DST_XFRM_TUNNEL		0x0100
 
+	unsigned short		pending_confirm;
+
 	short			error;
 	short			obsolete;
 	unsigned short		header_len;	/* more space at head required */
@@ -371,7 +373,8 @@ static inline struct dst_entry *skb_dst_pop(struct sk_buff *skb)
 
 extern int dst_discard(struct sk_buff *skb);
 extern void *dst_alloc(struct dst_ops *ops, struct net_device *dev,
-		       int initial_ref, int initial_obsolete, int flags);
+		       int initial_ref, int initial_obsolete,
+		       unsigned short flags);
 extern void __dst_free(struct dst_entry *dst);
 extern struct dst_entry *dst_destroy(struct dst_entry *dst);
 
@@ -395,14 +398,24 @@ static inline void dst_rcu_free(struct rcu_head *head)
 
 static inline void dst_confirm(struct dst_entry *dst)
 {
-	if (dst) {
-		struct neighbour *n;
+	dst->pending_confirm = 1;
+}
 
-		rcu_read_lock();
-		n = dst_get_neighbour_noref(dst);
-		neigh_confirm(n);
-		rcu_read_unlock();
+static inline int dst_neigh_output(struct dst_entry *dst, struct neighbour *n,
+				   struct sk_buff *skb)
+{
+	struct hh_cache *hh;
+
+	if (unlikely(dst->pending_confirm)) {
+		n->confirmed = jiffies;
+		dst->pending_confirm = 0;
 	}
+
+	hh = &n->hh;
+	if ((n->nud_state & NUD_CONNECTED) && hh->hh_len)
+		return neigh_hh_output(hh, skb);
+	else
+		return n->output(n, skb);
 }
 
 static inline struct neighbour *dst_neigh_lookup(const struct dst_entry *dst, const void *daddr)
diff --git a/include/net/neighbour.h b/include/net/neighbour.h
index e1d18bd..344d898 100644
--- a/include/net/neighbour.h
+++ b/include/net/neighbour.h
@@ -309,12 +309,6 @@ static inline struct neighbour * neigh_clone(struct neighbour *neigh)
 
 #define neigh_hold(n)	atomic_inc(&(n)->refcnt)
 
-static inline void neigh_confirm(struct neighbour *neigh)
-{
-	if (neigh)
-		neigh->confirmed = jiffies;
-}
-
 static inline int neigh_event_send(struct neighbour *neigh, struct sk_buff *skb)
 {
 	unsigned long now = jiffies;
@@ -358,15 +352,6 @@ static inline int neigh_hh_output(struct hh_cache *hh, struct sk_buff *skb)
 	return dev_queue_xmit(skb);
 }
 
-static inline int neigh_output(struct neighbour *n, struct sk_buff *skb)
-{
-	struct hh_cache *hh = &n->hh;
-	if ((n->nud_state & NUD_CONNECTED) && hh->hh_len)
-		return neigh_hh_output(hh, skb);
-	else
-		return n->output(n, skb);
-}
-
 static inline struct neighbour *
 __neigh_lookup(struct neigh_table *tbl, const void *pkey, struct net_device *dev, int creat)
 {
diff --git a/net/core/dst.c b/net/core/dst.c
index 43d94ce..a6e19a2 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -152,7 +152,7 @@ EXPORT_SYMBOL(dst_discard);
 const u32 dst_default_metrics[RTAX_MAX];
 
 void *dst_alloc(struct dst_ops *ops, struct net_device *dev,
-		int initial_ref, int initial_obsolete, int flags)
+		int initial_ref, int initial_obsolete, unsigned short flags)
 {
 	struct dst_entry *dst;
 
@@ -188,6 +188,7 @@ void *dst_alloc(struct dst_ops *ops, struct net_device *dev,
 	dst->__use = 0;
 	dst->lastuse = jiffies;
 	dst->flags = flags;
+	dst->pending_confirm = 0;
 	dst->next = NULL;
 	if (!(flags & DST_NOCOUNT))
 		dst_entries_add(ops, 1);
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 6e9a266..cc52679 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -198,7 +198,7 @@ static inline int ip_finish_output2(struct sk_buff *skb)
 	if (unlikely(!neigh))
 		neigh = __neigh_create(&arp_tbl, &nexthop, dev, false);
 	if (neigh) {
-		int res = neigh_output(neigh, skb);
+		int res = dst_neigh_output(dst, neigh, skb);
 
 		rcu_read_unlock_bh();
 		return res;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 8416f8a..ca0d0e7 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -740,13 +740,13 @@ void tcp_update_metrics(struct sock *sk)
 	if (sysctl_tcp_nometrics_save)
 		return;
 
-	dst_confirm(dst);
-
 	if (dst && (dst->flags & DST_HOST)) {
 		const struct inet_connection_sock *icsk = inet_csk(sk);
 		int m;
 		unsigned long rtt;
 
+		dst_confirm(dst);
+
 		if (icsk->icsk_backoff || !tp->srtt) {
 			/* This session failed to estimate rtt. Why?
 			 * Probably, no packets returned in time.
@@ -3869,9 +3869,11 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 			tcp_cong_avoid(sk, ack, prior_in_flight);
 	}
 
-	if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP))
-		dst_confirm(__sk_dst_get(sk));
-
+	if ((flag & FLAG_FORWARD_PROGRESS) || !(flag & FLAG_NOT_DUP)) {
+		struct dst_entry *dst = __sk_dst_get(sk);
+		if (dst)
+			dst_confirm(dst);
+	}
 	return 1;
 
 no_queue:
@@ -6140,9 +6142,14 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
 
 		case TCP_FIN_WAIT1:
 			if (tp->snd_una == tp->write_seq) {
+				struct dst_entry *dst;
+
 				tcp_set_state(sk, TCP_FIN_WAIT2);
 				sk->sk_shutdown |= SEND_SHUTDOWN;
-				dst_confirm(__sk_dst_get(sk));
+
+				dst = __sk_dst_get(sk);
+				if (dst)
+					dst_confirm(dst);
 
 				if (!sock_flag(sk, SOCK_DEAD))
 					/* Wake up lingering close() */
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index a233a7c..c94e4aa 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -125,7 +125,7 @@ static int ip6_finish_output2(struct sk_buff *skb)
 	rcu_read_lock();
 	neigh = dst_get_neighbour_noref(dst);
 	if (neigh) {
-		int res = neigh_output(neigh, skb);
+		int res = dst_neigh_output(dst, neigh, skb);
 
 		rcu_read_unlock();
 		return res;
-- 
1.7.10

^ permalink raw reply related

* [PATCH 05/19] net: Add optional SKB arg to dst_ops->neigh_lookup().
From: David Miller @ 2012-07-03  9:46 UTC (permalink / raw)
  To: netdev


Causes the handler to use the daddr in the ipv4/ipv6 header when
the route gateway is unspecified (local subnet).

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/dst.h         |    8 +++++++-
 include/net/dst_ops.h     |    4 +++-
 net/bridge/br_netfilter.c |    4 +++-
 net/decnet/dn_route.c     |    8 ++++++--
 net/ipv4/route.c          |   14 ++++++++++----
 net/ipv6/route.c          |   14 ++++++++++----
 net/xfrm/xfrm_policy.c    |    6 ++++--
 7 files changed, 43 insertions(+), 15 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 84e7a3f..295a705 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -420,7 +420,13 @@ static inline int dst_neigh_output(struct dst_entry *dst, struct neighbour *n,
 
 static inline struct neighbour *dst_neigh_lookup(const struct dst_entry *dst, const void *daddr)
 {
-	return dst->ops->neigh_lookup(dst, daddr);
+	return dst->ops->neigh_lookup(dst, NULL, daddr);
+}
+
+static inline struct neighbour *dst_neigh_lookup_skb(const struct dst_entry *dst,
+						     struct sk_buff *skb)
+{
+	return dst->ops->neigh_lookup(dst, skb, NULL);
 }
 
 static inline void dst_link_failure(struct sk_buff *skb)
diff --git a/include/net/dst_ops.h b/include/net/dst_ops.h
index 3682a0a..4badc86 100644
--- a/include/net/dst_ops.h
+++ b/include/net/dst_ops.h
@@ -26,7 +26,9 @@ struct dst_ops {
 	void			(*link_failure)(struct sk_buff *);
 	void			(*update_pmtu)(struct dst_entry *dst, u32 mtu);
 	int			(*local_out)(struct sk_buff *skb);
-	struct neighbour *	(*neigh_lookup)(const struct dst_entry *dst, const void *daddr);
+	struct neighbour *	(*neigh_lookup)(const struct dst_entry *dst,
+						struct sk_buff *skb,
+						const void *daddr);
 
 	struct kmem_cache	*kmem_cachep;
 
diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 20fa719..4378775 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -120,7 +120,9 @@ static u32 *fake_cow_metrics(struct dst_entry *dst, unsigned long old)
 	return NULL;
 }
 
-static struct neighbour *fake_neigh_lookup(const struct dst_entry *dst, const void *daddr)
+static struct neighbour *fake_neigh_lookup(const struct dst_entry *dst,
+					   struct sk_buff *skb,
+					   const void *daddr)
 {
 	return NULL;
 }
diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index 2493ed5..60e4c6e 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -117,7 +117,9 @@ static void dn_dst_destroy(struct dst_entry *);
 static struct dst_entry *dn_dst_negative_advice(struct dst_entry *);
 static void dn_dst_link_failure(struct sk_buff *);
 static void dn_dst_update_pmtu(struct dst_entry *dst, u32 mtu);
-static struct neighbour *dn_dst_neigh_lookup(const struct dst_entry *dst, const void *daddr);
+static struct neighbour *dn_dst_neigh_lookup(const struct dst_entry *dst,
+					     struct sk_buff *skb,
+					     const void *daddr);
 static int dn_route_input(struct sk_buff *);
 static void dn_run_flush(unsigned long dummy);
 
@@ -828,7 +830,9 @@ static unsigned int dn_dst_mtu(const struct dst_entry *dst)
 	return mtu ? : dst->dev->mtu;
 }
 
-static struct neighbour *dn_dst_neigh_lookup(const struct dst_entry *dst, const void *daddr)
+static struct neighbour *dn_dst_neigh_lookup(const struct dst_entry *dst,
+					     struct sk_buff *skb,
+					     const void *daddr)
 {
 	return __neigh_lookup_errno(&dn_neigh_table, daddr, dst->dev);
 }
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index bae3638..7453dfc 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -188,7 +188,9 @@ static u32 *ipv4_cow_metrics(struct dst_entry *dst, unsigned long old)
 	return p;
 }
 
-static struct neighbour *ipv4_neigh_lookup(const struct dst_entry *dst, const void *daddr);
+static struct neighbour *ipv4_neigh_lookup(const struct dst_entry *dst,
+					   struct sk_buff *skb,
+					   const void *daddr);
 
 static struct dst_ops ipv4_dst_ops = {
 	.family =		AF_INET,
@@ -1088,7 +1090,9 @@ static int slow_chain_length(const struct rtable *head)
 	return length >> FRACT_BITS;
 }
 
-static struct neighbour *ipv4_neigh_lookup(const struct dst_entry *dst, const void *daddr)
+static struct neighbour *ipv4_neigh_lookup(const struct dst_entry *dst,
+					   struct sk_buff *skb,
+					   const void *daddr)
 {
 	struct net_device *dev = dst->dev;
 	const __be32 *pkey = daddr;
@@ -1098,6 +1102,8 @@ static struct neighbour *ipv4_neigh_lookup(const struct dst_entry *dst, const vo
 	rt = (const struct rtable *) dst;
 	if (rt->rt_gateway)
 		pkey = (const __be32 *) &rt->rt_gateway;
+	else if (skb)
+		pkey = &ip_hdr(skb)->daddr;
 
 	n = __ipv4_neigh_lookup(dev, *(__force u32 *)pkey);
 	if (n)
@@ -1107,7 +1113,7 @@ static struct neighbour *ipv4_neigh_lookup(const struct dst_entry *dst, const vo
 
 static int rt_bind_neighbour(struct rtable *rt)
 {
-	struct neighbour *n = ipv4_neigh_lookup(&rt->dst, &rt->rt_gateway);
+	struct neighbour *n = ipv4_neigh_lookup(&rt->dst, NULL, &rt->rt_gateway);
 	if (IS_ERR(n))
 		return PTR_ERR(n);
 	dst_set_neighbour(&rt->dst, n);
@@ -1388,7 +1394,7 @@ static void check_peer_redir(struct dst_entry *dst, struct inet_peer *peer)
 
 	rt->rt_gateway = peer->redirect_learned.a4;
 
-	n = ipv4_neigh_lookup(&rt->dst, &rt->rt_gateway);
+	n = ipv4_neigh_lookup(&rt->dst, NULL, &rt->rt_gateway);
 	if (IS_ERR(n)) {
 		rt->rt_gateway = orig_gw;
 		return;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index c518e4e..4b581c6 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -120,21 +120,27 @@ static u32 *ipv6_cow_metrics(struct dst_entry *dst, unsigned long old)
 	return p;
 }
 
-static inline const void *choose_neigh_daddr(struct rt6_info *rt, const void *daddr)
+static inline const void *choose_neigh_daddr(struct rt6_info *rt,
+					     struct sk_buff *skb,
+					     const void *daddr)
 {
 	struct in6_addr *p = &rt->rt6i_gateway;
 
 	if (!ipv6_addr_any(p))
 		return (const void *) p;
+	else if (skb)
+		return &ipv6_hdr(skb)->daddr;
 	return daddr;
 }
 
-static struct neighbour *ip6_neigh_lookup(const struct dst_entry *dst, const void *daddr)
+static struct neighbour *ip6_neigh_lookup(const struct dst_entry *dst,
+					  struct sk_buff *skb,
+					  const void *daddr)
 {
 	struct rt6_info *rt = (struct rt6_info *) dst;
 	struct neighbour *n;
 
-	daddr = choose_neigh_daddr(rt, daddr);
+	daddr = choose_neigh_daddr(rt, skb, daddr);
 	n = __ipv6_neigh_lookup(&nd_tbl, dst->dev, daddr);
 	if (n)
 		return n;
@@ -1162,7 +1168,7 @@ struct dst_entry *icmp6_dst_alloc(struct net_device *dev,
 	if (neigh)
 		neigh_hold(neigh);
 	else {
-		neigh = ip6_neigh_lookup(&rt->dst, &fl6->daddr);
+		neigh = ip6_neigh_lookup(&rt->dst, NULL, &fl6->daddr);
 		if (IS_ERR(neigh)) {
 			in6_dev_put(idev);
 			dst_free(&rt->dst);
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index ccfbd32..a28a3f9 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -2404,9 +2404,11 @@ static unsigned int xfrm_mtu(const struct dst_entry *dst)
 	return mtu ? : dst_mtu(dst->path);
 }
 
-static struct neighbour *xfrm_neigh_lookup(const struct dst_entry *dst, const void *daddr)
+static struct neighbour *xfrm_neigh_lookup(const struct dst_entry *dst,
+					   struct sk_buff *skb,
+					   const void *daddr)
 {
-	return dst_neigh_lookup(dst->path, daddr);
+	return dst->path->ops->neigh_lookup(dst, skb, daddr);
 }
 
 int xfrm_policy_register_afinfo(struct xfrm_policy_afinfo *afinfo)
-- 
1.7.10

^ permalink raw reply related

* [PATCH 06/19] sch_teql: Convert over to dev_neigh_lookup_skb().
From: David Miller @ 2012-07-03  9:46 UTC (permalink / raw)
  To: netdev


Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/sched/sch_teql.c |   39 +++++++++++++--------------------------
 1 file changed, 13 insertions(+), 26 deletions(-)

diff --git a/net/sched/sch_teql.c b/net/sched/sch_teql.c
index ca0c296..6cd7174 100644
--- a/net/sched/sch_teql.c
+++ b/net/sched/sch_teql.c
@@ -67,7 +67,6 @@ struct teql_master {
 struct teql_sched_data {
 	struct Qdisc *next;
 	struct teql_master *m;
-	struct neighbour *ncache;
 	struct sk_buff_head q;
 };
 
@@ -134,7 +133,6 @@ teql_reset(struct Qdisc *sch)
 
 	skb_queue_purge(&dat->q);
 	sch->q.qlen = 0;
-	teql_neigh_release(xchg(&dat->ncache, NULL));
 }
 
 static void
@@ -166,7 +164,6 @@ teql_destroy(struct Qdisc *sch)
 					}
 				}
 				skb_queue_purge(&dat->q);
-				teql_neigh_release(xchg(&dat->ncache, NULL));
 				break;
 			}
 
@@ -225,21 +222,15 @@ static int teql_qdisc_init(struct Qdisc *sch, struct nlattr *opt)
 static int
 __teql_resolve(struct sk_buff *skb, struct sk_buff *skb_res,
 	       struct net_device *dev, struct netdev_queue *txq,
-	       struct neighbour *mn)
+	       struct dst_entry *dst)
 {
-	struct teql_sched_data *q = qdisc_priv(txq->qdisc);
-	struct neighbour *n = q->ncache;
+	struct neighbour *n;
+	int err = 0;
+
+	n = dst_neigh_lookup_skb(dst, skb);
+	if (!n)
+		return -ENOENT;
 
-	if (mn->tbl == NULL)
-		return -EINVAL;
-	if (n && n->tbl == mn->tbl &&
-	    memcmp(n->primary_key, mn->primary_key, mn->tbl->key_len) == 0) {
-		atomic_inc(&n->refcnt);
-	} else {
-		n = __neigh_lookup_errno(mn->tbl, mn->primary_key, dev);
-		if (IS_ERR(n))
-			return PTR_ERR(n);
-	}
 	if (neigh_event_send(n, skb_res) == 0) {
 		int err;
 		char haddr[MAX_ADDR_LEN];
@@ -248,15 +239,13 @@ __teql_resolve(struct sk_buff *skb, struct sk_buff *skb_res,
 		err = dev_hard_header(skb, dev, ntohs(skb->protocol), haddr,
 				      NULL, skb->len);
 
-		if (err < 0) {
-			neigh_release(n);
-			return -EINVAL;
-		}
-		teql_neigh_release(xchg(&q->ncache, n));
-		return 0;
+		if (err < 0)
+			err = -EINVAL;
+	} else {
+		err = (skb_res == NULL) ? -EAGAIN : 1;
 	}
 	neigh_release(n);
-	return (skb_res == NULL) ? -EAGAIN : 1;
+	return err;
 }
 
 static inline int teql_resolve(struct sk_buff *skb,
@@ -265,7 +254,6 @@ static inline int teql_resolve(struct sk_buff *skb,
 			       struct netdev_queue *txq)
 {
 	struct dst_entry *dst = skb_dst(skb);
-	struct neighbour *mn;
 	int res;
 
 	if (txq->qdisc == &noop_qdisc)
@@ -275,8 +263,7 @@ static inline int teql_resolve(struct sk_buff *skb,
 		return 0;
 
 	rcu_read_lock();
-	mn = dst_get_neighbour_noref(dst);
-	res = mn ? __teql_resolve(skb, skb_res, dev, txq, mn) : 0;
+	res = __teql_resolve(skb, skb_res, dev, txq, dst);
 	rcu_read_unlock();
 
 	return res;
-- 
1.7.10

^ permalink raw reply related

* [PATCH 07/19] ipoib: Convert over to dev_lookup_neigh_skb().
From: David Miller @ 2012-07-03  9:46 UTC (permalink / raw)
  To: netdev


Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c      |    4 +++-
 drivers/infiniband/ulp/ipoib/ipoib_multicast.c |   22 +++++++++++++---------
 2 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 3974c29..bbee4b2 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -715,7 +715,7 @@ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	rcu_read_lock();
 	if (likely(skb_dst(skb))) {
-		n = dst_get_neighbour_noref(skb_dst(skb));
+		n = dst_neigh_lookup_skb(skb_dst(skb), skb);
 		if (!n) {
 			++dev->stats.tx_dropped;
 			dev_kfree_skb_any(skb);
@@ -797,6 +797,8 @@ static int ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		}
 	}
 unlock:
+	if (n)
+		neigh_release(n);
 	rcu_read_unlock();
 	return NETDEV_TX_OK;
 }
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index 20ebc6f..fbb95ee 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -720,16 +720,20 @@ out:
 
 		rcu_read_lock();
 		if (dst)
-			n = dst_get_neighbour_noref(dst);
-		if (n && !*to_ipoib_neigh(n)) {
-			struct ipoib_neigh *neigh = ipoib_neigh_alloc(n,
-								      skb->dev);
-
-			if (neigh) {
-				kref_get(&mcast->ah->ref);
-				neigh->ah	= mcast->ah;
-				list_add_tail(&neigh->list, &mcast->neigh_list);
+			n = dst_neigh_lookup_skb(dst, skb);
+		if (n) {
+			if (!*to_ipoib_neigh(n)) {
+				struct ipoib_neigh *neigh;
+
+				neigh = ipoib_neigh_alloc(n, skb->dev);
+				if (neigh) {
+					kref_get(&mcast->ah->ref);
+					neigh->ah	= mcast->ah;
+					list_add_tail(&neigh->list,
+						      &mcast->neigh_list);
+				}
 			}
+			neigh_release(n);
 		}
 		rcu_read_unlock();
 		spin_unlock_irqrestore(&priv->lock, flags);
-- 
1.7.10

^ permalink raw reply related

* [PATCH 08/19] qeth: Convert over to dst_neigh_lookup_skb().
From: David Miller @ 2012-07-03  9:46 UTC (permalink / raw)
  To: netdev


Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/s390/net/qeth_l3_main.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c
index 7be5e97..73ac63d 100644
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -2700,10 +2700,11 @@ int inline qeth_l3_get_cast_type(struct qeth_card *card, struct sk_buff *skb)
 	rcu_read_lock();
 	dst = skb_dst(skb);
 	if (dst)
-		n = dst_get_neighbour_noref(dst);
+		n = dst_neigh_lookup_skb(dst, skb);
 	if (n) {
 		cast_type = n->type;
 		rcu_read_unlock();
+		neigh_release(n);
 		if ((cast_type == RTN_BROADCAST) ||
 		    (cast_type == RTN_MULTICAST) ||
 		    (cast_type == RTN_ANYCAST))
-- 
1.7.10

^ permalink raw reply related

* [PATCH 09/19] cxgbi: Convert over to dst_neigh_lookup().
From: David Miller @ 2012-07-03  9:46 UTC (permalink / raw)
  To: netdev


Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/scsi/cxgbi/libcxgbi.c |   12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/cxgbi/libcxgbi.c b/drivers/scsi/cxgbi/libcxgbi.c
index d9253db..b44c1cf 100644
--- a/drivers/scsi/cxgbi/libcxgbi.c
+++ b/drivers/scsi/cxgbi/libcxgbi.c
@@ -494,7 +494,7 @@ static struct cxgbi_sock *cxgbi_check_route(struct sockaddr *dst_addr)
 		goto err_out;
 	}
 	dst = &rt->dst;
-	n = dst_get_neighbour_noref(dst);
+	n = dst_neigh_lookup(dst, &daddr->sin_addr.s_addr);
 	if (!n) {
 		err = -ENODEV;
 		goto rel_rt;
@@ -506,7 +506,7 @@ static struct cxgbi_sock *cxgbi_check_route(struct sockaddr *dst_addr)
 			&daddr->sin_addr.s_addr, ntohs(daddr->sin_port),
 			ndev->name);
 		err = -ENETUNREACH;
-		goto rel_rt;
+		goto rel_neigh;
 	}
 
 	if (ndev->flags & IFF_LOOPBACK) {
@@ -521,7 +521,7 @@ static struct cxgbi_sock *cxgbi_check_route(struct sockaddr *dst_addr)
 		pr_info("dst %pI4, %s, NOT cxgbi device.\n",
 			&daddr->sin_addr.s_addr, ndev->name);
 		err = -ENETUNREACH;
-		goto rel_rt;
+		goto rel_neigh;
 	}
 	log_debug(1 << CXGBI_DBG_SOCK,
 		"route to %pI4 :%u, ndev p#%d,%s, cdev 0x%p.\n",
@@ -531,7 +531,7 @@ static struct cxgbi_sock *cxgbi_check_route(struct sockaddr *dst_addr)
 	csk = cxgbi_sock_create(cdev);
 	if (!csk) {
 		err = -ENOMEM;
-		goto rel_rt;
+		goto rel_neigh;
 	}
 	csk->cdev = cdev;
 	csk->port_id = port;
@@ -541,9 +541,13 @@ static struct cxgbi_sock *cxgbi_check_route(struct sockaddr *dst_addr)
 	csk->daddr.sin_port = daddr->sin_port;
 	csk->daddr.sin_family = daddr->sin_family;
 	csk->saddr.sin_addr.s_addr = fl4.saddr;
+	neigh_release(n);
 
 	return csk;
 
+rel_neigh:
+	neigh_release(n);
+
 rel_rt:
 	ip_rt_put(rt);
 	if (csk)
-- 
1.7.10

^ permalink raw reply related

* [PATCH 10/19] cxgb4i: Convert over to dst_neigh_lookup().
From: David Miller @ 2012-07-03  9:46 UTC (permalink / raw)
  To: netdev


Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
index 5a4a3bf..cc9a068 100644
--- a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
+++ b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
@@ -1142,7 +1142,7 @@ static int init_act_open(struct cxgbi_sock *csk)
 	cxgbi_sock_set_flag(csk, CTPF_HAS_ATID);
 	cxgbi_sock_get(csk);
 
-	n = dst_get_neighbour_noref(csk->dst);
+	n = dst_neigh_lookup(csk->dst, &csk->daddr.sin_addr.s_addr);
 	if (!n) {
 		pr_err("%s, can't get neighbour of csk->dst.\n", ndev->name);
 		goto rel_resource;
@@ -1182,9 +1182,12 @@ static int init_act_open(struct cxgbi_sock *csk)
 
 	cxgbi_sock_set_state(csk, CTP_ACTIVE_OPEN);
 	send_act_open_req(csk, skb, csk->l2t);
+	neigh_release(n);
 	return 0;
 
 rel_resource:
+	if (n)
+		neigh_release(n);
 	if (skb)
 		__kfree_skb(skb);
 	return -EINVAL;
-- 
1.7.10

^ permalink raw reply related

* [PATCH 11/19] br_netfilter: Convert to dst_neigh_lookup_skb().
From: David Miller @ 2012-07-03  9:46 UTC (permalink / raw)
  To: netdev


Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/bridge/br_netfilter.c |   36 +++++++++++++++++++++++-------------
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 4378775..b98d3d7 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -375,19 +375,29 @@ static int br_nf_pre_routing_finish_bridge(struct sk_buff *skb)
 	if (!skb->dev)
 		goto free_skb;
 	dst = skb_dst(skb);
-	neigh = dst_get_neighbour_noref(dst);
-	if (neigh->hh.hh_len) {
-		neigh_hh_bridge(&neigh->hh, skb);
-		skb->dev = nf_bridge->physindev;
-		return br_handle_frame_finish(skb);
-	} else {
-		/* the neighbour function below overwrites the complete
-		 * MAC header, so we save the Ethernet source address and
-		 * protocol number. */
-		skb_copy_from_linear_data_offset(skb, -(ETH_HLEN-ETH_ALEN), skb->nf_bridge->data, ETH_HLEN-ETH_ALEN);
-		/* tell br_dev_xmit to continue with forwarding */
-		nf_bridge->mask |= BRNF_BRIDGED_DNAT;
-		return neigh->output(neigh, skb);
+	neigh = dst_neigh_lookup_skb(dst, skb);
+	if (neigh) {
+		int ret;
+
+		if (neigh->hh.hh_len) {
+			neigh_hh_bridge(&neigh->hh, skb);
+			skb->dev = nf_bridge->physindev;
+			ret = br_handle_frame_finish(skb);
+		} else {
+			/* the neighbour function below overwrites the complete
+			 * MAC header, so we save the Ethernet source address and
+			 * protocol number.
+			 */
+			skb_copy_from_linear_data_offset(skb,
+							 -(ETH_HLEN-ETH_ALEN),
+							 skb->nf_bridge->data,
+							 ETH_HLEN-ETH_ALEN);
+			/* tell br_dev_xmit to continue with forwarding */
+			nf_bridge->mask |= BRNF_BRIDGED_DNAT;
+			ret = neigh->output(neigh, skb);
+		}
+		neigh_release(neigh);
+		return ret;
 	}
 free_skb:
 	kfree_skb(skb);
-- 
1.7.10

^ permalink raw reply related

* [PATCH 12/19] neigh: Convert over to dst_neigh_lookup_skb().
From: David Miller @ 2012-07-03  9:46 UTC (permalink / raw)
  To: netdev


Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/core/neighbour.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index a793af9..eb3efdc 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1202,9 +1202,15 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new,
 
 			rcu_read_lock();
 			/* On shaper/eql skb->dst->neighbour != neigh :( */
-			if (dst && (n2 = dst_get_neighbour_noref(dst)) != NULL)
-				n1 = n2;
+			n2 = NULL;
+			if (dst) {
+				n2 = dst_neigh_lookup_skb(dst, skb);
+				if (n2)
+					n1 = n2;
+			}
 			n1->output(n1, skb);
+			if (n2)
+				neigh_release(n2);
 			rcu_read_unlock();
 
 			write_lock_bh(&neigh->lock);
-- 
1.7.10

^ permalink raw reply related

* [PATCH 13/19] decnet: Use neighbours privately in dn_route struct.
From: David Miller @ 2012-07-03  9:46 UTC (permalink / raw)
  To: netdev


This allows an easy conversion away from dst_get_neighbour*().

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/dn_route.h |    2 ++
 net/decnet/dn_neigh.c  |    2 +-
 net/decnet/dn_route.c  |   36 +++++++++++++++++++++++++++++-------
 3 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/include/net/dn_route.h b/include/net/dn_route.h
index c507e05..4f7d6a1 100644
--- a/include/net/dn_route.h
+++ b/include/net/dn_route.h
@@ -67,6 +67,8 @@ extern void dn_rt_cache_flush(int delay);
 struct dn_route {
 	struct dst_entry dst;
 
+	struct neighbour *n;
+
 	struct flowidn fld;
 
 	__le16 rt_saddr;
diff --git a/net/decnet/dn_neigh.c b/net/decnet/dn_neigh.c
index 8e9a35b..3aede1b 100644
--- a/net/decnet/dn_neigh.c
+++ b/net/decnet/dn_neigh.c
@@ -202,7 +202,7 @@ static int dn_neigh_output_packet(struct sk_buff *skb)
 {
 	struct dst_entry *dst = skb_dst(skb);
 	struct dn_route *rt = (struct dn_route *)dst;
-	struct neighbour *neigh = dst_get_neighbour_noref(dst);
+	struct neighbour *neigh = rt->n;
 	struct net_device *dev = neigh->dev;
 	char mac_addr[ETH_ALEN];
 	unsigned int seq;
diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index 60e4c6e..6e74b3f 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -114,6 +114,7 @@ static struct dst_entry *dn_dst_check(struct dst_entry *, __u32);
 static unsigned int dn_dst_default_advmss(const struct dst_entry *dst);
 static unsigned int dn_dst_mtu(const struct dst_entry *dst);
 static void dn_dst_destroy(struct dst_entry *);
+static void dn_dst_ifdown(struct dst_entry *, struct net_device *dev, int how);
 static struct dst_entry *dn_dst_negative_advice(struct dst_entry *);
 static void dn_dst_link_failure(struct sk_buff *);
 static void dn_dst_update_pmtu(struct dst_entry *dst, u32 mtu);
@@ -140,6 +141,7 @@ static struct dst_ops dn_dst_ops = {
 	.mtu =			dn_dst_mtu,
 	.cow_metrics =		dst_cow_metrics_generic,
 	.destroy =		dn_dst_destroy,
+	.ifdown =		dn_dst_ifdown,
 	.negative_advice =	dn_dst_negative_advice,
 	.link_failure =		dn_dst_link_failure,
 	.update_pmtu =		dn_dst_update_pmtu,
@@ -148,9 +150,27 @@ static struct dst_ops dn_dst_ops = {
 
 static void dn_dst_destroy(struct dst_entry *dst)
 {
+	struct dn_route *rt = (struct dn_route *) dst;
+
+	if (rt->n)
+		neigh_release(rt->n);
 	dst_destroy_metrics_generic(dst);
 }
 
+static void dn_dst_ifdown(struct dst_entry *dst, struct net_device *dev, int how)
+{
+	if (how) {
+		struct dn_route *rt = (struct dn_route *) dst;
+		struct neighbour *n = rt->n;
+
+		if (n && n->dev == dev) {
+			n->dev = dev_net(dev)->loopback_dev;
+			dev_hold(n->dev);
+			dev_put(dev);
+		}
+	}
+}
+
 static __inline__ unsigned int dn_hash(__le16 src, __le16 dst)
 {
 	__u16 tmp = (__u16 __force)(src ^ dst);
@@ -246,7 +266,8 @@ static int dn_dst_gc(struct dst_ops *ops)
  */
 static void dn_dst_update_pmtu(struct dst_entry *dst, u32 mtu)
 {
-	struct neighbour *n = dst_get_neighbour_noref(dst);
+	struct dn_route *rt = (struct dn_route *) dst;
+	struct neighbour *n = rt->n;
 	u32 min_mtu = 230;
 	struct dn_dev *dn;
 
@@ -715,7 +736,8 @@ out:
 static int dn_to_neigh_output(struct sk_buff *skb)
 {
 	struct dst_entry *dst = skb_dst(skb);
-	struct neighbour *n = dst_get_neighbour_noref(dst);
+	struct dn_route *rt = (struct dn_route *) dst;
+	struct neighbour *n = rt->n;
 
 	return n->output(n, skb);
 }
@@ -729,7 +751,7 @@ static int dn_output(struct sk_buff *skb)
 
 	int err = -EINVAL;
 
-	if (dst_get_neighbour_noref(dst) == NULL)
+	if (rt->n == NULL)
 		goto error;
 
 	skb->dev = dev;
@@ -852,11 +874,11 @@ static int dn_rt_set_next_hop(struct dn_route *rt, struct dn_fib_res *res)
 	}
 	rt->rt_type = res->type;
 
-	if (dev != NULL && dst_get_neighbour_noref(&rt->dst) == NULL) {
+	if (dev != NULL && rt->n == NULL) {
 		n = __neigh_lookup_errno(&dn_neigh_table, &rt->rt_gateway, dev);
 		if (IS_ERR(n))
 			return PTR_ERR(n);
-		dst_set_neighbour(&rt->dst, n);
+		rt->n = n;
 	}
 
 	if (dst_metric(&rt->dst, RTAX_MTU) > rt->dst.dev->mtu)
@@ -1163,7 +1185,7 @@ make_route:
 	rt->rt_dst_map    = fld.daddr;
 	rt->rt_src_map    = fld.saddr;
 
-	dst_set_neighbour(&rt->dst, neigh);
+	rt->n = neigh;
 	neigh = NULL;
 
 	rt->dst.lastuse = jiffies;
@@ -1433,7 +1455,7 @@ make_route:
 	rt->fld.flowidn_iif  = in_dev->ifindex;
 	rt->fld.flowidn_mark = fld.flowidn_mark;
 
-	dst_set_neighbour(&rt->dst, neigh);
+	rt->n = neigh;
 	rt->dst.lastuse = jiffies;
 	rt->dst.output = dn_rt_bug;
 	switch (res.type) {
-- 
1.7.10

^ permalink raw reply related

* [PATCH 14/19] net: Pass neighbours into the NETEVENT_REDIRECT event.
From: David Miller @ 2012-07-03  9:46 UTC (permalink / raw)
  To: netdev


Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c |   23 ++++++++------------
 include/net/netevent.h                             |    3 +++
 net/ipv6/route.c                                   |    6 ++++-
 3 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
index 55cf72a..633c602 100644
--- a/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
+++ b/drivers/net/ethernet/chelsio/cxgb3/cxgb3_offload.c
@@ -62,7 +62,8 @@ static const unsigned int MAX_ATIDS = 64 * 1024;
 static const unsigned int ATID_BASE = 0x10000;
 
 static void cxgb_neigh_update(struct neighbour *neigh);
-static void cxgb_redirect(struct dst_entry *old, struct dst_entry *new);
+static void cxgb_redirect(struct dst_entry *old, struct neighbour *old_neigh,
+			  struct dst_entry *new, struct neighbour *new_neigh);
 
 static inline int offload_activated(struct t3cdev *tdev)
 {
@@ -968,8 +969,9 @@ static int nb_callback(struct notifier_block *self, unsigned long event,
 	}
 	case (NETEVENT_REDIRECT):{
 		struct netevent_redirect *nr = ctx;
-		cxgb_redirect(nr->old, nr->new);
-		cxgb_neigh_update(dst_get_neighbour_noref(nr->new));
+		cxgb_redirect(nr->old, nr->old_neigh,
+			      nr->new, nr->new_neigh);
+		cxgb_neigh_update(nr->new_neigh);
 		break;
 	}
 	default:
@@ -1107,10 +1109,10 @@ static void set_l2t_ix(struct t3cdev *tdev, u32 tid, struct l2t_entry *e)
 	tdev->send(tdev, skb);
 }
 
-static void cxgb_redirect(struct dst_entry *old, struct dst_entry *new)
+static void cxgb_redirect(struct dst_entry *old, struct neighbour *old_neigh,
+			  struct dst_entry *new, struct neighbour *new_neigh)
 {
 	struct net_device *olddev, *newdev;
-	struct neighbour *n;
 	struct tid_info *ti;
 	struct t3cdev *tdev;
 	u32 tid;
@@ -1118,15 +1120,8 @@ static void cxgb_redirect(struct dst_entry *old, struct dst_entry *new)
 	struct l2t_entry *e;
 	struct t3c_tid_entry *te;
 
-	n = dst_get_neighbour_noref(old);
-	if (!n)
-		return;
-	olddev = n->dev;
-
-	n = dst_get_neighbour_noref(new);
-	if (!n)
-		return;
-	newdev = n->dev;
+	olddev = old_neigh->dev;
+	newdev = new_neigh->dev;
 
 	if (!is_offloading(olddev))
 		return;
diff --git a/include/net/netevent.h b/include/net/netevent.h
index 086f8a5..56e9862 100644
--- a/include/net/netevent.h
+++ b/include/net/netevent.h
@@ -12,10 +12,13 @@
  */
 
 struct dst_entry;
+struct neighbour;
 
 struct netevent_redirect {
 	struct dst_entry *old;
+	struct neighbour *old_neigh;
 	struct dst_entry *new;
+	struct neighbour *new_neigh;
 };
 
 enum netevent_notif_type {
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 4b581c6..24034d7 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1687,6 +1687,7 @@ void rt6_redirect(const struct in6_addr *dest, const struct in6_addr *src,
 	struct rt6_info *rt, *nrt = NULL;
 	struct netevent_redirect netevent;
 	struct net *net = dev_net(neigh->dev);
+	struct neighbour *old_neigh;
 
 	rt = ip6_route_redirect(dest, src, saddr, neigh->dev);
 
@@ -1714,7 +1715,8 @@ void rt6_redirect(const struct in6_addr *dest, const struct in6_addr *src,
 	dst_confirm(&rt->dst);
 
 	/* Duplicate redirect: silently ignore. */
-	if (neigh == dst_get_neighbour_noref_raw(&rt->dst))
+	old_neigh = dst_get_neighbour_noref_raw(&rt->dst);
+	if (neigh == old_neigh)
 		goto out;
 
 	nrt = ip6_rt_copy(rt, dest);
@@ -1732,7 +1734,9 @@ void rt6_redirect(const struct in6_addr *dest, const struct in6_addr *src,
 		goto out;
 
 	netevent.old = &rt->dst;
+	netevent.old_neigh = old_neigh;
 	netevent.new = &nrt->dst;
+	netevent.new_neigh = neigh;
 	call_netevent_notifiers(NETEVENT_REDIRECT, &netevent);
 
 	if (rt->rt6i_flags & RTF_CACHE) {
-- 
1.7.10

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox