Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: Netconsole crash on 2.6.38-rc3
From: Greg KH @ 2011-02-18  3:02 UTC (permalink / raw)
  To: Sarah Sharp; +Cc: David S. Miller, netdev
In-Reply-To: <20110218012847.GA8980@xanatos>

On Thu, Feb 17, 2011 at 05:28:47PM -0800, Sarah Sharp wrote:
> I'm trying to debug an xHCI driver crash on 2.6.38-rc3, and netconsole
> is crashing when I try to load it.  I will try to update to 2.6.38-rc5,
> but I'm sort of stuck on rc3 since Greg KH's USB tree is based on that.

No it isn't, it's synced up with 2.6.38-rc5 at the moment, which was
required to handle some merge conflicts.  You might want to update your
version :)

thanks,

greg k-h

^ permalink raw reply

* Re: Mass udp flow reboot linux with RealTek RTL-8169 Gigabit
From: Seblu @ 2011-02-18  2:54 UTC (permalink / raw)
  To: Francois Romieu; +Cc: Eric Dumazet, lkml, netdev, Ivan Vecera
In-Reply-To: <20110213203417.GA11442@electric-eye.fr.zoreil.com>

[-- Attachment #1: Type: text/plain, Size: 571 bytes --]

On Sun, Feb 13, 2011 at 9:34 PM, Francois Romieu <romieu@fr.zoreil.com> wrote:
> Seblu <seblu@seblu.net> :
> [...]
>> > NIC seems to be reset frequently but host stop rebooting. \o//
>> ok after about 1 hour of iperf, host reboot.
>
> Can you apply the patch below on top of 2.6.38-rc4 ?
>

I've applyed your patch on 2.6.38-rc5. Host have rebooted 2mn after udp start.
After this reboot, host is still on after 2 hour under a 1Gbit/s udp flow.

I attached a dmesg output before reboot. Do you need anything else?

-- 
Sébastien Luttringer
www.seblu.net

[-- Attachment #2: dmesg.2.6.38-rc5-seblu.xz --]
[-- Type: application/x-xz, Size: 12392 bytes --]

^ permalink raw reply

* [PATCH] bonding: bond_select_queue off by one
From: Phil Oester @ 2011-02-18  2:07 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 506 bytes --]

The bonding driver's bond_select_queue function simply returns
skb->queue_mapping.  However queue_mapping could be == 16
for queue #16.  This causes the following message to be flooded
to syslog:

kernel: bondx selects TX queue 16, but real number of TX queues is 16

ndo_select_queue wants a zero-based number, so bonding driver needs
to subtract one to return the proper queue number.  Also fix grammar in
a comment while in the vicinity.

Phil Oester

Signed-off-by: Phil Oester <kernel@linuxace.com>



[-- Attachment #2: patch-bond-txq --]
[-- Type: text/plain, Size: 691 bytes --]

--- linux-2.6/drivers/net/bonding/bond_main.c.orig	2011-01-30 09:15:09.813843817 -0800
+++ linux-2.6/drivers/net/bonding/bond_main.c	2011-02-17 18:02:46.919050909 -0800
@@ -4537,11 +4537,11 @@
 {
 	/*
 	 * This helper function exists to help dev_pick_tx get the correct
-	 * destination queue.  Using a helper function skips the a call to
+	 * destination queue.  Using a helper function skips a call to
 	 * skb_tx_hash and will put the skbs in the queue we expect on their
 	 * way down to the bonding driver.
 	 */
-	return skb->queue_mapping;
+	return skb->queue_mapping ? skb->queue_mapping - 1 : 0;
 }
 
 static netdev_tx_t bond_start_xmit(struct sk_buff *skb, struct net_device *dev)

^ permalink raw reply

* Netconsole crash on 2.6.38-rc3
From: Sarah Sharp @ 2011-02-18  1:28 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Greg KH

[-- Attachment #1: Type: text/plain, Size: 803 bytes --]

I'm trying to debug an xHCI driver crash on 2.6.38-rc3, and netconsole
is crashing when I try to load it.  I will try to update to 2.6.38-rc5,
but I'm sort of stuck on rc3 since Greg KH's USB tree is based on that.

Attached is the two scripts I use to set up my box and call netconsole.
netconsole-on-network.sh is called first, followed by
netconsole-ending-on-network.sh.

When I invoked the netconsole-ending-on-network.sh script, netconsole
failed to load with an error about having the wrong ethernet device.  My
ethernet device apparently migrated from eth1 to eth0 on that box.

After modifying my script, unloading the netconsole driver, and
re-running netconsole-ending-on-network.sh, I got a "Killed" message
with the attached trace in dmesg.

Is this a known bug on 2.6.38-rc3?

Sarah Sharp

[-- Attachment #2: netconsole-crash.txt --]
[-- Type: text/plain, Size: 3918 bytes --]

[   30.336011] eth0: no IPv6 routers present
[   62.165508] netconsole: local port 6665
[   62.165512] netconsole: local IP 0.0.0.0
[   62.165513] netconsole: interface 'eth1'
[   62.165514] netconsole: remote port 6666
[   62.165515] netconsole: remote IP 192.168.1.138
[   62.165517] netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
[   62.165518] netconsole: eth1 doesn't exist, aborting.
[   62.165520] netconsole: cleaning up
[   98.791662] netconsole: local port 6665
[   98.791666] netconsole: local IP 0.0.0.0
[   98.791667] netconsole: interface 'eth0'
[   98.791668] netconsole: remote port 6666
[   98.791669] netconsole: remote IP 192.168.1.138
[   98.791671] netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
[   98.791673] netconsole: local IP 192.168.1.8
[   98.791690] BUG: unable to handle kernel NULL pointer dereference at           (null)
[   98.791693] IP: [<ffffffff81131977>] d_delete+0x47/0x180
[   98.791698] PGD 221b45067 PUD 221bce067 PMD 0 
[   98.791701] Oops: 0000 [#1] SMP 
[   98.791702] last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host1/target1:0:0/1:0:0:0/block/sda/sda1/stat
[   98.791705] CPU 0 
[   98.791706] Modules linked in: netconsole(+) i915 drm_kms_helper drm binfmt_misc i2c_algo_bit ppdev bridge stp bnep video lp parport snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd soundcore usbhid pcspkr snd_page_alloc intel_agp intel_gtt iTCO_wdt iTCO_vendor_support ehci_hcd uhci_hcd usbcore floppy
[   98.791726] 
[   98.791728] Pid: 3337, comm: modprobe Not tainted 2.6.38-rc3+ #179 P5Q-EM/System Product Name
[   98.791730] RIP: 0010:[<ffffffff81131977>]  [<ffffffff81131977>] d_delete+0x47/0x180
[   98.791732] RSP: 0018:ffff880221b8fe68  EFLAGS: 00010246
[   98.791734] RAX: 0000000000000202 RBX: ffff8802255e2180 RCX: ffffffff81ababe0
[   98.791735] RDX: 0000000000000000 RSI: ffff8802255e21b8 RDI: ffff8802255e21dc
[   98.791737] RBP: ffff880221b8fe88 R08: ffff8800cda14420 R09: 0000000000000000
[   98.791738] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[   98.791740] R13: ffff8802255e21dc R14: 0000000000000000 R15: ffff880221b8fee8
[   98.791742] FS:  00007fa4795896f0(0000) GS:ffff8800cda00000(0000) knlGS:0000000000000000
[   98.791744] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[   98.791745] CR2: 0000000000000000 CR3: 0000000221b98000 CR4: 00000000000406b0
[   98.791747] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   98.791749] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   98.791750] Process modprobe (pid: 3337, threadinfo ffff880221b8e000, task ffff88021cc243b0)
[   98.791752] Stack:
[   98.791752]  ffff8802255e2180 00000000ffffffef ffffffffa01b80a0 ffff880221bd9c68
[   98.791755]  ffff880221b8fec8 ffffffff8118bf33 0000000a685497ea ffffffffa01b80a8
[   98.791758]  ffff880221b8fec8 ffff880221bd9c00 0000000000000000 0000000000000000
[   98.791760] Call Trace:
[   98.791765]  [<ffffffff8118bf33>] configfs_register_subsystem+0x103/0x1c0
[   98.791768]  [<ffffffffa0006260>] init_netconsole+0x260/0x1000 [netconsole]
[   98.791771]  [<ffffffffa0006000>] ? init_netconsole+0x0/0x1000 [netconsole]
[   98.791775]  [<ffffffff810001de>] do_one_initcall+0x3e/0x170
[   98.791778]  [<ffffffff81086e26>] sys_init_module+0x106/0x280
[   98.791780]  [<ffffffff81002dbb>] system_call_fastpath+0x16/0x1b
[   98.791781] Code: 84 83 00 00 00 49 8d 7c 24 20 e8 45 c4 3f 00 85 c0 0f 1f 00 75 6e 41 fe 45 00 f3 90 4c 89 ef 45 31 f6 e8 5d c4 3f 00 4c 8b 63 30 <41> 0f b7 04 24 25 00 f0 00 00 3d 00 40 00 00 41 0f 94 c6 83 7b 
[   98.791799] RIP  [<ffffffff81131977>] d_delete+0x47/0x180
[   98.791801]  RSP <ffff880221b8fe68>
[   98.791802] CR2: 0000000000000000
[   98.791804] ---[ end trace 0bbba7195aad7eaa ]---

[-- Attachment #3: netconsole-on-network.sh --]
[-- Type: application/x-sh, Size: 146 bytes --]

[-- Attachment #4: netconsole-ending-on-network.sh --]
[-- Type: application/x-sh, Size: 88 bytes --]

^ permalink raw reply

* linux-next: manual merge of the net tree with the net-current tree
From: Stephen Rothwell @ 2011-02-18  1:20 UTC (permalink / raw)
  To: David Miller, netdev
  Cc: linux-next, linux-kernel, Jesse Brandeburg, Jeff Kirsher

Hi all,

Today's linux-next merge of the net tree got a conflict in
drivers/net/e1000e/netdev.c between commit
713b3c9e4c1a6da6b45da6474ed554ed0a48de69 ("e1000e: flush all writebacks
before unload") from the net-current tree and commit
67fd4fcb78a7ced369a6bd8a131ec8c65ebd2bbb ("e1000e: convert to stats64")
from the net tree.

Just context changes.  I fixed it up (see below) and can carry the fix as
necessary.
-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

diff --cc drivers/net/e1000e/netdev.c
index 3fa110d,7cedfeb..0000000
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@@ -3344,21 -3335,8 +3341,23 @@@ int e1000e_up(struct e1000_adapter *ada
  	return 0;
  }
  
 +static void e1000e_flush_descriptors(struct e1000_adapter *adapter)
 +{
 +	struct e1000_hw *hw = &adapter->hw;
 +
 +	if (!(adapter->flags2 & FLAG2_DMA_BURST))
 +		return;
 +
 +	/* flush pending descriptor writebacks to memory */
 +	ew32(TIDV, adapter->tx_int_delay | E1000_TIDV_FPD);
 +	ew32(RDTR, adapter->rx_int_delay | E1000_RDTR_FPD);
 +
 +	/* execute the writes immediately */
 +	e1e_flush();
 +}
 +
+ static void e1000e_update_stats(struct e1000_adapter *adapter);
+ 
  void e1000e_down(struct e1000_adapter *adapter)
  {
  	struct net_device *netdev = adapter->netdev;
@@@ -4179,11 -4154,7 +4186,10 @@@ static void e1000_watchdog_task(struct 
  	struct e1000_ring *tx_ring = adapter->tx_ring;
  	struct e1000_hw *hw = &adapter->hw;
  	u32 link, tctl;
- 	int tx_pending = 0;
  
 +	if (test_bit(__E1000_DOWN, &adapter->state))
 +		return;
 +
  	link = e1000e_has_link(adapter);
  	if ((netif_carrier_ok(netdev)) && link) {
  		/* Cancel scheduled suspend requests. */

^ permalink raw reply

* [RFC PATCH 3/3] ipv4: Set DST_NOCACHE in rt_dst_alloc().
From: David Miller @ 2011-02-18  0:34 UTC (permalink / raw)
  To: netdev


Instead of using a read/modify/write in rt_finalize().

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/route.c |   11 +++++------
 1 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 488094d..01b27ff 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -500,11 +500,6 @@ static int rt_garbage_collect(struct dst_ops *ops)
 
 static int rt_finalize(struct rtable *rt, struct rtable **rp, struct sk_buff *skb)
 {
-	/* To avoid expensive rcu stuff for this uncached dst, we set
-	 * DST_NOCACHE so that dst_release() can free dst without
-	 * waiting a grace period.
-	 */
-	rt->dst.flags |= DST_NOCACHE;
 	if (rt->rt_type == RTN_UNICAST || rt_is_output_route(rt)) {
 		int err = arp_bind_neighbour(&rt->dst);
 		if (err) {
@@ -1114,7 +1109,11 @@ static struct rtable *rt_dst_alloc(bool nopolicy, bool noxfrm)
 	if (rt) {
 		rt->dst.obsolete = -1;
 
-		rt->dst.flags = DST_HOST |
+		/* To avoid expensive rcu stuff for this uncached dst, we set
+		 * DST_NOCACHE so that dst_release() can free dst without
+		 * waiting a grace period.
+		 */
+		rt->dst.flags = DST_NOCACHE | DST_HOST |
 			(nopolicy ? DST_NOPOLICY : 0) |
 			(noxfrm ? DST_NOXFRM : 0);
 	}
-- 
1.7.4.1


^ permalink raw reply related

* [RFC PATCH 2/3] ipv4: Kill ip_route_input_noref().
From: David Miller @ 2011-02-18  0:34 UTC (permalink / raw)
  To: netdev


The "noref" argument to ip_route_input_common() is now always ignored
because we do not cache routes, and in that case we must always grab
a reference to the resulting 'dst'.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/route.h    |   16 ++--------------
 net/ipv4/arp.c         |    2 +-
 net/ipv4/ip_input.c    |    4 ++--
 net/ipv4/route.c       |    6 +++---
 net/ipv4/xfrm4_input.c |    4 ++--
 5 files changed, 10 insertions(+), 22 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index fcf1b11..c403a69 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -121,20 +121,8 @@ extern int		__ip_route_output_key(struct net *, struct rtable **, const struct f
 extern int		ip_route_output_key(struct net *, struct rtable **, struct flowi *flp);
 extern int		ip_route_output_flow(struct net *, struct rtable **rp, struct flowi *flp, struct sock *sk, int flags);
 
-extern int ip_route_input_common(struct sk_buff *skb, __be32 dst, __be32 src,
-				 u8 tos, struct net_device *devin, bool noref);
-
-static inline int ip_route_input(struct sk_buff *skb, __be32 dst, __be32 src,
-				 u8 tos, struct net_device *devin)
-{
-	return ip_route_input_common(skb, dst, src, tos, devin, false);
-}
-
-static inline int ip_route_input_noref(struct sk_buff *skb, __be32 dst, __be32 src,
-				       u8 tos, struct net_device *devin)
-{
-	return ip_route_input_common(skb, dst, src, tos, devin, true);
-}
+extern int ip_route_input(struct sk_buff *skb, __be32 dst, __be32 src,
+			  u8 tos, struct net_device *devin);
 
 extern unsigned short	ip_rt_frag_needed(struct net *net, struct iphdr *iph, unsigned short new_mtu, struct net_device *dev);
 extern void		ip_rt_send_redirect(struct sk_buff *skb);
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 7927589..555b412 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -873,7 +873,7 @@ static int arp_process(struct sk_buff *skb)
 	}
 
 	if (arp->ar_op == htons(ARPOP_REQUEST) &&
-	    ip_route_input_noref(skb, tip, sip, 0, dev) == 0) {
+	    ip_route_input(skb, tip, sip, 0, dev) == 0) {
 
 		rt = skb_rtable(skb);
 		addr_type = rt->rt_type;
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index d7b2b09..577eb45 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -324,8 +324,8 @@ static int ip_rcv_finish(struct sk_buff *skb)
 	 *	how the packet travels inside Linux networking.
 	 */
 	if (skb_dst(skb) == NULL) {
-		int err = ip_route_input_noref(skb, iph->daddr, iph->saddr,
-					       iph->tos, skb->dev);
+		int err = ip_route_input(skb, iph->daddr, iph->saddr,
+					 iph->tos, skb->dev);
 		if (unlikely(err)) {
 			if (err == -EHOSTUNREACH)
 				IP_INC_STATS_BH(dev_net(skb->dev),
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index f74149c..488094d 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1520,8 +1520,8 @@ martian_source_keep_err:
 	goto out;
 }
 
-int ip_route_input_common(struct sk_buff *skb, __be32 daddr, __be32 saddr,
-			   u8 tos, struct net_device *dev, bool noref)
+int ip_route_input(struct sk_buff *skb, __be32 daddr, __be32 saddr,
+		   u8 tos, struct net_device *dev)
 {
 	int res;
 
@@ -1564,7 +1564,7 @@ int ip_route_input_common(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 	rcu_read_unlock();
 	return res;
 }
-EXPORT_SYMBOL(ip_route_input_common);
+EXPORT_SYMBOL(ip_route_input);
 
 /* called with rcu_read_lock() */
 static struct rtable *__mkroute_output(const struct fib_result *res,
diff --git a/net/ipv4/xfrm4_input.c b/net/ipv4/xfrm4_input.c
index 06814b6..58d23a5 100644
--- a/net/ipv4/xfrm4_input.c
+++ b/net/ipv4/xfrm4_input.c
@@ -27,8 +27,8 @@ static inline int xfrm4_rcv_encap_finish(struct sk_buff *skb)
 	if (skb_dst(skb) == NULL) {
 		const struct iphdr *iph = ip_hdr(skb);
 
-		if (ip_route_input_noref(skb, iph->daddr, iph->saddr,
-					 iph->tos, skb->dev))
+		if (ip_route_input(skb, iph->daddr, iph->saddr,
+				   iph->tos, skb->dev))
 			goto drop;
 	}
 	return dst_input(skb);
-- 
1.7.4.1


^ permalink raw reply related

* [RFC PATCH 1/3] ipv4: Delete routing cache.
From: David Miller @ 2011-02-18  0:34 UTC (permalink / raw)
  To: netdev


Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/route.h     |    1 -
 net/ipv4/fib_frontend.c |    5 -
 net/ipv4/route.c        |  903 ++---------------------------------------------
 3 files changed, 24 insertions(+), 885 deletions(-)

diff --git a/include/net/route.h b/include/net/route.h
index bf790c1..fcf1b11 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -117,7 +117,6 @@ extern int		ip_rt_init(void);
 extern void		ip_rt_redirect(__be32 old_gw, __be32 dst, __be32 new_gw,
 				       __be32 src, struct net_device *dev);
 extern void		rt_cache_flush(struct net *net, int how);
-extern void		rt_cache_flush_batch(struct net *net);
 extern int		__ip_route_output_key(struct net *, struct rtable **, const struct flowi *flp);
 extern int		ip_route_output_key(struct net *, struct rtable **, struct flowi *flp);
 extern int		ip_route_output_flow(struct net *, struct rtable **rp, struct flowi *flp, struct sock *sk, int flags);
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 2a49c06..694145c 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -978,11 +978,6 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
 		rt_cache_flush(dev_net(dev), 0);
 		break;
 	case NETDEV_UNREGISTER_BATCH:
-		/* The batch unregister is only called on the first
-		 * device in the list of devices being unregistered.
-		 * Therefore we should not pass dev_net(dev) in here.
-		 */
-		rt_cache_flush_batch(NULL);
 		break;
 	}
 	return NOTIFY_DONE;
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 2facde0..f74149c 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -129,7 +129,6 @@ static int ip_rt_gc_elasticity __read_mostly	= 8;
 static int ip_rt_mtu_expires __read_mostly	= 10 * 60 * HZ;
 static int ip_rt_min_pmtu __read_mostly		= 512 + 20 + 20;
 static int ip_rt_min_advmss __read_mostly	= 256;
-static int rt_chain_length_max __read_mostly	= 20;
 
 /*
  *	Interface to generic destination cache.
@@ -222,184 +221,30 @@ const __u8 ip_tos2prio[16] = {
 };
 
 
-/*
- * Route cache.
- */
-
-/* The locking scheme is rather straight forward:
- *
- * 1) Read-Copy Update protects the buckets of the central route hash.
- * 2) Only writers remove entries, and they hold the lock
- *    as they look at rtable reference counts.
- * 3) Only readers acquire references to rtable entries,
- *    they do so with atomic increments and with the
- *    lock held.
- */
-
-struct rt_hash_bucket {
-	struct rtable __rcu	*chain;
-};
-
-#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || \
-	defined(CONFIG_PROVE_LOCKING)
-/*
- * Instead of using one spinlock for each rt_hash_bucket, we use a table of spinlocks
- * The size of this table is a power of two and depends on the number of CPUS.
- * (on lockdep we have a quite big spinlock_t, so keep the size down there)
- */
-#ifdef CONFIG_LOCKDEP
-# define RT_HASH_LOCK_SZ	256
-#else
-# if NR_CPUS >= 32
-#  define RT_HASH_LOCK_SZ	4096
-# elif NR_CPUS >= 16
-#  define RT_HASH_LOCK_SZ	2048
-# elif NR_CPUS >= 8
-#  define RT_HASH_LOCK_SZ	1024
-# elif NR_CPUS >= 4
-#  define RT_HASH_LOCK_SZ	512
-# else
-#  define RT_HASH_LOCK_SZ	256
-# endif
-#endif
-
-static spinlock_t	*rt_hash_locks;
-# define rt_hash_lock_addr(slot) &rt_hash_locks[(slot) & (RT_HASH_LOCK_SZ - 1)]
-
-static __init void rt_hash_lock_init(void)
-{
-	int i;
-
-	rt_hash_locks = kmalloc(sizeof(spinlock_t) * RT_HASH_LOCK_SZ,
-			GFP_KERNEL);
-	if (!rt_hash_locks)
-		panic("IP: failed to allocate rt_hash_locks\n");
-
-	for (i = 0; i < RT_HASH_LOCK_SZ; i++)
-		spin_lock_init(&rt_hash_locks[i]);
-}
-#else
-# define rt_hash_lock_addr(slot) NULL
-
-static inline void rt_hash_lock_init(void)
-{
-}
-#endif
-
-static struct rt_hash_bucket 	*rt_hash_table __read_mostly;
-static unsigned			rt_hash_mask __read_mostly;
-static unsigned int		rt_hash_log  __read_mostly;
-
 static DEFINE_PER_CPU(struct rt_cache_stat, rt_cache_stat);
 #define RT_CACHE_STAT_INC(field) __this_cpu_inc(rt_cache_stat.field)
 
-static inline unsigned int rt_hash(__be32 daddr, __be32 saddr, int idx,
-				   int genid)
-{
-	return jhash_3words((__force u32)daddr, (__force u32)saddr,
-			    idx, genid)
-		& rt_hash_mask;
-}
-
 static inline int rt_genid(struct net *net)
 {
 	return atomic_read(&net->ipv4.rt_genid);
 }
 
 #ifdef CONFIG_PROC_FS
-struct rt_cache_iter_state {
-	struct seq_net_private p;
-	int bucket;
-	int genid;
-};
-
-static struct rtable *rt_cache_get_first(struct seq_file *seq)
-{
-	struct rt_cache_iter_state *st = seq->private;
-	struct rtable *r = NULL;
-
-	for (st->bucket = rt_hash_mask; st->bucket >= 0; --st->bucket) {
-		if (!rcu_dereference_raw(rt_hash_table[st->bucket].chain))
-			continue;
-		rcu_read_lock_bh();
-		r = rcu_dereference_bh(rt_hash_table[st->bucket].chain);
-		while (r) {
-			if (dev_net(r->dst.dev) == seq_file_net(seq) &&
-			    r->rt_genid == st->genid)
-				return r;
-			r = rcu_dereference_bh(r->dst.rt_next);
-		}
-		rcu_read_unlock_bh();
-	}
-	return r;
-}
-
-static struct rtable *__rt_cache_get_next(struct seq_file *seq,
-					  struct rtable *r)
-{
-	struct rt_cache_iter_state *st = seq->private;
-
-	r = rcu_dereference_bh(r->dst.rt_next);
-	while (!r) {
-		rcu_read_unlock_bh();
-		do {
-			if (--st->bucket < 0)
-				return NULL;
-		} while (!rcu_dereference_raw(rt_hash_table[st->bucket].chain));
-		rcu_read_lock_bh();
-		r = rcu_dereference_bh(rt_hash_table[st->bucket].chain);
-	}
-	return r;
-}
-
-static struct rtable *rt_cache_get_next(struct seq_file *seq,
-					struct rtable *r)
-{
-	struct rt_cache_iter_state *st = seq->private;
-	while ((r = __rt_cache_get_next(seq, r)) != NULL) {
-		if (dev_net(r->dst.dev) != seq_file_net(seq))
-			continue;
-		if (r->rt_genid == st->genid)
-			break;
-	}
-	return r;
-}
-
-static struct rtable *rt_cache_get_idx(struct seq_file *seq, loff_t pos)
-{
-	struct rtable *r = rt_cache_get_first(seq);
-
-	if (r)
-		while (pos && (r = rt_cache_get_next(seq, r)))
-			--pos;
-	return pos ? NULL : r;
-}
-
 static void *rt_cache_seq_start(struct seq_file *seq, loff_t *pos)
 {
-	struct rt_cache_iter_state *st = seq->private;
 	if (*pos)
-		return rt_cache_get_idx(seq, *pos - 1);
-	st->genid = rt_genid(seq_file_net(seq));
+		return NULL;
 	return SEQ_START_TOKEN;
 }
 
 static void *rt_cache_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
-	struct rtable *r;
-
-	if (v == SEQ_START_TOKEN)
-		r = rt_cache_get_first(seq);
-	else
-		r = rt_cache_get_next(seq, v);
 	++*pos;
-	return r;
+	return NULL;
 }
 
 static void rt_cache_seq_stop(struct seq_file *seq, void *v)
 {
-	if (v && v != SEQ_START_TOKEN)
-		rcu_read_unlock_bh();
 }
 
 static int rt_cache_seq_show(struct seq_file *seq, void *v)
@@ -409,29 +254,6 @@ static int rt_cache_seq_show(struct seq_file *seq, void *v)
 			   "Iface\tDestination\tGateway \tFlags\t\tRefCnt\tUse\t"
 			   "Metric\tSource\t\tMTU\tWindow\tIRTT\tTOS\tHHRef\t"
 			   "HHUptod\tSpecDst");
-	else {
-		struct rtable *r = v;
-		int len;
-
-		seq_printf(seq, "%s\t%08X\t%08X\t%8X\t%d\t%u\t%d\t"
-			      "%08X\t%d\t%u\t%u\t%02X\t%d\t%1d\t%08X%n",
-			r->dst.dev ? r->dst.dev->name : "*",
-			(__force u32)r->rt_dst,
-			(__force u32)r->rt_gateway,
-			r->rt_flags, atomic_read(&r->dst.__refcnt),
-			r->dst.__use, 0, (__force u32)r->rt_src,
-			dst_metric_advmss(&r->dst) + 40,
-			dst_metric(&r->dst, RTAX_WINDOW),
-			(int)((dst_metric(&r->dst, RTAX_RTT) >> 3) +
-			      dst_metric(&r->dst, RTAX_RTTVAR)),
-			r->fl.fl4_tos,
-			r->dst.hh ? atomic_read(&r->dst.hh->hh_refcnt) : -1,
-			r->dst.hh ? (r->dst.hh->hh_output ==
-				       dev_queue_xmit) : 0,
-			r->rt_spec_dst, &len);
-
-		seq_printf(seq, "%*s\n", 127 - len, "");
-	}
 	return 0;
 }
 
@@ -444,8 +266,7 @@ static const struct seq_operations rt_cache_seq_ops = {
 
 static int rt_cache_seq_open(struct inode *inode, struct file *file)
 {
-	return seq_open_net(inode, file, &rt_cache_seq_ops,
-			sizeof(struct rt_cache_iter_state));
+	return seq_open_net(inode, file, &rt_cache_seq_ops, 0);
 }
 
 static const struct file_operations rt_cache_seq_fops = {
@@ -643,184 +464,12 @@ static inline int ip_rt_proc_init(void)
 }
 #endif /* CONFIG_PROC_FS */
 
-static inline void rt_free(struct rtable *rt)
-{
-	call_rcu_bh(&rt->dst.rcu_head, dst_rcu_free);
-}
-
-static inline void rt_drop(struct rtable *rt)
-{
-	ip_rt_put(rt);
-	call_rcu_bh(&rt->dst.rcu_head, dst_rcu_free);
-}
-
-static inline int rt_fast_clean(struct rtable *rth)
-{
-	/* Kill broadcast/multicast entries very aggresively, if they
-	   collide in hash table with more useful entries */
-	return (rth->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST)) &&
-		rt_is_input_route(rth) && rth->dst.rt_next;
-}
-
-static inline int rt_valuable(struct rtable *rth)
-{
-	return (rth->rt_flags & (RTCF_REDIRECTED | RTCF_NOTIFY)) ||
-		(rth->peer && rth->peer->pmtu_expires);
-}
-
-static int rt_may_expire(struct rtable *rth, unsigned long tmo1, unsigned long tmo2)
-{
-	unsigned long age;
-	int ret = 0;
-
-	if (atomic_read(&rth->dst.__refcnt))
-		goto out;
-
-	age = jiffies - rth->dst.lastuse;
-	if ((age <= tmo1 && !rt_fast_clean(rth)) ||
-	    (age <= tmo2 && rt_valuable(rth)))
-		goto out;
-	ret = 1;
-out:	return ret;
-}
-
-/* Bits of score are:
- * 31: very valuable
- * 30: not quite useless
- * 29..0: usage counter
- */
-static inline u32 rt_score(struct rtable *rt)
-{
-	u32 score = jiffies - rt->dst.lastuse;
-
-	score = ~score & ~(3<<30);
-
-	if (rt_valuable(rt))
-		score |= (1<<31);
-
-	if (rt_is_output_route(rt) ||
-	    !(rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST|RTCF_LOCAL)))
-		score |= (1<<30);
-
-	return score;
-}
-
-static inline bool rt_caching(const struct net *net)
-{
-	return net->ipv4.current_rt_cache_rebuild_count <=
-		net->ipv4.sysctl_rt_cache_rebuild_count;
-}
-
-static inline bool compare_hash_inputs(const struct flowi *fl1,
-					const struct flowi *fl2)
-{
-	return ((((__force u32)fl1->fl4_dst ^ (__force u32)fl2->fl4_dst) |
-		((__force u32)fl1->fl4_src ^ (__force u32)fl2->fl4_src) |
-		(fl1->iif ^ fl2->iif)) == 0);
-}
-
-static inline int compare_keys(struct flowi *fl1, struct flowi *fl2)
-{
-	return (((__force u32)fl1->fl4_dst ^ (__force u32)fl2->fl4_dst) |
-		((__force u32)fl1->fl4_src ^ (__force u32)fl2->fl4_src) |
-		(fl1->mark ^ fl2->mark) |
-		(*(u16 *)&fl1->fl4_tos ^ *(u16 *)&fl2->fl4_tos) |
-		(fl1->oif ^ fl2->oif) |
-		(fl1->iif ^ fl2->iif)) == 0;
-}
-
-static inline int compare_netns(struct rtable *rt1, struct rtable *rt2)
-{
-	return net_eq(dev_net(rt1->dst.dev), dev_net(rt2->dst.dev));
-}
-
 static inline int rt_is_expired(struct rtable *rth)
 {
 	return rth->rt_genid != rt_genid(dev_net(rth->dst.dev));
 }
 
 /*
- * Perform a full scan of hash table and free all entries.
- * Can be called by a softirq or a process.
- * In the later case, we want to be reschedule if necessary
- */
-static void rt_do_flush(struct net *net, int process_context)
-{
-	unsigned int i;
-	struct rtable *rth, *next;
-
-	for (i = 0; i <= rt_hash_mask; i++) {
-		struct rtable __rcu **pprev;
-		struct rtable *list;
-
-		if (process_context && need_resched())
-			cond_resched();
-		rth = rcu_dereference_raw(rt_hash_table[i].chain);
-		if (!rth)
-			continue;
-
-		spin_lock_bh(rt_hash_lock_addr(i));
-
-		list = NULL;
-		pprev = &rt_hash_table[i].chain;
-		rth = rcu_dereference_protected(*pprev,
-			lockdep_is_held(rt_hash_lock_addr(i)));
-
-		while (rth) {
-			next = rcu_dereference_protected(rth->dst.rt_next,
-				lockdep_is_held(rt_hash_lock_addr(i)));
-
-			if (!net ||
-			    net_eq(dev_net(rth->dst.dev), net)) {
-				rcu_assign_pointer(*pprev, next);
-				rcu_assign_pointer(rth->dst.rt_next, list);
-				list = rth;
-			} else {
-				pprev = &rth->dst.rt_next;
-			}
-			rth = next;
-		}
-
-		spin_unlock_bh(rt_hash_lock_addr(i));
-
-		for (; list; list = next) {
-			next = rcu_dereference_protected(list->dst.rt_next, 1);
-			rt_free(list);
-		}
-	}
-}
-
-/*
- * While freeing expired entries, we compute average chain length
- * and standard deviation, using fixed-point arithmetic.
- * This to have an estimation of rt_chain_length_max
- *  rt_chain_length_max = max(elasticity, AVG + 4*SD)
- * We use 3 bits for frational part, and 29 (or 61) for magnitude.
- */
-
-#define FRACT_BITS 3
-#define ONE (1UL << FRACT_BITS)
-
-/*
- * Given a hash chain and an item in this hash chain,
- * find if a previous entry has the same hash_inputs
- * (but differs on tos, mark or oif)
- * Returns 0 if an alias is found.
- * Returns ONE if rth has no alias before itself.
- */
-static int has_noalias(const struct rtable *head, const struct rtable *rth)
-{
-	const struct rtable *aux = head;
-
-	while (aux != rth) {
-		if (compare_hash_inputs(&aux->fl, &rth->fl))
-			return 0;
-		aux = rcu_dereference_protected(aux->dst.rt_next, 1);
-	}
-	return ONE;
-}
-
-/*
  * Pertubation of rt_genid by a small quantity [1..256]
  * Using 8 bits of shuffling ensure we can call rt_cache_invalidate()
  * many times (2^24) without giving recent rt_genid.
@@ -841,366 +490,32 @@ static void rt_cache_invalidate(struct net *net)
 void rt_cache_flush(struct net *net, int delay)
 {
 	rt_cache_invalidate(net);
-	if (delay >= 0)
-		rt_do_flush(net, !in_softirq());
 }
 
-/* Flush previous cache invalidated entries from the cache */
-void rt_cache_flush_batch(struct net *net)
-{
-	rt_do_flush(net, !in_softirq());
-}
-
-static void rt_emergency_hash_rebuild(struct net *net)
-{
-	if (net_ratelimit())
-		printk(KERN_WARNING "Route hash chain too long!\n");
-	rt_cache_invalidate(net);
-}
-
-/*
-   Short description of GC goals.
-
-   We want to build algorithm, which will keep routing cache
-   at some equilibrium point, when number of aged off entries
-   is kept approximately equal to newly generated ones.
-
-   Current expiration strength is variable "expire".
-   We try to adjust it dynamically, so that if networking
-   is idle expires is large enough to keep enough of warm entries,
-   and when load increases it reduces to limit cache size.
- */
-
 static int rt_garbage_collect(struct dst_ops *ops)
 {
-	static unsigned long expire = RT_GC_TIMEOUT;
-	static unsigned long last_gc;
-	static int rover;
-	static int equilibrium;
-	struct rtable *rth;
-	struct rtable __rcu **rthp;
-	unsigned long now = jiffies;
-	int goal;
-	int entries = dst_entries_get_fast(&ipv4_dst_ops);
-
-	/*
-	 * Garbage collection is pretty expensive,
-	 * do not make it too frequently.
-	 */
-
 	RT_CACHE_STAT_INC(gc_total);
-
-	if (now - last_gc < ip_rt_gc_min_interval &&
-	    entries < ip_rt_max_size) {
-		RT_CACHE_STAT_INC(gc_ignored);
-		goto out;
-	}
-
-	entries = dst_entries_get_slow(&ipv4_dst_ops);
-	/* Calculate number of entries, which we want to expire now. */
-	goal = entries - (ip_rt_gc_elasticity << rt_hash_log);
-	if (goal <= 0) {
-		if (equilibrium < ipv4_dst_ops.gc_thresh)
-			equilibrium = ipv4_dst_ops.gc_thresh;
-		goal = entries - equilibrium;
-		if (goal > 0) {
-			equilibrium += min_t(unsigned int, goal >> 1, rt_hash_mask + 1);
-			goal = entries - equilibrium;
-		}
-	} else {
-		/* We are in dangerous area. Try to reduce cache really
-		 * aggressively.
-		 */
-		goal = max_t(unsigned int, goal >> 1, rt_hash_mask + 1);
-		equilibrium = entries - goal;
-	}
-
-	if (now - last_gc >= ip_rt_gc_min_interval)
-		last_gc = now;
-
-	if (goal <= 0) {
-		equilibrium += goal;
-		goto work_done;
-	}
-
-	do {
-		int i, k;
-
-		for (i = rt_hash_mask, k = rover; i >= 0; i--) {
-			unsigned long tmo = expire;
-
-			k = (k + 1) & rt_hash_mask;
-			rthp = &rt_hash_table[k].chain;
-			spin_lock_bh(rt_hash_lock_addr(k));
-			while ((rth = rcu_dereference_protected(*rthp,
-					lockdep_is_held(rt_hash_lock_addr(k)))) != NULL) {
-				if (!rt_is_expired(rth) &&
-					!rt_may_expire(rth, tmo, expire)) {
-					tmo >>= 1;
-					rthp = &rth->dst.rt_next;
-					continue;
-				}
-				*rthp = rth->dst.rt_next;
-				rt_free(rth);
-				goal--;
-			}
-			spin_unlock_bh(rt_hash_lock_addr(k));
-			if (goal <= 0)
-				break;
-		}
-		rover = k;
-
-		if (goal <= 0)
-			goto work_done;
-
-		/* Goal is not achieved. We stop process if:
-
-		   - if expire reduced to zero. Otherwise, expire is halfed.
-		   - if table is not full.
-		   - if we are called from interrupt.
-		   - jiffies check is just fallback/debug loop breaker.
-		     We will not spin here for long time in any case.
-		 */
-
-		RT_CACHE_STAT_INC(gc_goal_miss);
-
-		if (expire == 0)
-			break;
-
-		expire >>= 1;
-#if RT_CACHE_DEBUG >= 2
-		printk(KERN_DEBUG "expire>> %u %d %d %d\n", expire,
-				dst_entries_get_fast(&ipv4_dst_ops), goal, i);
-#endif
-
-		if (dst_entries_get_fast(&ipv4_dst_ops) < ip_rt_max_size)
-			goto out;
-	} while (!in_softirq() && time_before_eq(jiffies, now));
-
-	if (dst_entries_get_fast(&ipv4_dst_ops) < ip_rt_max_size)
-		goto out;
-	if (dst_entries_get_slow(&ipv4_dst_ops) < ip_rt_max_size)
-		goto out;
-	if (net_ratelimit())
-		printk(KERN_WARNING "dst cache overflow\n");
-	RT_CACHE_STAT_INC(gc_dst_overflow);
-	return 1;
-
-work_done:
-	expire += ip_rt_gc_min_interval;
-	if (expire > ip_rt_gc_timeout ||
-	    dst_entries_get_fast(&ipv4_dst_ops) < ipv4_dst_ops.gc_thresh ||
-	    dst_entries_get_slow(&ipv4_dst_ops) < ipv4_dst_ops.gc_thresh)
-		expire = ip_rt_gc_timeout;
-#if RT_CACHE_DEBUG >= 2
-	printk(KERN_DEBUG "expire++ %u %d %d %d\n", expire,
-			dst_entries_get_fast(&ipv4_dst_ops), goal, rover);
-#endif
-out:	return 0;
-}
-
-/*
- * Returns number of entries in a hash chain that have different hash_inputs
- */
-static int slow_chain_length(const struct rtable *head)
-{
-	int length = 0;
-	const struct rtable *rth = head;
-
-	while (rth) {
-		length += has_noalias(head, rth);
-		rth = rcu_dereference_protected(rth->dst.rt_next, 1);
-	}
-	return length >> FRACT_BITS;
+	return 0;
 }
 
-static int rt_intern_hash(unsigned hash, struct rtable *rt,
-			  struct rtable **rp, struct sk_buff *skb, int ifindex)
+static int rt_finalize(struct rtable *rt, struct rtable **rp, struct sk_buff *skb)
 {
-	struct rtable	*rth, *cand;
-	struct rtable __rcu **rthp, **candp;
-	unsigned long	now;
-	u32 		min_score;
-	int		chain_length;
-	int attempts = !in_softirq();
-
-restart:
-	chain_length = 0;
-	min_score = ~(u32)0;
-	cand = NULL;
-	candp = NULL;
-	now = jiffies;
-
-	if (!rt_caching(dev_net(rt->dst.dev))) {
-		/*
-		 * If we're not caching, just tell the caller we
-		 * were successful and don't touch the route.  The
-		 * caller hold the sole reference to the cache entry, and
-		 * it will be released when the caller is done with it.
-		 * If we drop it here, the callers have no way to resolve routes
-		 * when we're not caching.  Instead, just point *rp at rt, so
-		 * the caller gets a single use out of the route
-		 * Note that we do rt_free on this new route entry, so that
-		 * once its refcount hits zero, we are still able to reap it
-		 * (Thanks Alexey)
-		 * Note: To avoid expensive rcu stuff for this uncached dst,
-		 * we set DST_NOCACHE so that dst_release() can free dst without
-		 * waiting a grace period.
-		 */
-
-		rt->dst.flags |= DST_NOCACHE;
-		if (rt->rt_type == RTN_UNICAST || rt_is_output_route(rt)) {
-			int err = arp_bind_neighbour(&rt->dst);
-			if (err) {
-				if (net_ratelimit())
-					printk(KERN_WARNING
-					    "Neighbour table failure & not caching routes.\n");
-				ip_rt_put(rt);
-				return err;
-			}
-		}
-
-		goto skip_hashing;
-	}
-
-	rthp = &rt_hash_table[hash].chain;
-
-	spin_lock_bh(rt_hash_lock_addr(hash));
-	while ((rth = rcu_dereference_protected(*rthp,
-			lockdep_is_held(rt_hash_lock_addr(hash)))) != NULL) {
-		if (rt_is_expired(rth)) {
-			*rthp = rth->dst.rt_next;
-			rt_free(rth);
-			continue;
-		}
-		if (compare_keys(&rth->fl, &rt->fl) && compare_netns(rth, rt)) {
-			/* Put it first */
-			*rthp = rth->dst.rt_next;
-			/*
-			 * Since lookup is lockfree, the deletion
-			 * must be visible to another weakly ordered CPU before
-			 * the insertion at the start of the hash chain.
-			 */
-			rcu_assign_pointer(rth->dst.rt_next,
-					   rt_hash_table[hash].chain);
-			/*
-			 * Since lookup is lockfree, the update writes
-			 * must be ordered for consistency on SMP.
-			 */
-			rcu_assign_pointer(rt_hash_table[hash].chain, rth);
-
-			dst_use(&rth->dst, now);
-			spin_unlock_bh(rt_hash_lock_addr(hash));
-
-			rt_drop(rt);
-			if (rp)
-				*rp = rth;
-			else
-				skb_dst_set(skb, &rth->dst);
-			return 0;
-		}
-
-		if (!atomic_read(&rth->dst.__refcnt)) {
-			u32 score = rt_score(rth);
-
-			if (score <= min_score) {
-				cand = rth;
-				candp = rthp;
-				min_score = score;
-			}
-		}
-
-		chain_length++;
-
-		rthp = &rth->dst.rt_next;
-	}
-
-	if (cand) {
-		/* ip_rt_gc_elasticity used to be average length of chain
-		 * length, when exceeded gc becomes really aggressive.
-		 *
-		 * The second limit is less certain. At the moment it allows
-		 * only 2 entries per bucket. We will see.
-		 */
-		if (chain_length > ip_rt_gc_elasticity) {
-			*candp = cand->dst.rt_next;
-			rt_free(cand);
-		}
-	} else {
-		if (chain_length > rt_chain_length_max &&
-		    slow_chain_length(rt_hash_table[hash].chain) > rt_chain_length_max) {
-			struct net *net = dev_net(rt->dst.dev);
-			int num = ++net->ipv4.current_rt_cache_rebuild_count;
-			if (!rt_caching(net)) {
-				printk(KERN_WARNING "%s: %d rebuilds is over limit, route caching disabled\n",
-					rt->dst.dev->name, num);
-			}
-			rt_emergency_hash_rebuild(net);
-			spin_unlock_bh(rt_hash_lock_addr(hash));
-
-			hash = rt_hash(rt->fl.fl4_dst, rt->fl.fl4_src,
-					ifindex, rt_genid(net));
-			goto restart;
-		}
-	}
-
-	/* Try to bind route to arp only if it is output
-	   route or unicast forwarding path.
+	/* To avoid expensive rcu stuff for this uncached dst, we set
+	 * DST_NOCACHE so that dst_release() can free dst without
+	 * waiting a grace period.
 	 */
+	rt->dst.flags |= DST_NOCACHE;
 	if (rt->rt_type == RTN_UNICAST || rt_is_output_route(rt)) {
 		int err = arp_bind_neighbour(&rt->dst);
 		if (err) {
-			spin_unlock_bh(rt_hash_lock_addr(hash));
-
-			if (err != -ENOBUFS) {
-				rt_drop(rt);
-				return err;
-			}
-
-			/* Neighbour tables are full and nothing
-			   can be released. Try to shrink route cache,
-			   it is most likely it holds some neighbour records.
-			 */
-			if (attempts-- > 0) {
-				int saved_elasticity = ip_rt_gc_elasticity;
-				int saved_int = ip_rt_gc_min_interval;
-				ip_rt_gc_elasticity	= 1;
-				ip_rt_gc_min_interval	= 0;
-				rt_garbage_collect(&ipv4_dst_ops);
-				ip_rt_gc_min_interval	= saved_int;
-				ip_rt_gc_elasticity	= saved_elasticity;
-				goto restart;
-			}
-
 			if (net_ratelimit())
-				printk(KERN_WARNING "ipv4: Neighbour table overflow.\n");
-			rt_drop(rt);
-			return -ENOBUFS;
+				printk(KERN_WARNING
+				       "Neighbour table failure & not caching routes.\n");
+			ip_rt_put(rt);
+			return err;
 		}
 	}
 
-	rt->dst.rt_next = rt_hash_table[hash].chain;
-
-#if RT_CACHE_DEBUG >= 2
-	if (rt->dst.rt_next) {
-		struct rtable *trt;
-		printk(KERN_DEBUG "rt_cache @%02x: %pI4",
-		       hash, &rt->rt_dst);
-		for (trt = rt->dst.rt_next; trt; trt = trt->dst.rt_next)
-			printk(" . %pI4", &trt->rt_dst);
-		printk("\n");
-	}
-#endif
-	/*
-	 * Since lookup is lockfree, we must make sure
-	 * previous writes to rt are comitted to memory
-	 * before making rt visible to other CPUS.
-	 */
-	rcu_assign_pointer(rt_hash_table[hash].chain, rt);
-
-	spin_unlock_bh(rt_hash_lock_addr(hash));
-
-skip_hashing:
 	if (rp)
 		*rp = rt;
 	else
@@ -1270,26 +585,6 @@ void __ip_select_ident(struct iphdr *iph, struct dst_entry *dst, int more)
 }
 EXPORT_SYMBOL(__ip_select_ident);
 
-static void rt_del(unsigned hash, struct rtable *rt)
-{
-	struct rtable __rcu **rthp;
-	struct rtable *aux;
-
-	rthp = &rt_hash_table[hash].chain;
-	spin_lock_bh(rt_hash_lock_addr(hash));
-	ip_rt_put(rt);
-	while ((aux = rcu_dereference_protected(*rthp,
-			lockdep_is_held(rt_hash_lock_addr(hash)))) != NULL) {
-		if (aux == rt || rt_is_expired(aux)) {
-			*rthp = aux->dst.rt_next;
-			rt_free(aux);
-			continue;
-		}
-		rthp = &aux->dst.rt_next;
-	}
-	spin_unlock_bh(rt_hash_lock_addr(hash));
-}
-
 /* called in rcu_read_lock() section */
 void ip_rt_redirect(__be32 old_gw, __be32 daddr, __be32 new_gw,
 		    __be32 saddr, struct net_device *dev)
@@ -1348,14 +643,11 @@ static struct dst_entry *ipv4_negative_advice(struct dst_entry *dst)
 			ip_rt_put(rt);
 			ret = NULL;
 		} else if (rt->rt_flags & RTCF_REDIRECTED) {
-			unsigned hash = rt_hash(rt->fl.fl4_dst, rt->fl.fl4_src,
-						rt->fl.oif,
-						rt_genid(dev_net(dst->dev)));
 #if RT_CACHE_DEBUG >= 1
 			printk(KERN_DEBUG "ipv4_negative_advice: redirect to %pI4/%02x dropped\n",
-				&rt->rt_dst, rt->fl.fl4_tos);
+			       &rt->rt_dst, rt->fl.fl4_tos);
 #endif
-			rt_del(hash, rt);
+			ip_rt_put(rt);
 			ret = NULL;
 		} else if (rt->peer &&
 			   rt->peer->pmtu_expires &&
@@ -1833,7 +1125,6 @@ static struct rtable *rt_dst_alloc(bool nopolicy, bool noxfrm)
 static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 				u8 tos, struct net_device *dev, int our)
 {
-	unsigned int hash;
 	struct rtable *rth;
 	__be32 spec_dst;
 	struct in_device *in_dev = __in_dev_get_rcu(dev);
@@ -1895,8 +1186,7 @@ static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 #endif
 	RT_CACHE_STAT_INC(in_slow_mc);
 
-	hash = rt_hash(daddr, saddr, dev->ifindex, rt_genid(dev_net(dev)));
-	return rt_intern_hash(hash, rth, NULL, skb, dev->ifindex);
+	return rt_finalize(rth, NULL, skb);
 
 e_nobufs:
 	return -ENOBUFS;
@@ -2036,7 +1326,6 @@ static int ip_mkroute_input(struct sk_buff *skb,
 {
 	struct rtable* rth = NULL;
 	int err;
-	unsigned hash;
 
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
 	if (res->fi && res->fi->fib_nhs > 1 && fl->oif == 0)
@@ -2049,9 +1338,7 @@ static int ip_mkroute_input(struct sk_buff *skb,
 		return err;
 
 	/* put it into the cache */
-	hash = rt_hash(daddr, saddr, fl->iif,
-		       rt_genid(dev_net(rth->dst.dev)));
-	return rt_intern_hash(hash, rth, NULL, skb, fl->iif);
+	return rt_finalize(rth, NULL, skb);
 }
 
 /*
@@ -2079,7 +1366,6 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 	unsigned	flags = 0;
 	u32		itag = 0;
 	struct rtable * rth;
-	unsigned	hash;
 	__be32		spec_dst;
 	int		err = -EINVAL;
 	struct net    * net = dev_net(dev);
@@ -2193,8 +1479,7 @@ local_input:
 		rth->rt_flags 	&= ~RTCF_LOCAL;
 	}
 	rth->rt_type	= res.type;
-	hash = rt_hash(daddr, saddr, fl.iif, rt_genid(net));
-	err = rt_intern_hash(hash, rth, NULL, skb, fl.iif);
+	err = rt_finalize(rth, NULL, skb);
 	goto out;
 
 no_route:
@@ -2238,47 +1523,10 @@ martian_source_keep_err:
 int ip_route_input_common(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 			   u8 tos, struct net_device *dev, bool noref)
 {
-	struct rtable * rth;
-	unsigned	hash;
-	int iif = dev->ifindex;
-	struct net *net;
 	int res;
 
-	net = dev_net(dev);
-
 	rcu_read_lock();
 
-	if (!rt_caching(net))
-		goto skip_cache;
-
-	tos &= IPTOS_RT_MASK;
-	hash = rt_hash(daddr, saddr, iif, rt_genid(net));
-
-	for (rth = rcu_dereference(rt_hash_table[hash].chain); rth;
-	     rth = rcu_dereference(rth->dst.rt_next)) {
-		if ((((__force u32)rth->fl.fl4_dst ^ (__force u32)daddr) |
-		     ((__force u32)rth->fl.fl4_src ^ (__force u32)saddr) |
-		     (rth->fl.iif ^ iif) |
-		     rth->fl.oif |
-		     (rth->fl.fl4_tos ^ tos)) == 0 &&
-		    rth->fl.mark == skb->mark &&
-		    net_eq(dev_net(rth->dst.dev), net) &&
-		    !rt_is_expired(rth)) {
-			if (noref) {
-				dst_use_noref(&rth->dst, jiffies);
-				skb_dst_set_noref(skb, &rth->dst);
-			} else {
-				dst_use(&rth->dst, jiffies);
-				skb_dst_set(skb, &rth->dst);
-			}
-			RT_CACHE_STAT_INC(in_hit);
-			rcu_read_unlock();
-			return 0;
-		}
-		RT_CACHE_STAT_INC(in_hlist_search);
-	}
-
-skip_cache:
 	/* Multicast recognition logic is moved from route cache to here.
 	   The problem was that too many Ethernet cards have broken/missing
 	   hardware multicast filters :-( As result the host on multicasting
@@ -2419,11 +1667,10 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
 
 /*
  * Major route resolver routine.
- * called with rcu_read_lock();
  */
 
-static int ip_route_output_slow(struct net *net, struct rtable **rp,
-				const struct flowi *oldflp)
+int __ip_route_output_key(struct net *net, struct rtable **rp,
+			  const struct flowi *oldflp)
 {
 	u32 tos	= RT_FL_TOS(oldflp);
 	struct flowi fl = { .fl4_dst = oldflp->fl4_dst,
@@ -2600,55 +1847,13 @@ make_route:
 	rth = __mkroute_output(&res, &fl, oldflp, dev_out, flags);
 	if (IS_ERR(rth))
 		err = PTR_ERR(rth);
-	else {
-		unsigned int hash;
-
-		hash = rt_hash(oldflp->fl4_dst, oldflp->fl4_src, oldflp->oif,
-			       rt_genid(dev_net(dev_out)));
-		err = rt_intern_hash(hash, rth, rp, NULL, oldflp->oif);
-	}
+	else
+		err = rt_finalize(rth, rp, NULL);
 
 out:
 	rcu_read_unlock();
 	return err;
 }
-
-int __ip_route_output_key(struct net *net, struct rtable **rp,
-			  const struct flowi *flp)
-{
-	struct rtable *rth;
-	unsigned int hash;
-
-	if (!rt_caching(net))
-		goto slow_output;
-
-	hash = rt_hash(flp->fl4_dst, flp->fl4_src, flp->oif, rt_genid(net));
-
-	rcu_read_lock_bh();
-	for (rth = rcu_dereference_bh(rt_hash_table[hash].chain); rth;
-		rth = rcu_dereference_bh(rth->dst.rt_next)) {
-		if (rth->fl.fl4_dst == flp->fl4_dst &&
-		    rth->fl.fl4_src == flp->fl4_src &&
-		    rt_is_output_route(rth) &&
-		    rth->fl.oif == flp->oif &&
-		    rth->fl.mark == flp->mark &&
-		    !((rth->fl.fl4_tos ^ flp->fl4_tos) &
-			    (IPTOS_RT_MASK | RTO_ONLINK)) &&
-		    net_eq(dev_net(rth->dst.dev), net) &&
-		    !rt_is_expired(rth)) {
-			dst_use(&rth->dst, jiffies);
-			RT_CACHE_STAT_INC(out_hit);
-			rcu_read_unlock_bh();
-			*rp = rth;
-			return 0;
-		}
-		RT_CACHE_STAT_INC(out_hlist_search);
-	}
-	rcu_read_unlock_bh();
-
-slow_output:
-	return ip_route_output_slow(net, rp, flp);
-}
 EXPORT_SYMBOL_GPL(__ip_route_output_key);
 
 static struct dst_entry *ipv4_blackhole_dst_check(struct dst_entry *dst, u32 cookie)
@@ -2942,43 +2147,6 @@ errout_free:
 
 int ip_rt_dump(struct sk_buff *skb,  struct netlink_callback *cb)
 {
-	struct rtable *rt;
-	int h, s_h;
-	int idx, s_idx;
-	struct net *net;
-
-	net = sock_net(skb->sk);
-
-	s_h = cb->args[0];
-	if (s_h < 0)
-		s_h = 0;
-	s_idx = idx = cb->args[1];
-	for (h = s_h; h <= rt_hash_mask; h++, s_idx = 0) {
-		if (!rt_hash_table[h].chain)
-			continue;
-		rcu_read_lock_bh();
-		for (rt = rcu_dereference_bh(rt_hash_table[h].chain), idx = 0; rt;
-		     rt = rcu_dereference_bh(rt->dst.rt_next), idx++) {
-			if (!net_eq(dev_net(rt->dst.dev), net) || idx < s_idx)
-				continue;
-			if (rt_is_expired(rt))
-				continue;
-			skb_dst_set_noref(skb, &rt->dst);
-			if (rt_fill_info(net, skb, NETLINK_CB(cb->skb).pid,
-					 cb->nlh->nlmsg_seq, RTM_NEWROUTE,
-					 1, NLM_F_MULTI) <= 0) {
-				skb_dst_drop(skb);
-				rcu_read_unlock_bh();
-				goto done;
-			}
-			skb_dst_drop(skb);
-		}
-		rcu_read_unlock_bh();
-	}
-
-done:
-	cb->args[0] = h;
-	cb->args[1] = idx;
 	return skb->len;
 }
 
@@ -3211,16 +2379,6 @@ static __net_initdata struct pernet_operations rt_genid_ops = {
 struct ip_rt_acct __percpu *ip_rt_acct __read_mostly;
 #endif /* CONFIG_IP_ROUTE_CLASSID */
 
-static __initdata unsigned long rhash_entries;
-static int __init set_rhash_entries(char *str)
-{
-	if (!str)
-		return 0;
-	rhash_entries = simple_strtoul(str, &str, 0);
-	return 1;
-}
-__setup("rhash_entries=", set_rhash_entries);
-
 int __init ip_rt_init(void)
 {
 	int rc = 0;
@@ -3243,21 +2401,8 @@ int __init ip_rt_init(void)
 	if (dst_entries_init(&ipv4_dst_blackhole_ops) < 0)
 		panic("IP: failed to allocate ipv4_dst_blackhole_ops counter\n");
 
-	rt_hash_table = (struct rt_hash_bucket *)
-		alloc_large_system_hash("IP route cache",
-					sizeof(struct rt_hash_bucket),
-					rhash_entries,
-					(totalram_pages >= 128 * 1024) ?
-					15 : 17,
-					0,
-					&rt_hash_log,
-					&rt_hash_mask,
-					rhash_entries ? 0 : 512 * 1024);
-	memset(rt_hash_table, 0, (rt_hash_mask + 1) * sizeof(struct rt_hash_bucket));
-	rt_hash_lock_init();
-
-	ipv4_dst_ops.gc_thresh = (rt_hash_mask + 1);
-	ip_rt_max_size = (rt_hash_mask + 1) * 16;
+	ipv4_dst_ops.gc_thresh = ~0;
+	ip_rt_max_size = INT_MAX;
 
 	devinet_init();
 	ip_fib_init();
-- 
1.7.4.1


^ permalink raw reply related

* [RFC PATCH 0/3] route cache deletion and cleanups
From: David Miller @ 2011-02-18  0:34 UTC (permalink / raw)
  To: netdev


Here is a respin of the route cache deletion patch, with some minor
cleanups that become possible only afterwards.

Enjoy.

^ permalink raw reply

* Re: [net-next-2.6 PATCH] enic: Always use single transmit and single receive hardware queues per device
From: David Miller @ 2011-02-18  0:13 UTC (permalink / raw)
  To: vkolluri; +Cc: netdev
In-Reply-To: <20110217235719.6978.78272.stgit@savbu-pc100.cisco.com>

From: Vasanthy Kolluri <vkolluri@cisco.com>
Date: Thu, 17 Feb 2011 15:57:19 -0800

> From: Vasanthy Kolluri <vkolluri@cisco.com>
> 
> We believe that our earlier patch for supporting multiple hardware receive queues per enic device requires more internal testing. At this point, we think that it's best to disable the use of multiple receive queues. The current patch provides an effective means for the same.
> 
> Also, we continue to disallow multiple hardware transmit queues per device. But change the way we enforce this in order to maintain consistency with the way receive queues are handled.
> 
> Signed-off-by: Christian Benvenuti <benve@cisco.com>
> Signed-off-by: Danny Guo <dannguo@cisco.com>
> Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
> Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
> Signed-off-by: David Wang <dwang2@cisco.com>

Applied.

^ permalink raw reply

* [PATCH 7/7] ipv4: Use const'ify fib_result deep in the route call chains.
From: David Miller @ 2011-02-18  0:12 UTC (permalink / raw)
  To: netdev


The only troublesome bit here is __mkroute_output which wants
to override res->fi and res->type, compute those in local
variables instead.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/ip_fib.h |    2 +-
 net/ipv4/fib_rules.c |    2 +-
 net/ipv4/route.c     |   32 +++++++++++++++++---------------
 3 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index b3019d8..523a170 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -202,7 +202,7 @@ extern int __net_init fib4_rules_init(struct net *net);
 extern void __net_exit fib4_rules_exit(struct net *net);
 
 #ifdef CONFIG_IP_ROUTE_CLASSID
-extern u32 fib_rules_tclass(struct fib_result *res);
+extern u32 fib_rules_tclass(const struct fib_result *res);
 #endif
 
 extern int fib_lookup(struct net *n, struct flowi *flp, struct fib_result *res);
diff --git a/net/ipv4/fib_rules.c b/net/ipv4/fib_rules.c
index 9cefe72..3018efb 100644
--- a/net/ipv4/fib_rules.c
+++ b/net/ipv4/fib_rules.c
@@ -47,7 +47,7 @@ struct fib4_rule {
 };
 
 #ifdef CONFIG_IP_ROUTE_CLASSID
-u32 fib_rules_tclass(struct fib_result *res)
+u32 fib_rules_tclass(const struct fib_result *res)
 {
 	return res->r ? ((struct fib4_rule *) res->r)->tclassid : 0;
 }
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 9841543..2facde0 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1787,10 +1787,10 @@ static void rt_init_metrics(struct rtable *rt, struct fib_info *fi)
 	}
 }
 
-static void rt_set_nexthop(struct rtable *rt, struct fib_result *res, u32 itag)
+static void rt_set_nexthop(struct rtable *rt, const struct fib_result *res,
+			   struct fib_info *fi, u16 type, u32 itag)
 {
 	struct dst_entry *dst = &rt->dst;
-	struct fib_info *fi = res->fi;
 
 	if (fi) {
 		if (FIB_RES_GW(*res) &&
@@ -1813,7 +1813,7 @@ static void rt_set_nexthop(struct rtable *rt, struct fib_result *res, u32 itag)
 #endif
 	set_class_tag(rt, itag);
 #endif
-	rt->rt_type = res->type;
+	rt->rt_type = type;
 }
 
 static struct rtable *rt_dst_alloc(bool nopolicy, bool noxfrm)
@@ -1939,7 +1939,7 @@ static void ip_handle_martian_source(struct net_device *dev,
 
 /* called in rcu_read_lock() section */
 static int __mkroute_input(struct sk_buff *skb,
-			   struct fib_result *res,
+			   const struct fib_result *res,
 			   struct in_device *in_dev,
 			   __be32 daddr, __be32 saddr, u32 tos,
 			   struct rtable **result)
@@ -2018,7 +2018,7 @@ static int __mkroute_input(struct sk_buff *skb,
 	rth->dst.output = ip_output;
 	rth->rt_genid = rt_genid(dev_net(rth->dst.dev));
 
-	rt_set_nexthop(rth, res, itag);
+	rt_set_nexthop(rth, res, res->fi, res->type, itag);
 
 	rth->rt_flags = flags;
 
@@ -2319,23 +2319,25 @@ skip_cache:
 EXPORT_SYMBOL(ip_route_input_common);
 
 /* called with rcu_read_lock() */
-static struct rtable *__mkroute_output(struct fib_result *res,
+static struct rtable *__mkroute_output(const struct fib_result *res,
 				       const struct flowi *fl,
 				       const struct flowi *oldflp,
 				       struct net_device *dev_out,
 				       unsigned int flags)
 {
+	struct fib_info *fi = res->fi;
 	u32 tos = RT_FL_TOS(oldflp);
 	struct in_device *in_dev;
+	u16 type = res->type;
 	struct rtable *rth;
 
 	if (ipv4_is_loopback(fl->fl4_src) && !(dev_out->flags & IFF_LOOPBACK))
 		return ERR_PTR(-EINVAL);
 
 	if (ipv4_is_lbcast(fl->fl4_dst))
-		res->type = RTN_BROADCAST;
+		type = RTN_BROADCAST;
 	else if (ipv4_is_multicast(fl->fl4_dst))
-		res->type = RTN_MULTICAST;
+		type = RTN_MULTICAST;
 	else if (ipv4_is_zeronet(fl->fl4_dst))
 		return ERR_PTR(-EINVAL);
 
@@ -2346,10 +2348,10 @@ static struct rtable *__mkroute_output(struct fib_result *res,
 	if (!in_dev)
 		return ERR_PTR(-EINVAL);
 
-	if (res->type == RTN_BROADCAST) {
+	if (type == RTN_BROADCAST) {
 		flags |= RTCF_BROADCAST | RTCF_LOCAL;
-		res->fi = NULL;
-	} else if (res->type == RTN_MULTICAST) {
+		fi = NULL;
+	} else if (type == RTN_MULTICAST) {
 		flags |= RTCF_MULTICAST | RTCF_LOCAL;
 		if (!ip_check_mc(in_dev, oldflp->fl4_dst, oldflp->fl4_src,
 				 oldflp->proto))
@@ -2358,8 +2360,8 @@ static struct rtable *__mkroute_output(struct fib_result *res,
 		 * default one, but do not gateway in this case.
 		 * Yes, it is hack.
 		 */
-		if (res->fi && res->prefixlen < 4)
-			res->fi = NULL;
+		if (fi && res->prefixlen < 4)
+			fi = NULL;
 	}
 
 	rth = rt_dst_alloc(IN_DEV_CONF_GET(in_dev, NOPOLICY),
@@ -2399,7 +2401,7 @@ static struct rtable *__mkroute_output(struct fib_result *res,
 			RT_CACHE_STAT_INC(out_slow_mc);
 		}
 #ifdef CONFIG_IP_MROUTE
-		if (res->type == RTN_MULTICAST) {
+		if (type == RTN_MULTICAST) {
 			if (IN_DEV_MFORWARD(in_dev) &&
 			    !ipv4_is_local_multicast(oldflp->fl4_dst)) {
 				rth->dst.input = ip_mr_input;
@@ -2409,7 +2411,7 @@ static struct rtable *__mkroute_output(struct fib_result *res,
 #endif
 	}
 
-	rt_set_nexthop(rth, res, 0);
+	rt_set_nexthop(rth, res, fi, type, 0);
 
 	rth->rt_flags = flags;
 	return rth;
-- 
1.7.4.1


^ permalink raw reply related

* [PATCH 6/7] ipv4: Mark fib_combine_itag()'s 'res' arg as const.
From: David Miller @ 2011-02-18  0:12 UTC (permalink / raw)
  To: netdev


Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/ip_fib.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 08b46b8..b3019d8 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -232,7 +232,7 @@ extern void fib_select_multipath(const struct flowi *flp, struct fib_result *res
 extern void fib_trie_init(void);
 extern struct fib_table *fib_trie_table(u32 id);
 
-static inline void fib_combine_itag(u32 *itag, struct fib_result *res)
+static inline void fib_combine_itag(u32 *itag, const struct fib_result *res)
 {
 #ifdef CONFIG_IP_ROUTE_CLASSID
 #ifdef CONFIG_IP_MULTIPLE_TABLES
-- 
1.7.4.1


^ permalink raw reply related

* [PATCH 5/7] ipv4: Avoid use of signed integers in fib_trie code.
From: David Miller @ 2011-02-18  0:12 UTC (permalink / raw)
  To: netdev

GCC emits all kinds of crazy zero extensions when we go from signed
int, to unsigned short, etc. etc.

This transformation has to be legal because:

1) In tkey_extract_bits() in mask_pfx(), the values are used to
   perform shifts, on which negative values are undefined by C.

2) In fib_table_lookup() we perform comparisons with unsigned
   values, constants, and additions.  None of which should
   encounter negative values.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/fib_trie.c |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 1eae90b..edf3b09 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -217,12 +217,12 @@ static inline int tnode_child_length(const struct tnode *tn)
 	return 1 << tn->bits;
 }

-static inline t_key mask_pfx(t_key k, unsigned short l)
+static inline t_key mask_pfx(t_key k, unsigned int l)
 {
 	return (l == 0) ? 0 : k >> (KEYLENGTH-l) << (KEYLENGTH-l);
 }

-static inline t_key tkey_extract_bits(t_key a, int offset, int bits)
+static inline t_key tkey_extract_bits(t_key a, unsigned int offset, unsigned int bits)
 {
 	if (offset < KEYLENGTH)
 		return ((t_key)(a << offset)) >> (KEYLENGTH - bits);
@@ -1378,11 +1378,11 @@ int fib_table_lookup(struct fib_table *tb, const struct flowi *flp,
 	int ret;
 	struct rt_trie_node *n;
 	struct tnode *pn;
-	int pos, bits;
+	unsigned int pos, bits;
 	t_key key = ntohl(flp->fl4_dst);
-	int chopped_off;
+	unsigned int chopped_off;
 	t_key cindex = 0;
-	int current_prefix_length = KEYLENGTH;
+	unsigned int current_prefix_length = KEYLENGTH;
 	struct tnode *cn;
 	t_key pref_mismatch;

-- 
1.7.4.1

^ permalink raw reply related

* [PATCH 4/7] net: Add initial_ref arg to dst_alloc().
From: David Miller @ 2011-02-18  0:12 UTC (permalink / raw)
  To: netdev


This allows avoiding multiple writes to the initial __refcnt.

The most simplest cases of wanting an initial reference of "1"
in ipv4 and ipv6 have been converted, the rest have been left
along and kept at the existing "0".

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/net/dst.h      |    2 +-
 net/core/dst.c         |    4 ++--
 net/decnet/dn_route.c  |    4 ++--
 net/ipv4/route.c       |    7 ++-----
 net/ipv6/route.c       |    5 ++---
 net/xfrm/xfrm_policy.c |    2 +-
 6 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index e01855d..23b564d 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -352,7 +352,7 @@ static inline struct dst_entry *skb_dst_pop(struct sk_buff *skb)
 }
 
 extern int dst_discard(struct sk_buff *skb);
-extern void * dst_alloc(struct dst_ops * ops);
+extern void *dst_alloc(struct dst_ops * ops, int initial_ref);
 extern void __dst_free(struct dst_entry * dst);
 extern struct dst_entry *dst_destroy(struct dst_entry * dst);
 
diff --git a/net/core/dst.c b/net/core/dst.c
index c1674fd..91104d3 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -166,7 +166,7 @@ EXPORT_SYMBOL(dst_discard);
 
 const u32 dst_default_metrics[RTAX_MAX];
 
-void *dst_alloc(struct dst_ops *ops)
+void *dst_alloc(struct dst_ops *ops, int initial_ref)
 {
 	struct dst_entry *dst;
 
@@ -177,7 +177,7 @@ void *dst_alloc(struct dst_ops *ops)
 	dst = kmem_cache_zalloc(ops->kmem_cachep, GFP_ATOMIC);
 	if (!dst)
 		return NULL;
-	atomic_set(&dst->__refcnt, 0);
+	atomic_set(&dst->__refcnt, initial_ref);
 	dst->ops = ops;
 	dst->lastuse = jiffies;
 	dst->path = dst;
diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index 42c9c62..06c054d 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -1122,7 +1122,7 @@ make_route:
 	if (dev_out->flags & IFF_LOOPBACK)
 		flags |= RTCF_LOCAL;
 
-	rt = dst_alloc(&dn_dst_ops);
+	rt = dst_alloc(&dn_dst_ops, 0);
 	if (rt == NULL)
 		goto e_nobufs;
 
@@ -1383,7 +1383,7 @@ static int dn_route_input_slow(struct sk_buff *skb)
 	}
 
 make_route:
-	rt = dst_alloc(&dn_dst_ops);
+	rt = dst_alloc(&dn_dst_ops, 0);
 	if (rt == NULL)
 		goto e_nobufs;
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 79a2871..9841543 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1818,12 +1818,10 @@ static void rt_set_nexthop(struct rtable *rt, struct fib_result *res, u32 itag)
 
 static struct rtable *rt_dst_alloc(bool nopolicy, bool noxfrm)
 {
-	struct rtable *rt = dst_alloc(&ipv4_dst_ops);
+	struct rtable *rt = dst_alloc(&ipv4_dst_ops, 1);
 	if (rt) {
 		rt->dst.obsolete = -1;
 
-		atomic_set(&rt->dst.__refcnt, 1);
-
 		rt->dst.flags = DST_HOST |
 			(nopolicy ? DST_NOPOLICY : 0) |
 			(noxfrm ? DST_NOXFRM : 0);
@@ -2679,12 +2677,11 @@ static int ipv4_dst_blackhole(struct net *net, struct rtable **rp, struct flowi
 {
 	struct rtable *ort = *rp;
 	struct rtable *rt = (struct rtable *)
-		dst_alloc(&ipv4_dst_blackhole_ops);
+		dst_alloc(&ipv4_dst_blackhole_ops, 1);
 
 	if (rt) {
 		struct dst_entry *new = &rt->dst;
 
-		atomic_set(&new->__refcnt, 1);
 		new->__use = 1;
 		new->input = dst_discard;
 		new->output = dst_discard;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index ad8556e..7946b53 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -221,7 +221,7 @@ static struct rt6_info ip6_blk_hole_entry_template = {
 /* allocate dst with ip6_dst_ops */
 static inline struct rt6_info *ip6_dst_alloc(struct dst_ops *ops)
 {
-	return (struct rt6_info *)dst_alloc(ops);
+	return (struct rt6_info *)dst_alloc(ops, 0);
 }
 
 static void ip6_dst_destroy(struct dst_entry *dst)
@@ -873,13 +873,12 @@ int ip6_dst_blackhole(struct sock *sk, struct dst_entry **dstp, struct flowi *fl
 {
 	struct rt6_info *ort = (struct rt6_info *) *dstp;
 	struct rt6_info *rt = (struct rt6_info *)
-		dst_alloc(&ip6_dst_blackhole_ops);
+		dst_alloc(&ip6_dst_blackhole_ops, 1);
 	struct dst_entry *new = NULL;
 
 	if (rt) {
 		new = &rt->dst;
 
-		atomic_set(&new->__refcnt, 1);
 		new->__use = 1;
 		new->input = dst_discard;
 		new->output = dst_discard;
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 8b3ef40..3f1257a 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1340,7 +1340,7 @@ static inline struct xfrm_dst *xfrm_alloc_dst(struct net *net, int family)
 	default:
 		BUG();
 	}
-	xdst = dst_alloc(dst_ops) ?: ERR_PTR(-ENOBUFS);
+	xdst = dst_alloc(dst_ops, 0) ?: ERR_PTR(-ENOBUFS);
 	xfrm_policy_put_afinfo(afinfo);
 
 	xdst->flo.ops = &xfrm_bundle_fc_ops;
-- 
1.7.4.1


^ permalink raw reply related

* [PATCH 3/7] ipv4: Consolidate ipv4 dst allocation logic.
From: David Miller @ 2011-02-18  0:12 UTC (permalink / raw)
  To: netdev


This also allows us to combine all the dst->flags settings and avoid
read/modify/write sequences to this struct member.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/route.c |   52 +++++++++++++++++++++-------------------------------
 1 files changed, 21 insertions(+), 31 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index b2b3c9e..79a2871 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1816,6 +1816,21 @@ static void rt_set_nexthop(struct rtable *rt, struct fib_result *res, u32 itag)
 	rt->rt_type = res->type;
 }
 
+static struct rtable *rt_dst_alloc(bool nopolicy, bool noxfrm)
+{
+	struct rtable *rt = dst_alloc(&ipv4_dst_ops);
+	if (rt) {
+		rt->dst.obsolete = -1;
+
+		atomic_set(&rt->dst.__refcnt, 1);
+
+		rt->dst.flags = DST_HOST |
+			(nopolicy ? DST_NOPOLICY : 0) |
+			(noxfrm ? DST_NOXFRM : 0);
+	}
+	return rt;
+}
+
 /* called in rcu_read_lock() section */
 static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 				u8 tos, struct net_device *dev, int our)
@@ -1846,17 +1861,12 @@ static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
 		if (err < 0)
 			goto e_err;
 	}
-	rth = dst_alloc(&ipv4_dst_ops);
+	rth = rt_dst_alloc(IN_DEV_CONF_GET(in_dev, NOPOLICY), false);
 	if (!rth)
 		goto e_nobufs;
 
 	rth->dst.output = ip_rt_bug;
-	rth->dst.obsolete = -1;
 
-	atomic_set(&rth->dst.__refcnt, 1);
-	rth->dst.flags= DST_HOST;
-	if (IN_DEV_CONF_GET(in_dev, NOPOLICY))
-		rth->dst.flags |= DST_NOPOLICY;
 	rth->fl.fl4_dst	= daddr;
 	rth->rt_dst	= daddr;
 	rth->fl.fl4_tos	= tos;
@@ -1985,19 +1995,13 @@ static int __mkroute_input(struct sk_buff *skb,
 		}
 	}
 
-
-	rth = dst_alloc(&ipv4_dst_ops);
+	rth = rt_dst_alloc(IN_DEV_CONF_GET(in_dev, NOPOLICY),
+			   IN_DEV_CONF_GET(out_dev, NOXFRM));
 	if (!rth) {
 		err = -ENOBUFS;
 		goto cleanup;
 	}
 
-	atomic_set(&rth->dst.__refcnt, 1);
-	rth->dst.flags= DST_HOST;
-	if (IN_DEV_CONF_GET(in_dev, NOPOLICY))
-		rth->dst.flags |= DST_NOPOLICY;
-	if (IN_DEV_CONF_GET(out_dev, NOXFRM))
-		rth->dst.flags |= DST_NOXFRM;
 	rth->fl.fl4_dst	= daddr;
 	rth->rt_dst	= daddr;
 	rth->fl.fl4_tos	= tos;
@@ -2012,7 +2016,6 @@ static int __mkroute_input(struct sk_buff *skb,
 	rth->fl.oif 	= 0;
 	rth->rt_spec_dst= spec_dst;
 
-	rth->dst.obsolete = -1;
 	rth->dst.input = ip_forward;
 	rth->dst.output = ip_output;
 	rth->rt_genid = rt_genid(dev_net(rth->dst.dev));
@@ -2162,18 +2165,13 @@ brd_input:
 	RT_CACHE_STAT_INC(in_brd);
 
 local_input:
-	rth = dst_alloc(&ipv4_dst_ops);
+	rth = rt_dst_alloc(IN_DEV_CONF_GET(in_dev, NOPOLICY), false);
 	if (!rth)
 		goto e_nobufs;
 
 	rth->dst.output= ip_rt_bug;
-	rth->dst.obsolete = -1;
 	rth->rt_genid = rt_genid(net);
 
-	atomic_set(&rth->dst.__refcnt, 1);
-	rth->dst.flags= DST_HOST;
-	if (IN_DEV_CONF_GET(in_dev, NOPOLICY))
-		rth->dst.flags |= DST_NOPOLICY;
 	rth->fl.fl4_dst	= daddr;
 	rth->rt_dst	= daddr;
 	rth->fl.fl4_tos	= tos;
@@ -2366,18 +2364,11 @@ static struct rtable *__mkroute_output(struct fib_result *res,
 			res->fi = NULL;
 	}
 
-
-	rth = dst_alloc(&ipv4_dst_ops);
+	rth = rt_dst_alloc(IN_DEV_CONF_GET(in_dev, NOPOLICY),
+			   IN_DEV_CONF_GET(in_dev, NOXFRM));
 	if (!rth)
 		return ERR_PTR(-ENOBUFS);
 
-	atomic_set(&rth->dst.__refcnt, 1);
-	rth->dst.flags= DST_HOST;
-	if (IN_DEV_CONF_GET(in_dev, NOXFRM))
-		rth->dst.flags |= DST_NOXFRM;
-	if (IN_DEV_CONF_GET(in_dev, NOPOLICY))
-		rth->dst.flags |= DST_NOPOLICY;
-
 	rth->fl.fl4_dst	= oldflp->fl4_dst;
 	rth->fl.fl4_tos	= tos;
 	rth->fl.fl4_src	= oldflp->fl4_src;
@@ -2394,7 +2385,6 @@ static struct rtable *__mkroute_output(struct fib_result *res,
 	rth->rt_spec_dst= fl->fl4_src;
 
 	rth->dst.output=ip_output;
-	rth->dst.obsolete = -1;
 	rth->rt_genid = rt_genid(dev_net(dev_out));
 
 	RT_CACHE_STAT_INC(out_slow_tot);
-- 
1.7.4.1


^ permalink raw reply related

* [PATCH 2/7] ipv4: Move rcu_read_{lock,unlock}() into ip_route_output_slow().
From: David Miller @ 2011-02-18  0:11 UTC (permalink / raw)
  To: netdev


Simplifies tail of __ip_route_output_key().

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/route.c |   13 ++++++-------
 1 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 849be48..b2b3c9e 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2456,6 +2456,7 @@ static int ip_route_output_slow(struct net *net, struct rtable **rp,
 	res.r		= NULL;
 #endif
 
+	rcu_read_lock();
 	if (oldflp->fl4_src) {
 		err = -EINVAL;
 		if (ipv4_is_multicast(oldflp->fl4_src) ||
@@ -2617,15 +2618,16 @@ make_route:
 		err = rt_intern_hash(hash, rth, rp, NULL, oldflp->oif);
 	}
 
-out:	return err;
+out:
+	rcu_read_unlock();
+	return err;
 }
 
 int __ip_route_output_key(struct net *net, struct rtable **rp,
 			  const struct flowi *flp)
 {
-	unsigned int hash;
-	int res;
 	struct rtable *rth;
+	unsigned int hash;
 
 	if (!rt_caching(net))
 		goto slow_output;
@@ -2655,10 +2657,7 @@ int __ip_route_output_key(struct net *net, struct rtable **rp,
 	rcu_read_unlock_bh();
 
 slow_output:
-	rcu_read_lock();
-	res = ip_route_output_slow(net, rp, flp);
-	rcu_read_unlock();
-	return res;
+	return ip_route_output_slow(net, rp, flp);
 }
 EXPORT_SYMBOL_GPL(__ip_route_output_key);
 
-- 
1.7.4.1


^ permalink raw reply related

* [PATCH 1/7] ipv4: Simplify output route creation call sequence.
From: David Miller @ 2011-02-18  0:11 UTC (permalink / raw)
  To: netdev


There's a lot of redundancy and unnecessary stack frames
in the output route creation path.

1) Make __mkroute_output() return error pointers.

2) Eliminate ip_mkroute_output() entirely, made possible by #1.

3) Call __mkroute_output() directly and handling the returning error
   pointers in ip_route_output_slow().

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/route.c |   58 +++++++++++++++++++++--------------------------------
 1 files changed, 23 insertions(+), 35 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 756f544..849be48 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2323,33 +2323,32 @@ skip_cache:
 EXPORT_SYMBOL(ip_route_input_common);
 
 /* called with rcu_read_lock() */
-static int __mkroute_output(struct rtable **result,
-			    struct fib_result *res,
-			    const struct flowi *fl,
-			    const struct flowi *oldflp,
-			    struct net_device *dev_out,
-			    unsigned flags)
+static struct rtable *__mkroute_output(struct fib_result *res,
+				       const struct flowi *fl,
+				       const struct flowi *oldflp,
+				       struct net_device *dev_out,
+				       unsigned int flags)
 {
-	struct rtable *rth;
-	struct in_device *in_dev;
 	u32 tos = RT_FL_TOS(oldflp);
+	struct in_device *in_dev;
+	struct rtable *rth;
 
 	if (ipv4_is_loopback(fl->fl4_src) && !(dev_out->flags & IFF_LOOPBACK))
-		return -EINVAL;
+		return ERR_PTR(-EINVAL);
 
 	if (ipv4_is_lbcast(fl->fl4_dst))
 		res->type = RTN_BROADCAST;
 	else if (ipv4_is_multicast(fl->fl4_dst))
 		res->type = RTN_MULTICAST;
 	else if (ipv4_is_zeronet(fl->fl4_dst))
-		return -EINVAL;
+		return ERR_PTR(-EINVAL);
 
 	if (dev_out->flags & IFF_LOOPBACK)
 		flags |= RTCF_LOCAL;
 
 	in_dev = __in_dev_get_rcu(dev_out);
 	if (!in_dev)
-		return -EINVAL;
+		return ERR_PTR(-EINVAL);
 
 	if (res->type == RTN_BROADCAST) {
 		flags |= RTCF_BROADCAST | RTCF_LOCAL;
@@ -2370,7 +2369,7 @@ static int __mkroute_output(struct rtable **result,
 
 	rth = dst_alloc(&ipv4_dst_ops);
 	if (!rth)
-		return -ENOBUFS;
+		return ERR_PTR(-ENOBUFS);
 
 	atomic_set(&rth->dst.__refcnt, 1);
 	rth->dst.flags= DST_HOST;
@@ -2425,28 +2424,7 @@ static int __mkroute_output(struct rtable **result,
 	rt_set_nexthop(rth, res, 0);
 
 	rth->rt_flags = flags;
-	*result = rth;
-	return 0;
-}
-
-/* called with rcu_read_lock() */
-static int ip_mkroute_output(struct rtable **rp,
-			     struct fib_result *res,
-			     const struct flowi *fl,
-			     const struct flowi *oldflp,
-			     struct net_device *dev_out,
-			     unsigned flags)
-{
-	struct rtable *rth = NULL;
-	int err = __mkroute_output(&rth, res, fl, oldflp, dev_out, flags);
-	unsigned hash;
-	if (err == 0) {
-		hash = rt_hash(oldflp->fl4_dst, oldflp->fl4_src, oldflp->oif,
-			       rt_genid(dev_net(dev_out)));
-		err = rt_intern_hash(hash, rth, rp, NULL, oldflp->oif);
-	}
-
-	return err;
+	return rth;
 }
 
 /*
@@ -2469,6 +2447,7 @@ static int ip_route_output_slow(struct net *net, struct rtable **rp,
 	struct fib_result res;
 	unsigned int flags = 0;
 	struct net_device *dev_out = NULL;
+	struct rtable *rth;
 	int err;
 
 
@@ -2627,7 +2606,16 @@ static int ip_route_output_slow(struct net *net, struct rtable **rp,
 
 
 make_route:
-	err = ip_mkroute_output(rp, &res, &fl, oldflp, dev_out, flags);
+	rth = __mkroute_output(&res, &fl, oldflp, dev_out, flags);
+	if (IS_ERR(rth))
+		err = PTR_ERR(rth);
+	else {
+		unsigned int hash;
+
+		hash = rt_hash(oldflp->fl4_dst, oldflp->fl4_src, oldflp->oif,
+			       rt_genid(dev_net(dev_out)));
+		err = rt_intern_hash(hash, rth, rp, NULL, oldflp->oif);
+	}
 
 out:	return err;
 }
-- 
1.7.4.1


^ permalink raw reply related

* [PATCH 0/7] IPV4 routing path simplifications
From: David Miller @ 2011-02-18  0:11 UTC (permalink / raw)
  To: netdev

These cleanups and simplifications are based upon work I have been
doing over the past day analyzing why slow path route resolution
has so much overhead.

They are close to trivial and certainly not controversial so I've
added them to net-next-2.6

I know a new spin of the routing cache deletion patch is necessary so
that it applies cleanly relative to this stuff, and I'll post that in
a little bit.

Thanks.

^ permalink raw reply

* Re: [PATCH v2 0/5] Panda: Support for WLAN on WL127x
From: Tony Lindgren @ 2011-02-18  0:03 UTC (permalink / raw)
  To: Panduranga Mallireddy
  Cc: coelho, netdev, linux-omap, linux-mmc, ohad, benzyg,
	pradeepgurumath, vishalm, x-boudet, naveen_jain, pavan_savoy,
	manjunatha_halli, cjb
In-Reply-To: <1297759236-25323-1-git-send-email-panduranga_mallireddy@ti.com>

* Panduranga Mallireddy <panduranga_mallireddy@ti.com> [110215 00:13]:
> Fixes from V1:
> 1. Removing the pull up of WLAN IRQ line, since it is always held up by wl127x device.
> 
> Adding support for WLAN on Panda board using wl12xx and mac80211 drivers

Thanks, adding these to devel-board for the upcoming merge window.

Tony

^ permalink raw reply

* [net-next-2.6 PATCH] enic: Always use single transmit and single receive hardware queues per device
From: Vasanthy Kolluri @ 2011-02-17 23:57 UTC (permalink / raw)
  To: davem; +Cc: netdev

From: Vasanthy Kolluri <vkolluri@cisco.com>

We believe that our earlier patch for supporting multiple hardware receive queues per enic device requires more internal testing. At this point, we think that it's best to disable the use of multiple receive queues. The current patch provides an effective means for the same.

Also, we continue to disallow multiple hardware transmit queues per device. But change the way we enforce this in order to maintain consistency with the way receive queues are handled.

Signed-off-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: Danny Guo <dannguo@cisco.com>
Signed-off-by: Vasanthy Kolluri <vkolluri@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David Wang <dwang2@cisco.com>
---
 drivers/net/enic/enic.h      |    6 +++---
 drivers/net/enic/enic_main.c |    2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)


diff --git a/drivers/net/enic/enic.h b/drivers/net/enic/enic.h
index 2ac891b..aee5256 100644
--- a/drivers/net/enic/enic.h
+++ b/drivers/net/enic/enic.h
@@ -32,13 +32,13 @@
 
 #define DRV_NAME		"enic"
 #define DRV_DESCRIPTION		"Cisco VIC Ethernet NIC Driver"
-#define DRV_VERSION		"2.1.1.8"
+#define DRV_VERSION		"2.1.1.9"
 #define DRV_COPYRIGHT		"Copyright 2008-2011 Cisco Systems, Inc"
 
 #define ENIC_BARS_MAX		6
 
-#define ENIC_WQ_MAX		8
-#define ENIC_RQ_MAX		8
+#define ENIC_WQ_MAX		1
+#define ENIC_RQ_MAX		1
 #define ENIC_CQ_MAX		(ENIC_WQ_MAX + ENIC_RQ_MAX)
 #define ENIC_INTR_MAX		(ENIC_CQ_MAX + 2)
 
diff --git a/drivers/net/enic/enic_main.c b/drivers/net/enic/enic_main.c
index d1aa807..4f1710e 100644
--- a/drivers/net/enic/enic_main.c
+++ b/drivers/net/enic/enic_main.c
@@ -2080,7 +2080,7 @@ static void enic_reset(struct work_struct *work)
 static int enic_set_intr_mode(struct enic *enic)
 {
 	unsigned int n = min_t(unsigned int, enic->rq_count, ENIC_RQ_MAX);
-	unsigned int m = 1;
+	unsigned int m = min_t(unsigned int, enic->wq_count, ENIC_WQ_MAX);
 	unsigned int i;
 
 	/* Set interrupt mode (INTx, MSI, MSI-X) depending


^ permalink raw reply related

* Re: IGMP and rwlock: Dead ocurred again on TILEPro
From: Chris Metcalf @ 2011-02-17 23:18 UTC (permalink / raw)
  To: David Miller; +Cc: xiyou.wangcong, cypher.w, linux-kernel, eric.dumazet, netdev
In-Reply-To: <20110217.151147.35033921.davem@davemloft.net>

On 2/17/2011 6:11 PM, David Miller wrote:
> From: Chris Metcalf <cmetcalf@tilera.com>
> Date: Thu, 17 Feb 2011 18:04:13 -0500
>
>> On 2/17/2011 5:53 PM, David Miller wrote:
>>> From: Chris Metcalf <cmetcalf@tilera.com>
>>> Date: Thu, 17 Feb 2011 17:49:46 -0500
>>>
>>>> The fix is to disable interrupts for the arch_read_lock family of methods. 
>>> How does that help handle the race when it happens between different
>>> cpus, instead of between IRQ and non-IRQ context on the same CPU?
>> There's no race in that case, since the lock code properly backs off and
>> retries until the other cpu frees it.  The distinction here is that the
>> non-IRQ context is "wedged" by the IRQ context.
>>
>>> Why don't you just use the generic spinlock based rwlock code on Tile,
>>> since that is all that your atomic instructions can handle
>>> sufficiently?
>> The tile-specific code encodes reader/writer information in the same 32-bit
>> word that the test-and-set instruction manipulates, so it's more efficient
>> both in space and time.  This may not really matter for rwlocks, since no
>> one cares much about them any more, but that was the motivation.
> Ok, but IRQ disabling is going to be very expensive.

The interrupt architecture on Tile allows a write to a special-purpose
register to put you into a "critical section" where no interrupts or faults
are delivered.  So we just need to bracket the read_lock operations with
two SPR writes; each takes six machine cycles, so we're only adding 12
cycles to the total cost of taking or releasing a read lock on an rwlock.

-- 
Chris Metcalf, Tilera Corp.
http://www.tilera.com

^ permalink raw reply

* Re: IGMP and rwlock: Dead ocurred again on TILEPro
From: David Miller @ 2011-02-17 23:11 UTC (permalink / raw)
  To: cmetcalf; +Cc: xiyou.wangcong, cypher.w, linux-kernel, eric.dumazet, netdev
In-Reply-To: <4D5DA96D.5060200@tilera.com>

From: Chris Metcalf <cmetcalf@tilera.com>
Date: Thu, 17 Feb 2011 18:04:13 -0500

> On 2/17/2011 5:53 PM, David Miller wrote:
>> From: Chris Metcalf <cmetcalf@tilera.com>
>> Date: Thu, 17 Feb 2011 17:49:46 -0500
>>
>>> The fix is to disable interrupts for the arch_read_lock family of methods. 
>> How does that help handle the race when it happens between different
>> cpus, instead of between IRQ and non-IRQ context on the same CPU?
> 
> There's no race in that case, since the lock code properly backs off and
> retries until the other cpu frees it.  The distinction here is that the
> non-IRQ context is "wedged" by the IRQ context.
> 
>> Why don't you just use the generic spinlock based rwlock code on Tile,
>> since that is all that your atomic instructions can handle
>> sufficiently?
> 
> The tile-specific code encodes reader/writer information in the same 32-bit
> word that the test-and-set instruction manipulates, so it's more efficient
> both in space and time.  This may not really matter for rwlocks, since no
> one cares much about them any more, but that was the motivation.

Ok, but IRQ disabling is going to be very expensive.

^ permalink raw reply

* Re: IGMP and rwlock: Dead ocurred again on TILEPro
From: Chris Metcalf @ 2011-02-17 23:04 UTC (permalink / raw)
  To: David Miller; +Cc: xiyou.wangcong, cypher.w, linux-kernel, eric.dumazet, netdev
In-Reply-To: <20110217.145333.232751283.davem@davemloft.net>

On 2/17/2011 5:53 PM, David Miller wrote:
> From: Chris Metcalf <cmetcalf@tilera.com>
> Date: Thu, 17 Feb 2011 17:49:46 -0500
>
>> The fix is to disable interrupts for the arch_read_lock family of methods. 
> How does that help handle the race when it happens between different
> cpus, instead of between IRQ and non-IRQ context on the same CPU?

There's no race in that case, since the lock code properly backs off and
retries until the other cpu frees it.  The distinction here is that the
non-IRQ context is "wedged" by the IRQ context.

> Why don't you just use the generic spinlock based rwlock code on Tile,
> since that is all that your atomic instructions can handle
> sufficiently?

The tile-specific code encodes reader/writer information in the same 32-bit
word that the test-and-set instruction manipulates, so it's more efficient
both in space and time.  This may not really matter for rwlocks, since no
one cares much about them any more, but that was the motivation.

-- 
Chris Metcalf, Tilera Corp.
http://www.tilera.com

^ permalink raw reply

* Re: [PATCH v6 0/9] net: Unified offload configuration
From: David Miller @ 2011-02-17 22:56 UTC (permalink / raw)
  To: mirq-linux; +Cc: netdev, bhutchings
In-Reply-To: <cover.1297824704.git.mirq-linux@rere.qmqm.pl>

From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Date: Wed, 16 Feb 2011 03:59:16 +0100 (CET)

> Here's a v6 of the ethtool unification patch series.
> 
> What's in it?
>  1..4:
> 	cleanups for the core patches
>  5:
> 	the patch - implement unified ethtool setting ops
>  6..7:
> 	implement interoperation between old and new ethtool ops
>  8:
> 	include RX checksum in features and plug it into new framework
>  9:
> 	convert loopback device to new framework
> 
> What is it good for?
>  - unifies driver behaviour wrt hardware offloads
>  - removes a lot of boilerplate code from drivers
>  - allows better fine-grained control over used offloads

Applied to net-next-2.6, please send any bug fixes relative to this.

Please get rid of that annoying message spit out by netif_features_change(),
it's just spam.  If we want notifications for stuff like this, use a
non-unicast netlink message so those who want to hear it can do so.

^ permalink raw reply

* Re: IGMP and rwlock: Dead ocurred again on TILEPro
From: David Miller @ 2011-02-17 22:53 UTC (permalink / raw)
  To: cmetcalf; +Cc: xiyou.wangcong, cypher.w, linux-kernel, eric.dumazet, netdev
In-Reply-To: <4D5DA60A.8080201@tilera.com>

From: Chris Metcalf <cmetcalf@tilera.com>
Date: Thu, 17 Feb 2011 17:49:46 -0500

> The fix is to disable interrupts for the arch_read_lock family of methods. 

How does that help handle the race when it happens between different
cpus, instead of between IRQ and non-IRQ context on the same CPU?

Why don't you just use the generic spinlock based rwlock code on Tile,
since that is all that your atomic instructions can handle
sufficiently?

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox