Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next] bnx2x: Do Tx handling in a separate tasklet.
From: David Miller @ 2009-10-26 22:28 UTC (permalink / raw)
  To: vladz; +Cc: eilong, netdev
In-Reply-To: <8628FE4E7912BF47A96AE7DD7BAC0AADCB2CFF1C42@SJEXCHCCR02.corp.ad.broadcom.com>

From: "Vladislav Zolotarov" <vladz@broadcom.com>
Date: Mon, 26 Oct 2009 07:42:27 -0700

> The separation of Tx and Rx interrupt handling gives us the
> possibility to properly affinitize the Rx (heavy CPU consuming task)
> and Tx (low CPU consuming task) and to ensure that Tx work is done
> not long after the Tx interrupt without interference of Rx work thus
> letting the user benefit from Tx coalescing configuration in order
> to achieve the best performance in each specific scenario. This is
> most important in heavy load scenarios with mixed traffic (UDP + TCP
> for instance). If we didn't separate Tx and Rx interrupt handling Tx
> coalescing configuration was not worth much.

There are other issues:

1) Actually, it makes sense to do TX and RX work together, since TX
   packet liberation makes fresh CPU local packets available for
   responses generated by RX packet reception.

2) TX packet liberation is not low CPU consumption, it has to perform
   many atomic instructions, reference socket state, enter the SLAB
   allocator, potentially liberate netfilter state, etc.

Using NAPI also moves the TX freeing into softirq context.

If you do it from a hardirq you are making it more expensive.  From
hardirq the free just puts the SKB on a list, schedules a softirq,
then does the real SKB free work from the softirq.

This needless SKB list management and softirq scheduling you'll
avoid if you do things from softirqs, and thus using NAPI makes
sense here.

^ permalink raw reply

* Re: [PATCH] TI DaVinci EMAC: Minor macro related updates
From: Jean-Christophe PLAGNIOL-VILLARD @ 2009-10-26 22:07 UTC (permalink / raw)
  To: Chaithrika U S; +Cc: netdev, davem, davinci-linux-open-source
In-Reply-To: <1254428719-13960-1-git-send-email-chaithrika@ti.com>

On 16:25 Thu 01 Oct     , Chaithrika U S wrote:
> Use BIT for macro definitions wherever possible, remove
> unused and redundant macros.
> 
> Signed-off-by: Chaithrika U S <chaithrika@ti.com>
> ---
> Applies to Linus' kernel tree
do you plan to send a new version soon?

as the current DaVinci EMAC does not build on the v2.6.32-rc5

Best Regards,
J.

^ permalink raw reply

* [PATCH] dcache: better name hash function
From: Stephen Hemminger <shemminger@vyatta.com>, Al Viro @ 2009-10-26 22:36 UTC (permalink / raw)
  To: Andrew Morton, Linus Torvalds; +Cc: Octavian Purdila, netdev, linux-kernel
In-Reply-To: <20091025214357.666350d2@nehalam>

Some experiments by Octavian with large numbers of network devices identified
that name_hash does not evenly distribute values causing performance
penalties.  The name hashing function is used by dcache et. all
so let's just choose a better one.

Additional standalone tests for 10,000,000 consecutive names
using lots of different algorithms shows fnv as the winner.
It is faster and has almost ideal dispersion. 
string10 is slightly faster, but only works for names like ppp0, ppp1,...

Algorithm             Time       Ratio       Max   StdDev
string10             0.238201       1.00      2444   0.02
fnv32                0.240595       1.00      2576   1.05
fnv64                0.241224       1.00      2556   0.69
SuperFastHash        0.272872       1.00      2871   2.15
string_hash17        0.295160       1.00      2484   0.40
jhash_string         0.300925       1.00      2606   1.00
crc                  1.606741       1.00      2474   0.29
md5_string           2.424771       1.00      2644   0.99
djb2                 0.275424       1.15      3821  19.04
string_hash31        0.264806       1.21      4097  22.78
sdbm                 0.371136       2.87     13016  67.54
elf                  0.371279       3.59      9990  79.50
pjw                  0.401172       3.59      9990  79.50
full_name_hash       0.285851      13.09     35174 171.81
kr_hash              0.245068     124.84    468448 549.89
fletcher             0.267664     124.84    468448 549.89
adler32              0.640668     124.84    468448 549.89
xor                  0.220545     213.82    583189 720.85
lastchar             0.194604     409.57   1000000 998.78

Time is seconds.
Ratio is how many probes required to lookup all values versus
  an ideal hash.
Max is longest chain

Reported-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/include/linux/dcache.h	2009-10-26 14:58:45.220347300 -0700
+++ b/include/linux/dcache.h	2009-10-26 15:12:15.004160122 -0700
@@ -45,15 +45,28 @@ struct dentry_stat_t {
 };
 extern struct dentry_stat_t dentry_stat;
 
-/* Name hashing routines. Initial hash value */
-/* Hash courtesy of the R5 hash in reiserfs modulo sign bits */
-#define init_name_hash()		0
+/*
+ * Fowler / Noll / Vo (FNV) Hash
+ * see: http://www.isthe.com/chongo/tech/comp/fnv/
+ */
+#ifdef CONFIG_64BIT
+#define FNV_PRIME  1099511628211ull
+#define FNV1_INIT  14695981039346656037ull
+#else
+#define FNV_PRIME  16777619u
+#define FNV1_INIT  2166136261u
+#endif
+
+#define init_name_hash()	FNV1_INIT
 
-/* partial hash update function. Assume roughly 4 bits per character */
+/* partial hash update function. */
 static inline unsigned long
-partial_name_hash(unsigned long c, unsigned long prevhash)
+partial_name_hash(unsigned char c, unsigned long prevhash)
 {
-	return (prevhash + (c << 4) + (c >> 4)) * 11;
+	prevhash ^= c;
+	prevhash *= FNV_PRIME;
+
+	return prevhash;
 }
 
 /*

^ permalink raw reply

* Re: [PATCH]NET/KS8695: add support NAPI for Rx
From: David Miller @ 2009-10-26 22:43 UTC (permalink / raw)
  To: bhutchings; +Cc: figo1802, dsilvers, netdev, vince, ben
In-Reply-To: <1256575746.2783.37.camel@achroite>

From: Ben Hutchings <bhutchings@solarflare.com>
Date: Mon, 26 Oct 2009 16:49:06 +0000

>> @@ -152,6 +156,10 @@ struct ks8695_priv {
>>  	enum ks8695_dtype dtype;
>>  	void __iomem *io_regs;
>>  
>> +	#ifdef KS8695NET_NAPI
>> +	struct napi_struct	napi;
>> +	#endif
>> +
> 
> NAPI is well-established and there should be no need to make it
> optional.  So far as I'm aware, all other drivers that had it as an
> option now use it unconditionally.

I absolutely refuse to apply this patch with the CPP conditional
present, if you convert to NAPI make it unconditional.

^ permalink raw reply

* Re: [PATCH v1 1/7] gianfar: Add per queue structure support
From: David Miller @ 2009-10-26 23:07 UTC (permalink / raw)
  To: sandeep.kumar; +Cc: netdev
In-Reply-To: <1256574433500-git-send-email-sandeep.kumar@freescale.com>

This patch set still doesn't apply cleanly to net-next-2.6,
in particular patch #4 fails to apply:

Applying: fsl_pq_mdio: Add Suport for etsec2.0 devices.
error: patch failed: drivers/net/fsl_pq_mdio.c:405
error: drivers/net/fsl_pq_mdio.c: patch does not apply
Patch failed at 0004 fsl_pq_mdio: Add Suport for etsec2.0 devices.
When you have resolved this problem run "git am --resolved".
If you would prefer to skip this patch, instead run "git am --skip".
To restore the original branch and stop patching run "git am --abort".

Please respin this, and also please CC: on all of your patch
postings as that helps me sort things.

Thanks.

^ permalink raw reply

* Re: [net-2.6 PATCH 1/3] igb: fix memory leak when setting ring size while interface is down
From: David Miller @ 2009-10-26 23:09 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, alexander.h.duyck
In-Reply-To: <20091026213147.9993.9778.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 26 Oct 2009 14:31:47 -0700

> From: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> Changing ring sizes while the interface was down was causing a double
> allocation of the receive and transmit rings.  This issue is amplified when
> there are multiple rings enabled.  To prevent this we need to add an
> additional check which will just update the ring counts when the interface
> is not up and skip the allocation steps.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-2.6 PATCH 2/3] ixgbe: fix memory leak when resizing rings while interface is down
From: David Miller @ 2009-10-26 23:09 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, alexander.h.duyck
In-Reply-To: <20091026213205.9993.32014.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 26 Oct 2009 14:32:05 -0700

> From: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> This patch resolves a memory leak that occurs when you resize the rings via
> the ethtool -G option while the interface is down.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-2.6 PATCH 3/3] igbvf: fix memory leak when ring size changed while interface down
From: David Miller @ 2009-10-26 23:09 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, alexander.h.duyck
In-Reply-To: <20091026213225.9993.25681.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 26 Oct 2009 14:32:25 -0700

> From: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> This patch resolves a memory leak which occurs while changing the ring size
> while the interface is down.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-2.6 PATCH 1/5] e1000e: clear PHY wakeup bit after LCD reset on 82577/82578
From: David Miller @ 2009-10-26 23:17 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, bruce.w.allan
In-Reply-To: <20091026212242.9682.25442.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 26 Oct 2009 14:22:47 -0700

> From: Bruce Allan <bruce.w.allan@intel.com>
> 
> Performing a dummy read of the PHY Wakeup Control (WUC) register clears the
> wakeup enable bit set by an PHY reset.  If this bit remains set, link
> problems may occur.
> 
> Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-2.6 PATCH 2/5] e1000e: increase swflag acquisition timeout for ICHx/PCH
From: David Miller @ 2009-10-26 23:17 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, bruce.w.allan
In-Reply-To: <20091026212306.9682.73519.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 26 Oct 2009 14:23:06 -0700

> From: Bruce Allan <bruce.w.allan@intel.com>
> 
> In some conditions (e.g. when AMT is enabled on the system), it is possible
> to take an extended period of time to for the driver to acquire the sw/fw/hw
> hardware semaphore used to protect against concurrent access of a shared
> resource (e.g. PHY registers).  This could cause PHY registers to not get
> configured properly resulting in link issues.
> 
> Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-2.6 PATCH 3/5] e1000e: 82577/82578 requires a different method to configure LPLU
From: David Miller @ 2009-10-26 23:17 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, bruce.w.allan
In-Reply-To: <20091026212325.9682.22270.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 26 Oct 2009 14:23:25 -0700

> From: Bruce Allan <bruce.w.allan@intel.com>
> 
> Unlike previous ICHx-based parts, the PCH-based parts (82577/82578) require
> LPLU (Low Power Link Up, or "reverse auto-negotiation") to be configured in
> the PHY rather than the MAC.
> 
> Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-2.6 PATCH 4/5] e1000e: separate mutex usage between NVM and PHY/CSR register for ICHx/PCH
From: David Miller @ 2009-10-26 23:17 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, bruce.w.allan
In-Reply-To: <20091026212343.9682.56885.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 26 Oct 2009 14:23:43 -0700

> From: Bruce Allan <bruce.w.allan@intel.com>
> 
> Accesses to NVM and PHY/CSR registers on ICHx/PCH-based parts are protected
> from concurrent accesses with a mutex that is acquired when the access is
> initiated and released when the access has completed.  However, the two
> types of accesses should not be protected by the same mutex because the
> driver may have to access the NVM while already holding the mutex over
> several consecutive PHY/CSR accesses which would result in livelock.
> 
> Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [net-2.6 PATCH 5/5] e1000e: allow for swflag to be held over consecutive PHY accesses
From: David Miller @ 2009-10-26 23:17 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, gospo, bruce.w.allan
In-Reply-To: <20091026212401.9682.51970.stgit@localhost.localdomain>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 26 Oct 2009 14:24:02 -0700

> From: Bruce Allan <bruce.w.allan@intel.com>
> 
> PCH-based parts (82577/82578) and some ICH8-based parts (82566) need to
> hold the swflag (sw/fw/hw hardware semaphore) over consecutive PHY accesses
> in order to perform sw-driven PHY configuration during initialization to
> workaround known hardware issues (see follow-on patch).  This patch
> provides new PHY read/write functions (and function pointers) that will
> allow accessing the PHY registers assuming the swflag has already been
> acquired.  The actual PHY register access code has moved into helper
> functions that are called with a flag indicating whether or not the swflag
> has already been acquired and acquires/releases it if not.
> 
> The functions called from within the updated PHY access functions had to be
> updated to assume the swflag was already acquired, and other functions that
> called those functions were also updated to acquire/release the swflag.
> 
> Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Applied.

^ permalink raw reply

* Re: [PATCH] PPPoE: Fix flush/close races.
From: David Miller @ 2009-10-26 23:23 UTC (permalink / raw)
  To: denys; +Cc: gorcunov, mostrows, linux-ppp, netdev, eric.dumazet
In-Reply-To: <200910262205.38108.denys@visp.net.lb>

From: Denys Fedoryschenko <denys@visp.net.lb>
Date: Mon, 26 Oct 2009 22:05:37 +0200

> On Monday 26 October 2009 21:59:33 Cyrill Gorcunov wrote:
>> [Michal Ostrowski - Mon, Oct 26, 2009 at 02:51:52PM -0500]
>>
>>
>> Thanks a lot Michal!
>>
>> I think we should add as well
>>
>> Reported-by: Denys Fedoryschenko <denys@visp.net.lb>
>> Tested-by: Denys Fedoryschenko <denys@visp.net.lb>
>>
>> 	-- Cyrill
> 
> Yes, till now everything working perfectly. Confirming :-)

Applied, thanks everyone.

^ permalink raw reply

* Re: [PATCH 0/5] Candidate fix for increased number of GFP_ATOMIC failures V2
From: Frans Pop @ 2009-10-26 23:45 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker,
	Rafael J. Wysocki, David Miller, Reinette Chatre, Kalle Valo,
	David Rientjes, KOSAKI Motohiro, Mohamed Abbas, Jens Axboe,
	John W. Linville, Pekka Enberg, Bartlomiej Zolnierkiewicz,
	Greg Kroah-Hartman, Stephan von Krawczynski, Kernel Testers List,
	netdev, linux-kernel, linux-mm@kvack.org
In-Reply-To: <200910262317.55960.elendil@planet.nl>

On Monday 26 October 2009, Frans Pop wrote:
> Detailed test results follow. I've done 2 test runs with each kernel (3
> for the last).

Forgot to mention that each run was after a reboot, so they are not 
interdependant.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH] sh_eth: Add asm/cacheflush.h
From: Nobuhiro Iwamatsu @ 2009-10-26 23:49 UTC (permalink / raw)
  To: netdev; +Cc: Linux-sh, Paul Mundt

Add include asm/cacheflush.h,  because declaration of __flush_purge_region
moved to asm/cacheflush.h.

Signed-off-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>
---
 drivers/net/sh_eth.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/net/sh_eth.c b/drivers/net/sh_eth.c
index f49d080..528b912 100644
--- a/drivers/net/sh_eth.c
+++ b/drivers/net/sh_eth.c
@@ -30,6 +30,7 @@
 #include <linux/phy.h>
 #include <linux/cache.h>
 #include <linux/io.h>
+#include <asm/cacheflush.h>

 #include "sh_eth.h"

-- 
1.6.4.3

^ permalink raw reply related

* Re: [PATCH 6/9] ser_gigaset: checkpatch cleanup
From: Tilman Schmidt @ 2009-10-26 23:59 UTC (permalink / raw)
  To: Joe Perches
  Cc: David Miller, Karsten Keil, Hansjoerg Lipp, netdev, linux-kernel,
	isdn4linux, i4ldeveloper
In-Reply-To: <1256518486.14711.13.camel@Joe-Laptop.home>

[-- Attachment #1: Type: text/plain, Size: 2021 bytes --]

Am 26.10.2009 01:54 schrieb Joe Perches:
> On Sun, 2009-10-25 at 20:30 +0100, Tilman Schmidt wrote:
>> Duly uglified as demanded by checkpatch.pl.
>> diff --git a/drivers/isdn/gigaset/ser-gigaset.c b/drivers/isdn/gigaset/ser-gigaset.c
>> index 3071a52..ac3409e 100644
>> --- a/drivers/isdn/gigaset/ser-gigaset.c
>> +++ b/drivers/isdn/gigaset/ser-gigaset.c
>> @@ -164,9 +164,15 @@ static void gigaset_modem_fill(unsigned long data)
>>  {
>>  	struct cardstate *cs = (struct cardstate *) data;
>>  	struct bc_state *bcs;
>> +	struct sk_buff *nextskb;
>>  	int sent = 0;
>>  
>> -	if (!cs || !(bcs = cs->bcs)) {
>> +	if (!cs) {
>> +		gig_dbg(DEBUG_OUTPUT, "%s: no cardstate", __func__);
>> +		return;
>> +	}
>> +	bcs = cs->bcs;
>> +	if (!bcs) {
>>  		gig_dbg(DEBUG_OUTPUT, "%s: no cardstate", __func__);
>>  		return;
>> 	}
> 
> perhaps:
> 	if (!cs || !cs->bcs) {
> 		gig_dbg(DEBUG_OUTPUT, "%s: no cardstate", __func__);
> 		return;
> 	}
> 	bcs = cs->bcs;

That would evaluate cs->bcs twice, and is also, in my experience,
significantly more prone to easily overlooked typos which result in
checking a different pointer in the if statement than the one that's
actually used in the subsequent assignment.

>> @@ -404,16 +412,20 @@ static void gigaset_device_release(struct device *dev)
>>  static int gigaset_initcshw(struct cardstate *cs)
>>  {
>>  	int rc;
>> +	struct ser_cardstate *scs;
>>  
>> -	if (!(cs->hw.ser = kzalloc(sizeof(struct ser_cardstate), GFP_KERNEL))) {
>> +	scs = kzalloc(sizeof(struct ser_cardstate), GFP_KERNEL);
>> +	if (!scs) {
>>  		pr_err("out of memory\n");
>>  		return 0;
>>  	}
>> +	cs->hw.ser = scs;
> 
> Why not no temporary and just:
> 
> 	cs->hw.ser = kzalloc...
> 	if (!cs->hw.ser)

For the same reasons as above.

Thanks,
Tilman

-- 
Tilman Schmidt                    E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 254 bytes --]

^ permalink raw reply

* [RFC PATCH] fib_hash: improve route deletion scaling on interface drop with lots of interfaces
From: Benjamin LaHaise @ 2009-10-27  0:03 UTC (permalink / raw)
  To: netdev

Hi folks,

Below is a patch to improve the scaling of interface destruction in 
fib_hash.  The general idea is to tie the fib_alias structure into a 
list off of net_device and walk that list during a fib_flush() caused 
by an interface drop.  This makes the resulting flush only have to walk 
the number of routes attached to an interface rather than the number of 
routes attached to all interfaces at the expense of a couple of additional 
pointers in struct fib_alias.

This patch is against Linus' tree.  I'll post against net-next after a 
bit more testing and feedback.  With 20,000 interfaces & routes, interface 
deletion time improves from 53s to 40s.  Note that this is with other changes 
applied to improve sysfs and procfs scaling, as otherwise those are the 
bottleneck.  Next up in the network code is rt_cache_flush().  Comments?

		-ben

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 812a5f3..982045b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -856,6 +856,7 @@ struct net_device
 
 	/* delayed register/unregister */
 	struct list_head	todo_list;
+	struct list_head	fib_list;
 	/* device index hash chain */
 	struct hlist_node	index_hlist;
 
diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index ef91fe9..0c32193 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -149,7 +149,7 @@ struct fib_table {
 	int		(*tb_delete)(struct fib_table *, struct fib_config *);
 	int		(*tb_dump)(struct fib_table *table, struct sk_buff *skb,
 				     struct netlink_callback *cb);
-	int		(*tb_flush)(struct fib_table *table);
+	int		(*tb_flush)(struct fib_table *table, struct net_device *dev);
 	void		(*tb_select_default)(struct fib_table *table,
 					     const struct flowi *flp, struct fib_result *res);
 
diff --git a/net/core/dev.c b/net/core/dev.c
index b8f74cf..9f6f736 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5173,6 +5173,7 @@ struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 	netdev_init_queues(dev);
 
 	INIT_LIST_HEAD(&dev->napi_list);
+	INIT_LIST_HEAD(&dev->fib_list);
 	dev->priv_flags = IFF_XMIT_DST_RELEASE;
 	setup(dev);
 	strcpy(dev->name, name);
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index e2f9505..0283b1f 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -128,18 +128,19 @@ void fib_select_default(struct net *net,
 		tb->tb_select_default(tb, flp, res);
 }
 
-static void fib_flush(struct net *net)
+static void fib_flush(struct net_device *dev)
 {
 	int flushed = 0;
 	struct fib_table *tb;
 	struct hlist_node *node;
 	struct hlist_head *head;
 	unsigned int h;
+	struct net *net = dev_net(dev);
 
 	for (h = 0; h < FIB_TABLE_HASHSZ; h++) {
 		head = &net->ipv4.fib_table_hash[h];
 		hlist_for_each_entry(tb, node, head, tb_hlist)
-			flushed += tb->tb_flush(tb);
+			flushed += tb->tb_flush(tb, dev);
 	}
 
 	if (flushed)
@@ -805,7 +806,7 @@ static void fib_del_ifaddr(struct in_ifaddr *ifa)
 			   for stray nexthop entries, then ignite fib_flush.
 			*/
 			if (fib_sync_down_addr(dev_net(dev), ifa->ifa_local))
-				fib_flush(dev_net(dev));
+				fib_flush(dev);
 		}
 	}
 #undef LOCAL_OK
@@ -895,7 +896,7 @@ static void nl_fib_lookup_exit(struct net *net)
 static void fib_disable_ip(struct net_device *dev, int force)
 {
 	if (fib_sync_down_dev(dev, force))
-		fib_flush(dev_net(dev));
+		fib_flush(dev);
 	rt_cache_flush(dev_net(dev), 0);
 	arp_ifdown(dev);
 }
@@ -1009,7 +1010,7 @@ static void __net_exit ip_fib_net_exit(struct net *net)
 		head = &net->ipv4.fib_table_hash[i];
 		hlist_for_each_entry_safe(tb, node, tmp, head, tb_hlist) {
 			hlist_del(node);
-			tb->tb_flush(tb);
+			tb->tb_flush(tb, NULL);
 			kfree(tb);
 		}
 	}
diff --git a/net/ipv4/fib_hash.c b/net/ipv4/fib_hash.c
index ecd3945..d08ba2f 100644
--- a/net/ipv4/fib_hash.c
+++ b/net/ipv4/fib_hash.c
@@ -377,6 +377,7 @@ static int fn_hash_insert(struct fib_table *tb, struct fib_config *cfg)
 	u8 tos = cfg->fc_tos;
 	__be32 key;
 	int err;
+	struct net_device *dev;
 
 	if (cfg->fc_dst_len > 32)
 		return -EINVAL;
@@ -516,6 +517,10 @@ static int fn_hash_insert(struct fib_table *tb, struct fib_config *cfg)
 	new_fa->fa_type = cfg->fc_type;
 	new_fa->fa_scope = cfg->fc_scope;
 	new_fa->fa_state = 0;
+	new_fa->fa_fib_node = f;
+	new_fa->fa_fz = fz;
+
+	dev = fi->fib_dev;
 
 	/*
 	 * Insert new entry to the list.
@@ -527,6 +532,7 @@ static int fn_hash_insert(struct fib_table *tb, struct fib_config *cfg)
 	list_add_tail(&new_fa->fa_list,
 		 (fa ? &fa->fa_list : &f->fn_alias));
 	fib_hash_genid++;
+	list_add_tail(&new_fa->fa_dev_list, &dev->fib_list);
 	write_unlock_bh(&fib_hash_lock);
 
 	if (new_f)
@@ -605,6 +611,7 @@ static int fn_hash_delete(struct fib_table *tb, struct fib_config *cfg)
 		kill_fn = 0;
 		write_lock_bh(&fib_hash_lock);
 		list_del(&fa->fa_list);
+		list_del(&fa->fa_dev_list);
 		if (list_empty(&f->fn_alias)) {
 			hlist_del(&f->fn_hash);
 			kill_fn = 1;
@@ -643,6 +650,7 @@ static int fn_flush_list(struct fn_zone *fz, int idx)
 			if (fi && (fi->fib_flags&RTNH_F_DEAD)) {
 				write_lock_bh(&fib_hash_lock);
 				list_del(&fa->fa_list);
+				list_del(&fa->fa_dev_list);
 				if (list_empty(&f->fn_alias)) {
 					hlist_del(&f->fn_hash);
 					kill_f = 1;
@@ -662,17 +670,69 @@ static int fn_flush_list(struct fn_zone *fz, int idx)
 	return found;
 }
 
-static int fn_hash_flush(struct fib_table *tb)
+static int fn_flush_alias(struct fn_hash *table, struct fib_alias *fa)
+{
+	int kill_f = 0;
+	struct fib_info *fi = fa->fa_info;
+	int found = 0;
+
+	if (!fi)
+		BUG();
+
+	if (fi && (fi->fib_flags & RTNH_F_DEAD)) {
+		struct fib_node *f = fa->fa_fib_node;
+		struct fn_zone *fz = fa->fa_fz;
+
+		write_lock_bh(&fib_hash_lock);
+		list_del(&fa->fa_list);
+		list_del(&fa->fa_dev_list);
+		if (list_empty(&f->fn_alias)) {
+			hlist_del(&f->fn_hash);
+			kill_f = 1;
+		}
+		fib_hash_genid++;
+		write_unlock_bh(&fib_hash_lock);
+
+		fn_free_alias(fa, f);
+		found++;
+
+		if (kill_f)
+			fn_free_node(f);
+		fz->fz_nent--;
+	}
+
+	return found;
+}
+
+static int fn_flush_dev(struct fn_hash *table, struct net_device *dev)
+{
+	int found = 0;
+	struct list_head *pos, *next;
+
+	list_for_each_safe(pos, next, &dev->fib_list) {
+		struct fib_alias *fa =
+			container_of(pos, struct fib_alias, fa_dev_list);
+		found += fn_flush_alias(table, fa);
+	}
+
+	return found;
+}
+
+static int fn_hash_flush(struct fib_table *tb, struct net_device *dev)
 {
 	struct fn_hash *table = (struct fn_hash *) tb->tb_data;
 	struct fn_zone *fz;
 	int found = 0;
 
-	for (fz = table->fn_zone_list; fz; fz = fz->fz_next) {
-		int i;
+	if (dev) {
+		found = fn_flush_dev(table, dev);
+	} else {
+		for (fz = table->fn_zone_list; fz; fz = fz->fz_next) {
+			int i;
 
-		for (i = fz->fz_divisor - 1; i >= 0; i--)
-			found += fn_flush_list(fz, i);
+			for (i = fz->fz_divisor - 1; i >= 0; i--)
+				found += fn_flush_list(fz, i);
+		}
 	}
 	return found;
 }
diff --git a/net/ipv4/fib_lookup.h b/net/ipv4/fib_lookup.h
index 637b133..9f2fad1 100644
--- a/net/ipv4/fib_lookup.h
+++ b/net/ipv4/fib_lookup.h
@@ -5,9 +5,17 @@
 #include <linux/list.h>
 #include <net/ip_fib.h>
 
+struct fib_node;
+struct fn_zone;
+
 struct fib_alias {
 	struct list_head	fa_list;
+	struct list_head	fa_dev_list;
 	struct fib_info		*fa_info;
+#ifdef CONFIG_IP_FIB_HASH
+	struct fib_node		*fa_fib_node;
+	struct fn_zone		*fa_fz;
+#endif
 	u8			fa_tos;
 	u8			fa_type;
 	u8			fa_scope;
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 291bdf5..4805772 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -1786,7 +1786,7 @@ static struct leaf *trie_leafindex(struct trie *t, int index)
 /*
  * Caller must hold RTNL.
  */
-static int fn_trie_flush(struct fib_table *tb)
+static int fn_trie_flush(struct fib_table *tb, struct net_device *dev)
 {
 	struct trie *t = (struct trie *) tb->tb_data;
 	struct leaf *l, *ll = NULL;

^ permalink raw reply related

* Re: [PATCH 6/9] ser_gigaset: checkpatch cleanup
From: Joe Perches @ 2009-10-27  0:14 UTC (permalink / raw)
  To: Tilman Schmidt
  Cc: David Miller, Karsten Keil, Hansjoerg Lipp, netdev, linux-kernel,
	isdn4linux, i4ldeveloper
In-Reply-To: <4AE637D8.60809@imap.cc>

On Tue, 2009-10-27 at 00:59 +0100, Tilman Schmidt wrote:
> Am 26.10.2009 01:54 schrieb Joe Perches:
> > On Sun, 2009-10-25 at 20:30 +0100, Tilman Schmidt wrote:
> >> Duly uglified as demanded by checkpatch.pl.
> >> diff --git a/drivers/isdn/gigaset/ser-gigaset.c b/drivers/isdn/gigaset/ser-gigaset.c
> >> index 3071a52..ac3409e 100644
> >> --- a/drivers/isdn/gigaset/ser-gigaset.c
> >> +++ b/drivers/isdn/gigaset/ser-gigaset.c
> >> @@ -164,9 +164,15 @@ static void gigaset_modem_fill(unsigned long data)
> >>  {
> >>  	struct cardstate *cs = (struct cardstate *) data;
> >>  	struct bc_state *bcs;
> >> +	struct sk_buff *nextskb;
> >>  	int sent = 0;
> >>  
> >> -	if (!cs || !(bcs = cs->bcs)) {
> >> +	if (!cs) {
> >> +		gig_dbg(DEBUG_OUTPUT, "%s: no cardstate", __func__);
> >> +		return;
> >> +	}
> >> +	bcs = cs->bcs;
> >> +	if (!bcs) {
> >>  		gig_dbg(DEBUG_OUTPUT, "%s: no cardstate", __func__);
> >>  		return;
> >> 	}
> > 
> > perhaps:
> > 	if (!cs || !cs->bcs) {
> > 		gig_dbg(DEBUG_OUTPUT, "%s: no cardstate", __func__);
> > 		return;
> > 	}
> > 	bcs = cs->bcs;
> 
> That would evaluate cs->bcs twice, and is also, in my experience,
> significantly more prone to easily overlooked typos which result in
> checking a different pointer in the if statement than the one that's
> actually used in the subsequent assignment.

The other is to duplicate the gig_dbg function as you've done.
Also prone to typos and more code as well.

> >> @@ -404,16 +412,20 @@ static void gigaset_device_release(struct device *dev)
> >>  static int gigaset_initcshw(struct cardstate *cs)
> >>  {
> >>  	int rc;
> >> +	struct ser_cardstate *scs;
> >>  
> >> -	if (!(cs->hw.ser = kzalloc(sizeof(struct ser_cardstate), GFP_KERNEL))) {
> >> +	scs = kzalloc(sizeof(struct ser_cardstate), GFP_KERNEL);
> >> +	if (!scs) {
> >>  		pr_err("out of memory\n");
> >>  		return 0;
> >>  	}
> >> +	cs->hw.ser = scs;
> > 
> > Why not no temporary and just:
> > 
> > 	cs->hw.ser = kzalloc...
> > 	if (!cs->hw.ser)
> 
> For the same reasons as above.

I believe the checkpatch recommended form is:

	foo = func();
	if ([!]foo) {
		handle_error()...
	}

as you've used in all the other conversions.

No big deal or difference, but I think what I
suggested is more kernel style normal.

cheers, Joe


^ permalink raw reply

* Re: [RFC PATCH] fib_hash: improve route deletion scaling on interface drop with lots of interfaces
From: David Miller @ 2009-10-27  0:17 UTC (permalink / raw)
  To: bcrl; +Cc: netdev
In-Reply-To: <20091027000302.GA3141@kvack.org>

From: Benjamin LaHaise <bcrl@lhnet.ca>
Date: Mon, 26 Oct 2009 20:03:02 -0400

> Below is a patch to improve the scaling of interface destruction in 
> fib_hash.  The general idea is to tie the fib_alias structure into a 
> list off of net_device and walk that list during a fib_flush() caused 
> by an interface drop.  This makes the resulting flush only have to walk 
> the number of routes attached to an interface rather than the number of 
> routes attached to all interfaces at the expense of a couple of additional 
> pointers in struct fib_alias.
> 
> This patch is against Linus' tree.  I'll post against net-next after a 
> bit more testing and feedback.  With 20,000 interfaces & routes, interface 
> deletion time improves from 53s to 40s.  Note that this is with other changes 
> applied to improve sysfs and procfs scaling, as otherwise those are the 
> bottleneck.  Next up in the network code is rt_cache_flush().  Comments?

On a real router adding and removing routes is happening a lot
whereas interface changes are rare.  You're making a more common
operation more expensive for the sake of a less common one.

> @@ -128,18 +128,19 @@ void fib_select_default(struct net *net,
>  		tb->tb_select_default(tb, flp, res);
>  }
>  
> -static void fib_flush(struct net *net)
> +static void fib_flush(struct net_device *dev)
>  {
>  	int flushed = 0;
>  	struct fib_table *tb;
>  	struct hlist_node *node;
>  	struct hlist_head *head;
>  	unsigned int h;
> +	struct net *net = dev_net(dev);
>  

Please put local variable lines that are longer at the beginning of
the list of variable declarations at the top of a function, not the
other way around which stands out like a sore thumb and looks ugly.

^ permalink raw reply

* Re: [PATCH] sh_eth: Add asm/cacheflush.h
From: David Miller @ 2009-10-27  0:19 UTC (permalink / raw)
  To: iwamatsu; +Cc: netdev, linux-sh, lethal
In-Reply-To: <29ab51dc0910261649p2fb22799o5cc70a71dff0f1dd@mail.gmail.com>

From: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>
Date: Tue, 27 Oct 2009 08:49:50 +0900

> Add include asm/cacheflush.h,  because declaration of __flush_purge_region
> moved to asm/cacheflush.h.
> 
> Signed-off-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>

Applied, thanks.

^ permalink raw reply

* Re: [RFC PATCH] fib_hash: improve route deletion scaling on interface drop with lots of interfaces
From: Stephen Hemminger @ 2009-10-27  0:24 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: netdev
In-Reply-To: <20091027000302.GA3141@kvack.org>

On Mon, 26 Oct 2009 20:03:02 -0400
Benjamin LaHaise <bcrl@lhnet.ca> wrote:

> Hi folks,
> 
> Below is a patch to improve the scaling of interface destruction in 
> fib_hash.  The general idea is to tie the fib_alias structure into a 
> list off of net_device and walk that list during a fib_flush() caused 
> by an interface drop.  This makes the resulting flush only have to walk 
> the number of routes attached to an interface rather than the number of 
> routes attached to all interfaces at the expense of a couple of additional 
> pointers in struct fib_alias.
> 
> This patch is against Linus' tree.  I'll post against net-next after a 
> bit more testing and feedback.  With 20,000 interfaces & routes, interface 
> deletion time improves from 53s to 40s.  Note that this is with other changes 
> applied to improve sysfs and procfs scaling, as otherwise those are the 
> bottleneck.  Next up in the network code is rt_cache_flush().  Comments?
> 
> 		-ben
> 

Any one doing large number of interfaces should be using FIB_TRIE?



-- 

^ permalink raw reply

* Re: [PATCH] sh_eth: Add asm/cacheflush.h
From: Paul Mundt @ 2009-10-27  0:25 UTC (permalink / raw)
  To: David Miller; +Cc: iwamatsu, netdev, linux-sh
In-Reply-To: <20091026.171956.110203653.davem@davemloft.net>

On Mon, Oct 26, 2009 at 05:19:56PM -0700, David Miller wrote:
> From: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>
> Date: Tue, 27 Oct 2009 08:49:50 +0900
> 
> > Add include asm/cacheflush.h,  because declaration of __flush_purge_region
> > moved to asm/cacheflush.h.
> > 
> > Signed-off-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>
> 
> Applied, thanks.

Even though Iwamatsu-san didn't specify the kernel version, this is
relevant for 2.6.32 (as that's where we made the change).

^ permalink raw reply

* Re: [PATCH] sh_eth: Add asm/cacheflush.h
From: David Miller @ 2009-10-27  0:28 UTC (permalink / raw)
  To: lethal; +Cc: iwamatsu, netdev, linux-sh
In-Reply-To: <20091027002525.GA17085@linux-sh.org>

From: Paul Mundt <lethal@linux-sh.org>
Date: Tue, 27 Oct 2009 09:25:25 +0900

> On Mon, Oct 26, 2009 at 05:19:56PM -0700, David Miller wrote:
>> From: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>
>> Date: Tue, 27 Oct 2009 08:49:50 +0900
>> 
>> > Add include asm/cacheflush.h,  because declaration of __flush_purge_region
>> > moved to asm/cacheflush.h.
>> > 
>> > Signed-off-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>
>> 
>> Applied, thanks.
> 
> Even though Iwamatsu-san didn't specify the kernel version, this is
> relevant for 2.6.32 (as that's where we made the change).

Ok.

^ permalink raw reply

* Re: [PATCH] vlan: allow VLAN ID 0 to be used
From: David Miller @ 2009-10-27  0:32 UTC (permalink / raw)
  To: eric.dumazet; +Cc: benny+usenet, gertjan_hofman, mcarlson, netdev, kaber
In-Reply-To: <4AE5CAC6.4000604@gmail.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 26 Oct 2009 17:13:58 +0100

> [PATCH] vlan: allow VLAN ID 0 to be used
> 
> We currently use a 16 bit field (vlan_tci) to store VLAN ID on a skb.
> 
> 0 value is used a special value, meaning VLAN ID not set.
> This forbids use of VLAN ID 0
> 
> As VLAN ID is 12 bits, we can use high order bit as a flag, and
> allow VLAN ID 0
> 
> Reported-by: Gertjan Hofman <gertjan_hofman@yahoo.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

This is going to need some more work.

IXGBE is already using the higher bits of ->vlan_tci internally,
your change breaks that.

QLGE explicitly initializes skb->vlan_tci to zero, you'll need to make
sure that's OK.

There is an explicit "if (skb->vlan_tci" (ie. zero vs. non-zero) test
in net/core/dev.c:netif_receive_skb()

net/core/skbuff.c:__copy_skb_header() does a straight copy, you'll
need to make sure that's still OK.

net/packet/af_packet.c:tpacket_rcv() and packet_recvmsg() report the
skb->vlan_tci value to userspace, that's broken now as userspace
doesn't expect that new bit to be there.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox