Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 2/7]: [NET]: Add NAPI_STATE_DISABLE.
From: David Miller @ 2008-01-08  5:38 UTC (permalink / raw)
  To: netdev

[NET]: Add NAPI_STATE_DISABLE.

Create a bit to signal that a napi_disable() is in progress.

This sets up infrastructure such that net_rx_action() can generically
break out of the ->poll() loop on a NAPI context that has a pending
napi_disable() yet is being bombed with packets (and thus would
otherwise poll endlessly and not allow the napi_disable() to finish).

Now, what napi_disable() does is first set the NAPI_STATE_DISABLE bit
(to indicate that a disable is pending), then it polls for the
NAPI_STATE_SCHED bit, and once the NAPI_STATE_SCHED bit is acquired
the NAPI_STATE_DISABLE bit is cleared.  Here, the test_and_set_bit()
provides the necessary memory barrier between the various bitops.

napi_schedule_prep() now tests for a pending disable as it's first
action and won't try to obtain the NAPI_STATE_SCHED bit if a disable
is pending.

As a result, we can remove the netif_running() check in
netif_rx_schedule_prep() because the NAPI disable pending state serves
this purpose.  And, it does so in a NAPI centric manner which is what
we really want.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/netdevice.h |   16 +++++++++++++---
 1 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e393995..b0813c3 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -319,21 +319,29 @@ struct napi_struct {
 enum
 {
 	NAPI_STATE_SCHED,	/* Poll is scheduled */
+	NAPI_STATE_DISABLE,	/* Disable pending */
 };

 extern void FASTCALL(__napi_schedule(struct napi_struct *n));

+static inline int napi_disable_pending(struct napi_struct *n)
+{
+	return test_bit(NAPI_STATE_DISABLE, &n->state);
+}
+
 /**
  *	napi_schedule_prep - check if napi can be scheduled
  *	@n: napi context
  *
  * Test if NAPI routine is already running, and if not mark
  * it as running.  This is used as a condition variable
- * insure only one NAPI poll instance runs
+ * insure only one NAPI poll instance runs.  We also make
+ * sure there is no pending NAPI disable.
  */
 static inline int napi_schedule_prep(struct napi_struct *n)
 {
-	return !test_and_set_bit(NAPI_STATE_SCHED, &n->state);
+	return !napi_disable_pending(n) &&
+		!test_and_set_bit(NAPI_STATE_SCHED, &n->state);
 }

 /**
@@ -389,8 +397,10 @@ static inline void napi_complete(struct napi_struct *n)
  */
 static inline void napi_disable(struct napi_struct *n)
 {
+	set_bit(NAPI_STATE_DISABLE, &n->state);
 	while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
 		msleep(1);
+	clear_bit(NAPI_STATE_DISABLE, &n->state);
 }

 /**
@@ -1268,7 +1278,7 @@ static inline u32 netif_msg_init(int debug_value, int default_msg_enable_bits)
 static inline int netif_rx_schedule_prep(struct net_device *dev,
 					 struct napi_struct *napi)
 {
-	return netif_running(dev) && napi_schedule_prep(napi);
+	return napi_schedule_prep(napi);
 }

 /* Add interface to tail of rx poll list. This assumes that _prep has
-- 
1.5.4.rc2.38.gd6da3

^ permalink raw reply related

* [PATCH 1/7]: [NET]: Do not grab device reference when scheduling a NAPI poll.
From: David Miller @ 2008-01-08  5:38 UTC (permalink / raw)
  To: netdev

[NET]: Do not grab device reference when scheduling a NAPI poll.

It is pointless, because everything that can make a device go away
will do a napi_disable() first.

The main impetus behind this is that now we can legally do a NAPI
completion in generic code like net_rx_action() which a following
changeset needs to do.  net_rx_action() can only perform actions
in NAPI centric ways, because there may be a one to many mapping
between NAPI contexts and network devices (SKY2 is one example).

We also want to get rid of this because it's an extra atomic in the
NAPI paths, and also because it is one of the last instances where the
NAPI interfaces care about net devices.

The one remaining netdev detail the NAPI stuff cares about is the
netif_running() check which will be killed off in a subsequent
changeset.

Signed-off-by: David S. Miller <davem@davemloft.net>
---
 include/linux/netdevice.h |    2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 1e6af4f..e393995 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1277,7 +1277,6 @@ static inline int netif_rx_schedule_prep(struct net_device *dev,
 static inline void __netif_rx_schedule(struct net_device *dev,
 				       struct napi_struct *napi)
 {
-	dev_hold(dev);
 	__napi_schedule(napi);
 }

@@ -1308,7 +1307,6 @@ static inline void __netif_rx_complete(struct net_device *dev,
 				       struct napi_struct *napi)
 {
 	__napi_complete(napi);
-	dev_put(dev);
 }

 /* Remove interface from poll list: it must be in the poll list
-- 
1.5.4.rc2.38.gd6da3

^ permalink raw reply related

* [PATCH 0/7]: Fix napi_disable() wedge during packet flood.
From: David Miller @ 2008-01-08  5:37 UTC (permalink / raw)
  To: netdev

After going back and forth several times on how to fix this bug I
finally think I have a clean solution.

It is possible to clean up a ton of stuff, such as getting rid of
netif_rx_schedule() et al. (none of these interfaces care about the
netdev arg any long) but we should do that post 2.6.24 and just do
what is necessary to fix this bug first.

To the Intel driver maintainers, I know the last patch is contentious.
But at this time it is more important to have all of these drivers
behaving consistently.  If you want to have the TX work factor into
the ->poll() logic that is fine but please discuss it with the list
and apply that change consistently over all of the 5 drivers.  The
current situation was a bit of a mess.

^ permalink raw reply

* Re: [PATCH] via-velocity big-endian support
From: linux @ 2008-01-08  5:34 UTC (permalink / raw)
  To: romieu, viro; +Cc: jgarzik, linux, netdev
In-Reply-To: <20080107231859.GA3450@electric-eye.fr.zoreil.com>

It doesn't look like you need a test report, but here's one anyway...
I grabbed the patch series from git and am running it successfully
right now.

^ permalink raw reply

* Re: Please pull 'fixes-davem' branch of wireless-2.6
From: David Miller @ 2008-01-08  5:21 UTC (permalink / raw)
  To: linville-2XuSBdqkA4R54TAoqtyWWQ
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20080108051425.GA3125-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>

From: "John W. Linville" <linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>
Date: Tue, 8 Jan 2008 00:14:25 -0500

> Two more fixes for 2.6.24.  I think they are self-explanatory --
> let me know if I'm too optimistic! :-)

Thanks John, I'll pull these in shortly.

^ permalink raw reply

* Please pull 'fixes-davem' branch of wireless-2.6
From: John W. Linville @ 2008-01-08  5:14 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA

Dave,

Two more fixes for 2.6.24.  I think they are self-explanatory --
let me know if I'm too optimistic! :-)

Thanks,

John

---

Individual patches are available here:

	http://www.kernel.org/pub/linux/kernel/people/linville/wireless-2.6/fixes-davem

---

The following changes since commit 3ce54450461bad18bbe1f9f5aa3ecd2f8e8d1235:
  Linus Torvalds (1):
        Linux 2.6.24-rc7

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git fixes-davem

Andrew Lutomirski (1):
      mac80211: return an error when SIWRATE doesn't match any rate

Michael Buesch (1):
      ssb: Fix probing of PCI cores if PCI and PCIE core is available

 drivers/ssb/scan.c             |   11 +++++++++++
 net/mac80211/ieee80211_ioctl.c |    6 +++---
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/ssb/scan.c b/drivers/ssb/scan.c
index 96258c6..63ee5cf 100644
--- a/drivers/ssb/scan.c
+++ b/drivers/ssb/scan.c
@@ -388,6 +388,17 @@ int ssb_bus_scan(struct ssb_bus *bus,
 		case SSB_DEV_PCI:
 		case SSB_DEV_PCIE:
 #ifdef CONFIG_SSB_DRIVER_PCICORE
+			if (bus->bustype == SSB_BUSTYPE_PCI) {
+				/* Ignore PCI cores on PCI-E cards.
+				 * Ignore PCI-E cores on PCI cards. */
+				if (dev->id.coreid == SSB_DEV_PCI) {
+					if (bus->host_pci->is_pcie)
+						continue;
+				} else {
+					if (!bus->host_pci->is_pcie)
+						continue;
+				}
+			}
 			if (bus->pcicore.dev) {
 				ssb_printk(KERN_WARNING PFX
 					   "WARNING: Multiple PCI(E) cores found\n");
diff --git a/net/mac80211/ieee80211_ioctl.c b/net/mac80211/ieee80211_ioctl.c
index 7027eed..308bbe4 100644
--- a/net/mac80211/ieee80211_ioctl.c
+++ b/net/mac80211/ieee80211_ioctl.c
@@ -591,7 +591,7 @@ static int ieee80211_ioctl_siwrate(struct net_device *dev,
 	sdata->bss->force_unicast_rateidx = -1;
 	if (rate->value < 0)
 		return 0;
-	for (i=0; i< mode->num_rates; i++) {
+	for (i=0; i < mode->num_rates; i++) {
 		struct ieee80211_rate *rates = &mode->rates[i];
 		int this_rate = rates->rate;
 
@@ -599,10 +599,10 @@ static int ieee80211_ioctl_siwrate(struct net_device *dev,
 			sdata->bss->max_ratectrl_rateidx = i;
 			if (rate->fixed)
 				sdata->bss->force_unicast_rateidx = i;
-			break;
+			return 0;
 		}
 	}
-	return 0;
+	return -EINVAL;
 }
 
 static int ieee80211_ioctl_giwrate(struct net_device *dev,
-- 
John W. Linville
linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org

^ permalink raw reply related

* Re: [PATCH 3/4] [XFRM]: Kill some bloat
From: David Miller @ 2008-01-08  5:10 UTC (permalink / raw)
  To: andi; +Cc: herbert, ilpo.jarvinen, netdev, acme, paul.moore, latten
In-Reply-To: <20080108050007.GA25338@one.firstfloor.org>

From: Andi Kleen <andi@firstfloor.org>
Date: Tue, 8 Jan 2008 06:00:07 +0100

> On Mon, Jan 07, 2008 at 07:37:00PM -0800, David Miller wrote:
> > The vast majority of them are one, two, and three liners.
> 
> % awk '  { line++ } ; /^{/ { total++; start = line } ; /^}/ { len=line-start-3; if (len > 4) l++; if (len >= 10) k++; } ; END { print total, l, l/total, k, k/total }' < include/net/tcp.h
> 68 28 0.411765 20 0.294118
> 
> 41% are over 4 lines, 29% are >= 10 lines.

Take out the comments and whitespace lines, your script is
too simplistic.

^ permalink raw reply

* Re: [PATCH 3/4] [XFRM]: Kill some bloat
From: Andi Kleen @ 2008-01-08  5:00 UTC (permalink / raw)
  To: David Miller
  Cc: andi, herbert, ilpo.jarvinen, netdev, acme, paul.moore, latten
In-Reply-To: <20080107.193700.159646842.davem@davemloft.net>

On Mon, Jan 07, 2008 at 07:37:00PM -0800, David Miller wrote:
> From: Andi Kleen <andi@firstfloor.org>
> Date: Tue, 8 Jan 2008 03:05:29 +0100
> 
> > On Mon, Jan 07, 2008 at 05:54:58PM -0800, David Miller wrote:
> > > I explicitly left them out.
> > > 
> > > Most of them are abstractions of common 2 or 3 instruction
> > > calculations, and thus should stay inline.
> > 
> > Definitely not in tcp.h. It has quite a lot of very long functions, of
> > which very few really need to be inline: (AFAIK the only one where 
> > it makes really sense is tcp_set_state due to constant evaluation; 
> > although I never quite understood why the callers just didn't 
> > call explicit functions to do these actions) 
> > 
> > % awk '  { line++ } ; /^{/ { start = line } ; /^}/ { n++; r += line-start-2; } ; END { print r/n }' < include/net/tcp.h 
> > 9.48889
> > 
> > The average function length is 9 lines.
> 
> The vast majority of them are one, two, and three liners.

% awk '  { line++ } ; /^{/ { total++; start = line } ; /^}/ { len=line-start-3; if (len > 4) l++; if (len >= 10) k++; } ; END { print total, l, l/total, k, k/total }' < include/net/tcp.h
68 28 0.411765 20 0.294118

41% are over 4 lines, 29% are >= 10 lines.

-Andi


^ permalink raw reply

* Re: [PATCH][XFRM] Statistics: Add outbound-dropping error.
From: Herbert Xu @ 2008-01-08  4:33 UTC (permalink / raw)
  To: Masahide NAKAMURA; +Cc: davem, netdev
In-Reply-To: <11997665683729-git-send-email-nakam@linux-ipv6.org>

On Tue, Jan 08, 2008 at 01:29:28PM +0900, Masahide NAKAMURA wrote:
>
> P.S.
> I don't touch XFRM_LOOKUP_ICMP related error at __xfrm_lookup()
> since it may not drop the packet.
> Correct me if it is wrong or comments are welcomed.

Right, whether the packet is dropped would be decided by the caller.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* [PATCH][XFRM] Statistics: Add outbound-dropping error.
From: Masahide NAKAMURA @ 2008-01-08  4:29 UTC (permalink / raw)
  To: davem; +Cc: herbert, netdev, Masahide NAKAMURA

Hello,

I found two more points where they should be incremented
as XFRM packet dropping counter. Please apply it.

P.S.
I don't touch XFRM_LOOKUP_ICMP related error at __xfrm_lookup()
since it may not drop the packet.
Correct me if it is wrong or comments are welcomed.

[PATCH][XFRM] Statistics: Add outbound-dropping error.

o Increment PolError counter when flow_cache_lookup() returns
  errored pointer.

o Increment NoStates counter at larval-drop.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
---
 net/xfrm/xfrm_policy.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 280f8de..d83227b 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1510,8 +1510,10 @@ restart:
 		policy = flow_cache_lookup(fl, dst_orig->ops->family,
 					   dir, xfrm_policy_lookup);
 		err = PTR_ERR(policy);
-		if (IS_ERR(policy))
+		if (IS_ERR(policy)) {
+			XFRM_INC_STATS(LINUX_MIB_XFRMOUTPOLERROR);
 			goto dropdst;
+		}
 	}

 	if (!policy)
@@ -1603,6 +1605,7 @@ restart:
 				/* EREMOTE tells the caller to generate
 				 * a one-shot blackhole route.
 				 */
+				XFRM_INC_STATS(LINUX_MIB_XFRMOUTNOSTATES);
 				xfrm_pol_put(policy);
 				return -EREMOTE;
 			}
-- 
1.4.4.2

^ permalink raw reply related

* Re: [PATCH 3/4] [XFRM]: Kill some bloat
From: David Miller @ 2008-01-08  3:37 UTC (permalink / raw)
  To: andi; +Cc: herbert, ilpo.jarvinen, netdev, acme, paul.moore, latten
In-Reply-To: <20080108020529.GC16156@one.firstfloor.org>

From: Andi Kleen <andi@firstfloor.org>
Date: Tue, 8 Jan 2008 03:05:29 +0100

> On Mon, Jan 07, 2008 at 05:54:58PM -0800, David Miller wrote:
> > I explicitly left them out.
> > 
> > Most of them are abstractions of common 2 or 3 instruction
> > calculations, and thus should stay inline.
> 
> Definitely not in tcp.h. It has quite a lot of very long functions, of
> which very few really need to be inline: (AFAIK the only one where 
> it makes really sense is tcp_set_state due to constant evaluation; 
> although I never quite understood why the callers just didn't 
> call explicit functions to do these actions) 
> 
> % awk '  { line++ } ; /^{/ { start = line } ; /^}/ { n++; r += line-start-2; } ; END { print r/n }' < include/net/tcp.h 
> 9.48889
> 
> The average function length is 9 lines.

The vast majority of them are one, two, and three liners.

There are about 4 or 5 inlines in there are in fact large and perhaps
should be removed, and these puff up your average.

^ permalink raw reply

* Re: Top 10 kernel oopses for the week ending January 5th, 2008
From: Linus Torvalds @ 2008-01-08  3:26 UTC (permalink / raw)
  To: Kevin Winchester
  Cc: J. Bruce Fields, Al Viro, Arjan van de Ven,
	Linux Kernel Mailing List, Andrew Morton, NetDev
In-Reply-To: <4782CF9C.6000508@gmail.com>

On Mon, 7 Jan 2008, Kevin Winchester wrote:

> J. Bruce Fields wrote:
> > 
> > Is there any good basic documentation on this to point people at?
> 
> I would second this question.  I see people "decode" oops on lkml often 
> enough, but I've never been entirely sure how its done.  Is it somewhere 
> in Documentation?

It's actually not necessarily at all that trivial, unless you have a deep 
understanding of the code generated for the architecture in question (and 
even then, some oopses take more time to figure out than others, thanks 
to inlining and tailcalls etc).

If the oops happened with a kernel you generated yourself, it's usually 
rather easy. Especially if you said "y" to the "generate debugging info" 
question at configuration time. Because, in that case, you really just do 
a simple

	gdb vmlinux

and then you can do (for example) something like setting a breakpoint at 
the EIP that was reported for the oops, and it will tell you what line it 
came from.

However, if you don't have the exact binary - which is the common case for 
random oopses reported on lkml - you will generally have to disassemble 
the hex sequence given in the oops (the "Code:" line), and try to match it 
up against the source code to try to figure out what is going on.

Even just the disassembly is not entirely trivial, since the oops will 
give you the eip that it happened at, but you often want to also 
disassemble *backwards* in order to get more of a context (the "Code:" 
line will mark the particular EIP that starts the oopsing instruction by 
enclosing it in <xx>, but with non-constant instruction lengths, you need 
to use a bit of trial-and-error to figure it out.

I usually just compile a small program like

	const char array[]="\xnn\xnn\xnn...";

	int main(int argc, char **argv)
	{
		printf("%p\n", array);
		*(int *)0=0;
	}

and run it under gdb, and then when it gets the SIGSEGV (due to the 
obvious NULL pointer dereference), I can just ask gdb to disassemble 
around the array that contains the code[] stuff. Try a few offsets, to see 
when the disassembly makes sense (and gives the reported EIP as the 
beginning of one of the disassembled instructions).

(You can do it other and smarter ways too, I'm not claiming that's a 
particularly good way to do it, and the old "ksymoops" program used to do 
a pretty good job of this, but I'm used to that particular idiotic way 
myself, since it's how I've basically always done it)

After that, you still need to try to match up the assembly code with the 
source code and figure out what variables the register contents actually 
are all about. You can often try to do a

	make the/affected/file.s

to generate the asm file in your own tree - the register allocation can be 
totally different due to different compilers and different options (and 
things like the fact that maybe the source tree you do this on doesn't 
match the oops report exactly), but it's usually a good starting point to 
compare the disassembly from gdb with the *.s file output from the 
compiler.

Quite often, it's all very obvious (you see some constant or other simple 
pattern). But if you're not used to the assembly format, you'll spend a 
lot of brainpower just trying to figure that part out even for the obvious 
stuff, which is why it's a good thing if you are very comfortable indeed 
with the assembly language of that particular platform.

It's not really all that hard. But the first few times you see those 
oopses, it all looks mostly like just line noise. So it definitely takes 
some practice to do it well.

Anyway, let's take an example, from

	http://lkml.org/lkml/2008/1/1/189

where the most obviously relevant parts are:

	BUG: unable to handle kernel paging request at virtual address 00100100
	EIP:    0060:[<f8819668>] 
	EIP is at evdev_disconnect+0x65/0x9e

	eax: 00000000   ebx: 000ffcf0   ecx: c1926760   edx: 00000033
	esi: f7415600   edi: f741564c   ebp: f7415654   esp: c1967e68
	Call Trace:
		[<c03454b2>] input_unregister_device+0x6f/0xff
		[<c03c6eb6>] klist_release+0x27/0x30
		[<c029178a>] kref_put+0x5f/0x6c
	..
	Code: 5e 4c 81 eb 10 04 00 00 eb 21 8d 83 08 04 00 00 b9 06 00 02 
	      00 ba 1d 00 00 00 e8 6a 93 95 c7 8b 9b 10 04 00 00 81 eb 10 
	      04 00 00 <8b> 83 10 04 00 00 0f 18 00 90 8d 83 10 04 00 00 
	      39 f8 75 cb 8d

so here let's do the above silly C program:

	const char array[]="\x5e\x4c\x81\xeb\x10\x04\x00\x00\xeb\x21..

and running it under gdb gives:

	0x8048500

	Program received signal SIGSEGV, Segmentation fault.
	0x080483f7 in main () at test.c:14
	14              *(int*)0=0;

and now I can just try

	x/20i 0x8048500

and it turns out that already gives a reasonable disassembly. The first 
few instructions are bogus: they're really part of the previous 
instruction, but it looks pretty sane around the actual problem spot, 
which is "array+43" (there are 42 bytes of code before the EIP one, and 20 
bytes after):

	0x8048500 <array>:      pop    %esi
	0x8048501 <array+1>:    dec    %esp
	0x8048502 <array+2>:    sub    $0x410,%ebx
	0x8048508 <array+8>:    jmp    0x804852b <array+43>
	0x804850a <array+10>:   lea    0x408(%ebx),%eax
	0x8048510 <array+16>:   mov    $0x20006,%ecx
	0x8048515 <array+21>:   mov    $0x1d,%edx
	0x804851a <array+26>:   call   0xcf9a1889
	0x804851f <array+31>:   mov    0x410(%ebx),%ebx
	0x8048525 <array+37>:   sub    $0x410,%ebx
	0x804852b <array+43>:   mov    0x410(%ebx),%eax
	0x8048531 <array+49>:   prefetchnta (%eax)
	0x8048534 <array+52>:   nop
	0x8048535 <array+53>:   lea    0x410(%ebx),%eax
	0x804853b <array+59>:   cmp    %edi,%eax
	0x804853d <array+61>:   jne    0x804850a <array+10>
	0x804853f <array+63>:   lea    (%eax),%eax
	.. 

so now we know that the faulting instruction was that

	mov    0x410(%ebx),%eax

and we can also see that this also matches the address that caused the 
oops (ebx=000ffcf0, so 0x410(%ebx) is 00100100, which matches the "unable 
to handle kernel paging request" message).

(Now, people used to kernel oopses will also recognize 00100100 as the 
LIST_POISON1, so this is all about dereferencing the ->next pointer of a 
list entry that has been removed from the list, but that's a whole 
separate level of kernel knowledge).

Anyway, you can now do

	make drivers/input/evdev.s

and see if you can find that kind of code sequence in there. You can use 
the "EIP: evdev_disconnect+0x65/0x9e" thing as a hint: if your compiler 
setup isn't too different, it's likely to be roughly two thirds into that 
evdev_disconnect function (but inlining really can mean that it's 
somewhere else entirely in the source tree!)

The rest left as an exercise for the reader.

		Linus

^ permalink raw reply

* Re: [PATCH 3/4] [XFRM]: Kill some bloat II
From: Andi Kleen @ 2008-01-08  2:08 UTC (permalink / raw)
  To: Andi Kleen
  Cc: David Miller, herbert, ilpo.jarvinen, netdev, acme, paul.moore,
	latten
In-Reply-To: <20080108020529.GC16156@one.firstfloor.org>

> % awk '  { line++ } ; /^{/ { start = line } ; /^}/ { n++; r += line-start-2; } ; END { print r/n }' < include/net/tcp.h 
> 9.48889
> 
> The average function length is 9 lines.

Actually 8 -- the awk hack had a off by one. Still too long.

-Andi

^ permalink raw reply

* Re: [PATCH 3/4] [XFRM]: Kill some bloat
From: Andi Kleen @ 2008-01-08  2:05 UTC (permalink / raw)
  To: David Miller
  Cc: andi, herbert, ilpo.jarvinen, netdev, acme, paul.moore, latten
In-Reply-To: <20080107.175458.127194310.davem@davemloft.net>

On Mon, Jan 07, 2008 at 05:54:58PM -0800, David Miller wrote:
> From: Andi Kleen <andi@firstfloor.org>
> Date: Tue, 08 Jan 2008 01:23:11 +0100
> 
> > David Miller <davem@davemloft.net> writes:
> > 
> > > Similarly I question just about any inline usage at all in *.c files
> > 
> > Don't forget the .h files. Especially a lot of stuff in tcp.h should
> > be probably in some .c file and not be inline.
> 
> I explicitly left them out.
> 
> Most of them are abstractions of common 2 or 3 instruction
> calculations, and thus should stay inline.

Definitely not in tcp.h. It has quite a lot of very long functions, of
which very few really need to be inline: (AFAIK the only one where 
it makes really sense is tcp_set_state due to constant evaluation; 
although I never quite understood why the callers just didn't 
call explicit functions to do these actions) 

% awk '  { line++ } ; /^{/ { start = line } ; /^}/ { n++; r += line-start-2; } ; END { print r/n }' < include/net/tcp.h 
9.48889

The average function length is 9 lines.

-Andi


^ permalink raw reply

* Re: TCP cache performance
From: David Miller @ 2008-01-08  1:57 UTC (permalink / raw)
  To: virtualphtn; +Cc: netdev, lachlan.andrew
In-Reply-To: <4782D06B.30706@gmail.com>

From: Tom Quetchenbach <virtualphtn@gmail.com>
Date: Mon, 07 Jan 2008 17:22:51 -0800

> This suggests that efforts to improve TCP performance should focus
> on cache usage rather than just processing time.

Thanks for reporting your data, but we very well know what the exact
problem is.

When we recover from loss, we touch thousands of packets freeing up
basically an entire window's worth.

^ permalink raw reply

* [PATCH 3/3] bonding: fix locking during alb failover and slave removal
From: Jay Vosburgh @ 2008-01-08  1:57 UTC (permalink / raw)
  To: netdev
  Cc: Jeff Garzik, David Miller, Andy Gospodarek, Krzysztof Oledzki,
	Jay Vosburgh
In-Reply-To: <1199757423867-git-send-email-fubar@us.ibm.com>

	alb_fasten_mac_swap (actually rlb_teach_disabled_mac_on_primary)
requries RTNL and no other locks.  This could cause dev_set_promiscuity
and/or dev_set_mac_address to be called with improper locking.

	Changed callers to hold only RTNL during calls to alb_fasten_mac_swap
or functions calling it.  Updated header comments in affected functions to
reflect proper reality of locking requirements.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
---
 drivers/net/bonding/bond_alb.c  |   18 ++++++++++++------
 drivers/net/bonding/bond_main.c |   14 ++++++++------
 2 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index 9b55a12..b57bc94 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -979,7 +979,7 @@ static void alb_swap_mac_addr(struct bonding *bond, struct slave *slave1, struct
 /*
  * Send learning packets after MAC address swap.
  *
- * Called with RTNL and bond->lock held for read.
+ * Called with RTNL and no other locks
  */
 static void alb_fasten_mac_swap(struct bonding *bond, struct slave *slave1,
 				struct slave *slave2)
@@ -987,6 +987,8 @@ static void alb_fasten_mac_swap(struct bonding *bond, struct slave *slave1,
 	int slaves_state_differ = (SLAVE_IS_OK(slave1) != SLAVE_IS_OK(slave2));
 	struct slave *disabled_slave = NULL;
 
+	ASSERT_RTNL();
+
 	/* fasten the change in the switch */
 	if (SLAVE_IS_OK(slave1)) {
 		alb_send_learning_packets(slave1, slave1->dev->dev_addr);
@@ -1031,7 +1033,7 @@ static void alb_fasten_mac_swap(struct bonding *bond, struct slave *slave1,
  * a slave that has @slave's permanet address as its current address.
  * We'll make sure that that slave no longer uses @slave's permanent address.
  *
- * Caller must hold bond lock
+ * Caller must hold RTNL and no other locks
  */
 static void alb_change_hw_addr_on_detach(struct bonding *bond, struct slave *slave)
 {
@@ -1542,7 +1544,12 @@ int bond_alb_init_slave(struct bonding *bond, struct slave *slave)
 	return 0;
 }
 
-/* Caller must hold bond lock for write */
+/*
+ * Remove slave from tlb and rlb hash tables, and fix up MAC addresses
+ * if necessary.
+ *
+ * Caller must hold RTNL and no other locks
+ */
 void bond_alb_deinit_slave(struct bonding *bond, struct slave *slave)
 {
 	if (bond->slave_cnt > 1) {
@@ -1658,12 +1665,11 @@ void bond_alb_handle_active_change(struct bonding *bond, struct slave *new_slave
 				       bond->alb_info.rlb_enabled);
 	}
 
-	read_lock(&bond->lock);
-
 	if (swap_slave) {
 		alb_fasten_mac_swap(bond, swap_slave, new_slave);
+		read_lock(&bond->lock);
 	} else {
-		/* fasten bond mac on new current slave */
+		read_lock(&bond->lock);
 		alb_send_learning_packets(new_slave, bond->dev->dev_addr);
 	}
 
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index b0b2603..77d004d 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1746,7 +1746,9 @@ int bond_release(struct net_device *bond_dev, struct net_device *slave_dev)
 		 * has been cleared (if our_slave == old_current),
 		 * but before a new active slave is selected.
 		 */
+		write_unlock_bh(&bond->lock);
 		bond_alb_deinit_slave(bond, slave);
+		write_lock_bh(&bond->lock);
 	}
 
 	if (oldcurrent == slave) {
@@ -1905,6 +1907,12 @@ static int bond_release_all(struct net_device *bond_dev)
 		slave_dev = slave->dev;
 		bond_detach_slave(bond, slave);
 
+		/* now that the slave is detached, unlock and perform
+		 * all the undo steps that should not be called from
+		 * within a lock.
+		 */
+		write_unlock_bh(&bond->lock);
+
 		if ((bond->params.mode == BOND_MODE_TLB) ||
 		    (bond->params.mode == BOND_MODE_ALB)) {
 			/* must be called only after the slave
@@ -1915,12 +1923,6 @@ static int bond_release_all(struct net_device *bond_dev)
 
 		bond_compute_features(bond);
 
-		/* now that the slave is detached, unlock and perform
-		 * all the undo steps that should not be called from
-		 * within a lock.
-		 */
-		write_unlock_bh(&bond->lock);
-
 		bond_destroy_slave_symlinks(bond_dev, slave_dev);
 		bond_del_vlans_from_slave(bond, slave_dev);
 
-- 
1.5.3.4.206.g58ba4-dirty


^ permalink raw reply related

* [PATCH 2/3] bonding: fix ASSERT_RTNL that produces spurious warnings
From: Jay Vosburgh @ 2008-01-08  1:56 UTC (permalink / raw)
  To: netdev
  Cc: Jeff Garzik, David Miller, Andy Gospodarek, Krzysztof Oledzki,
	Jay Vosburgh
In-Reply-To: <11997574222492-git-send-email-fubar@us.ibm.com>

	Move an ASSERT_RTNL down to where we should hold only RTNL;
the existing check produces spurious warnings because we hold additional
locks at _bh, tripping a debug warning in spin_lock_mutex().

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
---
 drivers/net/bonding/bond_alb.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index 25b8dbf..9b55a12 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -1601,9 +1601,6 @@ void bond_alb_handle_active_change(struct bonding *bond, struct slave *new_slave
 	struct slave *swap_slave;
 	int i;
 
-	if (new_slave)
-		ASSERT_RTNL();
-
 	if (bond->curr_active_slave == new_slave) {
 		return;
 	}
@@ -1649,6 +1646,8 @@ void bond_alb_handle_active_change(struct bonding *bond, struct slave *new_slave
 	write_unlock_bh(&bond->curr_slave_lock);
 	read_unlock(&bond->lock);
 
+	ASSERT_RTNL();
+
 	/* curr_active_slave must be set before calling alb_swap_mac_addr */
 	if (swap_slave) {
 		/* swap mac address */
-- 
1.5.3.4.206.g58ba4-dirty


^ permalink raw reply related

* [PATCH 1/3] bonding: fix locking in sysfs primary/active selection
From: Jay Vosburgh @ 2008-01-08  1:56 UTC (permalink / raw)
  To: netdev
  Cc: Jeff Garzik, David Miller, Andy Gospodarek, Krzysztof Oledzki,
	Jay Vosburgh
In-Reply-To: <11997574203125-git-send-email-fubar@us.ibm.com>

	Fix the functions that store the primary and active slave
options via sysfs to hold the correct locks in the correct order.

	The bond_change_active_slave and bond_select_active_slave
functions both require rtnl, bond->lock for read and curr_slave_lock for
write_bh, and no other locks.  This is so that the lower level
mode-specific functions (notably for balance-alb mode) can release locks
down to just rtnl in order to call, e.g., dev_set_mac_address with the
locks it expects (rtnl only).

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
---
 drivers/net/bonding/bond_sysfs.c |   15 ++++++++++-----
 1 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 11b76b3..28a2d80 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -1075,7 +1075,10 @@ static ssize_t bonding_store_primary(struct device *d,
 	struct slave *slave;
 	struct bonding *bond = to_bond(d);
 
-	write_lock_bh(&bond->lock);
+	rtnl_lock();
+	read_lock(&bond->lock);
+	write_lock_bh(&bond->curr_slave_lock);
+
 	if (!USES_PRIMARY(bond->params.mode)) {
 		printk(KERN_INFO DRV_NAME
 		       ": %s: Unable to set primary slave; %s is in mode %d\n",
@@ -1109,8 +1112,8 @@ static ssize_t bonding_store_primary(struct device *d,
 		}
 	}
 out:
-	write_unlock_bh(&bond->lock);
-
+	write_unlock_bh(&bond->curr_slave_lock);
+	read_unlock(&bond->lock);
 	rtnl_unlock();
 
 	return count;
@@ -1190,7 +1193,8 @@ static ssize_t bonding_store_active_slave(struct device *d,
 	struct bonding *bond = to_bond(d);
 
 	rtnl_lock();
-	write_lock_bh(&bond->lock);
+	read_lock(&bond->lock);
+	write_lock_bh(&bond->curr_slave_lock);
 
 	if (!USES_PRIMARY(bond->params.mode)) {
 		printk(KERN_INFO DRV_NAME
@@ -1247,7 +1251,8 @@ static ssize_t bonding_store_active_slave(struct device *d,
 		}
 	}
 out:
-	write_unlock_bh(&bond->lock);
+	write_unlock_bh(&bond->curr_slave_lock);
+	read_unlock(&bond->lock);
 	rtnl_unlock();
 
 	return count;
-- 
1.5.3.4.206.g58ba4-dirty


^ permalink raw reply related

* [PATCH 0/3] bonding: 3 fixes for 2.6.24
From: Jay Vosburgh @ 2008-01-08  1:56 UTC (permalink / raw)
  To: netdev; +Cc: Jeff Garzik, David Miller, Andy Gospodarek, Krzysztof Oledzki

	Following are three fixes to fix locking problems and
silence locking-related warnings in the current 2.6.24-rc.

	patch 1: fix locking in sysfs primary/active selection

	Call core network functions with expected locks to
eliminate potential deadlock and silence warnings.

	patch 2: fix ASSERT_RTNL that produces spurious warnings

	Relocate ASSERT_RTNL to remove a false warning; after patch,
ASSERT is located in code that holds only RTNL (additional locks were
causing the ASSERT to trip)

	patch 3: fix locking during alb failover and slave removal

	Fix all call paths into alb_fasten_mac_swap to hold only RTNL.
Eliminates deadlock and silences warnings.

	Patches are against the current netdev-2.6#upstream branch.

	Please apply for 2.6.24.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: [PATCH 3/4] [XFRM]: Kill some bloat
From: David Miller @ 2008-01-08  1:54 UTC (permalink / raw)
  To: andi; +Cc: herbert, ilpo.jarvinen, netdev, acme, paul.moore, latten
In-Reply-To: <p73sl19qi6o.fsf@bingen.suse.de>

From: Andi Kleen <andi@firstfloor.org>
Date: Tue, 08 Jan 2008 01:23:11 +0100

> David Miller <davem@davemloft.net> writes:
> 
> > Similarly I question just about any inline usage at all in *.c files
> 
> Don't forget the .h files. Especially a lot of stuff in tcp.h should
> be probably in some .c file and not be inline.

I explicitly left them out.

Most of them are abstractions of common 2 or 3 instruction
calculations, and thus should stay inline.

^ permalink raw reply

* TCP cache performance
From: Tom Quetchenbach @ 2008-01-08  1:22 UTC (permalink / raw)
  To: netdev; +Cc: Lachlan Andrew

I've been continuing off and on to investigate TCP performance issues.
As has been noted before on this list, loss and subsequent processing
can lead to spikes in the measured RTT which confuse delay-based
congestion control algorithms.

I've done some experiments that indicate that cache size is a
significant limiting factor here. My desktop machine with a 2.4 GHz Core
Duo and 4 MB cache quite noticeably outperforms our experiment servers,
which have two dual-core Xeons at 2.66 GHz but only 512 KB cache. At 400
Mbps with 40ms round-trip delay and 1024-packet buffer the desktop
behaves fairly normally, although there is still a large RTT spike at
the start of the flow due to slow-start. The servers show large RTT
spikes at each loss event, as well as some timeouts.

This suggests that efforts to improve TCP performance should focus on
cache usage rather than just processing time.

Plots of cwnd, RTT, and CPU load are available here:

512K cache:
http://wil-ns.cs.caltech.edu/benchmark.tmp/265/2flow--ALG=illinois-BUF=1024-BUF_tgt=1333,1.0-BW=400M-GAP=150-LEN=600-RTT=40--1/

4M cache:
http://wil-ns.cs.caltech.edu/benchmark.tmp/266/2flow--ALG=illinois-BUF=1024-BUF_tgt=1333,1.0-BW=400M-GAP=150-LEN=600-RTT=40--1/

Tests were done with net-2.6 (2.6.23.1 gives similar results though)
using tcp_probe to capture data.

-Tom

^ permalink raw reply

* Re: Top 10 kernel oopses for the week ending January 5th, 2008
From: Kevin Winchester @ 2008-01-08  1:19 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Al Viro, Arjan van de Ven, Linux Kernel Mailing List,
	Linus Torvalds, Andrew Morton, NetDev
In-Reply-To: <20080107174431.GC27741@fieldses.org>

J. Bruce Fields wrote:
> On Sat, Jan 05, 2008 at 09:39:35PM +0000, Al Viro wrote:
>> On Sat, Jan 05, 2008 at 01:06:17PM -0800, Arjan van de Ven wrote:
>>> The http://www.kerneloops.org website collects kernel oops and 
>>> warning reports from various mailing lists and bugzillas as well 
>>> as with a client users can install to auto-submit oopses. Below 
>>> is a top 10 list of the oopses collected in the last 7 days. 
>>> (Reports prior to 2.6.23 have been omitted in collecting the top 
>>> 10)
>>> 
>>> This week, a total of 49 oopses and warnings have been reported,
>>>  compared to 53 reports in the previous week.
>> FWIW, people moaning about the lack of entry-level kernel work 
>> would do well by decoding those to the level of "this place in this
>>  function, called from <here>, with so-and-so variable being
>> <this>" and posting the results.  As skills go, it's far more
>> useful than "how to trim the trailing whitespace" and the rest of 
>> checkpatch.pl-inspired crap that got so popular lately...
> 
> Is there any good basic documentation on this to point people at?
> 

I would second this question.  I see people "decode" oops on lkml often enough, but I've never been entirely sure how its done.  Is it somewhere in Documentation?

-- 
Kevin Winchester

^ permalink raw reply

* Re: [PATCH] [IPv6]: IPV6_MULTICAST_IF setting is ignored on link-local connect()
From: David Stevens @ 2008-01-08  1:18 UTC (permalink / raw)
  To: Brian Haley
  Cc: David Miller, netdev@vger.kernel.org, netdev-owner,
	YOSHIFUJI Hideaki
In-Reply-To: <47825B50.2060200@hp.com>

Brian,
        Looks good to me.

                        +-DLS


Acked-by: David L Stevens <dlstevens@us.ibm.com>

> How about the simple patch below?  I just removed the ENINVAL check from 

> my original patch, but it accomplishes the same thing.
...
> 
> Signed-off-by: Brian Haley <brian.haley@hp.com>
> ---
> diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
> index 2ed689a..5d4245a 100644
> --- a/net/ipv6/datagram.c
> +++ b/net/ipv6/datagram.c
> @@ -123,11 +123,11 @@ ipv4_connected:
>              goto out;
>           }
>           sk->sk_bound_dev_if = usin->sin6_scope_id;
> -         if (!sk->sk_bound_dev_if &&
> -             (addr_type & IPV6_ADDR_MULTICAST))
> -            fl.oif = np->mcast_oif;
>        }
> 
> +      if (!sk->sk_bound_dev_if && (addr_type & IPV6_ADDR_MULTICAST))
> +         sk->sk_bound_dev_if = np->mcast_oif;
> +
>        /* Connect to link-local address requires an interface */
>        if (!sk->sk_bound_dev_if) {
>           err = -EINVAL;


^ permalink raw reply

* [PATCH 001/001] ipv4: enable use of 240/4 address space
From: Vince Fuller @ 2008-01-08  1:10 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, vaf

from Vince Fuller <vaf@vaf.net>

This set of diffs modify the 2.6.20 kernel to enable use of the 240/4
(aka "class-E") address space as consistent with the Internet Draft
draft-fuller-240space-00.txt.

Signed-off-by: Vince Fuller <vaf@vaf.net>

---

--- include/linux/in.h.orig	2007-04-12 10:16:20.000000000 -0700
+++ include/linux/in.h	2008-01-07 16:54:38.000000000 -0800
@@ -215,8 +215,16 @@ struct sockaddr_in {
 #define	IN_MULTICAST(a)		IN_CLASSD(a)
 #define IN_MULTICAST_NET	0xF0000000
 
+#define IN_CLASSE(a)		((((long int) (a)) & 0xf0000000) == 0xf0000000)
+#define	IN_CLASSE_NET		0xffffff00
+#define	IN_CLASSE_NSHIFT	8
+#define	IN_CLASSE_HOST		(0xffffffff & ~IN_CLASSE_NET)
+
+/* 
+ * these are no longer used
 #define	IN_EXPERIMENTAL(a)	((((long int) (a)) & 0xf0000000) == 0xf0000000)
 #define	IN_BADCLASS(a)		IN_EXPERIMENTAL((a))
+*/
 
 /* Address to accept any incoming messages. */
 #define	INADDR_ANY		((unsigned long int) 0x00000000)
--- net/ipv4/devinet.c.orig	2007-04-12 10:16:23.000000000 -0700
+++ net/ipv4/devinet.c	2008-01-07 16:55:59.000000000 -0800
@@ -594,6 +594,8 @@ static __inline__ int inet_abc_len(__be3
 			rc = 16;
 		else if (IN_CLASSC(haddr))
 			rc = 24;
+		else if (IN_CLASSE(haddr))
+			rc = 24;
 	}
 
   	return rc;
--- net/ipv4/fib_frontend.c.orig	2007-06-07 10:47:08.000000000 -0700
+++ net/ipv4/fib_frontend.c	2008-01-07 16:55:59.000000000 -0800
@@ -152,7 +152,7 @@ unsigned inet_addr_type(__be32 addr)
 	struct fib_result	res;
 	unsigned ret = RTN_BROADCAST;
 
-	if (ZERONET(addr) || BADCLASS(addr))
+	if (ZERONET(addr) || addr == INADDR_BROADCAST)
 		return RTN_BROADCAST;
 	if (MULTICAST(addr))
 		return RTN_MULTICAST;
--- net/ipv4/ipconfig.c.orig	2007-04-12 10:16:23.000000000 -0700
+++ net/ipv4/ipconfig.c	2008-01-07 16:55:59.000000000 -0800
@@ -379,6 +379,8 @@ static int __init ic_defaults(void)
 			ic_netmask = htonl(IN_CLASSB_NET);
 		else if (IN_CLASSC(ntohl(ic_myaddr)))
 			ic_netmask = htonl(IN_CLASSC_NET);
+		else if (IN_CLASSE(ntohl(ic_myaddr)))
+			ic_netmask = htonl(IN_CLASSE_NET);
 		else {
 			printk(KERN_ERR "IP-Config: Unable to guess netmask for address %u.%u.%u.%u\n",
 				NIPQUAD(ic_myaddr));
--- net/ipv4/route.c.orig	2007-04-12 10:16:24.000000000 -0700
+++ net/ipv4/route.c	2008-01-07 16:55:59.000000000 -0800
@@ -1140,7 +1140,7 @@ void ip_rt_redirect(__be32 old_gw, __be3
 		return;
 
 	if (new_gw == old_gw || !IN_DEV_RX_REDIRECTS(in_dev)
-	    || MULTICAST(new_gw) || BADCLASS(new_gw) || ZERONET(new_gw))
+	    || MULTICAST(new_gw) || new_gw == INADDR_BROADCAST || ZERONET(new_gw))
 		goto reject_redirect;
 
 	if (!IN_DEV_SHARED_MEDIA(in_dev)) {
@@ -1617,7 +1617,7 @@ static int ip_route_input_mc(struct sk_b
 	if (in_dev == NULL)
 		return -EINVAL;
 
-	if (MULTICAST(saddr) || BADCLASS(saddr) || LOOPBACK(saddr) ||
+	if (MULTICAST(saddr) || saddr == INADDR_BROADCAST || LOOPBACK(saddr) ||
 	    skb->protocol != htons(ETH_P_IP))
 		goto e_inval;
 
@@ -1935,7 +1935,7 @@ static int ip_route_input_slow(struct sk
 	   by fib_lookup.
 	 */
 
-	if (MULTICAST(saddr) || BADCLASS(saddr) || LOOPBACK(saddr))
+	if (MULTICAST(saddr) || saddr == INADDR_BROADCAST || LOOPBACK(saddr))
 		goto martian_source;
 
 	if (daddr == htonl(0xFFFFFFFF) || (saddr == 0 && daddr == 0))
@@ -1947,7 +1947,7 @@ static int ip_route_input_slow(struct sk
 	if (ZERONET(saddr))
 		goto martian_source;
 
-	if (BADCLASS(daddr) || ZERONET(daddr) || LOOPBACK(daddr))
+	if (ZERONET(daddr) || LOOPBACK(daddr))
 		goto martian_destination;
 
 	/*
@@ -2171,7 +2171,7 @@ static inline int __mkroute_output(struc
 		res->type = RTN_BROADCAST;
 	else if (MULTICAST(fl->fl4_dst))
 		res->type = RTN_MULTICAST;
-	else if (BADCLASS(fl->fl4_dst) || ZERONET(fl->fl4_dst))
+	else if (ZERONET(fl->fl4_dst))
 		return -EINVAL;
 
 	if (dev_out->flags & IFF_LOOPBACK)
@@ -2391,7 +2391,7 @@ static int ip_route_output_slow(struct r
 	if (oldflp->fl4_src) {
 		err = -EINVAL;
 		if (MULTICAST(oldflp->fl4_src) ||
-		    BADCLASS(oldflp->fl4_src) ||
+		    oldflp->fl4_src == INADDR_BROADCAST ||
 		    ZERONET(oldflp->fl4_src))
 			goto out;
 

^ permalink raw reply

* Re: WARNING: at kernel/softirq.c:139 local_bh_enable()
From: Kok, Auke @ 2008-01-08  1:09 UTC (permalink / raw)
  To: Jayakrishnan.Chathu; +Cc: linux-kernel, NetDev
In-Reply-To: <2198383E1141814486F0B881B3260CF501D6BB66@daebe103.NOE.Nokia.com>

Jayakrishnan.Chathu@nokia.com wrote:
> I am running 2.6.23 kernel on a DUAL core and QUAD core i386 boxes and
> after everyboot, when the ethernet traffic starts i get this warning.
> 
> All the ports in the system are e1000 and i am using the kernel e1000
> driver.

[added netdev to the Cc:]

can you repro this with 2.6.24-rc7? What distro are you using? Is your distro
running a link-monitoring tool of some sorts?

Auke



> 
> Jan  7 22:31:00 localhost [warning] WARNING: at kernel/softirq.c:139
> local_bh_enable() 
> Jan  7 22:31:00 localhost [warning] [<c012bd0f>]
> local_bh_enable+0x49/0xa9 
> Jan  7 22:31:00 localhost [warning] [<c039ba1a>]
> dev_queue_xmit+0x26c/0x275 
> Jan  7 22:31:00 localhost [warning] [<c03cdf6c>] arp_xmit+0x4d/0x51 
> Jan  7 22:31:00 localhost [warning] [<c03cd9f6>] arp_solicit+0x156/0x174
> 
> Jan  7 22:31:00 localhost [warning] [<c03a047f>]
> neigh_timer_handler+0x1e0/0x224 
> Jan  7 22:31:00 localhost [warning] [<c012f820>]
> run_timer_softirq+0x113/0x172 
> Jan  7 22:31:00 localhost [warning] [<c013b042>] WARNING: at
> kernel/softirq.c:139 local_bh_enable() 
> Jan  7 22:31:00 localhost [warning] hrtimer_interrupt+0x19c/0x1c4 
> Jan  7 22:31:00 localhost [warning] [<c014002a>]  [<c012bd0f>]
> local_bh_enable+0x49/0xa9 
> Jan  7 22:31:00 localhost [warning] [<c039ba1a>]
> dev_queue_xmit+0x26c/0x275 
> Jan  7 22:31:00 localhost [warning] [<c03a0c05>]
> neigh_resolve_output+0x12c/0x15e 
> Jan  7 22:31:00 localhost [warning] [<c03a0881>]
> neigh_update+0x246/0x2cb 
> Jan  7 22:31:00 localhost [warning] [<c039fb21>] neigh_lookup+0xa9/0xb3 
> Jan  7 22:31:00 localhost [warning] [<c03ce410>] arp_process+0x43c/0x477
> 
> Jan  7 22:31:00 localhost [warning] [<c0120b73>]
> enqueue_task_fair+0x2d/0x30 
> Jan  7 22:31:00 localhost [warning] tick_sched_timer+0x0/0xba 
> Jan  7 22:31:00 localhost [warning] [<c03ce554>] arp_rcv+0x104/0x119 
> Jan  7 22:31:00 localhost [warning] [<c03a029f>]  [<c039bda6>]
> netif_receive_skb+0x1c5/0x1de 
> Jan  7 22:31:00 localhost [warning] [<f897a61d>]
> e1000_clean_rx_irq+0x40e/0x4ca [e1000] 
> Jan  7 22:31:00 localhost [warning] [<c013bdc6>]
> getnstimeofday+0x36/0x10c 
> Jan  7 22:31:00 localhost [warning] neigh_timer_handler+0x0/0x224 
> Jan  7 22:31:00 localhost [warning] [<c012be12>] __do_softirq+0x60/0xc1 
> Jan  7 22:31:00 localhost [warning] [<f8979e34>] e1000_clean+0x74/0x119
> [e1000] 
> Jan  7 22:31:00 localhost [warning] [<c039bf03>]  [<c012bea4>]
> net_rx_action+0x5a/0xd3 
> Jan  7 22:31:00 localhost [warning] [<c012be12>] __do_softirq+0x60/0xc1 
> Jan  7 22:31:00 localhost [warning] do_softirq+0x31/0x35 
> Jan  7 22:31:00 localhost [warning] [<c012bea4>] do_softirq+0x31/0x35 
> Jan  7 22:31:00 localhost [warning] [<c012bf03>] irq_exit+0x38/0x6b 
> Jan  7 22:31:00 localhost [warning] [<c0106a1e>]  [<c012bf03>]
> do_IRQ+0x80/0x93 
> Jan  7 22:31:00 localhost [warning] irq_exit+0x38/0x6b 
> Jan  7 22:31:00 localhost [warning] [<c01057b7>]
> common_interrupt+0x23/0x28 
> Jan  7 22:31:00 localhost [warning] [<c01600d8>]  [<c011a34d>]
> get_swap_page+0xe7/0x215 
> Jan  7 22:31:00 localhost [warning] [<c0103232>]
> mwait_idle_with_hints+0x34/0x38 Jan  7 22:31:00 localhost [warning]
> [<c0103236>] mwait_idle+0x0/0xa 
> Jan  7 22:31:00 localhost [warning] [<c01030f2>] cpu_idle+0x98/0xb9 
> Jan  7 22:31:00 localhost [warning] smp_apic_timer_interrupt+0x2c/0x35
> Jan  7 22:31:00 localhost [warning] ======================= 
> Jan  7 22:31:00 localhost [warning] [<c0105874>]
> apic_timer_interrupt+0x28/0x30 
> Jan  7 22:31:00 localhost [warning] [<c01600d8>]
> get_swap_page+0xe7/0x215 
> Jan  7 22:31:00 localhost [warning] [<c0103232>]
> mwait_idle_with_hints+0x34/0x38 
> Jan  7 22:31:00 localhost [warning] [<c0103236>] mwait_idle+0x0/0xa 
> Jan  7 22:31:00 localhost [warning] [<c01030f2>] cpu_idle+0x98/0xb9 
> Jan  7 22:31:00 localhost [warning] =======================
> 
> 
> Thanks
> Jayakrishnan Chathu
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox