Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Chris Snook @ 2007-08-21 13:50 UTC (permalink / raw)
  To: David Miller
  Cc: torvalds, piggin, satyam, herbert, paulus, clameter,
	ilpo.jarvinen, paulmck, stefanr, linux-kernel, linux-arch, netdev,
	akpm, ak, heiko.carstens, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher
In-Reply-To: <20070821.000404.39159401.davem@davemloft.net>

David Miller wrote:
> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Mon, 20 Aug 2007 22:46:47 -0700 (PDT)
> 
>> Ie a "barrier()" is likely _cheaper_ than the code generation downside 
>> from using "volatile".
> 
> Assuming GCC were ever better about the code generation badness
> with volatile that has been discussed here, I much prefer
> we tell GCC "this memory piece changed" rather than "every
> piece of memory has changed" which is what the barrier() does.
> 
> I happened to have been scanning a lot of assembler lately to
> track down a gcc-4.2 miscompilation on sparc64, and the barriers
> do hurt quite a bit in some places.  Instead of keeping unrelated
> variables around cached in local registers, it reloads everything.

Moore's law is definitely working against us here.  Register counts, 
pipeline depths, core counts, and clock multipliers are all increasing 
in the long run.  At some point in the future, barrier() will be 
universally regarded as a hammer too big for most purposes.  Whether or 
not removing it now constitutes premature optimization is arguable, but 
I think we should allow such optimization to happen (or not happen) in 
architecture-dependent code, and provide a consistent API that doesn't 
require the use of such things in arch-independent code where it might 
turn into a totally superfluous performance killer depending on what 
hardware it gets compiled for.

	-- Chris

^ permalink raw reply

* 2.6.22 dev_base changes info request
From: Fortier,Vincent [Montreal] @ 2007-08-21 14:22 UTC (permalink / raw)
  To: netdev
In-Reply-To: <11877040854129-git-send-email-ron.mercer@qlogic.com>


Hi all,

Sadly I have to use the Apani/Nortel cvc contivity linux client.  The
latest version (3.5) stopped being compatible with the latest stable
2.6.22 kernel.

Since there will most probably not be a new release from apani until
year 2013 we have hack the linux_wrapper.c file to try to keep the
module compatible.

Based on the horrors that showed up while compiling against the 2.6.22
kernel (see below), I was wondering if somebody would be kind enough to
give hints/examples/doc-url/etc on how to convert dev_base calls so we
ca find a proper way to make the linux_wrapper.c file compatible with
the latest stable kernel.

Here are the make against 2.6.22 kernel horrors:
  CC [M]
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.o
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c: In function 'init_misc':
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c:310: error: 'dev_base' undeclared (first use in this function)
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c:310: error: (Each undeclared identifier is reported only once
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c:310: error: for each function it appears in.)
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c: In function 'nl_ip_rcv':
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c:434: error: 'struct sk_buff' has no member named 'nh'
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c: In function 'dev_next':
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c:497: error: 'net_device_t' has no member named 'next'
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c: In function 'nl_skb_dup':
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c:549: error: 'struct sk_buff' has no member named 'h'
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c:549: error: 'struct sk_buff' has no member named 'h'
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c:550: error: 'struct sk_buff' has no member named 'nh'
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c:550: error: 'struct sk_buff' has no member named 'nh'
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c:551: error: 'struct sk_buff' has no member named 'mac'
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c:551: error: 'struct sk_buff' has no member named 'mac'
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c: In function 'nl_skb_hdr_copy':
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c:576: error: 'struct sk_buff' has no member named 'h'
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c:576: error: 'struct sk_buff' has no member named 'h'
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c:577: error: 'struct sk_buff' has no member named 'nh'
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c:577: error: 'struct sk_buff' has no member named 'nh'
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c:578: error: 'struct sk_buff' has no member named 'mac'
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c:578: error: 'struct sk_buff' has no member named 'mac'
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c: In function 'nl_skb_iph':
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c:595: error: 'struct sk_buff' has no member named 'nh'
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c: In function 'nl_send_skb':
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c:616: error: 'struct sk_buff' has no member named 'nh'
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c:619: error: 'struct sk_buff' has no member named 'nh'
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c: In function 'nl_data_len':
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c:658: error: invalid operands to binary -
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c: In function 'do_checksum_offload':
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c:752: error: 'struct sk_buff' has no member named 'nh'
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c:768: error: 'struct sk_buff' has no member named 'nh'
/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wrap
per.c:772: error: 'struct sk_buff' has no member named 'nh'
make[3]: ***
[/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6/linux_wra
pper.o] Error 1
make[2]: ***
[_module_/usr/src/Nortel/Apani_VPN_v3.5/cvc_linux-rh-gcc3-3.5/src/k2.6]
Error 2

Note, here is the line 310:
void init_misc(void)
{

        nl_dev_base = dev_base;
}


Help very much appreciated.

- vin

^ permalink raw reply

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Segher Boessenkool @ 2007-08-21 14:39 UTC (permalink / raw)
  To: Russell King
  Cc: Christoph Lameter, Paul Mackerras, heiko.carstens, horms,
	linux-kernel, Paul E. McKenney, ak, netdev, cfriesen, akpm,
	rpjday, Nick Piggin, linux-arch, jesper.juhl, satyam, zlynx,
	schwidefsky, Chris Snook, Herbert Xu, davem, Linus Torvalds,
	wensong, wjiang
In-Reply-To: <20070821070555.GA32036@flint.arm.linux.org.uk>

>>>> And no, RMW on MMIO isn't "problematic" at all, either.
>>>>
>>>> An RMW op is a read op, a modify op, and a write op, all rolled
>>>> into one opcode.  But three actual operations.
>>>
>>> Maybe for some CPUs, but not all.  ARM for instance can't use the
>>> load exclusive and store exclusive instructions to MMIO space.
>>
>> Sure, your CPU doesn't have RMW instructions -- how to emulate
>> those if you don't have them is a totally different thing.
>
> Let me say it more clearly: On ARM, it is impossible to perform atomic
> operations on MMIO space.

It's all completely beside the point, see the other subthread, but...

Yeah, you can't do LL/SC to MMIO space; ARM isn't alone in that.
You could still implement atomic operations on MMIO space by taking
a lock elsewhere, in normal cacheable memory space.  Why you would
do this is a separate question, you probably don't want it :-)


Segher


^ permalink raw reply

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Segher Boessenkool @ 2007-08-21 14:48 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Russell King, Christoph Lameter, heiko.carstens, horms,
	linux-kernel, Paul E. McKenney, ak, netdev, cfriesen, akpm,
	rpjday, Nick Piggin, linux-arch, jesper.juhl, satyam, zlynx,
	schwidefsky, Chris Snook, Herbert Xu, davem, Linus Torvalds,
	wensong, wjiang
In-Reply-To: <18122.45437.807239.395869@cargo.ozlabs.ibm.com>

>> Let me say it more clearly: On ARM, it is impossible to perform atomic
>> operations on MMIO space.
>
> Actually, no one is suggesting that we try to do that at all.
>
> The discussion about RMW ops on MMIO space started with a comment
> attributed to the gcc developers that one reason why gcc on x86
> doesn't use instructions that do RMW ops on volatile variables is that
> volatile is used to mark MMIO addresses, and there was some
> uncertainty about whether (non-atomic) RMW ops on x86 could be used on
> MMIO.  This is in regard to the question about why gcc on x86 always
> moves a volatile variable into a register before doing anything to it.

This question is GCC PR33102, which was incorrectly closed as a 
duplicate
of PR3506 -- and *that* PR was closed because its reporter seemed to
claim the GCC generated code for an increment on a volatile (namely, 
three
machine instructions: load, modify, store) was incorrect, and it has to
be one machine instruction.

> So the whole discussion is irrelevant to ARM, PowerPC and any other
> architecture except x86[-64].

And even there, it's not something the kernel can take advantage of
before GCC 4.4 is in widespread use, if then.  Let's move on.


Segher


^ permalink raw reply

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Segher Boessenkool @ 2007-08-21 14:59 UTC (permalink / raw)
  To: Chris Snook
  Cc: paulmck, heiko.carstens, ilpo.jarvinen, horms, linux-kernel,
	David Miller, rpjday, netdev, ak, piggin, akpm, torvalds,
	cfriesen, jesper.juhl, linux-arch, paulus, herbert, satyam,
	clameter, stefanr, schwidefsky, zlynx, wensong, wjiang
In-Reply-To: <46CAED8B.9030006@redhat.com>

> At some point in the future, barrier() will be universally regarded as 
> a hammer too big for most purposes.  Whether or not removing it now

You can't just remove it, it is needed in some places; you want to
replace it in most places with a more fine-grained "compiler barrier",
I presume?

> constitutes premature optimization is arguable, but I think we should 
> allow such optimization to happen (or not happen) in 
> architecture-dependent code, and provide a consistent API that doesn't 
> require the use of such things in arch-independent code where it might 
> turn into a totally superfluous performance killer depending on what 
> hardware it gets compiled for.

Explicit barrier()s won't be too hard to replace -- but what to do
about the implicit barrier()s in rmb() etc. etc. -- *those* will be
hard to get rid of, if only because it is hard enough to teach driver
authors about how to use those primitives *already*.  It is far from
clear what a good interface like that would look like, anyway.

Probably we should first start experimenting with a forget()-style
micro-barrier (but please, find a better name), and see if a nice
usage pattern shows up that can be turned into an API.

Segher

^ permalink raw reply

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Paul E. McKenney @ 2007-08-21 16:16 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Paul Mackerras, Russell King, Christoph Lameter, heiko.carstens,
	horms, linux-kernel, ak, netdev, cfriesen, akpm, rpjday,
	Nick Piggin, linux-arch, jesper.juhl, satyam, zlynx, schwidefsky,
	Chris Snook, Herbert Xu, davem, Linus Torvalds, wensong, wjiang
In-Reply-To: <33412cad5894a9cbdc85482db5e9a0a0@kernel.crashing.org>

On Tue, Aug 21, 2007 at 04:48:51PM +0200, Segher Boessenkool wrote:
> >>Let me say it more clearly: On ARM, it is impossible to perform atomic
> >>operations on MMIO space.
> >
> >Actually, no one is suggesting that we try to do that at all.
> >
> >The discussion about RMW ops on MMIO space started with a comment
> >attributed to the gcc developers that one reason why gcc on x86
> >doesn't use instructions that do RMW ops on volatile variables is that
> >volatile is used to mark MMIO addresses, and there was some
> >uncertainty about whether (non-atomic) RMW ops on x86 could be used on
> >MMIO.  This is in regard to the question about why gcc on x86 always
> >moves a volatile variable into a register before doing anything to it.
> 
> This question is GCC PR33102, which was incorrectly closed as a 
> duplicate
> of PR3506 -- and *that* PR was closed because its reporter seemed to
> claim the GCC generated code for an increment on a volatile (namely, 
> three
> machine instructions: load, modify, store) was incorrect, and it has to
> be one machine instruction.
> 
> >So the whole discussion is irrelevant to ARM, PowerPC and any other
> >architecture except x86[-64].
> 
> And even there, it's not something the kernel can take advantage of
> before GCC 4.4 is in widespread use, if then.  Let's move on.

I agree that instant gratification is hard to come by when synching
up compiler and kernel versions.  Nonetheless, it should be possible
to create APIs that are are conditioned on the compiler version.

						Thanx, Paul

^ permalink raw reply

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Satyam Sharma @ 2007-08-21 16:31 UTC (permalink / raw)
  To: Chris Snook
  Cc: David Miller, Linus Torvalds, piggin, herbert, paulus, clameter,
	ilpo.jarvinen, paulmck, stefanr, Linux Kernel Mailing List,
	linux-arch, netdev, Andrew Morton, ak, heiko.carstens,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher
In-Reply-To: <46CAED8B.9030006@redhat.com>

On Tue, 21 Aug 2007, Chris Snook wrote:

> David Miller wrote:
> > From: Linus Torvalds <torvalds@linux-foundation.org>
> > Date: Mon, 20 Aug 2007 22:46:47 -0700 (PDT)
> > 
> > > Ie a "barrier()" is likely _cheaper_ than the code generation downside
> > > from using "volatile".
> > 
> > Assuming GCC were ever better about the code generation badness
> > with volatile that has been discussed here, I much prefer
> > we tell GCC "this memory piece changed" rather than "every
> > piece of memory has changed" which is what the barrier() does.
> > 
> > I happened to have been scanning a lot of assembler lately to
> > track down a gcc-4.2 miscompilation on sparc64, and the barriers
> > do hurt quite a bit in some places.  Instead of keeping unrelated
> > variables around cached in local registers, it reloads everything.
> 
> Moore's law is definitely working against us here.  Register counts, pipeline
> depths, core counts, and clock multipliers are all increasing in the long run.
> At some point in the future, barrier() will be universally regarded as a
> hammer too big for most purposes.

I do agree, and the important point to note is that the benefits of a
/lighter/ compiler barrier, such as what David referred to above, _can_
be had without having to do anything with the "volatile" keyword at all.
And such a primitive has already been mentioned/proposed on this thread.

But this is all tangential to the core question at hand -- whether to have
implicit (compiler, possibly "light-weight" of the kind referred above)
barrier semantics in atomic ops that do not have them, or not.

I was lately looking in the kernel for _actual_ code that uses atomic_t
and benefits from the lack of any implicit barrier, with the compiler
being free to cache the atomic_t in a register. Now that often does _not_
happen, because all other ops (implemented in asm with LOCK prefix on x86)
_must_ therefore constrain the atomic_t to memory anyway. So typically all
atomic ops code sequences end up operating on memory.

Then I did locate sched.c:select_nohz_load_balancer() -- it repeatedly
references the same atomic_t object, and the code that I saw generated
(with CC_OPTIMIZE_FOR_SIZE=y) did cache it in a register for a sequence of
instructions. It uses atomic_cmpxchg, thereby not requiring explicit
memory barriers anywhere in the code, and is an example of an atomic_t
user that is safe, and yet benefits from its memory loads/stores being
elided/coalesced by the compiler.

# at this point, %%eax holds num_online_cpus() and
# %%ebx holds cpus_weight(nohz.cpu_mask)
# the variable "cpu" is in %esi

0xc1018e1d:      cmp    %eax,%ebx		# if No.A.
0xc1018e1f:      mov    0xc134d900,%eax		# first atomic_read()
0xc1018e24:      jne    0xc1018e36
0xc1018e26:      cmp    %esi,%eax		# if No.B.
0xc1018e28:      jne    0xc1018e80		# returns with 0
0xc1018e2a:      movl   $0xffffffff,0xc134d900	# atomic_set(-1), and ...
0xc1018e34:      jmp    0xc1018e80		# ... returns with 0
0xc1018e36:      cmp    $0xffffffff,%eax	# if No.C. (NOTE!)
0xc1018e39:      jne    0xc1018e46
0xc1018e3b:      lock cmpxchg %esi,0xc134d900	# atomic_cmpxchg()
0xc1018e43:      inc    %eax
0xc1018e44:      jmp    0xc1018e48
0xc1018e46:      cmp    %esi,%eax		# if No.D. (NOTE!)
0xc1018e48:      jne    0xc1018e80		# if !=, default return 0 (if No.E.)
0xc1018e4a:      jmp    0xc1018e84		# otherwise (==) returns with 1

The above is:

	if (cpus_weight(nohz.cpu_mask) == num_online_cpus()) {	/* if No.A. */
		if (atomic_read(&nohz.load_balancer) == cpu)	/* if No.B. */
			atomic_set(&nohz.load_balancer, -1);	/* XXX */
		return 0;
	}
	if (atomic_read(&nohz.load_balancer) == -1) {		/* if No.C. */
		/* make me the ilb owner */
		if (atomic_cmpxchg(&nohz.load_balancer, -1, cpu) == -1)	/* if No.E. */
			return 1;
	} else if (atomic_read(&nohz.load_balancer) == cpu)	/* if No.D. */
		return 1;
	...
	...
	return 0; /* default return from function */

As you can see, the atomic_read()'s of "if"s Nos. B, C, and D, were _all_
coalesced into a single memory reference "mov    0xc134d900,%eax" at the
top of the function, and then "if"s Nos. C and D simply used the value
from %%eax itself. But that's perfectly safe, such is the logic of this
function. It uses cmpxchg _whenever_ updating the value in the memory
atomic_t and then returns appropriately. The _only_ point that a casual
reader may find racy is that marked /* XXX */ above -- atomic_read()
followed by atomic_set() with no barrier in between. But even that is ok,
because if one thread ever finds that condition to succeed, it is 100%
guaranteed no other thread on any other CPU will find _any_ condition
to be true, thereby avoiding any race in the modification of that value.

BTW it does sound reasonable that a lot of atomic_t users that want a
compiler barrier probably also want a memory barrier. Do we make _that_
implicit too? Quite clearly, making _either_ one of those implicit in
atomic_{read,set} (in any form of implementation -- a forget() macro
based, *(volatile int *)& based, or inline asm based) would end up
harming code such as that cited above.

Lastly, the most obvious reason that should be considered against implicit
barriers in atomic ops is that it isn't "required" -- atomicity does not
imply any barrier after all, and making such a distinction would actually
be a healthy separation that helps people think more clearly when writing
lockless code.

[ But the "authors' expectations" / heisenbugs argument also holds some
  water ... for that, we can have a _variant_ in the API for atomic ops
  that has implicit compiler/memory barriers, to make it easier on those
  who want that behaviour. But let us not penalize code that knows what
  it is doing by changing the default to that, please. ]

Satyam

^ permalink raw reply

* [PATCH 0/1] net/core: Crash in dev_mc_sync() when putting macvlan interface up
From: Benjamin Thery @ 2007-08-21 16:29 UTC (permalink / raw)
  To: netdev; +Cc: Patrick McHardy

Hi,

My kernel crashed while testing macvlan interfaces on 2.6.23-rc2.
(See kernel panic below)

The culprit is dev_mc_sync(). In this routine, we delete 
elements from 'from->mc_list' unsafely. 
While going through the list, we may delete one of the element 
(__dev_addr_delete(from->mc_list,...)), and then try to continue
from that same element that have just been freed: for(..., da = da->next).

It took me some time to understand why only one of my test machines
was crashing. After a while I discovered my crashing victim has 
CONFIG_DEBUG_SLAB=y set, which poisons the freed 'struct dev_addr_list'.
(Now I love poison!)

The crash can be reproduced by setting the option CONFIG_DEBUG_SLAB=y.
Then, add a macvlan interface and set it up.

$ ip link add link eth0 type macvlan

$ ip link macvlan0 up

BUG: unable to handle kernel paging request at virtual address 6b6b6b6b
 printing eip:
c025e9b4
*pde = 00000000
Oops: 0000 [#1]
Modules linked in:
CPU:    0
EIP:    0060:[<c025e9b4>]    Not tainted VLI
EFLAGS: 00000282   (2.6.23-rc2-eb-netns #6)
EIP is at dev_mc_sync+0x5f/0x197
eax: 00000025   ebx: c11e5dec   ecx: ffffffff   edx: 00000046
esi: 6b6b6b6b   edi: c1134060   ebp: c742fe6c   esp: c742fe48
ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068
Process ifconfig (pid: 937, ti=c742e000 task=c1128000 task.ti=c742e000)
Stack: c034c6dc 6b6b6b6b c1134060 c7bd2180 00000000 c1134218 c7bd2180 c7bd2338 
       00001002 c742fe74 c02238a4 c742fe80 c025a9d8 c7bd2180 c742fe90 c025ab78 
       c7bd2180 00001043 c742fe9c c025ce66 c7bd2180 c742fec0 c025b034 c7bd2180 
Call Trace:
 [<c0102c66>] show_trace_log_lvl+0x1a/0x2f
 [<c0102d18>] show_stack_log_lvl+0x9d/0xa5
 [<c0102ede>] show_registers+0x1be/0x28f
 [<c0103097>] die+0xe8/0x208
 [<c010d555>] do_page_fault+0x4ba/0x595
 [<c02e3e62>] error_code+0x6a/0x70
 [<c02238a4>] macvlan_set_multicast_list+0x15/0x17
 [<c025a9d8>] __dev_set_rx_mode+0x7e/0x81
 [<c025ab78>] dev_set_rx_mode+0x25/0x3a
 [<c025ce66>] dev_open+0x4b/0x6a
 [<c025b034>] dev_change_flags+0xa4/0x159
 [<c028da20>] devinet_ioctl+0x204/0x506
 [<c028e082>] inet_ioctl+0x86/0xa4
 [<c02538f6>] sock_ioctl+0x159/0x177
 [<c0152ac4>] do_ioctl+0x1c/0x51
 [<c0152ce5>] vfs_ioctl+0x1ec/0x203
 [<c0152d2d>] sys_ioctl+0x31/0x48
 [<c01025ea>] syscall_call+0x7/0xb
 =======================
Code: 87 c8 01 00 00 00 00 00 00 8b b0 f8 00 00 00 c7 45 ec 00 00 00 00 e9 0a 01 00 00 89 74 24 04 c7 04 24 dc c6 34 c0 e8 57 44 eb ff <8b> 06 c7 04 24 f9 c6 34 c0 89 44 24 04 e8 45 44 eb ff 80 7e 25 
EIP: [<c025e9b4>] dev_mc_sync+0x5f/0x197 SS:ESP 0068:c742fe48
Kernel panic - not syncing: Fatal exception in interrupt


I think the problem may also exist in dev_mc_unsync().

I have a patch that seems to fix the issue for me.

Hope this helps.

Regards,
Benjamin
-- 

^ permalink raw reply

* [PATCH 1/1] net/core: Fix crash in dev_mc_sync()/dev_mc_unsync()
From: Benjamin Thery @ 2007-08-21 16:29 UTC (permalink / raw)
  To: netdev; +Cc: Patrick McHardy, Benjamin Thery
In-Reply-To: <20070821162922.149199466@frecb000701.frec.bull.fr>

[-- Attachment #1: Fix-crash-in-dev_mc_sync.patch --]
[-- Type: text/plain, Size: 1994 bytes --]

This patch fixes a crash that may occur when the routine dev_mc_sync()
deletes an address from the list it is currently going through. It 
saves the pointer to the next element before deleting the current one.
The problem may also exist in dev_mc_unsync().

Signed-off-by: Benjamin Thery <benjamin.thery@bull.net>
---
 net/core/dev_mcast.c |   14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

Index: linux-2.6.23-rc2/net/core/dev_mcast.c
===================================================================
--- linux-2.6.23-rc2.orig/net/core/dev_mcast.c
+++ linux-2.6.23-rc2/net/core/dev_mcast.c
@@ -116,11 +116,13 @@ int dev_mc_add(struct net_device *dev, v
  */
 int dev_mc_sync(struct net_device *to, struct net_device *from)
 {
-	struct dev_addr_list *da;
+	struct dev_addr_list *da, *next;
 	int err = 0;
 
 	netif_tx_lock_bh(to);
-	for (da = from->mc_list; da != NULL; da = da->next) {
+	da = from->mc_list;
+	while (da != NULL) {
+		next = da->next;
 		if (!da->da_synced) {
 			err = __dev_addr_add(&to->mc_list, &to->mc_count,
 					     da->da_addr, da->da_addrlen, 0);
@@ -134,6 +136,7 @@ int dev_mc_sync(struct net_device *to, s
 			__dev_addr_delete(&from->mc_list, &from->mc_count,
 					  da->da_addr, da->da_addrlen, 0);
 		}
+		da = next;
 	}
 	if (!err)
 		__dev_set_rx_mode(to);
@@ -156,12 +159,14 @@ EXPORT_SYMBOL(dev_mc_sync);
  */
 void dev_mc_unsync(struct net_device *to, struct net_device *from)
 {
-	struct dev_addr_list *da;
+	struct dev_addr_list *da, next;
 
 	netif_tx_lock_bh(from);
 	netif_tx_lock_bh(to);
 
-	for (da = from->mc_list; da != NULL; da = da->next) {
+	da = from->mc_list;
+	while (da != NULL) {
+		next = da->next;
 		if (!da->da_synced)
 			continue;
 		__dev_addr_delete(&to->mc_list, &to->mc_count,
@@ -169,6 +174,7 @@ void dev_mc_unsync(struct net_device *to
 		da->da_synced = 0;
 		__dev_addr_delete(&from->mc_list, &from->mc_count,
 				  da->da_addr, da->da_addrlen, 0);
+		da = next;
 	}
 	__dev_set_rx_mode(to);
 

-- 

^ permalink raw reply

* Re: [PATCH 6/7 v2] fs_enet: Be an of_platform device when CONFIG_PPC_CPM_NEW_BINDING is set.
From: Scott Wood @ 2007-08-21 16:47 UTC (permalink / raw)
  To: Vitaly Bordug; +Cc: jgarzik, netdev, linuxppc-dev
In-Reply-To: <20070821130155.43898b68@localhost.localdomain>

Vitaly Bordug wrote:
> On Fri, 17 Aug 2007 13:17:18 -0500
> Scott Wood wrote:
> 
> 
>>The existing OF glue code was crufty and broken.  Rather than fix it,
>>it will be removed, and the ethernet driver now talks to the device
>>tree directly.
>>
> 
> A bit short description, I'd rather expect some specific improvements list,
> that are now up and running using device tree. Or if it is just move to new
> infrastucture, let's state that, too.

Some of specific binding changes (there are too many to exhaustively 
list) are enumerated in the "new CPM binding" patch, which I'll send 
after Kumar's include/asm-ppc patch goes in (since it modifies one of 
those files).

>>+#ifdef CONFIG_PPC_CPM_NEW_BINDING
>>+static int __devinit find_phy(struct device_node *np,
>>+                              struct fs_platform_info *fpi)
>>+{
>>+	struct device_node *phynode, *mdionode;
>>+	struct resource res;
>>+	int ret = 0, len;
>>+
>>+	const u32 *data = of_get_property(np, "phy-handle", &len);
>>+	if (!data || len != 4)
>>+		return -EINVAL;
>>+
>>+	phynode = of_find_node_by_phandle(*data);
>>+	if (!phynode)
>>+		return -EINVAL;
>>+
>>+	mdionode = of_get_parent(phynode);
>>+	if (!phynode)
>>+		goto out_put_phy;
>>+
>>+	ret = of_address_to_resource(mdionode, 0, &res);
>>+	if (ret)
>>+		goto out_put_mdio;
>>+
>>+	data = of_get_property(phynode, "reg", &len);
>>+	if (!data || len != 4)
>>+		goto out_put_mdio;
>>+
>>+	snprintf(fpi->bus_id, 16, PHY_ID_FMT, res.start, *data);
>>+
>>+out_put_mdio:
>>+	of_node_put(mdionode);
>>+out_put_phy:
>>+	of_node_put(phynode);
>>+	return ret;
>>+}
> 
> And without phy node? 

It returns -EINVAL. :-)

>>+#ifdef CONFIG_FS_ENET_HAS_FEC
>>+#define IS_FEC(match) ((match)->data == &fs_fec_ops)
>>+#else
>>+#define IS_FEC(match) 0
>>+#endif
>>+
> 
> Since we're talking directly with device tree, why bother with CONFIG_ stuff? We are able to figure it out from dts..

We are figuring it out from the DTS (that's what match->data is).  I 
just didn't want boards without a FEC to have to build in support for it 
and waste memory.

>>+	fpi->rx_ring = 32;
>>+	fpi->tx_ring = 32;
>>+	fpi->rx_copybreak = 240;
>>+	fpi->use_napi = 0;
>>+	fpi->napi_weight = 17;
>>+
> 
> 
> move params over to  dts?

No.  These aren't attributes of the hardware, they're choices the driver 
makes about how much memory to use and how to interact with the rest of 
the kernel.

>>+	ret = find_phy(ofdev->node, fpi);
>>+	if (ret)
>>+		goto out_free_fpi;
>>+
> 
> so we're hosed without phy node.

How is that different from the old code, where you're hosed without 
fep->fpi->bus_id?

>>+static struct of_device_id fs_enet_match[] = {
>>+#ifdef CONFIG_FS_ENET_HAS_SCC
> 
> 
> same nagging. Are we able to get rid of Kconfig arcane defining which SoC currently plays the game for fs_enet?

No, it's still needed for mpc885ads to determine pin setup and 
conflicting device tree node removal (though that could go away if the 
device tree is changed to reflect the jumper settings).

It's also useful for excluding unwanted code.  I don't like using 
8xx/CPM2 as the decision point for that -- why should I build in 
mac-scc.c if I have no intention of using an SCC ethernet (either 
because my board doesn't have one, or because it's slow and conflicts 
with other devices)?

-Scott

^ permalink raw reply

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Linus Torvalds @ 2007-08-21 16:43 UTC (permalink / raw)
  To: Chris Snook
  Cc: David Miller, piggin, satyam, herbert, paulus, clameter,
	ilpo.jarvinen, paulmck, stefanr, linux-kernel, linux-arch, netdev,
	akpm, ak, heiko.carstens, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher
In-Reply-To: <46CAED8B.9030006@redhat.com>

On Tue, 21 Aug 2007, Chris Snook wrote:
> 
> Moore's law is definitely working against us here.  Register counts, pipeline
> depths, core counts, and clock multipliers are all increasing in the long run.
> At some point in the future, barrier() will be universally regarded as a
> hammer too big for most purposes.

Note that "barrier()" is purely a compiler barrier. It has zero impact on 
the CPU pipeline itself, and also has zero impact on anything that gcc 
knows isn't visible in memory (ie local variables that don't have their 
address taken), so barrier() really is pretty cheap.

Now, it's possible that gcc messes up in some circumstances, and that the 
memory clobber will cause gcc to also do things like flush local registers 
unnecessarily to their stack slots, but quite frankly, if that happens, 
it's a gcc problem, and I also have to say that I've not seen that myself.

So in a very real sense, "barrier()" will just make sure that there is a 
stronger sequence point for the compiler where things are stable. In most 
cases it has absolutely zero performance impact - apart from the 
-intended- impact of making sure that the compiler doesn't re-order or 
cache stuff around it.

And sure, we could make it more finegrained, and also introduce a 
per-variable barrier, but the fact is, people _already_ have problems with 
thinking about these kinds of things, and adding new abstraction issues 
with subtle semantics is the last thing we want.

So I really think you'd want to show a real example of real code that 
actually gets noticeably slower or bigger.

In removing "volatile", we have shown that. It may not have made a big 
difference on powerpc, but it makes a real difference on x86 - and more 
importantly, it removes something that people clearly don't know how it 
works, and incorrectly expect to just fix bugs.

[ There are *other* barriers - the ones that actually add memory barriers 
  to the CPU - that really can be quite expensive. The good news is that 
  the expense is going down rather than up: both Intel and AMD are not 
  only removing the need for some of them (ie "smp_rmb()" will become a 
  compiler-only barrier), but we're _also_ seeing the whole "pipeline 
  flush" approach go away, and be replaced by the CPU itself actually 
  being better - so even the actual CPU pipeline barriers are getting
  cheaper, not more expensive. ]

For example, did anybody even _test_ how expensive "barrier()" is? Just 
as a lark, I did

	#undef barrier
	#define barrier() do { } while (0)

in kernel/sched.c (which only has three of them in it, but hey, that's 
more than most files), and there were _zero_ code generation downsides. 
One instruction was moved (and a few line numbers changed), so it wasn't 
like the assembly language was identical, but the point is, barrier() 
simply doesn't have the same kinds of downsides that "volatile" has.

(That may not be true on other architectures or in other source files, of 
course. This *does* depend on code generation details. But anybody who 
thinks that "barrier()" is fundamentally expensive is simply incorrect. It 
is *fundamnetally* a no-op).

		Linus

^ permalink raw reply

* [RFT] sky2: yukon-ec-u phy power problems
From: Stephen Hemminger @ 2007-08-21 17:07 UTC (permalink / raw)
  To: Kevin E; +Cc: Willy Tarreau, linux-kernel, netdev
In-Reply-To: <623107.14278.qm@web38905.mail.mud.yahoo.com>

The sky2 driver clears some bits in the PHY control register, that cause
the PHY interface to get changed.  Some of these deal with voltage and power
savings as well. This may explain some of the failures on Gigabyte DS-3 motherboard.

The clue to possible problem was looking at why loading/unloading sky2 would
break vendor sk98lin driver (ie what registers changed).

--- a/drivers/net/sky2.c	2007-08-21 09:53:41.000000000 -0700
+++ b/drivers/net/sky2.c	2007-08-21 09:53:43.000000000 -0700
@@ -696,8 +696,8 @@ static void sky2_mac_init(struct sky2_hw
 	int i;
 	const u8 *addr = hw->dev[port]->dev_addr;

-	sky2_write32(hw, SK_REG(port, GPHY_CTRL), GPC_RST_SET);
-	sky2_write32(hw, SK_REG(port, GPHY_CTRL), GPC_RST_CLR);
+	sky2_write8(hw, SK_REG(port, GPHY_CTRL), GPC_RST_SET);
+	sky2_write8(hw, SK_REG(port, GPHY_CTRL), GPC_RST_CLR);

 	sky2_write8(hw, SK_REG(port, GMAC_CTRL), GMC_RST_CLR);

^ permalink raw reply

* [PATCH] sky2: don't clear phy power bits
From: Stephen Hemminger @ 2007-08-21 18:11 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: netdev

There are special PHY settings available on Yukon EC-U chip that
should not get cleared. The reset bits are in the least significant
bits, so that is all that needs to be set.

This should solve mysterious errors on some motherboards (like Gigabyte DS-3).

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>


--- a/drivers/net/sky2.c	2007-08-21 10:17:54.000000000 -0700
+++ b/drivers/net/sky2.c	2007-08-21 10:17:59.000000000 -0700
@@ -696,8 +696,8 @@ static void sky2_mac_init(struct sky2_hw
 	int i;
 	const u8 *addr = hw->dev[port]->dev_addr;
 
-	sky2_write32(hw, SK_REG(port, GPHY_CTRL), GPC_RST_SET);
-	sky2_write32(hw, SK_REG(port, GPHY_CTRL), GPC_RST_CLR);
+	sky2_write8(hw, SK_REG(port, GPHY_CTRL), GPC_RST_SET);
+	sky2_write8(hw, SK_REG(port, GPHY_CTRL), GPC_RST_CLR);
 
 	sky2_write8(hw, SK_REG(port, GMAC_CTRL), GMC_RST_CLR);
 

^ permalink raw reply

* [PATCH] sky2: don't clear phy power bits
From: Stephen Hemminger @ 2007-08-21 18:10 UTC (permalink / raw)
  To: Greg KH
  Cc: stable, stable, Justin Forbes, Zwane Mwaikambo, Theodore Ts'o,
	Randy Dunlap, Dave Jones, Chuck Wolber, Chris Wedgwood,
	Michael Krufky, Chuck Ebbert, Domenico Andreoli, torvalds, akpm,
	alan, netdev, Greg Kroah-Hartman
In-Reply-To: <20070821065421.GG5275@kroah.com>

There are special PHY settings available on Yukon EC-U chip that
should not get cleared. This should solve mysterious errors on some
motherboards (like Gigabyte DS-3).

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>


--- a/drivers/net/sky2.c	2007-08-21 11:08:27.000000000 -0700
+++ b/drivers/net/sky2.c	2007-08-21 11:09:03.000000000 -0700
@@ -657,8 +657,8 @@ static void sky2_mac_init(struct sky2_hw
 	int i;
 	const u8 *addr = hw->dev[port]->dev_addr;
 
-	sky2_write32(hw, SK_REG(port, GPHY_CTRL), GPC_RST_SET);
-	sky2_write32(hw, SK_REG(port, GPHY_CTRL), GPC_RST_CLR);
+	sky2_write8(hw, SK_REG(port, GPHY_CTRL), GPC_RST_SET);
+	sky2_write8(hw, SK_REG(port, GPHY_CTRL), GPC_RST_CLR);
 
 	sky2_write8(hw, SK_REG(port, GMAC_CTRL), GMC_RST_CLR);
 

^ permalink raw reply

* Re: [PATCH] sky2: don't clear phy power bits
From: Linus Torvalds @ 2007-08-21 18:26 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Greg KH, stable, stable, Justin Forbes, Zwane Mwaikambo,
	Theodore Ts'o, Randy Dunlap, Dave Jones, Chuck Wolber,
	Chris Wedgwood, Michael Krufky, Chuck Ebbert, Domenico Andreoli,
	akpm, alan, netdev
In-Reply-To: <20070821111022.046eec14@freepuppy.rosehill.hemminger.net>



On Tue, 21 Aug 2007, Stephen Hemminger wrote:
>
> There are special PHY settings available on Yukon EC-U chip that
> should not get cleared. This should solve mysterious errors on some
> motherboards (like Gigabyte DS-3).

Ok, applied.

However:

 - now all accesses to GPHY_CTRL are 8-bit (it used to mix them). Good.

 - but sky2.h still says "32 bit" about the register

Should the comments in sky2.h perhaps be fixed?

		Linus

^ permalink raw reply

* Re: [Bugme-new] [Bug 8914] New: filter attached to prio qdisc breaks priomap handling of packets it does _not_ match
From: Andrew Morton @ 2007-08-21 18:29 UTC (permalink / raw)
  To: netdev; +Cc: bugme-daemon, lionel
In-Reply-To: <bug-8914-10286@http.bugzilla.kernel.org/>

On Tue, 21 Aug 2007 05:32:57 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=8914
> 
>            Summary: filter attached to prio qdisc breaks priomap handling of
>                     packets it does _not_ match
>            Product: Networking
>            Version: 2.5
>      KernelVersion: 2.6.22.4
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: IPV4
>         AssignedTo: shemminger@osdl.org
>         ReportedBy: lionel@mamane.lu
> 
> 
> Most recent kernel where this bug did not occur: none known to me
> Distribution: Debian (sid / unstable / distro development bleeding edge)
> Hardware Environment: reproduced on i386 and x64-64 machines
> Software Environment: Debian, iproute 20070313-1
> Problem Description:
> 
> When I attach a filter to a prio qdisc, the packets that it does _not_
> match are not correctly handled (enqueued) according to the priomap
> anymore, that is the same way than when there is no filter attached to
> the qdisc. They seem to always fall in the map for best effort traffic.
> 
> Problem first noticed on Debian precompiled kernels (2.6.18-4-amd64, based on
> 2.6.18.7 and 2.6.22-1-686, based on 2.6.22.1), reproduced with self-compiled
> (with .config copied from Debian precompiled one) straight-from-kernel.org
> 2.6.22.4 .
> 
> Steps to reproduce:
> 
> Run attached script (as root), trying to ensure no other traffic happens over
> the interface: it installs qdiscs on interface ${TIF} (defaults to eth0), pings
> host ${PHOST} (defaults to master.debian.org - numerical IP hardcoded) with
> various IP TOS bits set, installs a filter that matches TCP (_not_ ICMP) and
> does the pings again. You have to ensure that pings to ${PHOST} leaver over
> ${TIF}. You may have to "modprobe em_cmp" before running the script.
> 
> Notice how the first pings (before filters get installed) get in the
> right queue according to their TOS bits, but after the filter gets
> installed, they all end up in the "best effort" queue.
> 
> 
> The output I get (non-relevant bits snipped out) is:
> 
>  Running test on interface eth0
>  by pinging host 70.103.162.29
> 
>  Pinging with normal service 1 times
>  Pinging with minimise delay 2 times
>  Pinging with minimise cost 4 times
>  qdisc pfifo 22: parent 20:2 limit 1000p
>   Sent 196 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
>  qdisc pfifo 23: parent 20:3 limit 1000p
>   Sent 98 bytes 1 pkt (dropped 0, overlimits 0 requeues 0)
>  qdisc pfifo 24: parent 20:4 limit 1000p
>   Sent 392 bytes 4 pkt (dropped 0, overlimits 0 requeues 0)
> 
>  Adding a filter that does _not_ match ICMP
> 
>  Pinging with normal service 8 times
>  qdisc pfifo 22: parent 20:2 limit 1000p
>   Sent 196 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
>  qdisc pfifo 23: parent 20:3 limit 1000p
>   Sent 924 bytes 10 pkt (dropped 0, overlimits 0 requeues 0)
>  qdisc pfifo 24: parent 20:4 limit 1000p
>   Sent 392 bytes 4 pkt (dropped 0, overlimits 0 requeues 0)
> 
>  Pinging with minimise delay 16 times
>  qdisc pfifo 22: parent 20:2 limit 1000p
>   Sent 196 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
>  qdisc pfifo 23: parent 20:3 limit 1000p
>   Sent 2492 bytes 26 pkt (dropped 0, overlimits 0 requeues 0)
>  qdisc pfifo 24: parent 20:4 limit 1000p
>   Sent 392 bytes 4 pkt (dropped 0, overlimits 0 requeues 0)
> 
>  Pinging with minimise cost 32 times
>  qdisc pfifo 22: parent 20:2 limit 1000p
>   Sent 196 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
>  qdisc pfifo 23: parent 20:3 limit 1000p
>   Sent 5628 bytes 58 pkt (dropped 0, overlimits 0 requeues 0)
>  qdisc pfifo 24: parent 20:4 limit 1000p
>   Sent 392 bytes 4 pkt (dropped 0, overlimits 0 requeues 0)
> 
> 
> The output I would expect is:
> 
>  (... snip ...)
> 
>  Adding a filter that does _not_ match ICMP
> 
>  Pinging with normal service 8 times
>  qdisc pfifo 22: parent 20:2 limit 1000p
>   Sent XXX bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
>  qdisc pfifo 23: parent 20:3 limit 1000p
>   Sent XXX bytes 10 pkt (dropped 0, overlimits 0 requeues 0)
>  qdisc pfifo 24: parent 20:4 limit 1000p
>   Sent XXX bytes 4 pkt (dropped 0, overlimits 0 requeues 0)
> 
>  Pinging with minimise delay 16 times
>  qdisc pfifo 22: parent 20:2 limit 1000p
>   Sent XXX bytes 18 pkt (dropped 0, overlimits 0 requeues 0)
>  qdisc pfifo 23: parent 20:3 limit 1000p
>   Sent XXX bytes 10 pkt (dropped 0, overlimits 0 requeues 0)
>  qdisc pfifo 24: parent 20:4 limit 1000p
>   Sent XXX bytes 4 pkt (dropped 0, overlimits 0 requeues 0)
> 
>  Pinging with minimise cost 32 times
>  qdisc pfifo 22: parent 20:2 limit 1000p
>   Sent XXX bytes 18 pkt (dropped 0, overlimits 0 requeues 0)
>  qdisc pfifo 23: parent 20:3 limit 1000p
>   Sent XXX bytes 10 pkt (dropped 0, overlimits 0 requeues 0)
>  qdisc pfifo 24: parent 20:4 limit 1000p
>   Sent XXX bytes 36 pkt (dropped 0, overlimits 0 requeues 0)

^ permalink raw reply

* [ofa-general] Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB
From: David Miller @ 2007-08-21 18:51 UTC (permalink / raw)
  To: hadi
  Cc: jagana, gaagaan, jeff, Robert.Olsson, kumarkr, rdreier,
	peter.p.waskiewicz.jr, mcarlson, netdev, sri, general, mchan,
	tgraf, johnpol, shemminger, kaber, herbert
In-Reply-To: <1187699422.5324.41.camel@localhost>

From: jamal <hadi@cyberus.ca>
Date: Tue, 21 Aug 2007 08:30:22 -0400

> On Tue, 2007-21-08 at 00:18 -0700, David Miller wrote:
> 
> > Using 16K buffer size really isn't going to keep the pipe full enough
> > for TSO. 
> 
> Why the comparison with TSO (or GSO for that matter)?

Because TSO does batching already, so it's a very good
"tit for tat" comparison of the new batching scheme
vs. an existing one.

^ permalink raw reply

* Re: [PATCH] sky2: don't clear phy power bits
From: Stephen Hemminger @ 2007-08-21 18:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Greg KH, stable, stable, Justin Forbes, Zwane Mwaikambo,
	Theodore Ts'o, Randy Dunlap, Dave Jones, Chuck Wolber,
	Chris Wedgwood, Michael Krufky, Chuck Ebbert, Domenico Andreoli,
	akpm, alan, netdev
In-Reply-To: <alpine.LFD.0.999.0708211123510.30176@woody.linux-foundation.org>

On Tue, 21 Aug 2007 11:26:23 -0700 (PDT)
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> 
> 
> On Tue, 21 Aug 2007, Stephen Hemminger wrote:
> >
> > There are special PHY settings available on Yukon EC-U chip that
> > should not get cleared. This should solve mysterious errors on some
> > motherboards (like Gigabyte DS-3).
> 
> Ok, applied.
> 
> However:
> 
>  - now all accesses to GPHY_CTRL are 8-bit (it used to mix them). Good.
> 
>  - but sky2.h still says "32 bit" about the register
> 
> Should the comments in sky2.h perhaps be fixed?
> 

Register is 32 bit, I'll add documentation about other bits.

-- 
Stephen Hemminger <shemminger@linux-foundation.org>

^ permalink raw reply

* [RFT] r8169 changes against 2.6.23-rc3
From: Bruce Cole @ 2007-08-21 19:32 UTC (permalink / raw)
  To: Francois Romieu, netdev; +Cc: bacole

On 8/20/07, Dirk wrote:
 >> So it seems that when the driver tries to queue a packet while the
 >> controller is busy processing the queue, the newly queued packet does
 >> not get noticed by the controller (until further packet activity 
occurs).
 >> Perhaps there is a problem with the memory barriers when adding to the
 >> TX queue, but I'm a newbie on linux kernel memory barriers.
 >
 >One thing I noticed a while ago (march) is that floodpinging (ping -f)
 >the r8169 host from an external system also increases performance
 >without changing code.
Yes, I just tried this and saw the same result.  Makes perfect sense - 
if the TX queue is normally getting stuck until TCP retransmits, then 
keeping the TX queue busy keeps the queue from remaining stuck.
I think this is a good demonstration that the underlying problem is a 
stuck TX queue as suggested.

 >I ended up (until now perhaps :-) with disabling the onboard nic and
 >adding an e1000 card.

Yes, ditching the realtek interface and going with an ad-on nic seems to 
be what everyone has been doing to get around this problem.  Perhaps 
you'd like to try the busy-wait workaround with ndelay(10)?  It has 
saved me from buying an e1000 card as well.

Speaking of the e1000, I notice that its TX queue processing code for 
that driver includes spin_lock_irqsave()/spin_unlock_irqrestore() 
protection on access to the queue.  The r8169 driver seems to be missing 
equivalent code.  Last time I dealt with kernel locking bugs was in the 
old days of splnet()/splx(), so I could use some help here, but I 
suspect this could be fixed with more careful locking.

^ permalink raw reply

* Oops in 2.6.22.1: skb_copy_and_csum_datagram_iovec()
From: Chuck Ebbert @ 2007-08-21 20:04 UTC (permalink / raw)
  To: Netdev

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=253290

18:57:54 osama kernel: BUG: unable to handle kernel NULL pointer dereference at virtual address 00000004
18:57:54 osama kernel:  printing eip:
18:57:54 osama kernel: c05c4026
18:57:54 osama kernel: *pde = 1d860067
18:57:54 osama kernel: *pte = 00000000
18:57:54 osama kernel: Oops: 0000 [#1]
18:57:54 osama kernel: SMP
18:57:54 osama kernel: last sysfs file: /power/state
18:57:54 osama kernel: Modules linked in: nfsd exportfs lockd nfs_acl autofs4 sunrpc dm_mirror dm_multipath dm_mod video sbs button dock battery ac ipv6 lp snd_via82xx snd_ac97_codec ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_pcm snd_timer i2c_viapro snd_page_alloc i2c_core 8139cp snd_mpu401_uart floppy snd_rawmidi via_ircc snd_seq_device via_rhine 8139too irda snd mii crc_ccitt soundcore ns558 parport_pc gameport ide_cd rtc_cmos serio_raw parport cdrom ext2 mbcache ehci_hcd ohci_hcd uhci_hcd
18:57:54 osama kernel: CPU:    0
18:57:54 osama kernel: EIP:    0060:[<c05c4026>]    Not tainted VLI
18:57:54 osama kernel: EFLAGS: 00010246   (2.6.22.1-32.fc6 #1)
18:57:54 osama kernel: EIP is at skb_copy_and_csum_datagram_iovec+0x17/0xca
18:57:54 osama kernel: eax: d4341180   ebx: 00000000   ecx: 00000000   edx: 00000008
18:57:54 osama kernel: esi: d4341180   edi: 00000000   ebp: 00000008   esp: d488fd7c
18:57:54 osama kernel: ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: 0068
18:57:54 osama kernel: Process lockd (pid: 2567, ti=d488f000 task=d4876000 task.ti=d488f000)
18:57:54 osama kernel: Stack: 00000000 00000000 00000246 00000292 d4341180 d58bf660 d4879014 d488ff18
18:57:54 osama kernel:        c05ffaf7 d488fdb0 00000000 00000000 00000000 d4c30980 00000040 c07374c0
18:57:54 osama kernel:        d488ff18 d488ff18 c05be8a5 00000000 00000040 00000002 d488fdd8 00000010
18:57:54 osama kernel: Call Trace:
18:57:54 osama kernel:  [<c05ffaf7>] udp_recvmsg+0xdd/0x1cd
18:57:54 osama kernel:  [<c05be8a5>] sock_common_recvmsg+0x3e/0x54
18:57:54 osama kernel:  [<c05bd0fa>] sock_recvmsg+0xec/0x107
18:57:54 osama kernel:  [<c041d0c2>] update_curr+0x23b/0x25c
18:57:54 osama kernel:  [<c0433e31>] autoremove_wake_function+0x0/0x35
18:57:54 osama kernel:  [<c041ce20>] update_stats_wait_end+0x84/0xad
18:57:54 osama kernel:  [<c06278fe>] __reacquire_kernel_lock+0x2f/0x4b
18:57:54 osama kernel:  [<c041d87a>] enqueue_entity+0x276/0x294
18:57:54 osama kernel:  [<c05be67f>] kernel_recvmsg+0x31/0x40
18:57:54 osama kernel:  [<e0bc52d4>] svc_udp_recvfrom+0x114/0x368 [sunrpc]
18:57:54 osama kernel:  [<c0626888>] schedule_timeout+0x13/0x8f
18:57:54 osama kernel:  [<e0bc620e>] svc_recv+0x2e5/0x393 [sunrpc]
18:57:54 osama kernel:  [<c0430e7d>] create_workqueue_thread+0x38/0x49
18:57:54 osama kernel:  [<c041f109>] default_wake_function+0x0/0xc
18:57:54 osama kernel:  [<e0ac31fe>] lockd+0x108/0x222 [lockd]
18:57:54 osama kernel:  [<c0404e76>] ret_from_fork+0x6/0x20
18:57:54 osama kernel:  [<e0ac30f6>] lockd+0x0/0x222 [lockd]
18:57:54 osama kernel:  [<e0ac30f6>] lockd+0x0/0x222 [lockd]
18:57:54 osama kernel:  [<c0406177>] kernel_thread_helper+0x7/0x10
18:57:54 osama kernel:  =======================
18:57:54 osama kernel: Code: f6 75 04 31 c0 eb 05 b8 f2 ff ff ff 83 c4 30 5b 5e 5f 5d c3 55 89 d5 57 56 89 c6 53 89 cb 83 ec 10 8b 78 54 29 d7 eb 03 83 c3 08 <8b> 43 04 85 c0 74 f6 39 f8 73 26 89 f0 e8 fa fd ff ff 66 85 c0


Oops is here:

int skb_copy_and_csum_datagram_iovec(struct sk_buff *skb,
                                     int hlen, struct iovec *iov)
{
        __wsum csum;
        int chunk = skb->len - hlen;

        /* Skip filled elements.
         * Pretty silly, look at memcpy_toiovec, though 8)
         */
====>   while (!iov->iov_len)
                iov++;


udp_recvmsg() passed a NULL iov to this function.


^ permalink raw reply

* Re: Linksys Gigabit USB2.0 adapter (asix) regression
From: David Hollis @ 2007-08-21 20:48 UTC (permalink / raw)
  To: Erik Slagter; +Cc: netdev
In-Reply-To: <46C9BFF2.8030206@slagter.name>

On Mon, 2007-08-20 at 18:23 +0200, Erik Slagter wrote:

> > /*
> >         if ((eeprom >> 8) != 1) {
> >                 asix_write_gpio(dev, 0x003c, 30);
> >                 asix_write_gpio(dev, 0x001c, 300);
> >                 asix_write_gpio(dev, 0x003c, 30);
> >         } else {
> > */
> >                 dbg("gpio phymode == 1 path");
> >                 asix_write_gpio(dev, AX_GPIO_GPO1EN, 30);
> >                 asix_write_gpio(dev, AX_GPIO_GPO1EN | AX_GPIO_GPO_1,
> > 30);
> > //      }
> 
> Tried, but now it doesn't work at all, no LEDs and no traffic.

Ok, that's what I expected.  It's very interesting that this was working
with older kernels, and now it isn't.  Very little has changed in the
last few releases.  I would think that it could be some type of changes
in usbnet, but that would seem to affect far more devices.  My AX88178
devices (not the Linksys one, they are manufacturing samples) still work
fine.  I may have to pickup on the Linksys ones to figure out what is up
with it.

-- 
David Hollis <dhollis@davehollis.com>

^ permalink raw reply

* [ofa-general] Re: [PATCH 0/9 Rev3] Implement batching skb API and support in IPoIB
From: jamal @ 2007-08-21 21:09 UTC (permalink / raw)
  To: David Miller
  Cc: jagana, gaagaan, jeff, Robert.Olsson, kumarkr, rdreier,
	peter.p.waskiewicz.jr, mcarlson, netdev, sri, general, mchan,
	tgraf, johnpol, shemminger, kaber, herbert
In-Reply-To: <20070821.115143.23013721.davem@davemloft.net>

On Tue, 2007-21-08 at 11:51 -0700, David Miller wrote:

> Because TSO does batching already, so it's a very good
> "tit for tat" comparison of the new batching scheme
> vs. an existing one.

Fair enough - I may have read too much into your email then;->

For bulk type of apps (where TSO will make a difference) this a fair
test. Hence i agree the 16KB buffer size is not sensible if the goal is
to simulate such an app. 
However (and this is where i read too much into what you were saying)
that the test by itself is insufficient comparison. You gotta look at
the other side of the coin i.e. at apps where TSO wont buy much.
Examples, a busy ssh or irc server and you could go as far as looking at
the most pre-dominant app on the wild west, http (average page size from
a few years back was in the range of 10-20K and can be simulated with
good ole netperf/iperf). 

cheers,
jamal

^ permalink raw reply

* [PATCH 3/3] sky2 1.17
From: Stephen Hemminger @ 2007-08-21 21:34 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: netdev
In-Reply-To: <20070821213401.543412708@linux-foundation.org>

[-- Attachment #1: sky2-1.17 --]
[-- Type: text/plain, Size: 461 bytes --]

Mark new version to track if current driver is in use.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>

--- a/drivers/net/sky2.c	2007-08-21 10:36:58.000000000 -0700
+++ b/drivers/net/sky2.c	2007-08-21 10:44:22.000000000 -0700
@@ -51,7 +51,7 @@
 #include "sky2.h"
 
 #define DRV_NAME		"sky2"
-#define DRV_VERSION		"1.16"
+#define DRV_VERSION		"1.17"
 #define PFX			DRV_NAME " "
 
 /*

-- 
Stephen Hemminger <shemminger@linux-foundation.org>


^ permalink raw reply

* [PATCH 1/3] sky2: clear PCI power control reg at startup
From: Stephen Hemminger @ 2007-08-21 21:34 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: netdev
In-Reply-To: <20070821213401.543412708@linux-foundation.org>

[-- Attachment #1: sky2-phy-power.patch --]
[-- Type: text/plain, Size: 1022 bytes --]

Make sure PCI register for PHY power gets cleared on boot, and make
sure to avoid any PCI posting problems.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>


--- a/drivers/net/sky2.c	2007-08-21 10:17:59.000000000 -0700
+++ b/drivers/net/sky2.c	2007-08-21 10:18:02.000000000 -0700
@@ -219,9 +219,12 @@ static void sky2_power_on(struct sky2_hw
 	else
 		sky2_write8(hw, B2_Y2_CLK_GATE, 0);
 
-	if (hw->chip_id == CHIP_ID_YUKON_EC_U || hw->chip_id == CHIP_ID_YUKON_EX) {
+	if (hw->chip_id == CHIP_ID_YUKON_EC_U ||
+	    hw->chip_id == CHIP_ID_YUKON_EX) {
 		u32 reg;
 
+		sky2_pci_write32(hw, PCI_DEV_REG3, 0);
+
 		reg = sky2_pci_read32(hw, PCI_DEV_REG4);
 		/* set all bits to 0 except bits 15..12 and 8 */
 		reg &= P_ASPM_CONTROL_MSK;
@@ -238,6 +241,8 @@ static void sky2_power_on(struct sky2_hw
 		reg = sky2_read32(hw, B2_GP_IO);
 		reg |= GLB_GPIO_STAT_RACE_DIS;
 		sky2_write32(hw, B2_GP_IO, reg);
+
+		sky2_read32(hw, B2_GP_IO);
 	}
 }
 

-- 
Stephen Hemminger <shemminger@linux-foundation.org>


^ permalink raw reply

* [PATCH 0/3] sky2 update for 2.6.23
From: Stephen Hemminger @ 2007-08-21 21:34 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: netdev

These patches address some issues that got listed as regressions
in 2.6.23.

-- 
Stephen Hemminger <shemminger@linux-foundation.org>


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox