Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
From: Paul E. McKenney @ 2007-08-15 20:15 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Herbert Xu, csnook, dhowells, linux-kernel, linux-arch, torvalds,
	netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl,
	Arnd Bergmann
In-Reply-To: <46C2FFDD.5000600@yahoo.com.au>

On Wed, Aug 15, 2007 at 11:30:05PM +1000, Nick Piggin wrote:
> Paul E. McKenney wrote:
> >On Tue, Aug 14, 2007 at 03:34:25PM +1000, Nick Piggin wrote:
> 
> >>Maybe it is the safe way to go, but it does obscure cases where there
> >>is a real need for barriers.
> >
> >
> >I prefer burying barriers into other primitives.
> 
> When they should naturally be there, eg. locking or the RCU primitives,
> I agree.
> 
> I don't like having them scattered in various "just in case" places,
> because it makes both the users and the APIs hard to understand and
> change.

I certainly agree that we shouldn't do volatile just for the fun of it,
and also that use of volatile should be quite rare.

> Especially since several big architectures don't have volatile in their
> atomic_get and _set, I think it would be a step backwards to add them in
> as a "just in case" thin now (unless there is a better reason).

Good point, except that I would expect gcc's optimization to continue
to improve.  I would like the kernel to be able to take advantage of
improved optimization, which means that we are going to have to make
a few changes.  Adding volatile to atomic_get() and atomic_set() is
IMHO one of those changes.

> >>Many atomic operations are allowed to be reordered between CPUs, so
> >>I don't have a good idea for the rationale to order them within the
> >>CPU (also loads and stores to long and ptr types are not ordered like
> >>this, although we do consider those to be atomic operations too).
> >>
> >>barrier() in a way is like enforcing sequential memory ordering
> >>between process and interrupt context, wheras volatile is just
> >>enforcing coherency of a single memory location (and as such is
> >>cheaper).
> >
> >
> >barrier() is useful, but it has the very painful side-effect of forcing
> >the compiler to dump temporaries.  So we do need something that is
> >not quite so global in effect.
> 
> Yep.
> 
> >>What do you think of this crazy idea?
> >>
> >>/* Enforce a compiler barrier for only operations to location X.
> >>* Call multiple times to provide an ordering between multiple
> >>* memory locations. Other memory operations can be assumed by
> >>* the compiler to remain unchanged and may be reordered
> >>*/
> >>#define order(x) asm volatile("" : "+m" (x))
> >
> >There was something very similar discussed earlier in this thread,
> >with quite a bit of debate as to exactly what the "m" flag should
> >look like.  I suggested something similar named ACCESS_ONCE in the
> >context of RCU (http://lkml.org/lkml/2007/7/11/664):
> 
> Oh, I missed that earlier debate. Will go have a look.
> 
> >	#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))
> >
> >The nice thing about this is that it works for both loads and stores.
> >Not clear that order() above does this -- I get compiler errors when
> >I try something like "b = order(a)" or "order(a) = 1" using gcc 4.1.2.
> 
> As Arnd ponted out, order() is not supposed to be an lvalue, but a
> statement like the rest of our memory ordering API.
> 
> As to your followup question of why to use it over ACCESS_ONCE. I
> guess, aside from consistency with the rest of the barrier APIs, you
> can use it in other primitives when you don't actually know what the
> caller is going to do or if it even will make an access. You could
> also use it between calls to _other_ primitives, etc... it just
> seems more flexible to me, but I haven't actually used such a thing
> in real code...
> 
> ACCESS_ONCE doesn't seem as descriptive. What it results in is the
> memory location being loaded or stored (presumably once exactly),
> but I think the more general underlying idea is a barrier point.

OK, first, I am not arguing that ACCESS_ONCE() can replace all current
uses of barrier().  Similarly, rcu_dereference(), rcu_assign_pointer(),
and the various RCU versions of the list-manipulation API in no way
replaced all uses of explicit memory barriers.  However, I do believe that
these API are much easier to use (where they apply, of course) than were
the earlier idioms involving explicit memory barriers.

But we of course need to keep the explicit memory and compiler barriers
for other situations.

							Thanx, Paul

^ permalink raw reply

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
From: Paul E. McKenney @ 2007-08-15 19:59 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: heiko.carstens, horms, linux-kernel, rpjday, ak, netdev, cfriesen,
	akpm, torvalds, Nick Piggin, linux-arch, jesper.juhl, zlynx,
	schwidefsky, Chris Snook, davem, wensong, wjiang, David Howells
In-Reply-To: <369924c4b3132a4b06258b7ac81b1006@kernel.crashing.org>

On Wed, Aug 15, 2007 at 09:46:55PM +0200, Segher Boessenkool wrote:
> >>>Well if there is only one memory location involved, then smp_rmb()
> >>>isn't
> >>>going to really do anything anyway, so it would be incorrect to use 
> >>>it.
> >>
> >>rmb() orders *any* two reads; that includes two reads from the same
> >>location.
> >
> >If the two reads are to the same location, all CPUs I am aware of
> >will maintain the ordering without need for a memory barrier.
> 
> That's true of course, although there is no real guarantee for that.

A CPU that did not provide this property ("cache coherence") would be
most emphatically reviled.  So we are pretty safe assuming that CPUs
will provide it.

						Thanx, Paul

^ permalink raw reply

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Christoph Lameter @ 2007-08-15 19:59 UTC (permalink / raw)
  To: Stefan Richter
  Cc: Satyam Sharma, Chris Snook, Linux Kernel Mailing List, linux-arch,
	torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher, Herbert Xu, Paul E. McKenney
In-Reply-To: <46C2D6F3.3070707@s5r6.in-berlin.de>

On Wed, 15 Aug 2007, Stefan Richter wrote:

> LDD3 says on page 125:  "The following operations are defined for the
> type [atomic_t] and are guaranteed to be atomic with respect to all
> processors of an SMP computer."
> 
> Doesn't "atomic WRT all processors" require volatility?

Atomic operations only require exclusive access to the cacheline while the 
value is modified.

^ permalink raw reply

* Re: [Bugme-new] [Bug 8891] New: in-kernel rpc generates broken RPCBPROC_GETVERSADDR v4 requests
From: Chuck Lever @ 2007-08-15 19:57 UTC (permalink / raw)
  To: Andrew Morton; +Cc: netdev, nfs, bugme-daemon@bugzilla.kernel.org, speidel
In-Reply-To: <20070815122909.c67c0db6.akpm@linux-foundation.org>

[-- Attachment #1: Type: text/plain, Size: 3941 bytes --]

Andrew Morton wrote:
> On Wed, 15 Aug 2007 12:22:51 -0700 (PDT)
> bugme-daemon@bugzilla.kernel.org wrote:
> 
>> http://bugzilla.kernel.org/show_bug.cgi?id=8891
>>
>>            Summary: in-kernel rpc generates broken RPCBPROC_GETVERSADDR v4
>>                     requests
>>            Product: Networking
>>            Version: 2.5
>>      KernelVersion: 2.6.22.1
>>           Platform: All
>>         OS/Version: Linux
>>               Tree: Mainline
>>             Status: NEW
>>           Severity: normal
>>           Priority: P1
>>          Component: Other
>>         AssignedTo: acme@ghostprotocols.net
>>         ReportedBy: speidel@ueberschuss.de
>>
>>
>> Most recent kernel where this bug did not occur: 2.6.20
> 
> Apparently a regression.

RPCBIND v4 requests are an experimental feature.  They can be disabled 
via a kernel build option (CONFIG_SUNRPC_BIND34).  I think these are 
left enabled in Fedora to find just the type of problem before v3 and v4 
becomes the default.

>> Distribution: Fedora Core 6
>> Hardware Environment: x86_64
>> Software Environment: NFS
>> Problem Description: When locking a file, an invalid RPCBPROC_GETVERSADDR
>> procedure call is sent out
>>
>> Steps to reproduce:
>> on the NFS server, run: tcpdump -xX -pni eth0 -s0 port 111 and src host
>> theNFSclient
>> on a client running 2.6.22.1, run: flock -x /nfsmount/somefile ls
>> tcpdump will show:
>>
>> 17:10:01.290655 IP 141.2.15.141.34572 > 141.2.1.1.sunrpc: P 0:168(168) ack 1
>> win 183 <nop,nop,timestamp 966695 115079499>
>>         0x0000:  4500 00dc 53ad 4000 3f06 bcdc 8d02 0f8d  E...S.@.?.......
>>                 0x0010:  8d02 0101 870c 006f cdcd 938d db00 c8a7 
>> .......o........
>>         0x0020:  8018 00b7 e59d 0000 0101 080a 000e c027  ...............'
>>         0x0030:  06db f94b 8000 00a4 fd48 2a07 0000 0000  ...K.....H*.....
>>         0x0040:  0000 0002 0001 86a0 0000 0004 0000 0009  ................
>>         0x0050:  0000 0001 0000 0054 0000 03c6 0000 0007  .......T........
>>         0x0060:  6572 6964 796b 6500 0000 0000 0000 0000  eridyke.........
>>         0x0070:  0000 000e 0000 0000 0000 0001 0000 0002  ................
>>         0x0080:  0000 0003 0000 0004 0000 0005 0000 0006  ................
>>         0x0090:  0000 0007 0000 0007 0000 0009 0000 000a  ................
>>         0x00a0:  0000 000c 0000 0014 0000 001c 0000 0000  ................
>>         0x00b0:  0000 0000 0001 86b5 0000 0004 0000 0003  ................
>>         0x00c0:  7463 7000 0000 0009 3134 312e 322e 312e  tcp.....141.2.1.
>>         0x00d0:  3100 0000 0000 0004 7270 6362            1.......rpcb
>>
>> Note the "141.2.1.1" in the output.
>>
>> According to RFC 1833, you can read here the following fields:
>>
>> RPCBPROC_GETVERSADDR version 4 procedure is being called
>> r_prog == 0x000186b5 == 100021 == nfs.lockd
>> r_vers == 4
>> r_netid == (length 3) "tcp"
>> r_addr == (length 9) "141.2.1.1"
>> r_owner == (length 4) "rpcb"
>>
>> This r_addr member is supposed to contain an universal address. Although I have
>> no source for that, the RFC clearly says a service can listen to an address,
>> and RPCBPROC_GETVERSADDR is supposed to return an universal address too. From
>> this I can conclude that an universal address is supposed to contain the port
>> number, and other operating systems are using the format
>> a.b.c.d.PortHighByte.PortLowByte (like in FTP, but with . instead of ,). I
>> interpret the RFC 1833 so that the port number of the rpcbind, that is, 111, is
>> to be appended, so the correct value of r_addr would be "141.2.1.1.0.111". This
>> matches other implementations.

That RFC is unclear on exactly what a universal address should look 
like.  However speidel's interpretation is not unreasonable, and I'm 
glad he was able to check this against other implementations that I 
don't have.  My unit testing against Solaris did not reveal this issue.

I'll look into the problem.

[-- Attachment #2: chuck.lever.vcf --]
[-- Type: text/x-vcard, Size: 290 bytes --]

begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
url:http://oss.oracle.com/~cel
version:2.1
end:vcard


[-- Attachment #3: Type: text/plain, Size: 315 bytes --]

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

[-- Attachment #4: Type: text/plain, Size: 140 bytes --]

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
From: Segher Boessenkool @ 2007-08-15 19:46 UTC (permalink / raw)
  To: paulmck
  Cc: heiko.carstens, horms, linux-kernel, rpjday, ak, netdev, cfriesen,
	akpm, torvalds, Nick Piggin, linux-arch, jesper.juhl, zlynx,
	schwidefsky, Chris Snook, davem, wensong, wjiang, David Howells
In-Reply-To: <20070815191829.GJ9645@linux.vnet.ibm.com>

>>> Well if there is only one memory location involved, then smp_rmb()
>>> isn't
>>> going to really do anything anyway, so it would be incorrect to use 
>>> it.
>>
>> rmb() orders *any* two reads; that includes two reads from the same
>> location.
>
> If the two reads are to the same location, all CPUs I am aware of
> will maintain the ordering without need for a memory barrier.

That's true of course, although there is no real guarantee for that.


Segher


^ permalink raw reply

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Satyam Sharma @ 2007-08-15 19:54 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Segher Boessenkool, horms, Stefan Richter,
	Linux Kernel Mailing List, rpjday, netdev, ak, cfriesen,
	Heiko Carstens, jesper.juhl, linux-arch, Andrew Morton, zlynx,
	clameter, schwidefsky, Chris Snook, Herbert Xu, davem,
	Linus Torvalds, wensong, wjiang
In-Reply-To: <20070815185724.GH9645@linux.vnet.ibm.com>

[ The Cc: list scares me. Should probably trim it. ]


On Wed, 15 Aug 2007, Paul E. McKenney wrote:

> On Wed, Aug 15, 2007 at 08:31:25PM +0200, Segher Boessenkool wrote:
> > >>How does the compiler know that msleep() has got barrier()s?
> > >
> > >Because msleep_interruptible() is in a separate compilation unit,
> > >the compiler has to assume that it might modify any arbitrary global.
> > 
> > No; compilation units have nothing to do with it, GCC can optimise
> > across compilation unit boundaries just fine, if you tell it to
> > compile more than one compilation unit at once.
> 
> Last I checked, the Linux kernel build system did compile each .c file
> as a separate compilation unit.
> 
> > What you probably mean is that the compiler has to assume any code
> > it cannot currently see can do anything (insofar as allowed by the
> > relevant standards etc.)

I think this was just terminology confusion here again. Isn't "any code
that it cannot currently see" the same as "another compilation unit",
and wouldn't the "compilation unit" itself expand if we ask gcc to
compile more than one unit at once? Or is there some more specific
"definition" for "compilation unit" (in gcc lingo, possibly?)

^ permalink raw reply

* Re: [Bugme-new] [Bug 8891] New: in-kernel rpc generates broken RPCBPROC_GETVERSADDR v4 requests
From: Andrew Morton @ 2007-08-15 19:29 UTC (permalink / raw)
  To: netdev, nfs; +Cc: bugme-daemon@bugzilla.kernel.org, speidel
In-Reply-To: <bug-8891-10286@http.bugzilla.kernel.org/>

On Wed, 15 Aug 2007 12:22:51 -0700 (PDT)
bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=8891
> 
>            Summary: in-kernel rpc generates broken RPCBPROC_GETVERSADDR v4
>                     requests
>            Product: Networking
>            Version: 2.5
>      KernelVersion: 2.6.22.1
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>         AssignedTo: acme@ghostprotocols.net
>         ReportedBy: speidel@ueberschuss.de
> 
> 
> Most recent kernel where this bug did not occur: 2.6.20

Apparently a regression.

> Distribution: Fedora Core 6
> Hardware Environment: x86_64
> Software Environment: NFS
> Problem Description: When locking a file, an invalid RPCBPROC_GETVERSADDR
> procedure call is sent out
> 
> Steps to reproduce:
> on the NFS server, run: tcpdump -xX -pni eth0 -s0 port 111 and src host
> theNFSclient
> on a client running 2.6.22.1, run: flock -x /nfsmount/somefile ls
> tcpdump will show:
> 
> 17:10:01.290655 IP 141.2.15.141.34572 > 141.2.1.1.sunrpc: P 0:168(168) ack 1
> win 183 <nop,nop,timestamp 966695 115079499>
>         0x0000:  4500 00dc 53ad 4000 3f06 bcdc 8d02 0f8d  E...S.@.?.......
>                 0x0010:  8d02 0101 870c 006f cdcd 938d db00 c8a7 
> .......o........
>         0x0020:  8018 00b7 e59d 0000 0101 080a 000e c027  ...............'
>         0x0030:  06db f94b 8000 00a4 fd48 2a07 0000 0000  ...K.....H*.....
>         0x0040:  0000 0002 0001 86a0 0000 0004 0000 0009  ................
>         0x0050:  0000 0001 0000 0054 0000 03c6 0000 0007  .......T........
>         0x0060:  6572 6964 796b 6500 0000 0000 0000 0000  eridyke.........
>         0x0070:  0000 000e 0000 0000 0000 0001 0000 0002  ................
>         0x0080:  0000 0003 0000 0004 0000 0005 0000 0006  ................
>         0x0090:  0000 0007 0000 0007 0000 0009 0000 000a  ................
>         0x00a0:  0000 000c 0000 0014 0000 001c 0000 0000  ................
>         0x00b0:  0000 0000 0001 86b5 0000 0004 0000 0003  ................
>         0x00c0:  7463 7000 0000 0009 3134 312e 322e 312e  tcp.....141.2.1.
>         0x00d0:  3100 0000 0000 0004 7270 6362            1.......rpcb
> 
> Note the "141.2.1.1" in the output.
> 
> According to RFC 1833, you can read here the following fields:
> 
> RPCBPROC_GETVERSADDR version 4 procedure is being called
> r_prog == 0x000186b5 == 100021 == nfs.lockd
> r_vers == 4
> r_netid == (length 3) "tcp"
> r_addr == (length 9) "141.2.1.1"
> r_owner == (length 4) "rpcb"
> 
> This r_addr member is supposed to contain an universal address. Although I have
> no source for that, the RFC clearly says a service can listen to an address,
> and RPCBPROC_GETVERSADDR is supposed to return an universal address too. From
> this I can conclude that an universal address is supposed to contain the port
> number, and other operating systems are using the format
> a.b.c.d.PortHighByte.PortLowByte (like in FTP, but with . instead of ,). I
> interpret the RFC 1833 so that the port number of the rpcbind, that is, 111, is
> to be appended, so the correct value of r_addr would be "141.2.1.1.0.111". This
> matches other implementations.
> 
> Of course, all this does sound like nitpicking, and as the callee is not even
> supposed to look at the port number (this argument is only there so the callee
> can decide correctly which interface address to use for the response). However,
> this omission triggers an apparently unpatched denial of service vulnerability
> against HP-UX 11.11's rpcbind and causes it to quit with a core dump and a bus
> error. When using a correctly formed universal address, or the empty string,
> there is no crash of HP-UX's rpcbind.
> 
> And of course it is HP who has to fix the DoS bug - but Linux 2.6.22 triggers
> it by a RFC violation, which is a minor bug to be fixed on Linux's side. By the
> way, does anyone have the right contacts to HP to report this bug? I do not
> have a HP service contract any more, and the only other support place I found
> is their public forum, where I of course can only provide limited information
> about the bug (basically, the same I am posting here).
> 
> Maybe the bug is fixed in a more current kernel, I was only using Fedora's
> highly patched distribution kernel and compared it to the sources of the base
> version in the main tree, where I see the very same problem in the source code.
> So I do not think it was Fedora who caused the problem, and am assigning it to
> mainline.
> 


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Satyam Sharma @ 2007-08-15 19:40 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Christoph Lameter, heiko.carstens, horms, Stefan Richter,
	Linux Kernel Mailing List, Paul E. McKenney, netdev, ak, cfriesen,
	rpjday, jesper.juhl, linux-arch, Andrew Morton, zlynx,
	schwidefsky, Chris Snook, Herbert Xu, davem, Linus Torvalds,
	wensong, wjiang
In-Reply-To: <9350e9fab505c92af8d5e1f3441d6ad2@kernel.crashing.org>



On Wed, 15 Aug 2007, Segher Boessenkool wrote:

> > "Volatile behaviour" itself isn't consistently defined (at least
> > definitely not consistently implemented in various gcc versions across
> > platforms),
> 
> It should be consistent across platforms; if not, file a bug please.
> 
> > but it is /expected/ to mean something like: "ensure that
> > every such access actually goes all the way to memory, and is not
> > re-ordered w.r.t. to other accesses, as far as the compiler can take
                              ^
                              (volatile)

(or, alternatively, "other accesses to the same volatile object" ...)

> > care of these". The last "as far as compiler can take care" disclaimer
> > comes about due to CPUs doing their own re-ordering nowadays.
> 
> You can *expect* whatever you want, but this isn't in line with
> reality at all.
> 
> volatile _does not_ prevent reordering wrt other accesses.
> [...]
> What volatile does are a) never optimise away a read (or write)
> to the object, since the data can change in ways the compiler
> cannot see; and b) never move stores to the object across a
> sequence point.  This does not mean other accesses cannot be
> reordered wrt the volatile access.
> 
> If the abstract machine would do an access to a volatile-
> qualified object, the generated machine code will do that
> access too.  But, for example, it can still be optimised
> away by the compiler, if it can prove it is allowed to.

As (now) indicated above, I had meant multiple volatile accesses to
the same object, obviously.

BTW:

#define atomic_read(a)	(*(volatile int *)&(a))
#define atomic_set(a,i)	(*(volatile int *)&(a) = (i))

int a;

void func(void)
{
	int b;

	b = atomic_read(a);
	atomic_set(a, 20);
	b = atomic_read(a);
}

gives:

func:
	pushl	%ebp
	movl	a, %eax
	movl	%esp, %ebp
	movl	$20, a
	movl	a, %eax
	popl	%ebp
	ret

so the first atomic_read() wasn't optimized away.


> volatile _does not_ make accesses go all the way to memory.
> [...]
> If you want stuff to go all the way to memory, you need some
> architecture-specific flush sequence; to make a store globally
> visible before another store, you need mb(); before some following
> read, you need mb(); to prevent reordering you need a barrier.

Sure, which explains the "as far as the compiler can take care" bit.
Poor phrase / choice of words, probably.

^ permalink raw reply

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
From: Paul E. McKenney @ 2007-08-15 19:18 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Nick Piggin, heiko.carstens, horms, linux-kernel, rpjday, ak,
	netdev, cfriesen, akpm, torvalds, jesper.juhl, linux-arch, zlynx,
	schwidefsky, Chris Snook, davem, wensong, wjiang, David Howells
In-Reply-To: <2cbda24e96a49c3ab7cf7039c515f9fc@kernel.crashing.org>

On Wed, Aug 15, 2007 at 08:51:58PM +0200, Segher Boessenkool wrote:
> >Well if there is only one memory location involved, then smp_rmb() 
> >isn't
> >going to really do anything anyway, so it would be incorrect to use it.
> 
> rmb() orders *any* two reads; that includes two reads from the same
> location.

If the two reads are to the same location, all CPUs I am aware of
will maintain the ordering without need for a memory barrier.

						Thanx, Paul

> >Consider that smp_rmb basically will do anything from flushing the
> >pipeline to invalidating loads speculatively executed out of order. 
> >AFAIK
> >it will not control the visibility of stores coming from other CPUs 
> >(that
> >is up to the cache coherency).
> 
> The writer side should typically use wmb() whenever the reader side
> uses rmb(), sure.
> 
> 
> Segher
> 
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 1/1] bonding: eliminate RTNL assertion spew
From: Jay Vosburgh @ 2007-08-15 19:15 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Andy Gospodarek, Stephen Hemminger, Jeff Garzik, netdev
In-Reply-To: <170fa0d20708151136q1535672dvbc90271afe80ec6f@mail.gmail.com>

Mike Snitzer <snitzer@gmail.com> wrote:

>I'd very much like to help out.  The "rtnl assertion spew" isn't
>instilling confidence in customers I've been working with.  If you'd
>like to send me patches in private I'd help test them ASAP.

	I'll send you some stuff off-list in a bit.

>Could you elaborate on the associated risk of _not_ fixing these
>issues?  balance-alb _seems_ to be working even though these traces
>occur on initialization.  But these rtnl traces are clearly more
>generic than balance-alb.

	There are really a couple of things going on.

	One danger is that some network device drivers may sleep in
certain critical sections (set MAC address, for example) while bonding
holds some lock.  Most drivers don't have potential sleeps here, but a
few do.  The most notable as I recall are a subset of the tg3 devices,

	The other danger is that some callback in the notifier call when
the MAC address changes may sleep.

	These are both separate from the RTNL warnings, which are a
notification that the interface is being messed with, but RTNL isn't
held.  The danger here is that a concurrent, independent, operation
could acquire RTNL and simultaneously fiddle with the interface.

	The ultimate problem with fixing it is that the locking in
bonding was implemented before these locking constraints existed, and
untangling the locking to conform to the new rules is fairly invovled.
Andy and I have been through several iterations of a "final" patch, and
we keep finding regressions.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Paul E. McKenney @ 2007-08-15 19:07 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Stefan Richter, Christoph Lameter, Chris Snook,
	Linux Kernel Mailing List, linux-arch, Linus Torvalds, netdev,
	Andrew Morton, ak, heiko.carstens, davem, schwidefsky, wensong,
	horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl, segher,
	Herbert Xu
In-Reply-To: <alpine.LFD.0.999.0708152048270.16414@enigma.security.iitk.ac.in>

On Wed, Aug 15, 2007 at 11:25:05PM +0530, Satyam Sharma wrote:
> Hi Paul,
> On Wed, 15 Aug 2007, Paul E. McKenney wrote:
> 
> > On Wed, Aug 15, 2007 at 07:17:29PM +0530, Satyam Sharma wrote:
> > > [...]
> > > No, I'd actually prefer something like what Christoph Lameter suggested,
> > > i.e. users (such as above) who want "volatile"-like behaviour from atomic
> > > ops can use alternative functions. How about something like:
> > > 
> > > #define atomic_read_volatile(v)			\
> > > 	({					\
> > > 		forget(&(v)->counter);		\
> > > 		((v)->counter);			\
> > > 	})
> > 
> > Wouldn't the above "forget" the value, throw it away, then forget
> > that it forgot it, giving non-volatile semantics?
> 
> Nope, I don't think so. I wrote the following trivial testcases:
> [ See especially tp4.c and tp4.s (last example). ]

Right.  I should have said "wouldn't the compiler be within its rights
to forget the value, throw it away, then forget that it forgot it".
The value coming out of the #define above is an unadorned ((v)->counter),
which has no volatile semantics.

> ==============================================================================
> $ cat tp1.c # Using volatile access casts
> 
> #define atomic_read(a)	(*(volatile int *)&a)
> 
> int a;
> 
> void func(void)
> {
> 	a = 0;
> 	while (atomic_read(a))
> 		;
> }
> ==============================================================================
> $ gcc -Os -S tp1.c; cat tp1.s
> 
> func:
> 	pushl	%ebp
> 	movl	%esp, %ebp
> 	movl	$0, a
> .L2:
> 	movl	a, %eax
> 	testl	%eax, %eax
> 	jne	.L2
> 	popl	%ebp
> 	ret
> ==============================================================================
> $ cat tp2.c # Using nothing; gcc will optimize the whole loop away
> 
> #define forget(x)
> 
> #define atomic_read(a)		\
> 	({			\
> 		forget(&(a));	\
> 		(a);		\
> 	})
> 
> int a;
> 
> void func(void)
> {
> 	a = 0;
> 	while (atomic_read(a))
> 		;
> }
> ==============================================================================
> $ gcc -Os -S tp2.c; cat tp2.s
> 
> func:
> 	pushl	%ebp
> 	movl	%esp, %ebp
> 	popl	%ebp
> 	movl	$0, a
> 	ret
> ==============================================================================
> $ cat tp3.c # Using a full memory clobber barrier
> 
> #define forget(x)	asm volatile ("":::"memory")
> 
> #define atomic_read(a)		\
> 	({			\
> 		forget(&(a));	\
> 		(a);		\
> 	})
> 
> int a;
> 
> void func(void)
> {
> 	a = 0;
> 	while (atomic_read(a))
> 		;
> }
> ==============================================================================
> $ gcc -Os -S tp3.c; cat tp3.s
> 
> func:
> 	pushl	%ebp
> 	movl	%esp, %ebp
> 	movl	$0, a
> .L2:
> 	cmpl	$0, a
> 	jne	.L2
> 	popl	%ebp
> 	ret
> ==============================================================================
> $ cat tp4.c # Using a forget(var) macro
> 
> #define forget(a)	__asm__ __volatile__ ("" :"=m" (a) :"m" (a))
> 
> #define atomic_read(a)		\
> 	({			\
> 		forget(a);	\
> 		(a);		\
> 	})
> 
> int a;
> 
> void func(void)
> {
> 	a = 0;
> 	while (atomic_read(a))
> 		;
> }
> ==============================================================================
> $ gcc -Os -S tp4.c; cat tp4.s
> 
> func:
> 	pushl	%ebp
> 	movl	%esp, %ebp
> 	movl	$0, a
> .L2:
> 	cmpl	$0, a
> 	jne	.L2
> 	popl	%ebp
> 	ret
> ==============================================================================
> 
> Possibly these were too trivial to expose any potential problems that you
> may have been referring to, so would be helpful if you could write a more
> concrete example / sample code.

The trick is to have a sufficiently complicated expression to force
the compiler to run out of registers.  If the value is non-volatile,
it will refetch it (and expect it not to have changed, possibly being
disappointed by an interrupt handler running on that same CPU).

> > > Or possibly, implement these "volatile" atomic ops variants in inline asm
> > > like the patch that Sebastian Siewior has submitted on another thread just
> > > a while back.
> > 
> > Given that you are advocating a change (please keep in mind that
> > atomic_read() and atomic_set() had volatile semantics on almost all
> > platforms), care to give some example where these historical volatile
> > semantics are causing a problem?
> > [...]
> > Can you give even one example
> > where the pre-existing volatile semantics are causing enough of a problem
> > to justify adding yet more atomic_*() APIs?
> 
> Will take this to the other sub-thread ...

OK.

> > > Of course, if we find there are more callers in the kernel who want the
> > > volatility behaviour than those who don't care, we can re-define the
> > > existing ops to such variants, and re-name the existing definitions to
> > > somethine else, say "atomic_read_nonvolatile" for all I care.
> > 
> > Do we really need another set of APIs?
> 
> Well, if there's one set of users who do care about volatile behaviour,
> and another set that doesn't, it only sounds correct to provide both
> those APIs, instead of forcing one behaviour on all users.

Well, if the second set doesn't care, they should be OK with the volatile
behavior in this case.

							Thanx, Paul

^ permalink raw reply

* [patch 4/4] Update e1000 driver to use devres.
From: Brandon Philips @ 2007-08-15 19:00 UTC (permalink / raw)
  To: netdev, e1000-devel; +Cc: teheo, Brandon Philips

[-- Attachment #1: e1000-devres.patch --]
[-- Type: text/plain, Size: 6880 bytes --]

Conversion of e1000 probe() and remove() to devres.

Signed-off-by: Brandon Philips <bphilips@suse.de>
---
 drivers/net/e1000/e1000.h      |    1 
 drivers/net/e1000/e1000_main.c |   92 ++++++++++++-----------------------------
 2 files changed, 28 insertions(+), 65 deletions(-)

Index: linux-2.6/drivers/net/e1000/e1000_main.c
===================================================================
--- linux-2.6.orig/drivers/net/e1000/e1000_main.c
+++ linux-2.6/drivers/net/e1000/e1000_main.c
@@ -860,15 +860,14 @@ e1000_probe(struct pci_dev *pdev,
 {
 	struct net_device *netdev;
 	struct e1000_adapter *adapter;
-	unsigned long mmio_start, mmio_len;
-	unsigned long flash_start, flash_len;
+	unsigned long mmio_len, flash_len;
 
 	static int cards_found = 0;
 	static int global_quad_port_a = 0; /* global ksp3 port a indication */
 	int i, err, pci_using_dac;
 	uint16_t eeprom_data = 0;
 	uint16_t eeprom_apme_mask = E1000_EEPROM_APME;
-	if ((err = pci_enable_device(pdev)))
+	if ((err = pcim_enable_device(pdev)))
 		return err;
 
 	if (!(err = pci_set_dma_mask(pdev, DMA_64BIT_MASK)) &&
@@ -878,20 +877,19 @@ e1000_probe(struct pci_dev *pdev,
 		if ((err = pci_set_dma_mask(pdev, DMA_32BIT_MASK)) &&
 		    (err = pci_set_consistent_dma_mask(pdev, DMA_32BIT_MASK))) {
 			E1000_ERR("No usable DMA configuration, aborting\n");
-			goto err_dma;
+			return err;
 		}
 		pci_using_dac = 0;
 	}
 
 	if ((err = pci_request_regions(pdev, e1000_driver_name)))
-		goto err_pci_reg;
+		return err;
 
 	pci_set_master(pdev);
 
-	err = -ENOMEM;
-	netdev = alloc_etherdev(sizeof(struct e1000_adapter));
+	netdev = devm_alloc_etherdev(&pdev->dev, sizeof(struct e1000_adapter));
 	if (!netdev)
-		goto err_alloc_etherdev;
+		return -ENOMEM;
 
 	SET_MODULE_OWNER(netdev);
 	SET_NETDEV_DEV(netdev, &pdev->dev);
@@ -903,13 +901,11 @@ e1000_probe(struct pci_dev *pdev,
 	adapter->hw.back = adapter;
 	adapter->msg_enable = (1 << debug) - 1;
 
-	mmio_start = pci_resource_start(pdev, BAR_0);
 	mmio_len = pci_resource_len(pdev, BAR_0);
 
-	err = -EIO;
-	adapter->hw.hw_addr = ioremap(mmio_start, mmio_len);
+	adapter->hw.hw_addr = pcim_iomap(pdev, BAR_0, mmio_len);
 	if (!adapter->hw.hw_addr)
-		goto err_ioremap;
+		return -EIO;
 
 	for (i = BAR_1; i <= BAR_5; i++) {
 		if (pci_resource_len(pdev, i) == 0)
@@ -943,8 +939,8 @@ e1000_probe(struct pci_dev *pdev,
 #endif
 	strncpy(netdev->name, pci_name(pdev), sizeof(netdev->name) - 1);
 
-	netdev->mem_start = mmio_start;
-	netdev->mem_end = mmio_start + mmio_len;
+	netdev->mem_start = pci_resource_start(pdev, BAR_0);
+	netdev->mem_end = netdev->mem_start + mmio_len;
 	netdev->base_addr = adapter->hw.io_base;
 
 	adapter->bd_number = cards_found;
@@ -952,16 +948,15 @@ e1000_probe(struct pci_dev *pdev,
 	/* setup the private structure */
 
 	if ((err = e1000_sw_init(adapter)))
-		goto err_sw_init;
+		return err;
 
 	err = -EIO;
 	/* Flash BAR mapping must happen after e1000_sw_init
 	 * because it depends on mac_type */
 	if ((adapter->hw.mac_type == e1000_ich8lan) &&
 	   (pci_resource_flags(pdev, 1) & IORESOURCE_MEM)) {
-		flash_start = pci_resource_start(pdev, 1);
 		flash_len = pci_resource_len(pdev, 1);
-		adapter->hw.flash_address = ioremap(flash_start, flash_len);
+		adapter->hw.flash_address = pcim_iomap(pdev, 1, flash_len);
 		if (!adapter->hw.flash_address)
 			goto err_flashmap;
 	}
@@ -1163,29 +1158,11 @@ err_register:
 err_eeprom:
 	if (!e1000_check_phy_reset_block(&adapter->hw))
 		e1000_phy_hw_reset(&adapter->hw);
-
-	if (adapter->hw.flash_address)
-		iounmap(adapter->hw.flash_address);
 err_flashmap:
 #ifdef CONFIG_E1000_NAPI
 	for (i = 0; i < adapter->num_rx_queues; i++)
 		dev_put(&adapter->polling_netdev[i]);
 #endif
-
-	kfree(adapter->tx_ring);
-	kfree(adapter->rx_ring);
-#ifdef CONFIG_E1000_NAPI
-	kfree(adapter->polling_netdev);
-#endif
-err_sw_init:
-	iounmap(adapter->hw.hw_addr);
-err_ioremap:
-	free_netdev(netdev);
-err_alloc_etherdev:
-	pci_release_regions(pdev);
-err_pci_reg:
-err_dma:
-	pci_disable_device(pdev);
 	return err;
 }
 
@@ -1224,21 +1201,6 @@ e1000_remove(struct pci_dev *pdev)
 
 	if (!e1000_check_phy_reset_block(&adapter->hw))
 		e1000_phy_hw_reset(&adapter->hw);
-
-	kfree(adapter->tx_ring);
-	kfree(adapter->rx_ring);
-#ifdef CONFIG_E1000_NAPI
-	kfree(adapter->polling_netdev);
-#endif
-
-	iounmap(adapter->hw.hw_addr);
-	if (adapter->hw.flash_address)
-		iounmap(adapter->hw.flash_address);
-	pci_release_regions(pdev);
-
-	free_netdev(netdev);
-
-	pci_disable_device(pdev);
 }
 
 /**
@@ -1350,27 +1312,27 @@ e1000_sw_init(struct e1000_adapter *adap
 static int __devinit
 e1000_alloc_queues(struct e1000_adapter *adapter)
 {
-	adapter->tx_ring = kcalloc(adapter->num_tx_queues,
-	                           sizeof(struct e1000_tx_ring), GFP_KERNEL);
+	adapter->tx_ring = devm_kcalloc(&adapter->pdev->dev,
+					adapter->num_tx_queues,
+					sizeof(struct e1000_tx_ring),
+					GFP_KERNEL);
 	if (!adapter->tx_ring)
 		return -ENOMEM;
 
-	adapter->rx_ring = kcalloc(adapter->num_rx_queues,
-	                           sizeof(struct e1000_rx_ring), GFP_KERNEL);
-	if (!adapter->rx_ring) {
-		kfree(adapter->tx_ring);
+	adapter->rx_ring = devm_kcalloc(&adapter->pdev->dev,
+					adapter->num_rx_queues,
+					sizeof(struct e1000_rx_ring),
+					GFP_KERNEL);
+	if (!adapter->rx_ring)
 		return -ENOMEM;
-	}
 
 #ifdef CONFIG_E1000_NAPI
-	adapter->polling_netdev = kcalloc(adapter->num_rx_queues,
-	                                  sizeof(struct net_device),
-	                                  GFP_KERNEL);
-	if (!adapter->polling_netdev) {
-		kfree(adapter->tx_ring);
-		kfree(adapter->rx_ring);
+	adapter->polling_netdev = devm_kcalloc(&adapter->pdev->dev,
+					       adapter->num_rx_queues,
+					       sizeof(struct net_device),
+					       GFP_KERNEL);
+	if (!adapter->polling_netdev)
 		return -ENOMEM;
-	}
 #endif
 
 	return E1000_SUCCESS;
@@ -5174,7 +5136,7 @@ e1000_resume(struct pci_dev *pdev)
 
 	pci_set_power_state(pdev, PCI_D0);
 	pci_restore_state(pdev);
-	if ((err = pci_enable_device(pdev))) {
+	if ((err = pcim_enable_device(pdev))) {
 		printk(KERN_ERR "e1000: Cannot enable PCI device from suspend\n");
 		return err;
 	}
Index: linux-2.6/drivers/net/e1000/e1000.h
===================================================================
--- linux-2.6.orig/drivers/net/e1000/e1000.h
+++ linux-2.6/drivers/net/e1000/e1000.h
@@ -41,6 +41,7 @@
 #include <linux/errno.h>
 #include <linux/ioport.h>
 #include <linux/pci.h>
+#include <linux/io.h>
 #include <linux/kernel.h>
 #include <linux/netdevice.h>
 #include <linux/etherdevice.h>

-- 

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply

* [patch 3/4] Implement devm_kcalloc
From: Brandon Philips @ 2007-08-15 19:00 UTC (permalink / raw)
  To: netdev, e1000-devel; +Cc: teheo, Brandon Philips

[-- Attachment #1: kcalloc-devres.patch --]
[-- Type: text/plain, Size: 1998 bytes --]

devm_kcalloc is a simple wrapper around devm_kzalloc for arrays.  This is
needed because kcalloc is often used in network devices. 

Signed-off-by: Brandon Philips <bphilips@suse.de>

---
 drivers/base/devres.c  |   16 ++++++++++++++++
 include/linux/device.h |    2 ++
 2 files changed, 18 insertions(+)

Index: linux-2.6/drivers/base/devres.c
===================================================================
--- linux-2.6.orig/drivers/base/devres.c
+++ linux-2.6/drivers/base/devres.c
@@ -630,6 +630,22 @@ void * devm_kzalloc(struct device *dev, 
 EXPORT_SYMBOL_GPL(devm_kzalloc);
 
 /**
+ * devm_kcalloc - resource-managed kcalloc
+ * @dev: Device to allocate memory for
+ * @n: number of elements.
+ * @size: element size.
+ * @flags: the type of memory to allocate.
+ */
+void *devm_kcalloc(struct device *dev, size_t n, size_t size,
+			   gfp_t flags)
+{
+	if (n != 0 && size > ULONG_MAX / n)
+		return NULL;
+	return devm_kzalloc(dev, n * size, flags);
+}
+EXPORT_SYMBOL_GPL(devm_kcalloc);
+
+/**
  * devm_kfree - Resource-managed kfree
  * @dev: Device this memory belongs to
  * @p: Memory to free
Index: linux-2.6/include/linux/device.h
===================================================================
--- linux-2.6.orig/include/linux/device.h
+++ linux-2.6/include/linux/device.h
@@ -402,6 +402,8 @@ extern int devres_release_group(struct d
 
 /* managed kzalloc/kfree for device drivers, no kmalloc, always use kzalloc */
 extern void *devm_kzalloc(struct device *dev, size_t size, gfp_t gfp);
+extern void *devm_kcalloc(struct device *dev, size_t n, size_t size,
+			  gfp_t flags);
 extern void devm_kfree(struct device *dev, void *p);
 
 struct device {

-- 

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply

* [patch 2/4] Update e100 driver to use devres.
From: Brandon Philips @ 2007-08-15 18:59 UTC (permalink / raw)
  To: netdev, e1000-devel; +Cc: teheo, Brandon Philips

[-- Attachment #1: e100-devres.patch --]
[-- Type: text/plain, Size: 5613 bytes --]

devres manages device resources and is currently used by all libata low level
drivers.   When devres is used in a driver the complexity of error handling in
the device probe and remove functions can be reduced.

In this case e100_free(), e100_remove() and all of the gotos in e100_probe have
been removed.  

Signed-off-by: Brandon Philips <bphilips@suse.de>

---
 drivers/net/e100.c |   78 +++++++++++++----------------------------------------
 1 file changed, 20 insertions(+), 58 deletions(-)

Index: linux-2.6/drivers/net/e100.c
===================================================================
--- linux-2.6.orig/drivers/net/e100.c
+++ linux-2.6/drivers/net/e100.c
@@ -2517,18 +2517,11 @@ static int e100_do_ioctl(struct net_devi
 
 static int e100_alloc(struct nic *nic)
 {
-	nic->mem = pci_alloc_consistent(nic->pdev, sizeof(struct mem),
-		&nic->dma_addr);
-	return nic->mem ? 0 : -ENOMEM;
-}
+	struct device *dev = &nic->pdev->dev;
 
-static void e100_free(struct nic *nic)
-{
-	if(nic->mem) {
-		pci_free_consistent(nic->pdev, sizeof(struct mem),
-			nic->mem, nic->dma_addr);
-		nic->mem = NULL;
-	}
+	nic->mem = dmam_alloc_coherent(dev, sizeof(struct mem),
+		&nic->dma_addr, GFP_ATOMIC);
+	return nic->mem ? 0 : -ENOMEM;
 }
 
 static int e100_open(struct net_device *netdev)
@@ -2554,8 +2547,10 @@ static int __devinit e100_probe(struct p
 	struct net_device *netdev;
 	struct nic *nic;
 	int err;
+	void __iomem **iomap;
+	int bar;
 
-	if(!(netdev = alloc_etherdev(sizeof(struct nic)))) {
+	if (!(netdev = devm_alloc_etherdev(&pdev->dev, sizeof(struct nic)))) {
 		if(((1 << debug) - 1) & NETIF_MSG_PROBE)
 			printk(KERN_ERR PFX "Etherdev alloc failed, abort.\n");
 		return -ENOMEM;
@@ -2585,26 +2580,28 @@ static int __devinit e100_probe(struct p
 	nic->msg_enable = (1 << debug) - 1;
 	pci_set_drvdata(pdev, netdev);
 
-	if((err = pci_enable_device(pdev))) {
+	if ((err = pcim_enable_device(pdev))) {
 		DPRINTK(PROBE, ERR, "Cannot enable PCI device, aborting.\n");
-		goto err_out_free_dev;
+		return err;
 	}
 
 	if(!(pci_resource_flags(pdev, 0) & IORESOURCE_MEM)) {
 		DPRINTK(PROBE, ERR, "Cannot find proper PCI device "
 			"base address, aborting.\n");
 		err = -ENODEV;
-		goto err_out_disable_pdev;
+		return err;
 	}
 
-	if((err = pci_request_regions(pdev, DRV_NAME))) {
+	bar = use_io ? 1 : 0;
+	if((err = pcim_iomap_regions(pdev, 1 << bar, DRV_NAME))) {
 		DPRINTK(PROBE, ERR, "Cannot obtain PCI resources, aborting.\n");
-		goto err_out_disable_pdev;
+		return err;
 	}
+	iomap = pcim_iomap_table(pdev)[bar];
 
 	if((err = pci_set_dma_mask(pdev, DMA_32BIT_MASK))) {
 		DPRINTK(PROBE, ERR, "No usable DMA configuration, aborting.\n");
-		goto err_out_free_res;
+		return err;
 	}
 
 	SET_MODULE_OWNER(netdev);
@@ -2613,12 +2610,6 @@ static int __devinit e100_probe(struct p
 	if (use_io)
 		DPRINTK(PROBE, INFO, "using i/o access mode\n");
 
-	nic->csr = pci_iomap(pdev, (use_io ? 1 : 0), sizeof(struct csr));
-	if(!nic->csr) {
-		DPRINTK(PROBE, ERR, "Cannot map device registers, aborting.\n");
-		err = -ENOMEM;
-		goto err_out_free_res;
-	}
 
 	if(ent->driver_data)
 		nic->flags |= ich;
@@ -2650,11 +2641,11 @@ static int __devinit e100_probe(struct p
 
 	if((err = e100_alloc(nic))) {
 		DPRINTK(PROBE, ERR, "Cannot alloc driver memory, aborting.\n");
-		goto err_out_iounmap;
+		return err;
 	}
 
 	if((err = e100_eeprom_load(nic)))
-		goto err_out_free;
+		return err;
 
 	e100_phy_init(nic);
 
@@ -2664,8 +2655,7 @@ static int __devinit e100_probe(struct p
 		if (!eeprom_bad_csum_allow) {
 			DPRINTK(PROBE, ERR, "Invalid MAC address from "
 			        "EEPROM, aborting.\n");
-			err = -EAGAIN;
-			goto err_out_free;
+			return -EAGAIN;
 		} else {
 			DPRINTK(PROBE, ERR, "Invalid MAC address from EEPROM, "
 			        "you MUST configure one.\n");
@@ -2685,7 +2675,7 @@ static int __devinit e100_probe(struct p
 	strcpy(netdev->name, "eth%d");
 	if((err = register_netdev(netdev))) {
 		DPRINTK(PROBE, ERR, "Cannot register net device, aborting.\n");
-		goto err_out_free;
+		return err;
 	}
 
 	DPRINTK(PROBE, INFO, "addr 0x%llx, irq %d, "
@@ -2695,36 +2685,8 @@ static int __devinit e100_probe(struct p
 		netdev->dev_addr[3], netdev->dev_addr[4], netdev->dev_addr[5]);
 
 	return 0;
-
-err_out_free:
-	e100_free(nic);
-err_out_iounmap:
-	pci_iounmap(pdev, nic->csr);
-err_out_free_res:
-	pci_release_regions(pdev);
-err_out_disable_pdev:
-	pci_disable_device(pdev);
-err_out_free_dev:
-	pci_set_drvdata(pdev, NULL);
-	free_netdev(netdev);
-	return err;
 }
 
-static void __devexit e100_remove(struct pci_dev *pdev)
-{
-	struct net_device *netdev = pci_get_drvdata(pdev);
-
-	if(netdev) {
-		struct nic *nic = netdev_priv(netdev);
-		unregister_netdev(netdev);
-		e100_free(nic);
-		iounmap(nic->csr);
-		free_netdev(netdev);
-		pci_release_regions(pdev);
-		pci_disable_device(pdev);
-		pci_set_drvdata(pdev, NULL);
-	}
-}
 
 #ifdef CONFIG_PM
 static int e100_suspend(struct pci_dev *pdev, pm_message_t state)
@@ -2875,7 +2837,7 @@ static struct pci_driver e100_driver = {
 	.name =         DRV_NAME,
 	.id_table =     e100_id_table,
 	.probe =        e100_probe,
-	.remove =       __devexit_p(e100_remove),
+	.remove =       __devexit_p(netdev_pci_remove_one),
 #ifdef CONFIG_PM
 	/* Power Management hooks */
 	.suspend =      e100_suspend,

-- 

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/

^ permalink raw reply

* [patch 1/4] Update net core to use devres.
From: Brandon Philips @ 2007-08-15 18:59 UTC (permalink / raw)
  To: netdev, e1000-devel; +Cc: teheo, Brandon Philips

[-- Attachment #1: ether-core-devres.patch --]
[-- Type: text/plain, Size: 7581 bytes --]

* netdev_pci_remove_one() can replace simple pci device remove
  functions

* devm_alloc_netdev() is like alloc_netdev but allocates memory using devres.

Signed-off-by: Brandon Philips <bphilips@suse.de>

---
 include/linux/etherdevice.h |    5 ++
 include/linux/netdevice.h   |    7 ++
 net/core/dev.c              |  105 +++++++++++++++++++++++++++++++++++++++-----
 net/ethernet/eth.c          |    8 +++
 4 files changed, 115 insertions(+), 10 deletions(-)

Index: linux-2.6/include/linux/netdevice.h
===================================================================
--- linux-2.6.orig/include/linux/netdevice.h
+++ linux-2.6/include/linux/netdevice.h
@@ -656,6 +656,7 @@ extern int		dev_queue_xmit(struct sk_buf
 extern int		register_netdevice(struct net_device *dev);
 extern void		unregister_netdevice(struct net_device *dev);
 extern void		free_netdev(struct net_device *dev);
+extern void		netdev_pci_remove_one(struct pci_dev *pdev);
 extern void		synchronize_net(void);
 extern int 		register_netdevice_notifier(struct notifier_block *nb);
 extern int		unregister_netdevice_notifier(struct notifier_block *nb);
@@ -1085,8 +1086,14 @@ extern void		ether_setup(struct net_devi
 extern struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
 				       void (*setup)(struct net_device *),
 				       unsigned int queue_count);
+extern struct net_device *devm_alloc_netdev_mq(struct device *dev,
+				       int sizeof_priv, const char *name,
+				       void (*setup)(struct net_device *),
+				       unsigned int queue_count);
 #define alloc_netdev(sizeof_priv, name, setup) \
 	alloc_netdev_mq(sizeof_priv, name, setup, 1)
+#define devm_alloc_netdev(dev, sizeof_priv, name, setup) \
+	devm_alloc_netdev_mq(dev, sizeof_priv, name, setup, 1)
 extern int		register_netdev(struct net_device *dev);
 extern void		unregister_netdev(struct net_device *dev);
 /* Functions used for secondary unicast and multicast support */
Index: linux-2.6/net/core/dev.c
===================================================================
--- linux-2.6.orig/net/core/dev.c
+++ linux-2.6/net/core/dev.c
@@ -89,6 +89,7 @@
 #include <linux/interrupt.h>
 #include <linux/if_ether.h>
 #include <linux/netdevice.h>
+#include <linux/pci.h>
 #include <linux/etherdevice.h>
 #include <linux/notifier.h>
 #include <linux/skbuff.h>
@@ -3658,18 +3659,46 @@ static struct net_device_stats *internal
 }
 
 /**
- *	alloc_netdev_mq - allocate network device
- *	@sizeof_priv:	size of private data to allocate space for
- *	@name:		device name format string
- *	@setup:		callback to initialize device
- *	@queue_count:	the number of subqueues to allocate
+ * devm_free_netdev - wrapper around free_netdev for devres
+ */
+static void devm_free_netdev(struct device *gendev, void *res)
+{
+	struct net_device **dev = (struct net_device **)res;
+	free_netdev(*dev);
+}
+
+/**
+ * register_netdev_devres - register netdev with a managed device
+ * @dev:	devres managed device responsible for the memory
+ * @netdev:		pointer to netdev to be managed
  *
- *	Allocates a struct net_device with private data area for driver use
- *	and performs basic initialization.  Also allocates subquue structs
- *	for each queue on the device at the end of the netdevice.
+ * Registers @netdev to the device @dev and calls free_netdev automatically
+ * when the device disappears
  */
-struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
-		void (*setup)(struct net_device *), unsigned int queue_count)
+static void *register_netdev_devres(struct device *gendev,
+					struct net_device *dev)
+{
+	struct net_device **p;
+
+	p = devres_alloc(devm_free_netdev, sizeof(*p), GFP_KERNEL);
+
+	if (unlikely(!p))
+		return NULL;
+
+	*p = dev;
+	devres_add(gendev, p);
+
+	return dev;
+}
+
+/**
+ *	__alloc_netdev_mq - does the work to allocate a network device
+ *	@dev:		devres managed device responsible for mem.
+ *			NULL if unmanaged
+ */
+struct net_device *__alloc_netdev_mq(struct device *gendev, int sizeof_priv,
+		const char *name, void (*setup)(struct net_device *),
+		unsigned int queue_count)
 {
 	void *p;
 	struct net_device *dev;
@@ -3706,8 +3735,44 @@ struct net_device *alloc_netdev_mq(int s
 	dev->get_stats = internal_stats;
 	setup(dev);
 	strcpy(dev->name, name);
+
+	/* If we are given a device then manage this netdev with devres */
+	if (gendev != NULL)
+		return register_netdev_devres(gendev, dev);
+
 	return dev;
 }
+
+/**
+ *	alloc_netdev_mq - alloc_netdev_mq for devres managed devices
+ *	@dev:		devres managed device responsible for mem.
+ */
+struct net_device *devm_alloc_netdev_mq(struct device *dev, int sizeof_priv,
+				const char *name,
+				void (*setup)(struct net_device *),
+				unsigned int queue_count)
+{
+	return __alloc_netdev_mq(dev, sizeof_priv, name, setup, queue_count);
+}
+EXPORT_SYMBOL(devm_alloc_netdev_mq);
+
+/**
+ *	alloc_netdev_mq - allocate network device
+ *	@sizeof_priv:	size of private data to allocate space for
+ *	@name:		device name format string
+ *	@setup:		callback to initialize device
+ *	@queue_count:	the number of subqueues to allocate
+ *
+ *	Allocates a struct net_device with private data area for driver use
+ *	and performs basic initialization.  Also allocates subquue structs
+ *	for each queue on the device at the end of the netdevice.
+ */
+struct net_device *alloc_netdev_mq(int sizeof_priv, const char *name,
+				void (*setup)(struct net_device *),
+				unsigned int queue_count)
+{
+	return __alloc_netdev_mq(NULL, sizeof_priv, name, setup, queue_count);
+}
 EXPORT_SYMBOL(alloc_netdev_mq);
 
 /**
@@ -3737,6 +3802,26 @@ void free_netdev(struct net_device *dev)
 #endif
 }
 
+#ifdef CONFIG_PCI
+/**
+ * netdev_pci_remove_one - free network device
+ * @pdev: pci_dev of the device to remove
+ *
+ * Simple remove function for pci network devices with no teardown besides
+ * resource deallocation.
+ */
+void netdev_pci_remove_one(struct pci_dev *pdev)
+{
+	struct net_device *netdev = pci_get_drvdata(pdev);
+	if (netdev) {
+		unregister_netdev(netdev);
+		pci_set_drvdata(pdev, NULL);
+	}
+}
+EXPORT_SYMBOL(netdev_pci_remove_one);
+#endif
+
+
 /* Synchronize with packet receive processing. */
 void synchronize_net(void)
 {
Index: linux-2.6/include/linux/etherdevice.h
===================================================================
--- linux-2.6.orig/include/linux/etherdevice.h
+++ linux-2.6/include/linux/etherdevice.h
@@ -40,7 +40,12 @@ extern int		eth_header_cache(struct neig
 					 struct hh_cache *hh);
 
 extern struct net_device *alloc_etherdev_mq(int sizeof_priv, unsigned int queue_count);
+extern struct net_device *devm_alloc_etherdev_mq(struct device *dev,
+					      int sizeof_priv,
+					      unsigned int queue_count);
 #define alloc_etherdev(sizeof_priv) alloc_etherdev_mq(sizeof_priv, 1)
+#define devm_alloc_etherdev(dev, sizeof_priv) \
+	devm_alloc_etherdev_mq(dev, sizeof_priv, 1)
 
 /**
  * is_zero_ether_addr - Determine if give Ethernet address is all zeros.
Index: linux-2.6/net/ethernet/eth.c
===================================================================
--- linux-2.6.orig/net/ethernet/eth.c
+++ linux-2.6/net/ethernet/eth.c
@@ -337,3 +337,11 @@ struct net_device *alloc_etherdev_mq(int
 	return alloc_netdev_mq(sizeof_priv, "eth%d", ether_setup, queue_count);
 }
 EXPORT_SYMBOL(alloc_etherdev_mq);
+
+struct net_device *devm_alloc_etherdev_mq(struct device *dev, int sizeof_priv,
+						unsigned int queue_count)
+{
+	return devm_alloc_netdev_mq(dev, sizeof_priv, "eth%d", ether_setup,
+				    queue_count);
+}
+EXPORT_SYMBOL(devm_alloc_etherdev_mq);

-- 

^ permalink raw reply

* [patch 0/4] Update network drivers to use devres
From: Brandon Philips @ 2007-08-15 18:59 UTC (permalink / raw)
  To: netdev, e1000-devel; +Cc: teheo

These patches update the network driver core, e100 and e1000 driver to use
devres.  Devres is a simple resource manager for device drivers, see
Documentation/driver-model/devres.txt for more information.

The use of devres will continue to be optional with this patch set and can be
applied to drivers where it makes sense.

This series includes all of the feedback from the final RFC.  Thanks :)

^ permalink raw reply

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Paul E. McKenney @ 2007-08-15 18:57 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: horms, Stefan Richter, Satyam Sharma, Linux Kernel Mailing List,
	rpjday, netdev, ak, cfriesen, Heiko Carstens, jesper.juhl,
	linux-arch, Andrew Morton, zlynx, clameter, schwidefsky,
	Chris Snook, Herbert Xu, davem, Linus Torvalds, wensong, wjiang
In-Reply-To: <d4406222873856f472b9173a14a19aa2@kernel.crashing.org>

On Wed, Aug 15, 2007 at 08:31:25PM +0200, Segher Boessenkool wrote:
> >>How does the compiler know that msleep() has got barrier()s?
> >
> >Because msleep_interruptible() is in a separate compilation unit,
> >the compiler has to assume that it might modify any arbitrary global.
> 
> No; compilation units have nothing to do with it, GCC can optimise
> across compilation unit boundaries just fine, if you tell it to
> compile more than one compilation unit at once.

Last I checked, the Linux kernel build system did compile each .c file
as a separate compilation unit.

> What you probably mean is that the compiler has to assume any code
> it cannot currently see can do anything (insofar as allowed by the
> relevant standards etc.)

Indeed.

> >In many cases, the compiler also has to assume that 
> >msleep_interruptible()
> >might call back into a function in the current compilation unit, thus
> >possibly modifying global static variables.
> 
> It most often is smart enough to see what compilation-unit-local
> variables might be modified that way, though :-)

Yep.  For example, if it knows the current value of a given such local
variable, and if all code paths that would change some other variable
cannot be reached given that current value of the first variable.
At least given that gcc doesn't know about multiple threads of execution!

							Thanx, Paul

^ permalink raw reply

* Re: [PATCH 6/24] make atomic_read() behave consistently on frv
From: Segher Boessenkool @ 2007-08-15 18:51 UTC (permalink / raw)
  To: Nick Piggin
  Cc: heiko.carstens, horms, linux-kernel, rpjday, ak, netdev, cfriesen,
	akpm, torvalds, jesper.juhl, linux-arch, zlynx, schwidefsky,
	Chris Snook, davem, wensong, wjiang, David Howells
In-Reply-To: <46C140DD.3060509@yahoo.com.au>

> Well if there is only one memory location involved, then smp_rmb() 
> isn't
> going to really do anything anyway, so it would be incorrect to use it.

rmb() orders *any* two reads; that includes two reads from the same
location.

> Consider that smp_rmb basically will do anything from flushing the
> pipeline to invalidating loads speculatively executed out of order. 
> AFAIK
> it will not control the visibility of stores coming from other CPUs 
> (that
> is up to the cache coherency).

The writer side should typically use wmb() whenever the reader side
uses rmb(), sure.


Segher


^ permalink raw reply

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Paul E. McKenney @ 2007-08-15 18:45 UTC (permalink / raw)
  To: David Howells
  Cc: Herbert Xu, Satyam Sharma, Stefan Richter, Christoph Lameter,
	Chris Snook, Linux Kernel Mailing List, linux-arch,
	Linus Torvalds, netdev, Andrew Morton, ak, heiko.carstens, davem,
	schwidefsky, wensong, horms, wjiang, cfriesen, zlynx, rpjday,
	jesper.juhl, segher
In-Reply-To: <4413.1187201997@redhat.com>

On Wed, Aug 15, 2007 at 07:19:57PM +0100, David Howells wrote:
> Herbert Xu <herbert@gondor.apana.org.au> wrote:
> 
> > Let's turn this around.  Can you give a single example where
> > the volatile semantics is needed in a legitimate way?
> 
> Accessing H/W registers?  But apart from that...

Communicating between process context and interrupt/NMI handlers using
per-CPU variables.

							Thanx, Paul

^ permalink raw reply

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Segher Boessenkool @ 2007-08-15 18:31 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Christoph Lameter, heiko.carstens, horms, Stefan Richter,
	Linux Kernel Mailing List, Paul E. McKenney, netdev, ak, cfriesen,
	rpjday, jesper.juhl, linux-arch, Andrew Morton, zlynx,
	schwidefsky, Chris Snook, Herbert Xu, davem, Linus Torvalds,
	wensong, wjiang
In-Reply-To: <alpine.LFD.0.999.0708151713080.16414@enigma.security.iitk.ac.in>

> "Volatile behaviour" itself isn't consistently defined (at least
> definitely not consistently implemented in various gcc versions across
> platforms),

It should be consistent across platforms; if not, file a bug please.

> but it is /expected/ to mean something like: "ensure that
> every such access actually goes all the way to memory, and is not
> re-ordered w.r.t. to other accesses, as far as the compiler can take
> care of these". The last "as far as compiler can take care" disclaimer
> comes about due to CPUs doing their own re-ordering nowadays.

You can *expect* whatever you want, but this isn't in line with
reality at all.

volatile _does not_ make accesses go all the way to memory.
volatile _does not_ prevent reordering wrt other accesses.

What volatile does are a) never optimise away a read (or write)
to the object, since the data can change in ways the compiler
cannot see; and b) never move stores to the object across a
sequence point.  This does not mean other accesses cannot be
reordered wrt the volatile access.

If the abstract machine would do an access to a volatile-
qualified object, the generated machine code will do that
access too.  But, for example, it can still be optimised
away by the compiler, if it can prove it is allowed to.

If you want stuff to go all the way to memory, you need some
architecture-specific flush sequence; to make a store globally
visible before another store, you need mb(); before some following
read, you need mb(); to prevent reordering you need a barrier.


Segher


^ permalink raw reply

* Re: [PATCH 1/1] bonding: eliminate RTNL assertion spew
From: Mike Snitzer @ 2007-08-15 18:36 UTC (permalink / raw)
  To: Andy Gospodarek; +Cc: Stephen Hemminger, Jeff Garzik, fubar, netdev
In-Reply-To: <bdfc5d6e0708142027i669347aasa935bb277b2bbb48@mail.gmail.com>

I'd very much like to help out.  The "rtnl assertion spew" isn't
instilling confidence in customers I've been working with.  If you'd
like to send me patches in private I'd help test them ASAP.

Could you elaborate on the associated risk of _not_ fixing these
issues?  balance-alb _seems_ to be working even though these traces
occur on initialization.  But these rtnl traces are clearly more
generic than balance-alb.

regards,
Mike

On 8/14/07, Andy Gospodarek <andy@greyhouse.net> wrote:
> On 8/14/07, Mike Snitzer <snitzer@gmail.com> wrote:
> > Andy,
> >
> > Is there an updated version of this patch?
> >
> > Please advise, thanks.
> >
> >
>
> Mike,
>
> There is a version that Jay and I have been testing and if you would
> like to help out, we could probably send you some patches.
>
> Jay has split the entire patch into smaller chunks, and we hope to
> post the entire thing quite soon (days not months).
>
> -andy
>

^ permalink raw reply

* Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: David Howells @ 2007-08-15 18:19 UTC (permalink / raw)
  To: Herbert Xu
  Cc: dhowells, Paul E. McKenney, Satyam Sharma, Stefan Richter,
	Christoph Lameter, Chris Snook, Linux Kernel Mailing List,
	linux-arch, Linus Torvalds, netdev, Andrew Morton, ak,
	heiko.carstens, davem, schwidefsky, wensong, horms, wjiang,
	cfriesen, zlynx, rpjday, jesper.juhl, segher
In-Reply-To: <20070815153335.GA23593@gondor.apana.org.au>

Herbert Xu <herbert@gondor.apana.org.au> wrote:

> Let's turn this around.  Can you give a single example where
> the volatile semantics is needed in a legitimate way?

Accessing H/W registers?  But apart from that...

David

^ permalink raw reply

* RE: Potential u32 classifier bug.
From: Waskiewicz Jr, Peter P @ 2007-08-15 18:16 UTC (permalink / raw)
  To: Thomas Graf; +Cc: netdev
In-Reply-To: <20070815181139.GD32236@postel.suug.ch>

> * Waskiewicz Jr, Peter P <peter.p.waskiewicz.jr@intel.com> 
> 2007-08-15 11:02
> > > There is this very horrible way of using the u32 
> classifier with a 
> > > negative offset to look into the ethernet header.
> > 
> > Based on this, it sounds like u32 using protocol 802_3 is broken?
> 
> You might be expecting too much from u32. The protocol given 
> to u32 is just a filter, it doesn't imply anything beyond that.
> u32 has its usage the way it is, that's way we've added an 
> ematch rather than extending u32 itself.

Ok, that clarifies it a bit.  I've just found a few examples on the net,
one of which is in a TC filter manual
(http://tcn.hypert.net/tcmanual.pdf, section 2.2.1.3 at the bottom of
the section), that was using u32 to simply filter on dest MAC address
without anything elaborate.  Either it worked way back when, or it was a
bogus example.

Thanks again Thomas,

-PJ

^ permalink raw reply

* Re: Potential u32 classifier bug.
From: Thomas Graf @ 2007-08-15 18:11 UTC (permalink / raw)
  To: Waskiewicz Jr, Peter P; +Cc: netdev
In-Reply-To: <D5C1322C3E673F459512FB59E0DDC329036D1BA7@orsmsx414.amr.corp.intel.com>

* Waskiewicz Jr, Peter P <peter.p.waskiewicz.jr@intel.com> 2007-08-15 11:02
> > There is this very horrible way of using the u32 classifier 
> > with a negative offset to look into the ethernet header.
> 
> Based on this, it sounds like u32 using protocol 802_3 is broken?

You might be expecting too much from u32. The protocol given
to u32 is just a filter, it doesn't imply anything beyond that.
u32 has its usage the way it is, that's way we've added an ematch
rather than extending u32 itself.

^ permalink raw reply

* RE: Potential u32 classifier bug.
From: Waskiewicz Jr, Peter P @ 2007-08-15 18:02 UTC (permalink / raw)
  To: Thomas Graf; +Cc: netdev
In-Reply-To: <20070815175758.GC32236@postel.suug.ch>

> * Waskiewicz Jr, Peter P <peter.p.waskiewicz.jr@intel.com> 
> 2007-08-09 18:07
> > My big question is: Has anyone recently used the 802_3 
> protocol in tc 
> > with u32 and actually gotten it to work?  I can't see how the
> > u32_classify() code can look at the mac header, since it is 
> using the 
> > network header accessor to start looking.  I think this is an issue 
> > with the classification code, but I'm looking to see if I'm doing 
> > something stupid before I really start digging into this mess.
> 
> There is this very horrible way of using the u32 classifier 
> with a negative offset to look into the ethernet header.

Based on this, it sounds like u32 using protocol 802_3 is broken?

> You might want to look into the cmp ematch which can be 
> attached to almost any classifier. It allows basing offsets 
> on any layer thus making ethernet header filtering trivial.

I'll look at this.  Thanks Thomas for the response!

-PJ

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox