* [PATCH 0/24] make atomic_read() behave consistently across all architectures
From: Chris Snook @ 2007-08-09 13:14 UTC (permalink / raw)
To: linux-kernel, linux-arch, torvalds
Cc: netdev, akpm, ak, heiko.carstens, davem, schwidefsky, wensong,
horms, wjiang, cfriesen, zlynx, rpjday, jesper.juhl
As recent discussions[1], and bugs[2] have shown, there is a great deal of
confusion about the expected behavior of atomic_read(), compounded by the
fact that it is not the same on all architectures. Since users expect calls
to atomic_read() to actually perform a read, it is not desirable to allow
the compiler to optimize this away. Requiring the use of barrier() in this
case is inefficient, since we only want to re-load the atomic_t variable,
not everything else in scope.
This patchset makes the behavior of atomic_read uniform by removing the
volatile keyword from all atomic_t and atomic64_t definitions that currently
have it, and instead explicitly casts the variable as volatile in
atomic_read(). This leaves little room for creative optimization by the
compiler, and is in keeping with the principles behind "volatile considered
harmful".
Busy-waiters should still use cpu_relax(), but fast paths may be able to
reduce their use of barrier() between some atomic_read() calls.
-- Chris
1) http://lkml.org/lkml/2007/7/1/52
2) http://lkml.org/lkml/2007/8/8/122
^ permalink raw reply
* [PATCH] Fix the SSB dependency hell
From: Michael Buesch @ 2007-08-09 13:04 UTC (permalink / raw)
To: John Linville; +Cc: netdev, linux-wireless, Andrew Morton
This fixes the SSB dependency hell and introduces some
fake-options that only give some advice on what to select.
We live with these fake options only until menuconfig is able
to tell more about needed dependencies and how to resolve them.
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Index: wireless-dev/drivers/net/Kconfig
===================================================================
--- wireless-dev.orig/drivers/net/Kconfig 2007-08-09 14:47:01.000000000 +0200
+++ wireless-dev/drivers/net/Kconfig 2007-08-09 14:47:40.000000000 +0200
@@ -1452,10 +1452,31 @@ config APRICOT
<file:Documentation/networking/net-modules.txt>. The module will be
called apricot.
+config B44_DEP_HACK
+ bool
+ depends on SSB && SSB_PCIHOST && SSB_DRIVER_PCICORE
+ default y
+
+config B44_ADVICE_HACK
+ bool "B44 for PCI not available. Read the help text of this option!"
+ depends on !B44_DEP_HACK
+ ---help---
+ The Broadcom 440x/47xx driver for PCI devices can not be enabled,
+ because the required dependencies are not selected.
+
+ In order to be able to select the Broadcom 440x/47xx PCI driver, you
+ need to enable the following options first:
+
+ CONFIG_SSB found in menu:
+ Device Drivers/Sonics Silicon Backplane/Sonics Silicon Backplane support
+ CONFIG_SSB_PCIHOST found in menu:
+ Device Drivers/Sonics Silicon Backplane/Support for SSB on PCI-bus host
+ CONFIG_SSB_DRIVER_PCICORE found in menu:
+ Device Drivers/Sonics Silicon Backplane/SSB PCI core driver
+
config B44
tristate "Broadcom 440x/47xx ethernet support"
- depends on HAS_IOMEM
- select SSB
+ depends on SSB
select MII
help
If you have a network (Ethernet) controller of this type, say Y
@@ -1473,9 +1494,7 @@ config B44
config B44_PCI
bool "Broadcom 440x PCI device support"
- depends on B44 && NET_PCI
- select SSB_PCIHOST
- select SSB_DRIVER_PCICORE
+ depends on B44 && SSB_PCIHOST && SSB_DRIVER_PCICORE && NET_PCI
default y
help
Support for Broadcom 440x PCI devices.
Index: wireless-dev/drivers/net/wireless/bcm43xx-mac80211/Kconfig
===================================================================
--- wireless-dev.orig/drivers/net/wireless/bcm43xx-mac80211/Kconfig 2007-08-09 14:47:01.000000000 +0200
+++ wireless-dev/drivers/net/wireless/bcm43xx-mac80211/Kconfig 2007-08-09 14:59:39.000000000 +0200
@@ -1,8 +1,29 @@
+config BCM43XX_DEP_HACK
+ bool
+ depends on SSB && SSB_PCIHOST && SSB_DRIVER_PCICORE
+ default y
+
+config BCM43XX_ADVICE_HACK
+ bool "BCM43xx PCI (mac80211) not available. Read the help text of this option!"
+ depends on !BCM43XX_DEP_HACK
+ ---help---
+ The BCM43xx driver for BCM43xx PCI devices can not be enabled,
+ because the required dependencies are not selected.
+
+ In order to be able to select the BCM43xx-mac80211 driver, you
+ need to enable the following options first:
+
+ CONFIG_SSB found in menu:
+ Device Drivers/Sonics Silicon Backplane/Sonics Silicon Backplane support
+ CONFIG_SSB_PCIHOST found in menu:
+ Device Drivers/Sonics Silicon Backplane/Support for SSB on PCI-bus host
+ CONFIG_SSB_DRIVER_PCICORE found in menu:
+ Device Drivers/Sonics Silicon Backplane/SSB PCI core driver
+
config BCM43XX_MAC80211
tristate "Broadcom BCM43xx wireless support (mac80211 stack)"
- depends on MAC80211 && WLAN_80211 && EXPERIMENTAL
+ depends on SSB && MAC80211 && WLAN_80211 && EXPERIMENTAL
select FW_LOADER
- select SSB
select HW_RANDOM
---help---
This is an experimental driver for the Broadcom 43xx wireless chip,
@@ -10,9 +31,7 @@ config BCM43XX_MAC80211
config BCM43XX_MAC80211_PCI
bool "BCM43xx PCI device support"
- depends on BCM43XX_MAC80211 && PCI
- select SSB_PCIHOST
- select SSB_DRIVER_PCICORE
+ depends on BCM43XX_MAC80211 && SSB_PCIHOST && SSB_DRIVER_PCICORE
default y
---help---
Broadcom 43xx PCI device support.
@@ -24,8 +43,7 @@ config BCM43XX_MAC80211_PCI
config BCM43XX_MAC80211_PCMCIA
bool "BCM43xx PCMCIA device support"
- depends on BCM43XX_MAC80211 && PCMCIA
- select SSB_PCMCIAHOST
+ depends on BCM43XX_MAC80211 && SSB_PCMCIAHOST
---help---
Broadcom 43xx PCMCIA device support.
@@ -43,14 +61,13 @@ config BCM43XX_MAC80211_PCMCIA
If unsure, say N.
config BCM43XX_MAC80211_DEBUG
- bool "Broadcom BCM43xx debugging (RECOMMENDED)"
+ bool "Broadcom BCM43xx debugging"
depends on BCM43XX_MAC80211
- select SSB_DEBUG if !SSB_SILENT
- default y
---help---
Broadcom 43xx debugging messages.
- Say Y, because the driver is still very experimental and
- this will help you get it running.
+
+ Say Y, if you want to find out why the driver does not
+ work for you.
config BCM43XX_MAC80211_DMA
bool
Index: wireless-dev/drivers/ssb/Kconfig
===================================================================
--- wireless-dev.orig/drivers/ssb/Kconfig 2007-08-09 14:47:01.000000000 +0200
+++ wireless-dev/drivers/ssb/Kconfig 2007-08-09 14:51:09.000000000 +0200
@@ -2,7 +2,7 @@ menu "Sonics Silicon Backplane"
config SSB
tristate "Sonics Silicon Backplane support"
- depends on EXPERIMENTAL && HAS_IOMEM
+ depends on HAS_IOMEM
help
Support for the Sonics Silicon Backplane bus
@@ -21,8 +21,8 @@ config SSB_PCIHOST
If unsure, say Y
config SSB_PCMCIAHOST
- bool "Support for SSB on PCMCIA-bus host"
- depends on SSB && PCMCIA
+ bool "Support for SSB on PCMCIA-bus host (EXPERIMENTAL)"
+ depends on SSB && PCMCIA && EXPERIMENTAL
help
Support for a Sonics Silicon Backplane on top
of a PCMCIA device.
@@ -65,14 +65,14 @@ config SSB_DRIVER_PCICORE
If unsure, say Y
config SSB_PCICORE_HOSTMODE
- bool "Hostmode support for SSB PCI core"
- depends on SSB_DRIVER_PCICORE && SSB_DRIVER_MIPS
+ bool "Hostmode support for SSB PCI core (EXPERIMENTAL)"
+ depends on SSB_DRIVER_PCICORE && SSB_DRIVER_MIPS && EXPERIMENTAL
help
PCIcore hostmode operation (external PCI bus).
config SSB_DRIVER_MIPS
- bool "SSB Broadcom MIPS core driver"
- depends on SSB && MIPS
+ bool "SSB Broadcom MIPS core driver (EXPERIMENTAL)"
+ depends on SSB && MIPS && EXPERIMENTAL
select SSB_SERIAL
help
Driver for the Sonics Silicon Backplane attached
@@ -81,8 +81,8 @@ config SSB_DRIVER_MIPS
If unsure, say N
config SSB_DRIVER_EXTIF
- bool "SSB Broadcom EXTIF core driver"
- depends on SSB_DRIVER_MIPS
+ bool "SSB Broadcom EXTIF core driver (EXPERIMENTAL)"
+ depends on SSB_DRIVER_MIPS && EXPERIMENTAL
help
Driver for the Sonics Silicon Backplane attached
Broadcom EXTIF core.
^ permalink raw reply
* Re: [patch] ipvs: force read of atomic_t in while loop
From: Martin Schwidefsky @ 2007-08-09 12:49 UTC (permalink / raw)
To: Chris Snook
Cc: Michael Buesch, Andi Kleen, Heiko Carstens, David Miller, akpm,
linux-kernel, netdev, wensong, horms, torvalds
In-Reply-To: <46BB0B4B.4070300@redhat.com>
On Thu, 2007-08-09 at 08:40 -0400, Chris Snook wrote:
> > #define reload_var(x) __asm__ __volatile__ (whatever, x)
> >
> > I don't know inline assembly that much, but isn't it possible
> > with that to kind of "fake-touch" the variable, so the compiler
> > must reload it (and only it) to make sure it's up to date?
>
> We can do it in C, like this:
>
> -#define atomic_read(v) ((v)->counter)
> +#define atomic_read(v) (*(volatile int *)&(v)->counter)
>
> By casting it volatile at the precise piece of code where we want to
> guarantee a read from memory, there's little risk of the compiler
> getting creative in its interpretation of the code.
To answer the inline assembler question:
asm volatile ("" : "=m" (counter)) : "m" (counter) )
will force the compiler to reload the value from memory. But the cast to
(volatile int *) is even better.
--
blue skies,
Martin.
"Reality continues to ruin my life." - Calvin.
^ permalink raw reply
* Re: [patch] ipvs: force read of atomic_t in while loop
From: Chris Snook @ 2007-08-09 12:40 UTC (permalink / raw)
To: Michael Buesch
Cc: Andi Kleen, Heiko Carstens, David Miller, akpm, linux-kernel,
netdev, schwidefsky, wensong, horms, torvalds
In-Reply-To: <200708091435.18595.mb@bu3sch.de>
Michael Buesch wrote:
> On Thursday 09 August 2007 02:15:33 Andi Kleen wrote:
>> On Wed, Aug 08, 2007 at 05:08:44PM -0400, Chris Snook wrote:
>>> Heiko Carstens wrote:
>>>> On Wed, Aug 08, 2007 at 03:21:31AM -0700, David Miller wrote:
>>>>> From: Heiko Carstens <heiko.carstens@de.ibm.com>
>>>>> Date: Wed, 8 Aug 2007 11:33:00 +0200
>>>>>
>>>>>> Just saw this while grepping for atomic_reads in a while loops.
>>>>>> Maybe we should re-add the volatile to atomic_t. Not sure.
>>>>> I think whatever the choice, it should be done consistently
>>>>> on every architecture.
>>>>>
>>>>> It's just asking for trouble if your arch does it differently from
>>>>> every other.
>>>> Well..currently it's i386/x86_64 and s390 which have no volatile
>>>> in atomic_t. And yes, of course I agree it should be consistent
>>>> across all architectures. But it isn't.
>>> Based on recent discussion, it's pretty clear that there's a lot of
>>> confusion about this. A lot of people (myself included, until I thought
>>> about it long and hard) will reasonably assume that calling
>>> atomic_read() will actually read the value from memory. Leaving out the
>>> volatile declaration seems like a pessimization to me. If you force
>>> people to use barrier() everywhere they're working with atomic_t, it
>>> will force re-reads of all the non-atomic data in use as well, which
>>> will cause more memory fetches of things that generally don't need
>>> barrier(). That and it's a bug waiting to happen.
>>>
>>> Andi -- your thoughts on the matter?
>> I also think readding volatile makes sense. An alternative would be
>> to stick an rmb() into atomic_read() -- that would also stop speculative reads.
>> Disadvantage is that it clobbers all memory, not just the specific value.
>>
>> But you really have to complain to Linus (cc'ed). He came up
>> with the volatile removale change iirc.
>
> Isn't it possible through some inline assembly trick
> that only a certain variable has to be reloaded?
> So we could define something like that:
>
> #define reload_var(x) __asm__ __volatile__ (whatever, x)
>
> I don't know inline assembly that much, but isn't it possible
> with that to kind of "fake-touch" the variable, so the compiler
> must reload it (and only it) to make sure it's up to date?
We can do it in C, like this:
-#define atomic_read(v) ((v)->counter)
+#define atomic_read(v) (*(volatile int *)&(v)->counter)
By casting it volatile at the precise piece of code where we want to guarantee a
read from memory, there's little risk of the compiler getting creative in its
interpretation of the code.
Stay tuned for the patch set...
-- Chris
^ permalink raw reply
* Re: [patch] ipvs: force read of atomic_t in while loop
From: Michael Buesch @ 2007-08-09 12:35 UTC (permalink / raw)
To: Andi Kleen
Cc: Chris Snook, Heiko Carstens, David Miller, akpm, linux-kernel,
netdev, schwidefsky, wensong, horms, torvalds
In-Reply-To: <20070809001533.GA17798@one.firstfloor.org>
On Thursday 09 August 2007 02:15:33 Andi Kleen wrote:
> On Wed, Aug 08, 2007 at 05:08:44PM -0400, Chris Snook wrote:
> > Heiko Carstens wrote:
> > >On Wed, Aug 08, 2007 at 03:21:31AM -0700, David Miller wrote:
> > >>From: Heiko Carstens <heiko.carstens@de.ibm.com>
> > >>Date: Wed, 8 Aug 2007 11:33:00 +0200
> > >>
> > >>>Just saw this while grepping for atomic_reads in a while loops.
> > >>>Maybe we should re-add the volatile to atomic_t. Not sure.
> > >>I think whatever the choice, it should be done consistently
> > >>on every architecture.
> > >>
> > >>It's just asking for trouble if your arch does it differently from
> > >>every other.
> > >
> > >Well..currently it's i386/x86_64 and s390 which have no volatile
> > >in atomic_t. And yes, of course I agree it should be consistent
> > >across all architectures. But it isn't.
> >
> > Based on recent discussion, it's pretty clear that there's a lot of
> > confusion about this. A lot of people (myself included, until I thought
> > about it long and hard) will reasonably assume that calling
> > atomic_read() will actually read the value from memory. Leaving out the
> > volatile declaration seems like a pessimization to me. If you force
> > people to use barrier() everywhere they're working with atomic_t, it
> > will force re-reads of all the non-atomic data in use as well, which
> > will cause more memory fetches of things that generally don't need
> > barrier(). That and it's a bug waiting to happen.
> >
> > Andi -- your thoughts on the matter?
>
> I also think readding volatile makes sense. An alternative would be
> to stick an rmb() into atomic_read() -- that would also stop speculative reads.
> Disadvantage is that it clobbers all memory, not just the specific value.
>
> But you really have to complain to Linus (cc'ed). He came up
> with the volatile removale change iirc.
Isn't it possible through some inline assembly trick
that only a certain variable has to be reloaded?
So we could define something like that:
#define reload_var(x) __asm__ __volatile__ (whatever, x)
I don't know inline assembly that much, but isn't it possible
with that to kind of "fake-touch" the variable, so the compiler
must reload it (and only it) to make sure it's up to date?
--
Greetings Michael.
^ permalink raw reply
* Re: [PATCH 1/1] NFS: change the ip_map cache code to handle IPv6 addresses
From: Chuck Lever @ 2007-08-09 12:16 UTC (permalink / raw)
To: Aurélien Charbon; +Cc: Mailing list NFSv4, netdev ML, Neil Brown
In-Reply-To: <46BAC0B9.1020207@ext.bull.net>
[-- Attachment #1: Type: text/plain, Size: 6312 bytes --]
Aurélien Charbon wrote:
> Here is a small part of missing pieces of IPv6 support for the server.
> It deals with the ip_map caching code part.
>
> It changes the ip_map structure to be able to store INET6 addresses.
> It adds also the changes in address hashing, and mapping to test it with
> INET addresses.
Thanks for posting your patch.
Your strategy is to convert all incoming IPv4 addresses in the ip_map
cache to IPv6 addresses, and use only IPv6 internally (often suggested
by IPv6 books I've encountered). For NFS, that is problematic because
these addresses are used as the target of access control rules for
exports; thus sys admins will expect to see IPv4 addresses in the output
of NFS utilities if they specified IPv4 addresses in their /etc/exports
file, for example.
Some naive questions:
1. If IPv6 support is not configured into the kernel, how does an
IPv6-only cache work?
2. I seem to recall (only quite vaguely) that at some point the server
might need to use one of the stored addresses to, say, open a socket to
the client? In that case, on a system with NICs configured only with
IPv4, is the cached IPv6 address properly converted back to IPv4
somehow? Can the cache code tell the difference between a cached IPv6
address that was converted from IPv4 and one that was added to the cache
as IPv6? Sorry I can't remember more clearly.
3. Would it be better to make the m_addr field a struct sockaddr, store
a whole address (with address family), and switch on the sa_family field?
> diff -u -r -N linux-2.6.23-rc1/fs/nfsd/export.c
> linux-2.6.23-rc1-IPv6-ip_map/fs/nfsd/export.c
> --- linux-2.6.23-rc1/fs/nfsd/export.c 2007-08-08 17:52:58.000000000 +0200
> +++ linux-2.6.23-rc1-IPv6-ip_map/fs/nfsd/export.c 2007-08-08
> 17:49:09.000000000 +0200
> @@ -1558,6 +1558,7 @@
> {
> struct auth_domain *dom;
> int i, err;
> + struct in6_addr addr6;
>
> /* First, consistency check. */
> err = -EINVAL;
> @@ -1576,9 +1577,14 @@
> goto out_unlock;
>
> /* Insert client into hashtable. */
> - for (i = 0; i < ncp->cl_naddr; i++)
> - auth_unix_add_addr(ncp->cl_addrlist[i], dom);
> -
> + for (i = 0; i < ncp->cl_naddr; i++) {
> + /* Mapping address */
> + addr6.s6_addr32[0] = 0;
> + addr6.s6_addr32[1] = 0;
> + addr6.s6_addr32[2] = htonl(0xffff);
> + addr6.s6_addr32[3] = (uint32_t)ncp->cl_addrlist[i].s_addr;
> + auth_unix_add_addr(addr6, dom);
> + }
> auth_unix_forget_old(dom);
> auth_domain_put(dom);
This converts IPv4 addresses to canonical IPv6 as it stores them. What
happens if a full-blown IPv6 address is encountered? Likewise, in nfsctl.c?
> @@ -112,12 +112,16 @@
> return (hash ^ (hash>>8)) & 0xff;
> }
> #endif
> +static inline int hash_ip6(struct in6_addr ip)
> +{
> + return (hash_ip(ip.s6_addr32[0]) ^ hash_ip(ip.s6_addr32[1]) ^
> hash_ip(ip.s6_addr32[2]) ^ hash_ip(ip.s6_addr32[3])) ;
> +}
How have you tested the effectiveness of the new hash function?
> @@ -151,20 +155,28 @@
> {
> char text_addr[20];
> struct ip_map *im = container_of(h, struct ip_map, h);
> - __be32 addr = im->m_addr.s_addr;
> +
> + __be32 addr[4];
> + int i;
> + for (i=0;i<4;i++)
> + addr[i] = im->m_addr.s6_addr[i];
>
> - snprintf(text_addr, 20, "%u.%u.%u.%u",
> - ntohl(addr) >> 24 & 0xff,
> - ntohl(addr) >> 16 & 0xff,
> - ntohl(addr) >> 8 & 0xff,
> - ntohl(addr) >> 0 & 0xff);
> + snprintf(text_addr, 20, "%04x:%04x:%04x:%04x:%04x:%04x:%04x:%04x",
> + ntohl(addr[3]) >> 16 & 0xff,
> + ntohl(addr[3]) >> 0 & 0xff,
> + ntohl(addr[2]) >> 16 & 0xff,
> + ntohl(addr[2]) >> 0 & 0xff,
> + ntohl(addr[1]) >> 16 & 0xff,
> + ntohl(addr[1]) >> 0 & 0xff,
> + ntohl(addr[0]) >> 16 & 0xff,
> + ntohl(addr[0]) >> 0 & 0xff);
The snprintf() format strings should use NIP6_FMT.
> @@ -197,8 +209,21 @@
> len = qword_get(&mesg, buf, mlen);
> if (len <= 0) return -EINVAL;
>
> - if (sscanf(buf, "%u.%u.%u.%u%c", &b1, &b2, &b3, &b4, &c) != 4)
> - return -EINVAL;
> + if (sscanf(buf, "%d.%d.%d.%d%c", &b1, &b2, &b3, &b4, &c) == 4) {
> + addr.s6_addr32[0] = 0;
> + addr.s6_addr32[1] = 0;
> + addr.s6_addr32[2] = htonl(0xffff);
> + addr.s6_addr32[3] =
> + htonl((((((b1<<8)|b2)<<8)|b3)<<8)|b4);
> + } else if (sscanf(buf, "%04x:%04x:%04x:%04x:%04x:%04x:%04x:%04x%c",
> + &b1, &b2, &b3, &b4, &b5, &b6, &b7, &b8, &c) ==
> 8) {
> + addr.s6_addr32[0] = htonl((b1<<16)|b2);
> + addr.s6_addr32[1] = htonl((b3<<16)|b4);
> + addr.s6_addr32[2] = htonl((b5<<16)|b6);
> + addr.s6_addr32[3] = htonl((b7<<16)|b8);
> + } else
> + return -EINVAL;
> +
Likewise, the sscanf() format strings should use NIP6_FMT.
> @@ -247,18 +269,22 @@
> }
> im = container_of(h, struct ip_map, h);
> /* class addr domain */
> - addr = im->m_addr;
> + memcpy(&addr, &im->m_addr, sizeof(struct in6_addr));
>
> if (test_bit(CACHE_VALID, &h->flags) &&
> !test_bit(CACHE_NEGATIVE, &h->flags))
> dom = im->m_client->h.name;
>
> - seq_printf(m, "%s %d.%d.%d.%d %s\n",
> + seq_printf(m, "%s %04x.%04x.%04x.%04x.%04x.%04x.%04x.%04x %s\n",
> im->m_class,
> - ntohl(addr.s_addr) >> 24 & 0xff,
> - ntohl(addr.s_addr) >> 16 & 0xff,
> - ntohl(addr.s_addr) >> 8 & 0xff,
> - ntohl(addr.s_addr) >> 0 & 0xff,
> + ntohl(addr.s6_addr32[3]) >> 16 & 0xffff,
> + ntohl(addr.s6_addr32[3]) & 0xffff,
> + ntohl(addr.s6_addr32[2]) >> 16 & 0xffff,
> + ntohl(addr.s6_addr32[2]) & 0xffff,
> + ntohl(addr.s6_addr32[1]) >> 16 & 0xffff,
> + ntohl(addr.s6_addr32[1]) & 0xffff,
> + ntohl(addr.s6_addr32[0]) >> 16 & 0xffff,
> + ntohl(addr.s6_addr32[0]) & 0xffff,
> dom
> );
> return 0;
And I think here NIP6_FMT should be used, but you're not using colons
between the hex digits. Was that intentional?
[-- Attachment #2: chuck.lever.vcf --]
[-- Type: text/x-vcard, Size: 302 bytes --]
begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
url:http://oss.oracle.com/~cel
version:2.1
end:vcard
^ permalink raw reply
* Re: [PATCH] make atomic_t volatile on all architectures
From: Chris Snook @ 2007-08-09 11:44 UTC (permalink / raw)
To: Herbert Xu
Cc: akpm, torvalds, ak, heiko.carstens, davem, linux-kernel, netdev,
schwidefsky, wensong, horms, wjiang, cfriesen, zlynx
In-Reply-To: <20070809083055.GA19048@gondor.apana.org.au>
Herbert Xu wrote:
> On Thu, Aug 09, 2007 at 03:47:57AM -0400, Chris Snook wrote:
>> If they're not doing anything, sure. Plenty of loops actually do some sort
>> of real work while waiting for their halt condition, possibly even work
>> which is necessary for their halt condition to occur, and you definitely
>> don't want to be doing cpu_relax() in this case. On register-rich
>> architectures you can do quite a lot of work without needing to reuse the
>> register containing the result of the atomic_read(). Those are precisely
>> the architectures where barrier() hurts the most.
>
> I have a problem with this argument. The same loop could be
> using a non-atomic as long as the updaters are serialised. Would
> you suggest that we turn such non-atomics into volatiles too?
No. I'm simply saying that when people call atomic_read, they expect a read to
occur. When this doesn't happen, people get confused. Merely referencing a
variable doesn't carry the same connotation.
Anyway, I'm taking Linus's advice and putting the cast in atomic_read(), rather
than the counter declaration itself. Everything else uses __asm__ __volatile__,
or calls atomic_read() with interrupts disabled. This ensures that
atomic_read() works as expected across all architectures, without the cruft the
compiler generates when you declare the variable itself volatile.
> Any loop that's waiting for an external halt condition either
> has to schedule away (which is a barrier) or you'd be busy
> waiting in which case you should use cpu_relax.
Not necessarily. Some code uses atomic_t for a sort of lightweight semaphore.
If your loop is actually doing real work, perhaps in a softirq handler
negotiating shared resources with a hard irq handler, you're not busy-waiting.
> Do you have an example where this isn't the case?
a) No, and that's sort of the point. We shouldn't have to audit the whole
kernel to find cases where a misleadingly-named function is doing what its users
are not expecting. If we can make it always do the right thing without any
substantial drawbacks, we should.
b) Loops are just one case, which came to mind because of the IPVS bug recently
discussed. I recall seeing some scheduler code recently which does an
atomic_read() twice on the same variable, with a barrier in between. It's in a
very hot path, so if we can remove that barrier, we save a bunch of register
loads. When you're context switching every microsecond in SCHED_RR, that really
matters.
-- Chris
^ permalink raw reply
* Re: net-2.6.24 GIT state
From: David Miller @ 2007-08-09 11:10 UTC (permalink / raw)
To: ilpo.jarvinen; +Cc: netdev
In-Reply-To: <Pine.LNX.4.64.0708091233050.24369@kivilampi-30.cs.helsinki.fi>
From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Thu, 9 Aug 2007 12:50:51 +0300 (EEST)
> At least these are trivial to take (after rebase, some come nicely even
> from the before tree):
>
> 40564051bdb237231e625ba674fdedf6e8373027 [TCP]: Remove num_acked>0 checks from cong.ctrl mods pkts_acked
> 4c035cd78a6b60710b0dda4f62339df070f761c8 [TCP]: Add tcp_dec_pcount_approx int variant
> ed2753b0b463df41c10121ce494d51428047fcbc [TCP]: Move code from tcp_ecn.h to tcp*.c and tcp.h & remove it
> c215314c6ea0261865d3df09f375016f3dcadeba [TCP]: Access to highest_sack obsoletes forward_cnt_hint
> 9e5f432fb247af2f0062c3c57d7f45d511692f26 [TCP] FRTO: remove unnecessary fackets/sacked_out recounting
> 96001b306c60cb969d456ac70e376220db97e54a [TCP]: Move Reno SACKed_out counter functions earlier
> 318cf753170367504cfac07ac89e97befb1ca501 [TCP]: Extract DSACK detection code from tcp_sacktag_write_queue().
> a2539974b098065ebe02ad11a6411df4f56ba0ed [TCP]: Rexmit hint must be cleared instead of setting it
> - Could be combined with the extrator as it's really a bug fix, once
> rxmit cnt hint gets dropped, we can add this back as optimization :-).
> 2fea67411f0c89642fa4abc0490b65c7852a1202 [TCP]: Extracted rexmit hint clearing from the LOST marking code
> 1fba6548b2ecc2d513981898247472d1183329c5 [TCP]: Add highest_sack seqno, points to globally highest SACK
Ok, in order to get this started I merged the above patches
into net-2.6.24
I'm exhausted and done for today. :-)
We can work on the other bits next. Note that patches like the
"Create tcp_sacktag_foo" ones are just mechanical transformations
and thus if the conflicts are huge they can just be reimplemented
by hand.
> ...Taking IsReno/IsFack removal is probably going to be hardest one, as it
> will conflict a lot. ...Maybe some sed work in patches would reduce # of
> conflicts instead of trying cherry-pick+rebase.
Yes, such patch hand editing techniques can help in these cases.
^ permalink raw reply
* Re: [PATCH] make atomic_t volatile on all architectures
From: Herbert Xu @ 2007-08-09 8:30 UTC (permalink / raw)
To: Chris Snook
Cc: akpm, torvalds, ak, heiko.carstens, davem, linux-kernel, netdev,
schwidefsky, wensong, horms, wjiang, cfriesen, zlynx
In-Reply-To: <46BAC6AD.7050603@redhat.com>
On Thu, Aug 09, 2007 at 03:47:57AM -0400, Chris Snook wrote:
>
> If they're not doing anything, sure. Plenty of loops actually do some sort
> of real work while waiting for their halt condition, possibly even work
> which is necessary for their halt condition to occur, and you definitely
> don't want to be doing cpu_relax() in this case. On register-rich
> architectures you can do quite a lot of work without needing to reuse the
> register containing the result of the atomic_read(). Those are precisely
> the architectures where barrier() hurts the most.
I have a problem with this argument. The same loop could be
using a non-atomic as long as the updaters are serialised. Would
you suggest that we turn such non-atomics into volatiles too?
Any loop that's waiting for an external halt condition either
has to schedule away (which is a barrier) or you'd be busy
waiting in which case you should use cpu_relax.
Do you have an example where this isn't the case?
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* Re: [PATCH 6/14] nes: hardware init
From: Stephen Hemminger @ 2007-08-09 10:56 UTC (permalink / raw)
To: Glenn Grundstrom; +Cc: Jeff Garzik, rdreier, ewg, netdev
In-Reply-To: <5E701717F2B2ED4EA60F87C8AA57B7CC07443E4D@venom2>
More comments
>---
>diff -Nurp NULL ofa_kernel-1.2/drivers/infiniband/hw/nes/nes_hw.c
>--- NULL 1969-12-31 18:00:00.000000000 -0600
>+++ ofa_kernel-1.2/drivers/infiniband/hw/nes/nes_hw.c 2007-08-06 20:09:04.000000000 -0500
>+
>+#include "nes.h"
>+u32 crit_err_count = 0;
Possible global namespace conflict. You should make sure all global
symbols start with something specfic to driver like "nes_".
>+#include "nes_cm.h"
>+
>+
>+#ifdef NES_DEBUG
>+static unsigned char *nes_iwarp_state_str[] = {
Why use unsigned char? This should be "static const char *"
Maybe use gcc array initilaizers.
^ permalink raw reply
* Re: ATA over ethernet swapping
From: Pavel Machek @ 2007-08-09 10:11 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Ed L. Cashin, kernel list, ak, Netdev list
In-Reply-To: <1185959909.12034.38.camel@twins>
Hi!
> I've been working on this for quite some time. And should post again
> soon. Please see the patches:
>
> http://programming.kicks-ass.net/kernel-patches/vm_deadlock/current/
>
> For now it requires one uses SLUB, I hope that SLAB will go away (will
> save me the trouble of adding support) and I guess I ought to do SLOB
> some time (if that does stay).
>
> You'd need the first 22 patches of that series, and then call
> sk_set_memalloc(sk) on the proper socket, and do some fiddling with the
> reconnect logic. See nfs-swapfile.patch for examples.
What do you use for testing? I set up ata over ethernet... swapping
over that should deadlock w/o your patches.
But I'm able to compile kernel (-j 10) on 128MB machine, and I tried
cat /dev/zero | grep foo to exhaust memory... and could not reproduce
the deadlock. Should I pingflood? Tweak down ammount of atomic memory
avaialable to make deadlocks easier to reproduce?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
^ permalink raw reply
* [PATCH] ssb: pci core driver fixes
From: Michael Buesch @ 2007-08-09 10:10 UTC (permalink / raw)
To: John Linville; +Cc: Aurelien Jarno, netdev
From: Aurelien Jarno <aurelien@aurel32.net>
Fixes various things on the SSB
PCI core driver:
- Correctly write the configuration register value in
ssb_extpci_write_config() for len = 1 or len = 2.
- Set the PCI_LATENCY_TIMER to handle devices on the PCI bus.
- Set the PCI arbiter control to internal.
- Add some delay between the configuration of the PCI controller
and its registration.
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Michael Buesch <mb@bu3sch.de>
--- a/drivers/ssb/driver_pcicore.c
+++ b/drivers/ssb/driver_pcicore.c
@@ -99,6 +99,9 @@
/* Enable PCI bridge BAR1 prefetch and burst */
pci_write_config_dword(dev, SSB_BAR1_CONTROL, 3);
+
+ /* Make sure our latency is high enough to handle the devices behind us */
+ pci_write_config_byte(dev, PCI_LATENCY_TIMER, 0xa8);
}
DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, ssb_fixup_pcibridge);
@@ -230,7 +233,7 @@
val = *((const u32 *)buf);
break;
}
- writel(*((const u32 *)buf), mmio);
+ writel(val, mmio);
err = 0;
unmap:
@@ -311,6 +314,8 @@
udelay(150); /* Assertion time demanded by the PCI standard */
val |= SSB_PCICORE_CTL_RST; /* Deassert RST# */
pcicore_write32(pc, SSB_PCICORE_CTL, val);
+ val = SSB_PCICORE_ARBCTL_INTERN;
+ pcicore_write32(pc, SSB_PCICORE_ARBCTL, val);
udelay(1); /* Assertion time demanded by the PCI standard */
/*TODO cardbus mode */
@@ -340,6 +345,9 @@
* The following needs change, if we want to port hostmode
* to non-MIPS platform. */
set_io_port_base((unsigned long)ioremap_nocache(SSB_PCI_MEM, 0x04000000));
+ /* Give some time to the PCI controller to configure itself with the new
+ * values. Not waiting at this point causes crashes of the machine. */
+ mdelay(10);
register_pci_controller(&ssb_pcicore_controller);
}
^ permalink raw reply
* Re: net-2.6.24 GIT state
From: Ilpo Järvinen @ 2007-08-09 9:50 UTC (permalink / raw)
To: David Miller; +Cc: netdev
In-Reply-To: <20070809.022823.31781972.davem@davemloft.net>
On Thu, 9 Aug 2007, David Miller wrote:
> Ok, as is clear for some the net-2.6.24 tree has been cut
> and is at:
>
> kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.24.git
>
> I'll be going over the TCP bits Ilpo and I have been discussing
...In case I didn't make it clear previously, you'll notice that there are
some non-submitted changes rebased as well on the top of it.
> then expect me to disappear back to driver hacking :-) As is,
> there should be enough in there for people to play with,
> regression test, and improve.
At least these are trivial to take (after rebase, some come nicely even
from the before tree):
40564051bdb237231e625ba674fdedf6e8373027 [TCP]: Remove num_acked>0 checks from cong.ctrl mods pkts_acked
4c035cd78a6b60710b0dda4f62339df070f761c8 [TCP]: Add tcp_dec_pcount_approx int variant
ed2753b0b463df41c10121ce494d51428047fcbc [TCP]: Move code from tcp_ecn.h to tcp*.c and tcp.h & remove it
c215314c6ea0261865d3df09f375016f3dcadeba [TCP]: Access to highest_sack obsoletes forward_cnt_hint
9e5f432fb247af2f0062c3c57d7f45d511692f26 [TCP] FRTO: remove unnecessary fackets/sacked_out recounting
96001b306c60cb969d456ac70e376220db97e54a [TCP]: Move Reno SACKed_out counter functions earlier
318cf753170367504cfac07ac89e97befb1ca501 [TCP]: Extract DSACK detection code from tcp_sacktag_write_queue().
a2539974b098065ebe02ad11a6411df4f56ba0ed [TCP]: Rexmit hint must be cleared instead of setting it
- Could be combined with the extrator as it's really a bug fix, once
rxmit cnt hint gets dropped, we can add this back as optimization :-).
2fea67411f0c89642fa4abc0490b65c7852a1202 [TCP]: Extracted rexmit hint clearing from the LOST marking code
1fba6548b2ecc2d513981898247472d1183329c5 [TCP]: Add highest_sack seqno, points to globally highest SACK
...I was claiming incorrectly last time that highest_sack stuff needs
SACK-block validation, it does not as it takes seq from skb->seq, not from
the receiver like I thought :-). ...Redoing the validator (the improved
version) on top of net-2.6.24 is on my "high priority list" though, so
maybe I'll have it already on today.
Left_out removal should be safe too: after my analysis, it seems safe as
most of the functions are if-modified-then-sync type, some sync always.
Clean_rtx_queue leaves left_out to old state which is the hardest thing to
track. However, TCP should always execute fastretrans_alert (that syncs
near entry) if sacked_out+lost_out is (or was) non-zero, so it shouldn't
leak stale state anywhere. Not going to fastretrans_alert would be IMHO a
bug that should be fixed. I found one such case which is fixed in [TCP]:
Keep state in Disorder also if only lost_out > 0). Also FRTO that is
called before fastretrans_alert sync at entry, so it shouldn't be a
problem to it either.
...Taking IsReno/IsFack removal is probably going to be hardest one, as it
will conflict a lot. ...Maybe some sed work in patches would reduce # of
conflicts instead of trying cherry-pick+rebase.
--
i.
^ permalink raw reply
* [PATCH] 3c59x: fix duplex configuration
From: Steffen Klassert @ 2007-08-09 8:57 UTC (permalink / raw)
To: Andrew Morton; +Cc: mb-tmp-ohtmvyyn.xreary.bet, netdev
A special sequence of ifconfig up/down and plug/unplug the cable
can break the duplex configuration of the driver.
Setting
vp->mii.full_duplex = vp->full_duplex
in vortex_up should fix this.
Addresses Bug 8575 3c59x duplex configuration broken
http://bugzilla.kernel.org/show_bug.cgi?id=8575
Cc: Martin Buck <mb-tmp-ohtmvyyn.xreary.bet@gromit.dyndns.org>
Signed-off-by: Steffen Klassert <klassert@mathematik.tu-chemnitz.de>
---
drivers/net/3c59x.c | 1 +
1 file changed, 1 insertion(+)
--- linux-2.6.23-rc2.orig/drivers/net/3c59x.c
+++ linux-2.6.23-rc2/drivers/net/3c59x.c
@@ -1555,6 +1555,7 @@ vortex_up(struct net_device *dev)
mii_reg1 = mdio_read(dev, vp->phys[0], MII_BMSR);
mii_reg5 = mdio_read(dev, vp->phys[0], MII_LPA);
vp->partner_flow_ctrl = ((mii_reg5 & 0x0400) != 0);
+ vp->mii.full_duplex = vp->full_duplex;
vortex_check_media(dev, 1);
}
^ permalink raw reply
* net-2.6.24 GIT state
From: David Miller @ 2007-08-09 9:28 UTC (permalink / raw)
To: netdev
Ok, as is clear for some the net-2.6.24 tree has been cut
and is at:
kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6.24.git
Contents:
1) napi_struct V6 :-)
2) LRO with EHEA and Myri10g ports
3) mac80211 merge with John Linville
4) Some netlink bits from Thomas Graf
5) VETH driver
For shits and grins I build a powerpc cross environment and used
this to fix up the EHEA build fallout. That means nobody with
these chips did any testing at all, please do so if you can.
SIS190 didn't build because it had a half-done NAPI implementation
(actually closer to not done) and it did netif_poll_{enable,disable}()
(and wrongly) in it's open and close routines. Doing this
unconditionally, even when non-NAPI, is just wrong and therefore since
the NAPI related bits were miniscule I just ripped out all of it to
fix that.
Natsemi and pasemi_mac were missing a few netif_poll_{disable,enable}()
conversions to napi_{enable,disable}().
I found one driver whose conversion got missed, ibm_emac, but I can't
enable that in the powerpc64 build because it's guarded against when
PPC_MERGE is enabled which ppc64 sets unconditionally.
I might take a stab at converting that guy (it's similar to
EHEA and can use multi-queue NAPI quite directly although
the access back up to the parent netdev is a bit tricky)
but if someone can beat me to it... Hint hint!
I'm now doing a powerpc allmodconfig build to try and shake out
any other build regressions added by the net-2.6.24 tree changes.
I'll be going over the TCP bits Ilpo and I have been discussing
then expect me to disappear back to driver hacking :-) As is,
there should be enough in there for people to play with,
regression test, and improve.
Thanks.
^ permalink raw reply
* Re: [PATCH] make atomic_t volatile on all architectures
From: Jerry Jiang @ 2007-08-09 9:18 UTC (permalink / raw)
To: 7eggert
Cc: Linus Torvalds, Chris Snook, akpm, ak, heiko.carstens, davem,
linux-kernel, netdev, schwidefsky, wensong, horms, cfriesen,
zlynx
In-Reply-To: <E1IJ424-0000d7-HK@be1.lrz>
On Thu, 09 Aug 2007 11:10:16 +0200
Bodo Eggert <7eggert@gmx.de> wrote:
> >
> > Why the *volatile-accesses-in-code* is acceptable, does C standard make it
> > clear?
>
> http://lwn.net/Articles/233482/
I have read this article before, but What Linus said only focusing on
the conclusion-- The semantics of it are so unclear as
to be totally useless.
and still not to said "Why the *volatile-accesses-in-code* is
acceptable"
-- Jerry
> --
> Fun things to slip into your budget
> Heisenberg Compensator upgrade kit
>
> Friß, Spammer: uWfuXeviZ@x.7eggert.dyndns.org
^ permalink raw reply
* [patch (testing)] Re: 2.6.20->2.6.21 - networking dies after random time
From: Jarek Poplawski @ 2007-08-09 9:19 UTC (permalink / raw)
To: Marcin Ślusarz
Cc: Ingo Molnar, Thomas Gleixner, Linus Torvalds,
Jean-Baptiste Vignaud, linux-kernel, shemminger, linux-net,
netdev, Andrew Morton, Alan Cox
In-Reply-To: <20070808114243.GC2426@ff.dom.local>
On Wed, Aug 08, 2007 at 01:42:43PM +0200, Jarek Poplawski wrote:
> Read below please:
>
> On Wed, Aug 08, 2007 at 01:09:36PM +0200, Marcin Ślusarz wrote:
> > 2007/8/7, Jarek Poplawski <jarkao2@o2.pl>:
> > > So, the let's try this idea yet: modified Ingo's "x86: activate
> > > HARDIRQS_SW_RESEND" patch.
> > > (Don't forget about make oldconfig before make.)
> > > For testing only.
...
> > > diff -Nurp 2.6.22.1-/arch/i386/Kconfig 2.6.22.1/arch/i386/Kconfig
> > > --- 2.6.22.1-/arch/i386/Kconfig 2007-07-09 01:32:17.000000000 +0200
> > > +++ 2.6.22.1/arch/i386/Kconfig 2007-08-07 13:13:03.000000000 +0200
> > > @@ -1252,6 +1252,10 @@ config GENERIC_PENDING_IRQ
> > > depends on GENERIC_HARDIRQS && SMP
> > > default y
> > >
> > > +config HARDIRQS_SW_RESEND
...
> > Works fine with:
>
> Very nice! It would be about time this kernel should start behave...
>
> > WARNING: at kernel/irq/resend.c:79 check_irq_resend()
> >
> > Call Trace:
...
> So, it looks like x86_64 io_apic's IPI code was unused too long...
> I hope it's a piece of cake for Ingo now...
So, we know now it's almost definitely something about lapic and IPIs
but, maybe it's not this code to blame...
Here is one more patch to check the possibility it's about the way
the resend edge type irqs are handled by level type handlers:
so, let's check if acking isn't too late...
Marcin and Jean-Baptiste: I would be very glad, as usual! And no need
to hurry; I think we know enough to fix this for you, but maybe this
test could explain if there are errors in lapics or only bad handling.
Many thanks,
Jarek P.
PS: this patch is very experimental, and only intended for testing.
It should be applied to clean 2.6.23-rc1 or a bit older (eg. 2.6.22)
(so 2.6.23-rc2 or any patches from this thread shouldn't be around)
---
diff -Nurp 2.6.23-rc1-/kernel/irq/chip.c 2.6.23-rc1/kernel/irq/chip.c
--- 2.6.23-rc1-/kernel/irq/chip.c 2007-07-09 01:32:17.000000000 +0200
+++ 2.6.23-rc1/kernel/irq/chip.c 2007-08-08 20:49:07.000000000 +0200
@@ -389,12 +389,19 @@ handle_fasteoi_irq(unsigned int irq, str
unsigned int cpu = smp_processor_id();
struct irqaction *action;
irqreturn_t action_ret;
+ int edge = 0;
spin_lock(&desc->lock);
if (unlikely(desc->status & IRQ_INPROGRESS))
goto out;
+ if ((desc->status & (IRQ_PENDING | IRQ_REPLAY)) ==
+ IRQ_REPLAY) {
+ desc->chip->ack(irq);
+ edge = 1;
+ }
+
desc->status &= ~(IRQ_REPLAY | IRQ_WAITING);
kstat_cpu(cpu).irqs[irq]++;
@@ -421,7 +428,8 @@ handle_fasteoi_irq(unsigned int irq, str
spin_lock(&desc->lock);
desc->status &= ~IRQ_INPROGRESS;
out:
- desc->chip->eoi(irq);
+ if (!edge)
+ desc->chip->eoi(irq);
spin_unlock(&desc->lock);
}
diff -Nurp 2.6.23-rc1-/kernel/irq/resend.c 2.6.23-rc1/kernel/irq/resend.c
--- 2.6.23-rc1-/kernel/irq/resend.c 2007-07-09 01:32:17.000000000 +0200
+++ 2.6.23-rc1/kernel/irq/resend.c 2007-08-08 20:44:14.000000000 +0200
@@ -57,14 +57,10 @@ void check_irq_resend(struct irq_desc *d
{
unsigned int status = desc->status;
- /*
- * Make sure the interrupt is enabled, before resending it:
- */
- desc->chip->enable(irq);
-
if ((status & (IRQ_PENDING | IRQ_REPLAY)) == IRQ_PENDING) {
desc->status = (status & ~IRQ_PENDING) | IRQ_REPLAY;
+ WARN_ON_ONCE(1);
if (!desc->chip || !desc->chip->retrigger ||
!desc->chip->retrigger(irq)) {
#ifdef CONFIG_HARDIRQS_SW_RESEND
@@ -74,4 +70,5 @@ void check_irq_resend(struct irq_desc *d
#endif
}
}
+ desc->chip->enable(irq);
}
^ permalink raw reply
* Re: [PATCH] make atomic_t volatile on all architectures
From: Bodo Eggert @ 2007-08-09 9:10 UTC (permalink / raw)
To: Jerry Jiang, Linus Torvalds, Chris Snook, akpm, ak,
heiko.carstens
In-Reply-To: <8Q8rD-hh-7@gated-at.bofh.it>
Jerry Jiang <wjiang@resilience.com> wrote:
> On Wed, 8 Aug 2007 21:18:25 -0700 (PDT)
>> On Wed, 8 Aug 2007, Chris Snook wrote:
>> > Some architectures currently do not declare the contents of an atomic_t to
>> > be
>> > volatile. This causes confusion since atomic_read() might not actually
>> > read anything if an optimizing compiler re-uses a value stored in a
>> > register, which can break code that loops until something external changes
>> > the value of an atomic_t.
>>
>> I'd be *much* happier with "atomic_read()" doing the "volatile" instead.
>>
>> The fact is, volatile on data structures is a bug. It's a wart in the C
>> language. It shouldn't be used.
>
> Why? It's a wart! Is it due to unclear C standard on volatile related point?
>
> Why the *volatile-accesses-in-code* is acceptable, does C standard make it
> clear?
http://lwn.net/Articles/233482/
--
Fun things to slip into your budget
Heisenberg Compensator upgrade kit
Friß, Spammer: uWfuXeviZ@x.7eggert.dyndns.org
^ permalink raw reply
* Re: Please pull 'upstream-davem' branch of wireless-2.6
From: David Miller @ 2007-08-09 9:00 UTC (permalink / raw)
To: linville; +Cc: linux-wireless, netdev
In-Reply-To: <20070806210131.GM6442@tuxdriver.com>
From: "John W. Linville" <linville@tuxdriver.com>
Date: Mon, 6 Aug 2007 17:01:31 -0400
> git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6.git upstream-davem
Pulled, thanks John.
^ permalink raw reply
* Re: [PATCH] make atomic_t volatile on all architectures
From: Heiko Carstens @ 2007-08-09 8:14 UTC (permalink / raw)
To: Chris Snook
Cc: Linus Torvalds, akpm, ak, davem, linux-kernel, netdev,
schwidefsky, wensong, horms, wjiang, cfriesen, zlynx
In-Reply-To: <46BAC2BE.1090106@redhat.com>
On Thu, Aug 09, 2007 at 03:31:10AM -0400, Chris Snook wrote:
> Linus Torvalds wrote:
> > I'd be *much* happier with "atomic_read()" doing the "volatile" instead.
> > The fact is, volatile on data structures is a bug. It's a wart in the C
> > language. It shouldn't be used. Volatile accesses in *code* can be ok, and
> > if we have "atomic_read()" expand to a "*(volatile int *)&(x)->value", then
> > I'd be ok with that.
> > But marking data structures volatile just makes the compiler screw up
> > totally, and makes code for initialization sequences etc much worse.
> > Linus
>
> Fair enough. Casting to (volatile int *) will give us the behavior people
> expect when using atomic_t without needing to use inefficient barriers.
>
> While we have the hood up, should we convert all the atomic_t's to
> non-volatile and put volatile casts in all the atomic_reads? I don't know
> enough about the various arches to say with confidence that those changes
> alone will preserve existing behavior. We might need some arch-specific
> tweaking of the atomic operations.
If you write that patch could you include the atomic64 variants as well,
please? Besides that just post the patch to linux-arch and maintainers
should speak up.
^ permalink raw reply
* Re:[patch] genirq: temporary fix for level-triggered IRQ resend
From: Jean-Baptiste Vignaud @ 2007-08-09 8:12 UTC (permalink / raw)
To: jarkao2
Cc: mingo, torvalds, tglx, linux-kernel, shemminger, linux-net,
netdev, akpm, alan, marcin.slusarz
> Hi,
>
> I see there is a bit of complaining on this original resend temporary
> patch. But, since it seems to do a good job for some people, here is
> my proposal to limit the 'range of fire' a little bit.
>
> Marcin and Jean-Baptiste: try to test this with 2.6.23-rc2, please.
> (Unless Ingo or Thomas have other plans with this problem?)
2.6.23-rc2 + this patch, and the box's cards are still networking after 20 hours.
RX bytes:452423991847 (421.3 GiB) TX bytes:13464471620 (12.5 GiB)
still testing.
Jb
^ permalink raw reply
* Re: [PATCH] make atomic_t volatile on all architectures
From: Chris Snook @ 2007-08-09 7:47 UTC (permalink / raw)
To: Herbert Xu
Cc: akpm, torvalds, ak, heiko.carstens, davem, linux-kernel, netdev,
schwidefsky, wensong, horms, wjiang, cfriesen, zlynx
In-Reply-To: <E1IIwQx-000468-00@gondolin.me.apana.org.au>
Herbert Xu wrote:
> Chris Snook <csnook@redhat.com> wrote:
>> Some architectures currently do not declare the contents of an atomic_t to be
>> volatile. This causes confusion since atomic_read() might not actually read
>> anything if an optimizing compiler re-uses a value stored in a register, which
>> can break code that loops until something external changes the value of an
>> atomic_t. Avoiding such bugs requires using barrier(), which causes re-loads
>
> Such loops should always use something like cpu_relax() which comes
> with a barrier.
If they're not doing anything, sure. Plenty of loops actually do some sort of
real work while waiting for their halt condition, possibly even work which is
necessary for their halt condition to occur, and you definitely don't want to be
doing cpu_relax() in this case. On register-rich architectures you can do quite
a lot of work without needing to reuse the register containing the result of the
atomic_read(). Those are precisely the architectures where barrier() hurts the
most.
>> of all registers used in the loop, thus hurting performance instead of helping
>> it, particularly on architectures where it's unnecessary. Since we generally
>
> Do you have an example of such a loop where performance is hurt by this?
Not handy. Perhaps more interesting are cases where we access the same atomic_t
twice in a hot path. If we can remove some of those barriers, those hot paths
will get faster.
Performance was only part of the motivation. The IPVS bug was an example of how
atomic_t is assumed (not always correctly) to work, and other recent discussions
on this list have made it clear that most people assume atomic_read() actually
reads something every time, and don't even think to consult the documentation
until they find out the hard way that it does not. I'm not saying we should
encourage lazy programming, but in this case the assumption is reasonable
because that's how people actually use atomic_t, and making this behavior
uniform across all architectures makes it more convenient to do things the right
way, which we should encourage.
> The IPVS code that led to this patch was simply broken and has been
> fixed to use cpu_relax().
I agree, busy-waiting should be done with cpu_relax(), if at all. I'm more
concerned about cases that are not busy-waiting, but could still get compiled
with the same optimization.
-- Chris
^ permalink raw reply
* Re: [PATCH] make atomic_t volatile on all architectures
From: Chris Snook @ 2007-08-09 7:31 UTC (permalink / raw)
To: Linus Torvalds
Cc: akpm, ak, heiko.carstens, davem, linux-kernel, netdev,
schwidefsky, wensong, horms, wjiang, cfriesen, zlynx
In-Reply-To: <alpine.LFD.0.999.0708082116060.25146@woody.linux-foundation.org>
Linus Torvalds wrote:
>
> On Wed, 8 Aug 2007, Chris Snook wrote:
>> Some architectures currently do not declare the contents of an atomic_t to be
>> volatile. This causes confusion since atomic_read() might not actually read
>> anything if an optimizing compiler re-uses a value stored in a register, which
>> can break code that loops until something external changes the value of an
>> atomic_t.
>
> I'd be *much* happier with "atomic_read()" doing the "volatile" instead.
>
> The fact is, volatile on data structures is a bug. It's a wart in the C
> language. It shouldn't be used.
>
> Volatile accesses in *code* can be ok, and if we have "atomic_read()"
> expand to a "*(volatile int *)&(x)->value", then I'd be ok with that.
>
> But marking data structures volatile just makes the compiler screw up
> totally, and makes code for initialization sequences etc much worse.
>
> Linus
Fair enough. Casting to (volatile int *) will give us the behavior people
expect when using atomic_t without needing to use inefficient barriers.
While we have the hood up, should we convert all the atomic_t's to non-volatile
and put volatile casts in all the atomic_reads? I don't know enough about the
various arches to say with confidence that those changes alone will preserve
existing behavior. We might need some arch-specific tweaking of the atomic
operations.
-- Chris
^ permalink raw reply
* [PATCH 1/1] NFS: change the ip_map cache code to handle IPv6 addresses
From: Aurélien Charbon @ 2007-08-09 7:22 UTC (permalink / raw)
To: Mailing list NFSv4; +Cc: netdev ML
Here is a small part of missing pieces of IPv6 support for the server.
It deals with the ip_map caching code part.
It changes the ip_map structure to be able to store INET6 addresses.
It adds also the changes in address hashing, and mapping to test it with
INET addresses.
Signed-off-by: Aurelien Charbon <aurelien.charbon@ext.bull.net>
---
fs/nfsd/export.c | 12 ++++-
fs/nfsd/nfsctl.c | 21 ++++++++-
include/linux/sunrpc/svcauth.h | 4 -
net/sunrpc/svcauth_unix.c | 90
++++++++++++++++++++++++++---------------
4 files changed, 87 insertions(+), 40 deletions(-)
diff -u -r -N linux-2.6.23-rc1/fs/nfsd/export.c
linux-2.6.23-rc1-IPv6-ip_map/fs/nfsd/export.c
--- linux-2.6.23-rc1/fs/nfsd/export.c 2007-08-08 17:52:58.000000000 +0200
+++ linux-2.6.23-rc1-IPv6-ip_map/fs/nfsd/export.c 2007-08-08
17:49:09.000000000 +0200
@@ -1558,6 +1558,7 @@
{
struct auth_domain *dom;
int i, err;
+ struct in6_addr addr6;
/* First, consistency check. */
err = -EINVAL;
@@ -1576,9 +1577,14 @@
goto out_unlock;
/* Insert client into hashtable. */
- for (i = 0; i < ncp->cl_naddr; i++)
- auth_unix_add_addr(ncp->cl_addrlist[i], dom);
-
+ for (i = 0; i < ncp->cl_naddr; i++) {
+ /* Mapping address */
+ addr6.s6_addr32[0] = 0;
+ addr6.s6_addr32[1] = 0;
+ addr6.s6_addr32[2] = htonl(0xffff);
+ addr6.s6_addr32[3] = (uint32_t)ncp->cl_addrlist[i].s_addr;
+ auth_unix_add_addr(addr6, dom);
+ }
auth_unix_forget_old(dom);
auth_domain_put(dom);
diff -u -r -N linux-2.6.23-rc1/fs/nfsd/nfsctl.c
linux-2.6.23-rc1-IPv6-ip_map/fs/nfsd/nfsctl.c
--- linux-2.6.23-rc1/fs/nfsd/nfsctl.c 2007-08-08 17:52:58.000000000 +0200
+++ linux-2.6.23-rc1-IPv6-ip_map/fs/nfsd/nfsctl.c 2007-08-08
17:49:09.000000000 +0200
@@ -222,7 +222,7 @@
struct auth_domain *clp;
int err = 0;
struct knfsd_fh *res;
-
+ struct in6_addr in6;
if (size < sizeof(*data))
return -EINVAL;
data = (struct nfsctl_fsparm*)buf;
@@ -236,7 +236,14 @@
res = (struct knfsd_fh*)buf;
exp_readlock();
- if (!(clp = auth_unix_lookup(sin->sin_addr)))
+
+ /* IPv6 address mapping */
+ in6.s6_addr32[0] = 0;
+ in6.s6_addr32[1] = 0;
+ in6.s6_addr32[2] = htonl(0xffff);
+ in6.s6_addr32[3] = (uint32_t)sin->sin_addr.s_addr;
+
+ if (!(clp = auth_unix_lookup(in6)))
err = -EPERM;
else {
err = exp_rootfh(clp, data->gd_path, res, data->gd_maxlen);
@@ -253,6 +260,7 @@
{
struct nfsctl_fdparm *data;
struct sockaddr_in *sin;
+ struct in6_addr in6;
struct auth_domain *clp;
int err = 0;
struct knfsd_fh fh;
@@ -271,7 +279,14 @@
res = buf;
sin = (struct sockaddr_in *)&data->gd_addr;
exp_readlock();
- if (!(clp = auth_unix_lookup(sin->sin_addr)))
+
+ /* IPv6 address mapping */
+ in6.s6_addr32[0] = 0;
+ in6.s6_addr32[1] = 0;
+ in6.s6_addr32[2] = htonl(0xffff);
+ in6.s6_addr32[3] = (uint32_t)sin->sin_addr.s_addr;
+
+ if (!(clp = auth_unix_lookup(in6)))
err = -EPERM;
else {
err = exp_rootfh(clp, data->gd_path, &fh, NFS_FHSIZE);
diff -u -r -N linux-2.6.23-rc1/include/linux/sunrpc/svcauth.h
linux-2.6.23-rc1-IPv6-ip_map/include/linux/sunrpc/svcauth.h
--- linux-2.6.23-rc1/include/linux/sunrpc/svcauth.h 2007-08-08
17:52:59.000000000 +0200
+++ linux-2.6.23-rc1-IPv6-ip_map/include/linux/sunrpc/svcauth.h
2007-08-08 17:48:54.000000000 +0200
@@ -120,10 +120,10 @@
extern struct auth_domain *unix_domain_find(char *name);
extern void auth_domain_put(struct auth_domain *item);
-extern int auth_unix_add_addr(struct in_addr addr, struct auth_domain
*dom);
+extern int auth_unix_add_addr(struct in6_addr addr, struct auth_domain
*dom);
extern struct auth_domain *auth_domain_lookup(char *name, struct
auth_domain *new);
extern struct auth_domain *auth_domain_find(char *name);
-extern struct auth_domain *auth_unix_lookup(struct in_addr addr);
+extern struct auth_domain *auth_unix_lookup(struct in6_addr addr);
extern int auth_unix_forget_old(struct auth_domain *dom);
extern void svcauth_unix_purge(void);
extern void svcauth_unix_info_release(void *);
diff -u -r -N linux-2.6.23-rc1/net/sunrpc/svcauth_unix.c
linux-2.6.23-rc1-IPv6-ip_map/net/sunrpc/svcauth_unix.c
--- linux-2.6.23-rc1/net/sunrpc/svcauth_unix.c 2007-08-08
17:53:01.000000000 +0200
+++ linux-2.6.23-rc1-IPv6-ip_map/net/sunrpc/svcauth_unix.c 2007-08-09
08:29:27.000000000 +0200
@@ -84,7 +84,7 @@
struct ip_map {
struct cache_head h;
char m_class[8]; /* e.g. "nfsd" */
- struct in_addr m_addr;
+ struct in6_addr m_addr;
struct unix_domain *m_client;
int m_add_change;
};
@@ -112,12 +112,16 @@
return (hash ^ (hash>>8)) & 0xff;
}
#endif
+static inline int hash_ip6(struct in6_addr ip)
+{
+ return (hash_ip(ip.s6_addr32[0]) ^ hash_ip(ip.s6_addr32[1]) ^
hash_ip(ip.s6_addr32[2]) ^ hash_ip(ip.s6_addr32[3])) ;
+}
static int ip_map_match(struct cache_head *corig, struct cache_head *cnew)
{
struct ip_map *orig = container_of(corig, struct ip_map, h);
struct ip_map *new = container_of(cnew, struct ip_map, h);
return strcmp(orig->m_class, new->m_class) == 0
- && orig->m_addr.s_addr == new->m_addr.s_addr;
+ && memcmp(orig->m_addr.s6_addr,
new->m_addr.s6_addr,sizeof(struct in6_addr));
}
static void ip_map_init(struct cache_head *cnew, struct cache_head *citem)
{
@@ -125,7 +129,7 @@
struct ip_map *item = container_of(citem, struct ip_map, h);
strcpy(new->m_class, item->m_class);
- new->m_addr.s_addr = item->m_addr.s_addr;
+ memcpy(&new->m_addr.s6_addr,
&item->m_addr.s6_addr,sizeof(item->m_addr.s6_addr));
}
static void update(struct cache_head *cnew, struct cache_head *citem)
{
@@ -151,20 +155,28 @@
{
char text_addr[20];
struct ip_map *im = container_of(h, struct ip_map, h);
- __be32 addr = im->m_addr.s_addr;
+
+ __be32 addr[4];
+ int i;
+ for (i=0;i<4;i++)
+ addr[i] = im->m_addr.s6_addr[i];
- snprintf(text_addr, 20, "%u.%u.%u.%u",
- ntohl(addr) >> 24 & 0xff,
- ntohl(addr) >> 16 & 0xff,
- ntohl(addr) >> 8 & 0xff,
- ntohl(addr) >> 0 & 0xff);
+ snprintf(text_addr, 20, "%04x:%04x:%04x:%04x:%04x:%04x:%04x:%04x",
+ ntohl(addr[3]) >> 16 & 0xff,
+ ntohl(addr[3]) >> 0 & 0xff,
+ ntohl(addr[2]) >> 16 & 0xff,
+ ntohl(addr[2]) >> 0 & 0xff,
+ ntohl(addr[1]) >> 16 & 0xff,
+ ntohl(addr[1]) >> 0 & 0xff,
+ ntohl(addr[0]) >> 16 & 0xff,
+ ntohl(addr[0]) >> 0 & 0xff);
qword_add(bpp, blen, im->m_class);
qword_add(bpp, blen, text_addr);
(*bpp)[-1] = '\n';
}
-static struct ip_map *ip_map_lookup(char *class, struct in_addr addr);
+static struct ip_map *ip_map_lookup(char *class, struct in6_addr addr);
static int ip_map_update(struct ip_map *ipm, struct unix_domain *udom,
time_t expiry);
static int ip_map_parse(struct cache_detail *cd,
@@ -175,10 +187,10 @@
* for scratch: */
char *buf = mesg;
int len;
- int b1,b2,b3,b4;
+ int b1,b2,b3,b4,b5,b6,b7,b8;
char c;
char class[8];
- struct in_addr addr;
+ struct in6_addr addr;
int err;
struct ip_map *ipmp;
@@ -197,8 +209,21 @@
len = qword_get(&mesg, buf, mlen);
if (len <= 0) return -EINVAL;
- if (sscanf(buf, "%u.%u.%u.%u%c", &b1, &b2, &b3, &b4, &c) != 4)
- return -EINVAL;
+ if (sscanf(buf, "%d.%d.%d.%d%c", &b1, &b2, &b3, &b4, &c) == 4) {
+ addr.s6_addr32[0] = 0;
+ addr.s6_addr32[1] = 0;
+ addr.s6_addr32[2] = htonl(0xffff);
+ addr.s6_addr32[3] =
+ htonl((((((b1<<8)|b2)<<8)|b3)<<8)|b4);
+ } else if (sscanf(buf, "%04x:%04x:%04x:%04x:%04x:%04x:%04x:%04x%c",
+ &b1, &b2, &b3, &b4, &b5, &b6, &b7, &b8, &c) ==
8) {
+ addr.s6_addr32[0] = htonl((b1<<16)|b2);
+ addr.s6_addr32[1] = htonl((b3<<16)|b4);
+ addr.s6_addr32[2] = htonl((b5<<16)|b6);
+ addr.s6_addr32[3] = htonl((b7<<16)|b8);
+ } else
+ return -EINVAL;
+
expiry = get_expiry(&mesg);
if (expiry ==0)
@@ -215,9 +240,6 @@
} else
dom = NULL;
- addr.s_addr =
- htonl((((((b1<<8)|b2)<<8)|b3)<<8)|b4);
-
ipmp = ip_map_lookup(class,addr);
if (ipmp) {
err = ip_map_update(ipmp,
@@ -238,7 +260,7 @@
struct cache_head *h)
{
struct ip_map *im;
- struct in_addr addr;
+ struct in6_addr addr;
char *dom = "-no-domain-";
if (h == NULL) {
@@ -247,18 +269,22 @@
}
im = container_of(h, struct ip_map, h);
/* class addr domain */
- addr = im->m_addr;
+ memcpy(&addr, &im->m_addr, sizeof(struct in6_addr));
if (test_bit(CACHE_VALID, &h->flags) &&
!test_bit(CACHE_NEGATIVE, &h->flags))
dom = im->m_client->h.name;
- seq_printf(m, "%s %d.%d.%d.%d %s\n",
+ seq_printf(m, "%s %04x.%04x.%04x.%04x.%04x.%04x.%04x.%04x %s\n",
im->m_class,
- ntohl(addr.s_addr) >> 24 & 0xff,
- ntohl(addr.s_addr) >> 16 & 0xff,
- ntohl(addr.s_addr) >> 8 & 0xff,
- ntohl(addr.s_addr) >> 0 & 0xff,
+ ntohl(addr.s6_addr32[3]) >> 16 & 0xffff,
+ ntohl(addr.s6_addr32[3]) & 0xffff,
+ ntohl(addr.s6_addr32[2]) >> 16 & 0xffff,
+ ntohl(addr.s6_addr32[2]) & 0xffff,
+ ntohl(addr.s6_addr32[1]) >> 16 & 0xffff,
+ ntohl(addr.s6_addr32[1]) & 0xffff,
+ ntohl(addr.s6_addr32[0]) >> 16 & 0xffff,
+ ntohl(addr.s6_addr32[0]) & 0xffff,
dom
);
return 0;
@@ -280,16 +306,16 @@
.alloc = ip_map_alloc,
};
-static struct ip_map *ip_map_lookup(char *class, struct in_addr addr)
+static struct ip_map *ip_map_lookup(char *class, struct in6_addr addr)
{
struct ip_map ip;
struct cache_head *ch;
strcpy(ip.m_class, class);
- ip.m_addr = addr;
+ memcpy(&ip.m_addr, &addr, sizeof(struct in6_addr));
ch = sunrpc_cache_lookup(&ip_map_cache, &ip.h,
hash_str(class, IP_HASHBITS) ^
- hash_ip(addr.s_addr));
+ hash_ip6(addr));
if (ch)
return container_of(ch, struct ip_map, h);
@@ -318,14 +344,14 @@
ch = sunrpc_cache_update(&ip_map_cache,
&ip.h, &ipm->h,
hash_str(ipm->m_class, IP_HASHBITS) ^
- hash_ip(ipm->m_addr.s_addr));
+ hash_ip6(ipm->m_addr));
if (!ch)
return -ENOMEM;
cache_put(ch, &ip_map_cache);
return 0;
}
-int auth_unix_add_addr(struct in_addr addr, struct auth_domain *dom)
+int auth_unix_add_addr(struct in6_addr addr, struct auth_domain *dom)
{
struct unix_domain *udom;
struct ip_map *ipmp;
@@ -352,7 +378,7 @@
return 0;
}
-struct auth_domain *auth_unix_lookup(struct in_addr addr)
+struct auth_domain *auth_unix_lookup(struct in6_addr addr)
{
struct ip_map *ipm;
struct auth_domain *rv;
@@ -641,7 +667,7 @@
int
svcauth_unix_set_client(struct svc_rqst *rqstp)
{
- struct sockaddr_in *sin = svc_addr_in(rqstp);
+ struct sockaddr_in6 *sin = svc_addr_in6(rqstp);
struct ip_map *ipm;
rqstp->rq_client = NULL;
@@ -651,7 +677,7 @@
ipm = ip_map_cached_get(rqstp);
if (ipm == NULL)
ipm = ip_map_lookup(rqstp->rq_server->sv_program->pg_class,
- sin->sin_addr);
+ sin->sin6_addr);
if (ipm == NULL)
return SVC_DENIED;
--
********************************
Aurelien Charbon
Linux NFSv4 team
Bull SAS
Echirolles - France
http://nfsv4.bullopensource.org/
********************************
^ permalink raw reply
* myri10ge net-2.6.24 build fix
From: David Miller @ 2007-08-09 7:11 UTC (permalink / raw)
To: netdev; +Cc: gallatin
I had to add the following patch to fix the build after
the LRO changes, I have no idea how you could have compile
tested that patch let alone done any real testing on it :-/
diff --git a/drivers/net/myri10ge/myri10ge.c b/drivers/net/myri10ge/myri10ge.c
index 4cc5e6f..0ac0610 100644
--- a/drivers/net/myri10ge/myri10ge.c
+++ b/drivers/net/myri10ge/myri10ge.c
@@ -93,6 +93,7 @@ MODULE_LICENSE("Dual BSD/GPL");
#define MYRI10GE_EEPROM_STRINGS_SIZE 256
#define MYRI10GE_MAX_SEND_DESC_TSO ((65536 / 2048) * 2)
#define MYRI10GE_MAX_LRO_DESCRIPTORS 8
+#define MYRI10GE_LRO_MAX_PKTS 64
#define MYRI10GE_NO_CONFIRM_DATA htonl(0xffffffff)
#define MYRI10GE_NO_RESPONSE_RESULT 0xffffffff
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox