* Breakage caused by unreviewed patch in x86 tree
@ 2008-04-27 20:51 James Bottomley
2008-04-27 20:53 ` David Miller
` (3 more replies)
0 siblings, 4 replies; 51+ messages in thread
From: James Bottomley @ 2008-04-27 20:51 UTC (permalink / raw)
To: Ingo Molnar, Thomas Gleixner; +Cc: linux-kernel
This patch:
commit 6371b495991debfd1417b17c2bc4f7d7bae05739
Author: Ingo Molnar <mingo@elte.hu>
Date: Wed Jan 30 13:33:40 2008 +0100
x86: change ioremap() to default to uncached
As far as I can tell went blindly into the x86 tree without being shared
on any mailing list at all. How can something that completely alters
the semantics of ioremap on x86 platforms go in without any review.
I'm pissed off because it broke a class of voyager machines: those which
rely on the quad interrupt controller (QIC). The precis of why they
broke is because the QIC does IPIs (or CPIs in its terminology) via
cache line interference: you interrupt a processor by moving a
designated memory area to write exclusive in the cache (by simply
writing to the line) and the CPU acks the interrupt by moving it back to
read shared (by reading from it). That area, is, of course, mapped by
ioremap, so reversing the ioremap semantics and adding the uncached bit
completely breaks the QIC. I might add that the intel SAPIC functions
in roughly the same manner, so this might break more than just voyager.
This patch was originally proposed my Matthew Wilcox in March 2006:
http://lkml.org/lkml/2006/3/30/251
And after a reasonable discussion, it was decided that there was too
much risk of breakage to take it. How come you two decided that an
identical patch with different authorship could go in to your tree
without discussion this time?
James
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Breakage caused by unreviewed patch in x86 tree
2008-04-27 20:51 Breakage caused by unreviewed patch in x86 tree James Bottomley
@ 2008-04-27 20:53 ` David Miller
2008-04-27 21:48 ` [patch] x86, voyager: fix ioremap_nocache() Ingo Molnar
` (2 subsequent siblings)
3 siblings, 0 replies; 51+ messages in thread
From: David Miller @ 2008-04-27 20:53 UTC (permalink / raw)
To: James.Bottomley; +Cc: mingo, tglx, linux-kernel
From: James Bottomley <James.Bottomley@HansenPartnership.com>
Date: Sun, 27 Apr 2008 16:51:25 -0400
> I'm pissed off because it broke a class of voyager machines: those which
Join the very large and constantly growing club, Ingo is upsetting a
lot of people lately.
^ permalink raw reply [flat|nested] 51+ messages in thread
* [patch] x86, voyager: fix ioremap_nocache()
2008-04-27 20:51 Breakage caused by unreviewed patch in x86 tree James Bottomley
2008-04-27 20:53 ` David Miller
@ 2008-04-27 21:48 ` Ingo Molnar
2008-04-27 22:05 ` James Bottomley
2008-04-27 22:34 ` James Bottomley
2008-04-27 22:00 ` Breakage caused by unreviewed patch in x86 tree H. Peter Anvin
2008-04-27 22:58 ` Arjan van de Ven
3 siblings, 2 replies; 51+ messages in thread
From: Ingo Molnar @ 2008-04-27 21:48 UTC (permalink / raw)
To: James Bottomley
Cc: Thomas Gleixner, linux-kernel, H. Peter Anvin, David S. Miller
* James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> This patch:
>
> commit 6371b495991debfd1417b17c2bc4f7d7bae05739
> Author: Ingo Molnar <mingo@elte.hu>
> Date: Wed Jan 30 13:33:40 2008 +0100
>
> x86: change ioremap() to default to uncached
>
> As far as I can tell went blindly into the x86 tree without being
> shared on any mailing list at all. How can something that completely
> alters the semantics of ioremap on x86 platforms go in without any
> review.
what happened before was that on x86 ioremap() was "lax" about the PTE
cachability and just said "writeback cached". That was utterly false for
most of the real ioremap()s done by drivers, but it happened to work out
fine due to the courtesy of BIOSes setting up UC MTRRs that forced those
areas into uncachable.
with the PAT changes, what used to be this default and careless
ioremap() behavior by x86 turned into a hard cache aliasing conflict. So
we pretty much _have to_ default to uncached - like all other sane
architectures already did forever. This is a direct consequence of the
PAT changes which were discussed on lkml.
But with PAT we take over from the MTRRs on x86 and using a cacheable
ioremap() would give us aliasing conflicts and trouble all around the
place.
In the Voyager case we should use ioremap_cache(), and thanks for
pointing out that dependency of the QIC. Does the patch below fix it for
you?
Ingo
--------------->
Subject: x86, voyager: fix ioremap_nocache()
From: Ingo Molnar <mingo@elte.hu>
Date: Sun Apr 27 23:21:03 CEST 2008
James Bottomley reported that the following PAT related commit:
| commit 6371b495991debfd1417b17c2bc4f7d7bae05739
| Author: Ingo Molnar <mingo@elte.hu>
| Date: Wed Jan 30 13:33:40 2008 +0100
|
| x86: change ioremap() to default to uncached
broke Voyager.
James says:
" it broke a class of voyager machines: those which
rely on the quad interrupt controller (QIC). The precis of why they
broke is because the QIC does IPIs (or CPIs in its terminology) via
cache line interference: you interrupt a processor by moving a
designated memory area to write exclusive in the cache (by simply
writing to the line) and the CPU acks the interrupt by moving it back to
read shared (by reading from it). That area, is, of course, mapped by
ioremap, so reversing the ioremap semantics and adding the uncached bit
completely breaks the QIC. "
Sorry about that!
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/x86/mach-voyager/voyager_cat.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Index: linux-x86.q/arch/x86/mach-voyager/voyager_cat.c
===================================================================
--- linux-x86.q.orig/arch/x86/mach-voyager/voyager_cat.c
+++ linux-x86.q/arch/x86/mach-voyager/voyager_cat.c
@@ -602,7 +602,7 @@ void __init voyager_cat_init(void)
request_resource(&iomem_resource, &res);
voyager_SUS = (struct voyager_SUS *)
- ioremap(addr, 0x400);
+ ioremap_cache(addr, 0x400);
printk(KERN_NOTICE "Voyager SUS mailbox version 0x%x\n",
voyager_SUS->SUS_version);
voyager_SUS->kernel_version = VOYAGER_MAILBOX_VERSION;
@@ -877,7 +877,7 @@ void __init voyager_cat_init(void)
request_resource(&iomem_resource, res);
}
- qic_addr = (unsigned long)ioremap(qic_addr, 0x400);
+ qic_addr = (unsigned long)ioremap_cache(qic_addr, 0x400);
for (j = 0; j < 4; j++) {
__u8 cpu;
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Breakage caused by unreviewed patch in x86 tree
2008-04-27 20:51 Breakage caused by unreviewed patch in x86 tree James Bottomley
2008-04-27 20:53 ` David Miller
2008-04-27 21:48 ` [patch] x86, voyager: fix ioremap_nocache() Ingo Molnar
@ 2008-04-27 22:00 ` H. Peter Anvin
2008-04-27 22:10 ` James Bottomley
2008-04-27 22:58 ` Arjan van de Ven
3 siblings, 1 reply; 51+ messages in thread
From: H. Peter Anvin @ 2008-04-27 22:00 UTC (permalink / raw)
To: James Bottomley; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel
James Bottomley wrote:
> I might add that the intel SAPIC functions
> in roughly the same manner, so this might break more than just voyager.
Are you referring to the IA64 SAPIC here, or something else? The only
mention of SAPIC in the x86 tree appear to be naming of fields in ACPI
tables.
-hpa
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-27 21:48 ` [patch] x86, voyager: fix ioremap_nocache() Ingo Molnar
@ 2008-04-27 22:05 ` James Bottomley
2008-04-27 22:36 ` Willy Tarreau
` (2 more replies)
2008-04-27 22:34 ` James Bottomley
1 sibling, 3 replies; 51+ messages in thread
From: James Bottomley @ 2008-04-27 22:05 UTC (permalink / raw)
To: Ingo Molnar
Cc: Thomas Gleixner, linux-kernel, H. Peter Anvin, David S. Miller
On Sun, 2008-04-27 at 23:48 +0200, Ingo Molnar wrote:
> * James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>
> > This patch:
> >
> > commit 6371b495991debfd1417b17c2bc4f7d7bae05739
> > Author: Ingo Molnar <mingo@elte.hu>
> > Date: Wed Jan 30 13:33:40 2008 +0100
> >
> > x86: change ioremap() to default to uncached
> >
> > As far as I can tell went blindly into the x86 tree without being
> > shared on any mailing list at all. How can something that completely
> > alters the semantics of ioremap on x86 platforms go in without any
> > review.
>
> what happened before was that on x86 ioremap() was "lax" about the PTE
> cachability and just said "writeback cached". That was utterly false for
> most of the real ioremap()s done by drivers, but it happened to work out
> fine due to the courtesy of BIOSes setting up UC MTRRs that forced those
> areas into uncachable.
>
> with the PAT changes, what used to be this default and careless
> ioremap() behavior by x86 turned into a hard cache aliasing conflict. So
> we pretty much _have to_ default to uncached - like all other sane
> architectures already did forever. This is a direct consequence of the
> PAT changes which were discussed on lkml.
>
> But with PAT we take over from the MTRRs on x86 and using a cacheable
> ioremap() would give us aliasing conflicts and trouble all around the
> place.
I'm not saying the patch is wrong ... or that just because it broke
voyager it shouldn't be done. What I'm saying is that it shouldn't have
been put into the x86 tree without mailing list review.
Running a git tree isn't a private fiefdom, it's a public trust; to keep
the trust of other developers, you have to run the tree in a transparent
fashion ... and making the mailing list the only input to it is one way
of ensuring this. It also helps with review that we're all so worried
about so little being done ...
> In the Voyager case we should use ioremap_cache(), and thanks for
> pointing out that dependency of the QIC. Does the patch below fix it for
> you?
>
> Ingo
>
> --------------->
> Subject: x86, voyager: fix ioremap_nocache()
> From: Ingo Molnar <mingo@elte.hu>
> Date: Sun Apr 27 23:21:03 CEST 2008
>
> James Bottomley reported that the following PAT related commit:
>
> | commit 6371b495991debfd1417b17c2bc4f7d7bae05739
> | Author: Ingo Molnar <mingo@elte.hu>
> | Date: Wed Jan 30 13:33:40 2008 +0100
> |
> | x86: change ioremap() to default to uncached
>
> broke Voyager.
>
> James says:
>
> " it broke a class of voyager machines: those which
> rely on the quad interrupt controller (QIC). The precis of why they
> broke is because the QIC does IPIs (or CPIs in its terminology) via
> cache line interference: you interrupt a processor by moving a
> designated memory area to write exclusive in the cache (by simply
> writing to the line) and the CPU acks the interrupt by moving it back to
> read shared (by reading from it). That area, is, of course, mapped by
> ioremap, so reversing the ioremap semantics and adding the uncached bit
> completely breaks the QIC. "
>
> Sorry about that!
>
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
> arch/x86/mach-voyager/voyager_cat.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> Index: linux-x86.q/arch/x86/mach-voyager/voyager_cat.c
> ===================================================================
> --- linux-x86.q.orig/arch/x86/mach-voyager/voyager_cat.c
> +++ linux-x86.q/arch/x86/mach-voyager/voyager_cat.c
> @@ -602,7 +602,7 @@ void __init voyager_cat_init(void)
>
> request_resource(&iomem_resource, &res);
> voyager_SUS = (struct voyager_SUS *)
> - ioremap(addr, 0x400);
> + ioremap_cache(addr, 0x400);
Actually, no ... wrong ioremap. That one is the SUS clickmap remap,
which is irrelevant whether it's cached or not. The QIC cacheline area
is the next one below it.
I've already got the actual patch (with a nice explanation) in the
voyager tree. I'll send them all out to the mailing list shortly; I'm
just trying to chase down a problem with TSCs before the voyager tree is
ready.
James
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Breakage caused by unreviewed patch in x86 tree
2008-04-27 22:00 ` Breakage caused by unreviewed patch in x86 tree H. Peter Anvin
@ 2008-04-27 22:10 ` James Bottomley
2008-04-27 22:13 ` H. Peter Anvin
0 siblings, 1 reply; 51+ messages in thread
From: James Bottomley @ 2008-04-27 22:10 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel
On Sun, 2008-04-27 at 15:00 -0700, H. Peter Anvin wrote:
> James Bottomley wrote:
> > I might add that the intel SAPIC functions
> > in roughly the same manner, so this might break more than just voyager.
>
> Are you referring to the IA64 SAPIC here, or something else? The only
> mention of SAPIC in the x86 tree appear to be naming of fields in ACPI
> tables.
Yes, that's the one ... but I believe a class of the xAPICs also used a
similar principle.
James
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Breakage caused by unreviewed patch in x86 tree
2008-04-27 22:10 ` James Bottomley
@ 2008-04-27 22:13 ` H. Peter Anvin
2008-04-27 22:18 ` James Bottomley
0 siblings, 1 reply; 51+ messages in thread
From: H. Peter Anvin @ 2008-04-27 22:13 UTC (permalink / raw)
To: James Bottomley; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel
James Bottomley wrote:
> On Sun, 2008-04-27 at 15:00 -0700, H. Peter Anvin wrote:
>> James Bottomley wrote:
>>> I might add that the intel SAPIC functions
>>> in roughly the same manner, so this might break more than just voyager.
>> Are you referring to the IA64 SAPIC here, or something else? The only
>> mention of SAPIC in the x86 tree appear to be naming of fields in ACPI
>> tables.
>
> Yes, that's the one ... but I believe a class of the xAPICs also used a
> similar principle.
I certainly have never seen a system on which the APIC has been mapped
cacheable. I would be very interested in the details, so if you could
elaborate that would be extremely useful.
-hpa
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Breakage caused by unreviewed patch in x86 tree
2008-04-27 22:13 ` H. Peter Anvin
@ 2008-04-27 22:18 ` James Bottomley
2008-04-27 22:31 ` H. Peter Anvin
0 siblings, 1 reply; 51+ messages in thread
From: James Bottomley @ 2008-04-27 22:18 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel
On Sun, 2008-04-27 at 15:13 -0700, H. Peter Anvin wrote:
> James Bottomley wrote:
> > On Sun, 2008-04-27 at 15:00 -0700, H. Peter Anvin wrote:
> >> James Bottomley wrote:
> >>> I might add that the intel SAPIC functions
> >>> in roughly the same manner, so this might break more than just voyager.
> >> Are you referring to the IA64 SAPIC here, or something else? The only
> >> mention of SAPIC in the x86 tree appear to be naming of fields in ACPI
> >> tables.
> >
> > Yes, that's the one ... but I believe a class of the xAPICs also used a
> > similar principle.
>
> I certainly have never seen a system on which the APIC has been mapped
> cacheable. I would be very interested in the details, so if you could
> elaborate that would be extremely useful.
Not really ... I just remember when the SAPIC and later the xAPIC
details were published as novel nearly a decade ago, I remember saying
that some of the voyager interrupt controllers had been using a similar
method for years.
James
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Breakage caused by unreviewed patch in x86 tree
2008-04-27 22:18 ` James Bottomley
@ 2008-04-27 22:31 ` H. Peter Anvin
0 siblings, 0 replies; 51+ messages in thread
From: H. Peter Anvin @ 2008-04-27 22:31 UTC (permalink / raw)
To: James Bottomley; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel
James Bottomley wrote:
>>> Yes, that's the one ... but I believe a class of the xAPICs also used a
>>> similar principle.
>> I certainly have never seen a system on which the APIC has been mapped
>> cacheable. I would be very interested in the details, so if you could
>> elaborate that would be extremely useful.
>
> Not really ... I just remember when the SAPIC and later the xAPIC
> details were published as novel nearly a decade ago, I remember saying
> that some of the voyager interrupt controllers had been using a similar
> method for years.
>
Well, I just looked up the xAPIC spec, and it states very clearly:
APIC registers are memory-mapped to a 4-KByte region of the processor’s
physical address space with an initial starting address of FEE00000H.
For correct APIC operation, this address space must be mapped to an area
of memory that has been designated as strong uncacheable (UC). See
Section 10.3, “Methods of Caching Available.”
So any use of cacheline-related bus cycles is generated by the LAPIC and
doesn't affect the CPU <-> LAPIC interface.
-hpa
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-27 21:48 ` [patch] x86, voyager: fix ioremap_nocache() Ingo Molnar
2008-04-27 22:05 ` James Bottomley
@ 2008-04-27 22:34 ` James Bottomley
2008-04-27 22:39 ` Jeff Garzik
1 sibling, 1 reply; 51+ messages in thread
From: James Bottomley @ 2008-04-27 22:34 UTC (permalink / raw)
To: Ingo Molnar
Cc: Thomas Gleixner, linux-kernel, H. Peter Anvin, David S. Miller
Here's another piece of the x86 API that's designed to be cached. The
dma_declare_coherent_memory() usually represents behind bridge memory
that's fully participatory in the coherence model.
Making it uncached damages the utility of this memory because doing
cacheline sized burst cycles when needed to it is far faster than
individual byte/word/quad writes.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
---
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 388b113..df83ffd 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -214,7 +214,7 @@ int dma_declare_coherent_memory(struct device *dev, dma_addr_t bus_addr,
/* FIXME: this routine just ignores DMA_MEMORY_INCLUDES_CHILDREN */
- mem_base = ioremap(bus_addr, size);
+ mem_base = ioremap_cache(bus_addr, size);
if (!mem_base)
goto out;
^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-27 22:05 ` James Bottomley
@ 2008-04-27 22:36 ` Willy Tarreau
2008-04-27 22:41 ` Ingo Molnar
2008-04-27 23:18 ` Ingo Molnar
2 siblings, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2008-04-27 22:36 UTC (permalink / raw)
To: James Bottomley
Cc: Ingo Molnar, Thomas Gleixner, linux-kernel, H. Peter Anvin,
David S. Miller
On Sun, Apr 27, 2008 at 06:05:59PM -0400, James Bottomley wrote:
> > Index: linux-x86.q/arch/x86/mach-voyager/voyager_cat.c
> > ===================================================================
> > --- linux-x86.q.orig/arch/x86/mach-voyager/voyager_cat.c
> > +++ linux-x86.q/arch/x86/mach-voyager/voyager_cat.c
> > @@ -602,7 +602,7 @@ void __init voyager_cat_init(void)
> >
> > request_resource(&iomem_resource, &res);
> > voyager_SUS = (struct voyager_SUS *)
> > - ioremap(addr, 0x400);
> > + ioremap_cache(addr, 0x400);
>
> Actually, no ... wrong ioremap. That one is the SUS clickmap remap,
> which is irrelevant whether it's cached or not. The QIC cacheline area
> is the next one below it.
>
> I've already got the actual patch (with a nice explanation) in the
> voyager tree. I'll send them all out to the mailing list shortly; I'm
> just trying to chase down a problem with TSCs before the voyager tree is
> ready.
James,
whatever the fix, please add a comment on the sensible line above so
that the same error does not happen again. Code must be readable without
comments, but magic choices must be commented if there's an underlying
reason.
Thanks,
Willy
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-27 22:34 ` James Bottomley
@ 2008-04-27 22:39 ` Jeff Garzik
2008-04-27 22:44 ` H. Peter Anvin
` (3 more replies)
0 siblings, 4 replies; 51+ messages in thread
From: Jeff Garzik @ 2008-04-27 22:39 UTC (permalink / raw)
To: James Bottomley
Cc: Ingo Molnar, Thomas Gleixner, linux-kernel, H. Peter Anvin,
David S. Miller, Linus Torvalds
James Bottomley wrote:
> Here's another piece of the x86 API that's designed to be cached. The
> dma_declare_coherent_memory() usually represents behind bridge memory
> that's fully participatory in the coherence model.
>
> Making it uncached damages the utility of this memory because doing
> cacheline sized burst cycles when needed to it is far faster than
> individual byte/word/quad writes.
>
> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
>
> ---
>
> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
> index 388b113..df83ffd 100644
> --- a/arch/x86/kernel/pci-dma.c
> +++ b/arch/x86/kernel/pci-dma.c
> @@ -214,7 +214,7 @@ int dma_declare_coherent_memory(struct device *dev, dma_addr_t bus_addr,
>
> /* FIXME: this routine just ignores DMA_MEMORY_INCLUDES_CHILDREN */
>
> - mem_base = ioremap(bus_addr, size);
> + mem_base = ioremap_cache(bus_addr, size);
> if (!mem_base)
> goto out;
Surely the change in semantics will find other places to creep out like
this?
I would rather change drivers to use ioremap_nocache(), and leave the
API as-is.
Isn't there Yet More Breakage in lib/iomap.c, given these new semantics?
if (flags & IORESOURCE_MEM) {
if (flags & IORESOURCE_CACHEABLE)
return ioremap(start, len);
return ioremap_nocache(start, len);
}
Any driver using pci_iomap() (libata, and others) is affected.
I disagree with this semantics change. A number of code places _and
drivers_ GET IT RIGHT, and these are all broken now?
Jeff
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-27 22:05 ` James Bottomley
2008-04-27 22:36 ` Willy Tarreau
@ 2008-04-27 22:41 ` Ingo Molnar
2008-04-27 23:18 ` Ingo Molnar
2 siblings, 0 replies; 51+ messages in thread
From: Ingo Molnar @ 2008-04-27 22:41 UTC (permalink / raw)
To: James Bottomley
Cc: Thomas Gleixner, linux-kernel, H. Peter Anvin, David S. Miller
* James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> > request_resource(&iomem_resource, &res);
> > voyager_SUS = (struct voyager_SUS *)
> > - ioremap(addr, 0x400);
> > + ioremap_cache(addr, 0x400);
>
> Actually, no ... wrong ioremap. That one is the SUS clickmap remap,
> which is irrelevant whether it's cached or not. The QIC cacheline area
> is the next one below it.
yes, but i assumed that since you had all cached ioremaps before, you
might wanted it for all of the devices in voyager_cat.c (or at least it
wouldnt hurt). Btw., the QIC ioremap i changed in my patch too, just in
case you didnt notice:
> > - qic_addr = (unsigned long)ioremap(qic_addr, 0x400);
> > + qic_addr = (unsigned long)ioremap_cache(qic_addr, 0x400);
so my patch too should do the trick.
Ingo
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-27 22:39 ` Jeff Garzik
@ 2008-04-27 22:44 ` H. Peter Anvin
2008-04-27 22:46 ` David Miller
` (2 subsequent siblings)
3 siblings, 0 replies; 51+ messages in thread
From: H. Peter Anvin @ 2008-04-27 22:44 UTC (permalink / raw)
To: Jeff Garzik
Cc: James Bottomley, Ingo Molnar, Thomas Gleixner, linux-kernel,
David S. Miller, Linus Torvalds
Jeff Garzik wrote:
>
> Isn't there Yet More Breakage in lib/iomap.c, given these new semantics?
>
> if (flags & IORESOURCE_MEM) {
> if (flags & IORESOURCE_CACHEABLE)
> return ioremap(start, len);
> return ioremap_nocache(start, len);
> }
>
> Any driver using pci_iomap() (libata, and others) is affected.
>
> I disagree with this semantics change. A number of code places _and
> drivers_ GET IT RIGHT, and these are all broken now?
>
They don't, though. They rely on the MTRRs to get it right for them,
and the MTRRs will say "uncached" in nearly all cases.
Consider the case above: you would have gotten the same result
*regardless*, because the MTRRs would not have made the memory
cacheable. Yes, it's broken, but it was broken before as well.
-hpa
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-27 22:39 ` Jeff Garzik
2008-04-27 22:44 ` H. Peter Anvin
@ 2008-04-27 22:46 ` David Miller
2008-04-27 22:52 ` H. Peter Anvin
` (2 more replies)
2008-04-27 23:01 ` Thomas Gleixner
2008-04-28 14:10 ` Arjan van de Ven
3 siblings, 3 replies; 51+ messages in thread
From: David Miller @ 2008-04-27 22:46 UTC (permalink / raw)
To: jeff; +Cc: James.Bottomley, mingo, tglx, linux-kernel, hpa, torvalds
From: Jeff Garzik <jeff@garzik.org>
Date: Sun, 27 Apr 2008 18:39:24 -0400
> I disagree with this semantics change. A number of code places _and
> drivers_ GET IT RIGHT, and these are all broken now?
[ Note, James's patch that you quoted is about mapping DMA
memory, in dma_declare_coherent_memory(), rather than devices.
But I know what you are trying to talk about Jeff. :-) ]
Wrt. ioremap() semanics, it is important to realize that if
the implementation of this on x86 has been giving non-cached
I/O mappings out up until recently, you can expect that there
are hundreds of drivers that might now be broken.
That's the sad fact of the ubiquity of x86, and it doesn't matter how
we defined the API is some document.
Anyways, my point is that this angle should be strongly considered in
any discussion about ioremap() behavior.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-27 22:46 ` David Miller
@ 2008-04-27 22:52 ` H. Peter Anvin
2008-04-27 22:58 ` David Miller
2008-04-27 23:34 ` Jeff Garzik
2008-04-27 22:53 ` Jeff Garzik
2008-04-27 23:01 ` Arjan van de Ven
2 siblings, 2 replies; 51+ messages in thread
From: H. Peter Anvin @ 2008-04-27 22:52 UTC (permalink / raw)
To: David Miller; +Cc: jeff, James.Bottomley, mingo, tglx, linux-kernel, torvalds
David Miller wrote:
> From: Jeff Garzik <jeff@garzik.org>
> Date: Sun, 27 Apr 2008 18:39:24 -0400
>
>> I disagree with this semantics change. A number of code places _and
>> drivers_ GET IT RIGHT, and these are all broken now?
>
> [ Note, James's patch that you quoted is about mapping DMA
> memory, in dma_declare_coherent_memory(), rather than devices.
> But I know what you are trying to talk about Jeff. :-) ]
>
> Wrt. ioremap() semanics, it is important to realize that if
> the implementation of this on x86 has been giving non-cached
> I/O mappings out up until recently, you can expect that there
> are hundreds of drivers that might now be broken.
>
> That's the sad fact of the ubiquity of x86, and it doesn't matter how
> we defined the API is some document.
>
> Anyways, my point is that this angle should be strongly considered in
> any discussion about ioremap() behavior.
>
There are pretty much three reasons to default to uncached, as far as I
understand:
1. De facto historical practice (follow the MTRRs);
2. Conservatism (failure mode less drastic);
3. Compatibility with other architectures.
Arguably, the right thing is to not even have ioremap() anymore, just
ioremap_{cache,nocache,wc} and consider any unconverted ioremap() as a
flag to audit that particular piece of code.
I don't think that can be done "instantly", though, so defaulting
ioremap() to uncached (as it apparently has been on other arches
already?) is the conservative option in the meantime.
-hpa
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-27 22:46 ` David Miller
2008-04-27 22:52 ` H. Peter Anvin
@ 2008-04-27 22:53 ` Jeff Garzik
2008-04-27 22:56 ` H. Peter Anvin
2008-04-27 23:01 ` Arjan van de Ven
2 siblings, 1 reply; 51+ messages in thread
From: Jeff Garzik @ 2008-04-27 22:53 UTC (permalink / raw)
To: David Miller; +Cc: James.Bottomley, mingo, tglx, linux-kernel, hpa, torvalds
David Miller wrote:
> From: Jeff Garzik <jeff@garzik.org>
> Date: Sun, 27 Apr 2008 18:39:24 -0400
>
>> I disagree with this semantics change. A number of code places _and
>> drivers_ GET IT RIGHT, and these are all broken now?
>
> [ Note, James's patch that you quoted is about mapping DMA
> memory, in dma_declare_coherent_memory(), rather than devices.
> But I know what you are trying to talk about Jeff. :-) ]
>
> Wrt. ioremap() semanics, it is important to realize that if
> the implementation of this on x86 has been giving non-cached
> I/O mappings out up until recently, you can expect that there
> are hundreds of drivers that might now be broken.
>
> That's the sad fact of the ubiquity of x86, and it doesn't matter how
> we defined the API is some document.
>
> Anyways, my point is that this angle should be strongly considered in
> any discussion about ioremap() behavior.
Understood.
I guess I am more annoyed that this stealth semantics change appears to
have broken everything that depends on pci_iomap(), including 90%+ of
all libata drivers, unless I am missing something.
That one piece of code (pci_iomap) was correct under the old semantics,
on x86 and elsewhere. It's tested and working nicely, and depended upon
by many drivers.
Jeff
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-27 22:53 ` Jeff Garzik
@ 2008-04-27 22:56 ` H. Peter Anvin
2008-04-27 22:59 ` David Miller
2008-04-27 23:02 ` Jeff Garzik
0 siblings, 2 replies; 51+ messages in thread
From: H. Peter Anvin @ 2008-04-27 22:56 UTC (permalink / raw)
To: Jeff Garzik
Cc: David Miller, James.Bottomley, mingo, tglx, linux-kernel,
torvalds
Jeff Garzik wrote:
>
> Understood.
>
> I guess I am more annoyed that this stealth semantics change appears to
> have broken everything that depends on pci_iomap(), including 90%+ of
> all libata drivers, unless I am missing something.
>
> That one piece of code (pci_iomap) was correct under the old semantics,
> on x86 and elsewhere. It's tested and working nicely, and depended upon
> by many drivers.
>
That one piece of code has had no effective change. Under both the old
and the new code, both branches functionally because ioremap_nocache(),
in one case because of MTRR and in one case because of PAT.
-hpa
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Breakage caused by unreviewed patch in x86 tree
2008-04-27 20:51 Breakage caused by unreviewed patch in x86 tree James Bottomley
` (2 preceding siblings ...)
2008-04-27 22:00 ` Breakage caused by unreviewed patch in x86 tree H. Peter Anvin
@ 2008-04-27 22:58 ` Arjan van de Ven
2008-04-27 23:00 ` David Miller
2008-04-27 23:03 ` James Bottomley
3 siblings, 2 replies; 51+ messages in thread
From: Arjan van de Ven @ 2008-04-27 22:58 UTC (permalink / raw)
To: James Bottomley; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel
On Sun, 27 Apr 2008 16:51:25 -0400
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> This patch:
>
> commit 6371b495991debfd1417b17c2bc4f7d7bae05739
> Author: Ingo Molnar <mingo@elte.hu>
> Date: Wed Jan 30 13:33:40 2008 +0100
>
> x86: change ioremap() to default to uncached
>
> As far as I can tell went blindly into the x86 tree without being
> shared on any mailing list at all. How can something that completely
> alters the semantics of ioremap on x86 platforms go in without any
> review.
it changed from "whatever coinflip you got" to "predictable outcome".
What you got before was uncached (most of the time), or if the bios was
creative, write combining. Or if the bios was broken in how it set up MTRR's, you could suddenly
get "cached".
When you're mapping device memory, uncached is the safe default.
With the switch to PAT (and phasing out of MTRR), the kernel needs to pick one of the three
(cached, writecombining, uncached) since you can no longer really depend on MTRRs saving
your bacon there.
Drivers in general, with VERY few exceptions, want uncached. Any other choice would have been deadly...
I'd like to ask you which one you would pick... you maintain a whole bunch of drivers as scsi maintainer, what would
you have picked?
The answer "whatever the MTRR set up" no longer holds ;(
>
> I might add that the intel
> SAPIC functions in roughly the same manner, so this might break more
> than just voyager.
Apics are uncached...
--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-27 22:52 ` H. Peter Anvin
@ 2008-04-27 22:58 ` David Miller
2008-04-27 23:04 ` H. Peter Anvin
2008-04-27 23:34 ` Jeff Garzik
1 sibling, 1 reply; 51+ messages in thread
From: David Miller @ 2008-04-27 22:58 UTC (permalink / raw)
To: hpa; +Cc: jeff, James.Bottomley, mingo, tglx, linux-kernel, torvalds
From: "H. Peter Anvin" <hpa@zytor.com>
Date: Sun, 27 Apr 2008 15:52:02 -0700
> Arguably, the right thing is to not even have ioremap() anymore, just
> ioremap_{cache,nocache,wc} and consider any unconverted ioremap() as a
> flag to audit that particular piece of code.
>
> I don't think that can be done "instantly", though, so defaulting
> ioremap() to uncached (as it apparently has been on other arches
> already?) is the conservative option in the meantime.
I think this is a reasonable course of action.
That leaves one of Jeff's concerns, what to do with pci_iomap(). That
was designed to give mappings with caching enabled, and as a result we
probably should make it behave that way.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-27 22:56 ` H. Peter Anvin
@ 2008-04-27 22:59 ` David Miller
2008-04-27 23:02 ` Jeff Garzik
1 sibling, 0 replies; 51+ messages in thread
From: David Miller @ 2008-04-27 22:59 UTC (permalink / raw)
To: hpa; +Cc: jeff, James.Bottomley, mingo, tglx, linux-kernel, torvalds
From: "H. Peter Anvin" <hpa@zytor.com>
Date: Sun, 27 Apr 2008 15:56:58 -0700
> That one piece of code has had no effective change. Under both the old
> and the new code, both branches functionally because ioremap_nocache(),
> in one case because of MTRR and in one case because of PAT.
Ho hum.. forget the suggestion I made in the reply I just sent out
then. :-/
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Breakage caused by unreviewed patch in x86 tree
2008-04-27 22:58 ` Arjan van de Ven
@ 2008-04-27 23:00 ` David Miller
2008-04-27 23:07 ` Arjan van de Ven
2008-04-27 23:03 ` James Bottomley
1 sibling, 1 reply; 51+ messages in thread
From: David Miller @ 2008-04-27 23:00 UTC (permalink / raw)
To: arjan; +Cc: James.Bottomley, mingo, tglx, linux-kernel
From: Arjan van de Ven <arjan@infradead.org>
Date: Sun, 27 Apr 2008 15:58:03 -0700
> On Sun, 27 Apr 2008 16:51:25 -0400
> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>
> > This patch:
> >
> > commit 6371b495991debfd1417b17c2bc4f7d7bae05739
> > Author: Ingo Molnar <mingo@elte.hu>
> > Date: Wed Jan 30 13:33:40 2008 +0100
> >
> > x86: change ioremap() to default to uncached
> >
> > As far as I can tell went blindly into the x86 tree without being
> > shared on any mailing list at all. How can something that completely
> > alters the semantics of ioremap on x86 platforms go in without any
> > review.
>
> it changed from "whatever coinflip you got" to "predictable outcome".
You're making technical responses to a question about process.
The core issue how the change in question was handled.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-27 22:46 ` David Miller
2008-04-27 22:52 ` H. Peter Anvin
2008-04-27 22:53 ` Jeff Garzik
@ 2008-04-27 23:01 ` Arjan van de Ven
2008-04-30 21:44 ` James Bottomley
2 siblings, 1 reply; 51+ messages in thread
From: Arjan van de Ven @ 2008-04-27 23:01 UTC (permalink / raw)
To: David Miller
Cc: jeff, James.Bottomley, mingo, tglx, linux-kernel, hpa, torvalds
On Sun, 27 Apr 2008 15:46:20 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:
> From: Jeff Garzik <jeff@garzik.org>
> Date: Sun, 27 Apr 2008 18:39:24 -0400
>
> > I disagree with this semantics change. A number of code places
> > _and drivers_ GET IT RIGHT, and these are all broken now?
>
> [ Note, James's patch that you quoted is about mapping DMA
> memory, in dma_declare_coherent_memory(), rather than devices.
> But I know what you are trying to talk about Jeff. :-) ]
>
> Wrt. ioremap() semanics, it is important to realize that if
> the implementation of this on x86 has been giving non-cached
> I/O mappings out up until recently, you can expect that there
> are hundreds of drivers that might now be broken.
it's even worse than that.
99% of the time it gave out uncached memory, but if the bios was iffy,
it would suddenly give out something else.
The changes to ioremap() recently turned that into "100% uncached".
For me, that's the right (because safe) behavior, exactly for the
reason you mentioned: it's what happened in practice, and driver writers
would assume that to happen.
Anything except uncached would just be insanity as default.
(well the old "whatever mood the bios writer was in, usually uncached" is
effectively uncached except for weird behaving boxes)
--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-27 22:39 ` Jeff Garzik
2008-04-27 22:44 ` H. Peter Anvin
2008-04-27 22:46 ` David Miller
@ 2008-04-27 23:01 ` Thomas Gleixner
2008-04-28 14:10 ` Arjan van de Ven
3 siblings, 0 replies; 51+ messages in thread
From: Thomas Gleixner @ 2008-04-27 23:01 UTC (permalink / raw)
To: Jeff Garzik
Cc: James Bottomley, Ingo Molnar, linux-kernel, H. Peter Anvin,
David S. Miller, Linus Torvalds
On Sun, 27 Apr 2008, Jeff Garzik wrote:
>
> Isn't there Yet More Breakage in lib/iomap.c, given these new semantics?
>
> if (flags & IORESOURCE_MEM) {
> if (flags & IORESOURCE_CACHEABLE)
> return ioremap(start, len);
> return ioremap_nocache(start, len);
> }
That's broken anyway as we are inconsistent across
architectures. ioremap defaults to uncached at least on:
POWERPC, ARM, MIPS, SH, FRV, PARISC, H8300, M68K, BLACKFIN, M68KNOMMU
So changing the above to ioremap_cache() would provide the desired
results across architectures.
Thanks,
tglx
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-27 22:56 ` H. Peter Anvin
2008-04-27 22:59 ` David Miller
@ 2008-04-27 23:02 ` Jeff Garzik
2008-04-27 23:14 ` Arjan van de Ven
1 sibling, 1 reply; 51+ messages in thread
From: Jeff Garzik @ 2008-04-27 23:02 UTC (permalink / raw)
To: H. Peter Anvin
Cc: David Miller, James.Bottomley, mingo, tglx, linux-kernel,
torvalds
H. Peter Anvin wrote:
> Jeff Garzik wrote:
>>
>> Understood.
>>
>> I guess I am more annoyed that this stealth semantics change appears
>> to have broken everything that depends on pci_iomap(), including 90%+
>> of all libata drivers, unless I am missing something.
>>
>> That one piece of code (pci_iomap) was correct under the old
>> semantics, on x86 and elsewhere. It's tested and working nicely, and
>> depended upon by many drivers.
>>
>
> That one piece of code has had no effective change. Under both the old
> and the new code, both branches functionally because ioremap_nocache(),
> in one case because of MTRR and in one case because of PAT.
OK good... libata uses it for controller registers exclusively, so that
should be fine from an operational standpoint.
The lack of discussion is still a bit irksome... at the very least I
could have pointed out that lib/iomap.c wanted an update, and the 2.5 yo
discussion could have resurfaced.
Jeff
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Breakage caused by unreviewed patch in x86 tree
2008-04-27 22:58 ` Arjan van de Ven
2008-04-27 23:00 ` David Miller
@ 2008-04-27 23:03 ` James Bottomley
2008-04-27 23:11 ` Arjan van de Ven
2008-04-27 23:17 ` H. Peter Anvin
1 sibling, 2 replies; 51+ messages in thread
From: James Bottomley @ 2008-04-27 23:03 UTC (permalink / raw)
To: Arjan van de Ven; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel
On Sun, 2008-04-27 at 15:58 -0700, Arjan van de Ven wrote:
> On Sun, 27 Apr 2008 16:51:25 -0400
> James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>
> > This patch:
> >
> > commit 6371b495991debfd1417b17c2bc4f7d7bae05739
> > Author: Ingo Molnar <mingo@elte.hu>
> > Date: Wed Jan 30 13:33:40 2008 +0100
> >
> > x86: change ioremap() to default to uncached
> >
> > As far as I can tell went blindly into the x86 tree without being
> > shared on any mailing list at all. How can something that completely
> > alters the semantics of ioremap on x86 platforms go in without any
> > review.
> it changed from "whatever coinflip you got" to "predictable outcome".
> What you got before was uncached (most of the time), or if the bios was
> creative, write combining. Or if the bios was broken in how it set up MTRR's, you could suddenly
> get "cached".
My major complaint is the lack of review and notice, not the actual
patch.
> When you're mapping device memory, uncached is the safe default.
Well, for certain device mailboxes, uncached does mean less performant.
The voyager breakage was exceptional ... I expect other problems just to
result in a loss of performance that caching gave by improving the
bursting. If we're lucky, the PCI bridge cache might hide a lot of
this.
> With the switch to PAT (and phasing out of MTRR), the kernel needs to pick one of the three
> (cached, writecombining, uncached) since you can no longer really depend on MTRRs saving
> your bacon there.
> Drivers in general, with VERY few exceptions, want uncached. Any other choice would have been deadly...
>
> I'd like to ask you which one you would pick... you maintain a whole bunch of drivers as scsi maintainer, what would
> you have picked?
> The answer "whatever the MTRR set up" no longer holds ;(
I wouldn't have picked ... I'd have asked for input.
James
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-27 22:58 ` David Miller
@ 2008-04-27 23:04 ` H. Peter Anvin
2008-04-30 20:35 ` Eric W. Biederman
0 siblings, 1 reply; 51+ messages in thread
From: H. Peter Anvin @ 2008-04-27 23:04 UTC (permalink / raw)
To: David Miller; +Cc: jeff, James.Bottomley, mingo, tglx, linux-kernel, torvalds
David Miller wrote:
> I think this is a reasonable course of action.
>
> That leaves one of Jeff's concerns, what to do with pci_iomap(). That
> was designed to give mappings with caching enabled, and as a result we
> probably should make it behave that way.
Yes, it should. Just be clear that *that* is a semantic change over
what the code currently does; it would appear that that is what the code
is *trying* to do.
I believe on x86 it will still get clamped by the MTRRs, at least
initially (I don't think we have flipped the default MTRR type to WB
yet) but that would immediately change the behaviour on non-x86
architectures.
-hpa
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Breakage caused by unreviewed patch in x86 tree
2008-04-27 23:00 ` David Miller
@ 2008-04-27 23:07 ` Arjan van de Ven
0 siblings, 0 replies; 51+ messages in thread
From: Arjan van de Ven @ 2008-04-27 23:07 UTC (permalink / raw)
To: David Miller; +Cc: James.Bottomley, mingo, tglx, linux-kernel
On Sun, 27 Apr 2008 16:00:41 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:
> From: Arjan van de Ven <arjan@infradead.org>
> Date: Sun, 27 Apr 2008 15:58:03 -0700
>
> > On Sun, 27 Apr 2008 16:51:25 -0400
> > James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> >
> > > This patch:
> > >
> > > commit 6371b495991debfd1417b17c2bc4f7d7bae05739
> > > Author: Ingo Molnar <mingo@elte.hu>
> > > Date: Wed Jan 30 13:33:40 2008 +0100
> > >
> > > x86: change ioremap() to default to uncached
> > >
> > > As far as I can tell went blindly into the x86 tree without being
> > > shared on any mailing list at all. How can something that
> > > completely alters the semantics of ioremap on x86 platforms go in
> > > without any review.
> >
> > it changed from "whatever coinflip you got" to "predictable
> > outcome".
>
> You're making technical responses to a question about process.
ok on a process side it is basically a "go from 99.9% to 100%" bugfix ;(
Anyone who claims that ioremap() on x86 was cached before doesn't understand the issue,
it was NOT cached. It was "almost always uncached". The bugfix here is to take away the
"almost" part.
Should that have been posted to lkml more vocally? Wouldn't have hurt. No argument about that.
Is it as big a deal as this thread makes it sound? Absolutely not! The change in behavior
or even semantics is almost zero.
--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Breakage caused by unreviewed patch in x86 tree
2008-04-27 23:03 ` James Bottomley
@ 2008-04-27 23:11 ` Arjan van de Ven
2008-04-27 23:17 ` H. Peter Anvin
1 sibling, 0 replies; 51+ messages in thread
From: Arjan van de Ven @ 2008-04-27 23:11 UTC (permalink / raw)
To: James Bottomley; +Cc: Ingo Molnar, Thomas Gleixner, linux-kernel
On Sun, 27 Apr 2008 19:03:59 -0400
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> On Sun, 2008-04-27 at 15:58 -0700, Arjan van de Ven wrote:
> > On Sun, 27 Apr 2008 16:51:25 -0400
> > James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> > When you're mapping device memory, uncached is the safe default.
>
> Well, for certain device mailboxes, uncached does mean less
> performant. The voyager breakage was exceptional ... I expect other
> problems just to result in a loss of performance that caching gave by
> improving the bursting. If we're lucky, the PCI bridge cache might
> hide a lot of this.
on a PC, NONE of the ioremap()s of PCI stuff were cached before, with the
exception of prefetchable ranges, that some bioses set up a write-combining MTRR for.
(and write-combining is effectively "uncached but with write buffering")
If you look at at what kind of devices have prefetchable ranges, you get two answers:
1) video cards
2) some IB adapters
So your argument that this might change PCI stuff is just false.
Your argument that this "99.9%->100% uncached" should have been announced/discussed
in public, sure. But to make this big a deal out of it?
--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-27 23:02 ` Jeff Garzik
@ 2008-04-27 23:14 ` Arjan van de Ven
0 siblings, 0 replies; 51+ messages in thread
From: Arjan van de Ven @ 2008-04-27 23:14 UTC (permalink / raw)
To: Jeff Garzik
Cc: H. Peter Anvin, David Miller, James.Bottomley, mingo, tglx,
linux-kernel, torvalds
On Sun, 27 Apr 2008 19:02:05 -0400
Jeff Garzik <jeff@garzik.org> wrote:
> H. Peter Anvin wrote:
> > Jeff Garzik wrote:
> >>
> >> Understood.
> >>
> >> I guess I am more annoyed that this stealth semantics change
> >> appears to have broken everything that depends on pci_iomap(),
> >> including 90%+ of all libata drivers, unless I am missing
> >> something.
> >>
> >> That one piece of code (pci_iomap) was correct under the old
> >> semantics, on x86 and elsewhere. It's tested and working nicely,
> >> and depended upon by many drivers.
> >>
> >
> > That one piece of code has had no effective change. Under both the
> > old and the new code, both branches functionally because
> > ioremap_nocache(), in one case because of MTRR and in one case
> > because of PAT.
>
> OK good... libata uses it for controller registers exclusively, so
> that should be fine from an operational standpoint.
>
> at the very least I
> could have pointed out that lib/iomap.c wanted an update, and the 2.5
> yo discussion could have resurfaced.
>
btw as a technical point: ioremap_cached() isn't guaranteed to give you cached memory.
In fact, on a PC, if you use ioremap_cached() on PCI bars, you will STILL get uncached
access, just because right now, it's not possible to give a cached mapping out!
We might want to try to change that in the future (by rewriting the MTRRs), but
that would need a LOT of auditing of ioremap (ab)users to make sure they actually
mean "cached" when they ask for cached (pci_iomap... I'm not convinced it's users
really can deal with cached) and it also will take a LOT of testing.
--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Breakage caused by unreviewed patch in x86 tree
2008-04-27 23:03 ` James Bottomley
2008-04-27 23:11 ` Arjan van de Ven
@ 2008-04-27 23:17 ` H. Peter Anvin
1 sibling, 0 replies; 51+ messages in thread
From: H. Peter Anvin @ 2008-04-27 23:17 UTC (permalink / raw)
To: James Bottomley
Cc: Arjan van de Ven, Ingo Molnar, Thomas Gleixner, linux-kernel
James Bottomley wrote:
> My major complaint is the lack of review and notice, not the actual
> patch.
[...]
> I wouldn't have picked ... I'd have asked for input.
I can't speak for anyone else, but I definitely have discussed this
issue with a large number of people. I can't tell you in which specific
fora, because it usually became a very brief conversation since there
really is only one plausible answer (*especially* in the light of the
precedents set by other architectures.)
Ingo's mention of it on LKML in this email:
http://lkml.org/lkml/2008/1/25/287
... is pretty typical of how the conversation usually went.
-hpa
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-27 22:05 ` James Bottomley
2008-04-27 22:36 ` Willy Tarreau
2008-04-27 22:41 ` Ingo Molnar
@ 2008-04-27 23:18 ` Ingo Molnar
2008-04-27 23:31 ` David Miller
2008-04-28 6:10 ` Christoph Hellwig
2 siblings, 2 replies; 51+ messages in thread
From: Ingo Molnar @ 2008-04-27 23:18 UTC (permalink / raw)
To: James Bottomley
Cc: Thomas Gleixner, linux-kernel, H. Peter Anvin, David S. Miller
* James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> > with the PAT changes, what used to be this default and careless
> > ioremap() behavior by x86 turned into a hard cache aliasing
> > conflict. So we pretty much _have to_ default to uncached - like all
> > other sane architectures already did forever. This is a direct
> > consequence of the PAT changes which were discussed on lkml.
> >
> > But with PAT we take over from the MTRRs on x86 and using a
> > cacheable ioremap() would give us aliasing conflicts and trouble all
> > around the place.
>
> I'm not saying the patch is wrong ... or that just because it broke
> voyager it shouldn't be done. [...]
ok, but that rather crutial piece of information was not at all clear
from your mail ... ;-)
> [...] What I'm saying is that it shouldn't have been put into the x86
> tree without mailing list review.
... but the PAT patchset was all around lkml. The ioremap change was
trivial, went into v2.6.25 and was discussed on lkml:
http://lkml.org/lkml/2008/1/25/287
" that's why we changed all ioremaps to default to cache-disabled (PCD)
in latest x86.git as well. For years the x86 architecture set the
ioremap pagetable entries to cacheable by default and only the MTRRs
(and the BIOS writers) saved us from trouble. Now we try to be a bit
more defensive and avoid "BIOS bug causes only Linux to crash and burn
while other OSs work fine" type of scenarios. "
i could have included the actual patch in that mail i guess - but this
was hardly an unreviewed change and you never showed interest in the PAT
threads on lkml (or much interest in x86 threads for that matter) so
would it have made any difference in practice?
The commit was mentioned in the pull request to Linus as well, which was
Cc:-ed to lkml. The commit was upstream about 3 months before you
noticed that it broke Voyager.
But, we'd not mind at all posting 1000 x86.git patches to lkml (or
another list) every 3 months (or more frequently), if people request
that.
Ingo
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-27 23:18 ` Ingo Molnar
@ 2008-04-27 23:31 ` David Miller
2008-04-28 0:31 ` Rik van Riel
2008-04-28 9:01 ` Alan Cox
2008-04-28 6:10 ` Christoph Hellwig
1 sibling, 2 replies; 51+ messages in thread
From: David Miller @ 2008-04-27 23:31 UTC (permalink / raw)
To: mingo; +Cc: James.Bottomley, tglx, linux-kernel, hpa
From: Ingo Molnar <mingo@elte.hu>
Date: Mon, 28 Apr 2008 01:18:09 +0200
> But, we'd not mind at all posting 1000 x86.git patches to lkml (or
> another list) every 3 months (or more frequently), if people request
> that.
Keep defending yourself Ingo.
You can post whatever patches you like a million times to lkml.
That's not the problem.
It's that the patches don't get reviewed, posting them more or to a
different place doesn't help that.
And you add patches before they are properly reviewed.
Look at all of the postings people have made this merge window about
x86 merge fallout(s). This is exactly the chief gripe of everyone,
that you merge patches with little or no review into your tree.
People paying attention can see what your goal is. You've expressed
in the past that you think we're not merging patches in fast enough,
which is quite a laugh, and now that you can do x86 maintainence
you're going to ramp things up and show everyone "how it's done."
And at what a great expense this experiment is being performed. Just
ask anyone who has tried to get any work done this past week or so.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-27 22:52 ` H. Peter Anvin
2008-04-27 22:58 ` David Miller
@ 2008-04-27 23:34 ` Jeff Garzik
2008-04-27 23:39 ` H. Peter Anvin
1 sibling, 1 reply; 51+ messages in thread
From: Jeff Garzik @ 2008-04-27 23:34 UTC (permalink / raw)
To: H. Peter Anvin
Cc: David Miller, James.Bottomley, mingo, tglx, linux-kernel,
torvalds
H. Peter Anvin wrote:
> Arguably, the right thing is to not even have ioremap() anymore, just
> ioremap_{cache,nocache,wc} and consider any unconverted ioremap() as a
> flag to audit that particular piece of code.
Seems like a sane plan... last handful might be tough, though.
Jeff
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-27 23:34 ` Jeff Garzik
@ 2008-04-27 23:39 ` H. Peter Anvin
0 siblings, 0 replies; 51+ messages in thread
From: H. Peter Anvin @ 2008-04-27 23:39 UTC (permalink / raw)
To: Jeff Garzik
Cc: David Miller, James.Bottomley, mingo, tglx, linux-kernel,
torvalds
Jeff Garzik wrote:
> H. Peter Anvin wrote:
>> Arguably, the right thing is to not even have ioremap() anymore, just
>> ioremap_{cache,nocache,wc} and consider any unconverted ioremap() as a
>> flag to audit that particular piece of code.
>
> Seems like a sane plan... last handful might be tough, though.
No doubt, although the vast majority of all call sites can probably be
trivially changed to ioremap_nocache().
-hpa
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-27 23:31 ` David Miller
@ 2008-04-28 0:31 ` Rik van Riel
2008-04-28 0:45 ` Al Viro
2008-04-28 9:01 ` Alan Cox
1 sibling, 1 reply; 51+ messages in thread
From: Rik van Riel @ 2008-04-28 0:31 UTC (permalink / raw)
To: David Miller; +Cc: mingo, James.Bottomley, tglx, linux-kernel, hpa
On Sun, 27 Apr 2008 16:31:06 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:
> You can post whatever patches you like a million times to lkml.
> That's not the problem.
>
> It's that the patches don't get reviewed, posting them more or to a
> different place doesn't help that.
If you really want to enforce this, I bet it could be automated
with scripts around git.
Simply refuse to apply a patch that does not have at least two
Signed-off-by/Reviewed-by/Acked-by lines and refuse to apply
a "git pull" if there is a changeset like that in the tree.
Of course, this could end up screwing rare architectures like
Voyager, so I'm not convinced it is a good idea...
--
All rights reversed.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-28 0:31 ` Rik van Riel
@ 2008-04-28 0:45 ` Al Viro
2008-04-28 0:52 ` H. Peter Anvin
0 siblings, 1 reply; 51+ messages in thread
From: Al Viro @ 2008-04-28 0:45 UTC (permalink / raw)
To: Rik van Riel
Cc: David Miller, mingo, James.Bottomley, tglx, linux-kernel, hpa
On Sun, Apr 27, 2008 at 08:31:27PM -0400, Rik van Riel wrote:
> On Sun, 27 Apr 2008 16:31:06 -0700 (PDT)
> David Miller <davem@davemloft.net> wrote:
>
> > You can post whatever patches you like a million times to lkml.
> > That's not the problem.
> >
> > It's that the patches don't get reviewed, posting them more or to a
> > different place doesn't help that.
>
> If you really want to enforce this, I bet it could be automated
> with scripts around git.
>
> Simply refuse to apply a patch that does not have at least two
> Signed-off-by/Reviewed-by/Acked-by lines and refuse to apply
> a "git pull" if there is a changeset like that in the tree.
Yeah, right.
commit 138fe4e069798d9aa948a5402ff15e58f483ee4e
Author: Konrad Rzeszutek <ketuzsezr@darnok.org>
Date: Wed Apr 9 19:50:41 2008 -0700
Firmware: add iSCSI iBFT Support
...
[akpm@linux-foundation.org: fix build]
Signed-off-by: Konrad Rzeszutek <konradr@linux.vnet.ibm.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Peter Jones <pjones@redhat.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
...
+static ssize_t sprintf_ipaddr(char *buf, u8 *ip)
+{
...
+ str += sprintf(str, NIP6_FMT, ntohs(ip[0]), ntohs(ip[1]),
+ ntohs(ip[2]), ntohs(ip[3]), ntohs(ip[4]),
+ ntohs(ip[5]), ntohs(ip[6]), ntohs(ip[7]));
are you going to tell me that this had been reviewed? Note these
ntohs() applied to 8bit values in there. Two signed-off-by, including
Greg "two s-o-b are enough to guarantee review"...
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-28 0:45 ` Al Viro
@ 2008-04-28 0:52 ` H. Peter Anvin
0 siblings, 0 replies; 51+ messages in thread
From: H. Peter Anvin @ 2008-04-28 0:52 UTC (permalink / raw)
To: Al Viro
Cc: Rik van Riel, David Miller, mingo, James.Bottomley, tglx,
linux-kernel
Al Viro wrote:
> are you going to tell me that this had been reviewed? Note these
> ntohs() applied to 8bit values in there. Two signed-off-by, including
> Greg "two s-o-b are enough to guarantee review"...
I thought it was well established that Signed-off-by was about copyright
and patch flow, *nothing else*.
-hpa
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-27 23:18 ` Ingo Molnar
2008-04-27 23:31 ` David Miller
@ 2008-04-28 6:10 ` Christoph Hellwig
2008-04-28 16:55 ` H. Peter Anvin
1 sibling, 1 reply; 51+ messages in thread
From: Christoph Hellwig @ 2008-04-28 6:10 UTC (permalink / raw)
To: Ingo Molnar
Cc: James Bottomley, Thomas Gleixner, linux-kernel, H. Peter Anvin,
David S. Miller
On Mon, Apr 28, 2008 at 01:18:09AM +0200, Ingo Molnar wrote:
> ... but the PAT patchset was all around lkml. The ioremap change was
> trivial, went into v2.6.25 and was discussed on lkml:
>
> http://lkml.org/lkml/2008/1/25/287
lkml is a hell-hole with a signal/noise ratio worse than slashdot. Any
chance you could create a linux-x86 list for x86 specific changes so
that a focussed review and discussion can happen?
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-27 23:31 ` David Miller
2008-04-28 0:31 ` Rik van Riel
@ 2008-04-28 9:01 ` Alan Cox
2008-04-28 9:17 ` David Miller
1 sibling, 1 reply; 51+ messages in thread
From: Alan Cox @ 2008-04-28 9:01 UTC (permalink / raw)
To: David Miller; +Cc: mingo, James.Bottomley, tglx, linux-kernel, hpa
> You can post whatever patches you like a million times to lkml.
> That's not the problem.
>
> It's that the patches don't get reviewed, posting them more or to a
> different place doesn't help that.
So review them. Your comments strike me as the pot calling the kettle
black given the way the network people like to live on their own mailing
list.
Sorting x86 arch code is inevitably going to break a few eggs, but I
suspect the time cost has been more in Dave v Ingo (12 rounds, two falls,
two submissions or a knockout) than actually sorting out the fallout of a
couple of problem cases.
Alan
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-28 9:01 ` Alan Cox
@ 2008-04-28 9:17 ` David Miller
2008-04-28 9:48 ` Adrian Bunk
2008-04-28 11:50 ` Ingo Molnar
0 siblings, 2 replies; 51+ messages in thread
From: David Miller @ 2008-04-28 9:17 UTC (permalink / raw)
To: alan; +Cc: mingo, James.Bottomley, tglx, linux-kernel, hpa
From: Alan Cox <alan@lxorguk.ukuu.org.uk>
Date: Mon, 28 Apr 2008 10:01:54 +0100
> > You can post whatever patches you like a million times to lkml.
> > That's not the problem.
> >
> > It's that the patches don't get reviewed, posting them more or to a
> > different place doesn't help that.
>
> So review them. Your comments strike me as the pot calling the kettle
> black given the way the network people like to live on their own mailing
> list.
Oh contraire. Because we networking folks use a seperate mailing list
with a lower signal to noise ratio than lkml, and as a result more
specialization, more patches get more review by more specialists.
It's the point I'm trying to make every time the "post everything to
lkml" argument gets fronted.
Telling me to review all of this crud just ignores the problem. And
the implication is that it's OK for unreviewed patches to go into the
tree, and it's most certainly not.
You might want to know that linux-next mainly exists because of how
much of this has been going on over the past half year or so.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-28 9:17 ` David Miller
@ 2008-04-28 9:48 ` Adrian Bunk
2008-04-28 11:50 ` Ingo Molnar
1 sibling, 0 replies; 51+ messages in thread
From: Adrian Bunk @ 2008-04-28 9:48 UTC (permalink / raw)
To: David Miller; +Cc: alan, mingo, James.Bottomley, tglx, linux-kernel, hpa
On Mon, Apr 28, 2008 at 02:17:45AM -0700, David Miller wrote:
> From: Alan Cox <alan@lxorguk.ukuu.org.uk>
> Date: Mon, 28 Apr 2008 10:01:54 +0100
>
> > > You can post whatever patches you like a million times to lkml.
> > > That's not the problem.
> > >
> > > It's that the patches don't get reviewed, posting them more or to a
> > > different place doesn't help that.
> >
> > So review them. Your comments strike me as the pot calling the kettle
> > black given the way the network people like to live on their own mailing
> > list.
>
> Oh contraire. Because we networking folks use a seperate mailing list
> with a lower signal to noise ratio than lkml, and as a result more
> specialization, more patches get more review by more specialists.
>
> It's the point I'm trying to make every time the "post everything to
> lkml" argument gets fronted.
>...
I'd say the best solution is to do both.
Many people active only in some area want to read only stuff affecting
their area and not linux-kernel that averaged at 525 emails per day in
February.
But when I have time I skim over patches on linux-kernel, and when I see
in the diffstat that a Makefile or Kconfig file gets touched I check
whether I can spot any bugs there (and other people sometimes also spot
bugs in patches for areas they aren't active in).
And starting regression tracking was only possible since most bug
reports went to linux-kernel, not only to a gazillion different lists.
That's why it might make sense to have everything on both specialized
lists and linux-kernel.
If it is wanted to reduce the volume on linux-kernel, offloading the
sending of patches to some new kernel-patches mailing list might be
an option. OTOH, patches and the discussion of patches seems to be
the signal on linux-kernel, not the noise..
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-28 9:17 ` David Miller
2008-04-28 9:48 ` Adrian Bunk
@ 2008-04-28 11:50 ` Ingo Molnar
1 sibling, 0 replies; 51+ messages in thread
From: Ingo Molnar @ 2008-04-28 11:50 UTC (permalink / raw)
To: David Miller; +Cc: alan, James.Bottomley, tglx, linux-kernel, hpa
* David Miller <davem@davemloft.net> wrote:
> > So review them. Your comments strike me as the pot calling the
> > kettle black given the way the network people like to live on their
> > own mailing list.
>
> Oh contraire. Because we networking folks use a seperate mailing list
> with a lower signal to noise ratio than lkml, and as a result more
> specialization, more patches get more review by more specialists.
well, then lets go back to the very basis of this whole ... box match.
:) It was about a broken networking patch that i stumbled upon
(unwillingly, via testing), which commit was _not_ posted on netdev and
_not_ posted on lkml:
http://lkml.org/lkml/2008/4/19/51
I simply hit a (trivial) regression in the networking code, but i simply
found no existing discussion of the suspect patch (commit 5e8fbe2a).
the development process is an integral part of the source code of this
OS and not a private matter of developers or maintainers. It is not a
religion and it is not taboo to criticise it, it is a crutial part of
our technology. So i will continue to criticise the development process
in the future too when i think it has aspects that hurts us. [and will
try to address all incoming criticism as well.]
> You might want to know that linux-next mainly exists because of how
> much of this has been going on over the past half year or so.
the problem is that linux-next alone would not have helped much in this
specific matter. For example the softlockup warnings annoyance you
reported would have triggered immediately had you booted your Sparc64
box with linux-next or -mm even just _once_ :-)
But the same holds for me: had i ran linux-next i could have reported
some of the networking regressions sooner.
So how about making mutual use of linux-next and 'promise' to each other
to at least minimally build/boot the integrated tree, or at least
promise to not complain too loudly about bugs that could have been found
and reported there via reasonable mutual testing of linux-next? Does
that sound reasonable?
Ingo
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-27 22:39 ` Jeff Garzik
` (2 preceding siblings ...)
2008-04-27 23:01 ` Thomas Gleixner
@ 2008-04-28 14:10 ` Arjan van de Ven
2008-04-28 14:29 ` James Bottomley
3 siblings, 1 reply; 51+ messages in thread
From: Arjan van de Ven @ 2008-04-28 14:10 UTC (permalink / raw)
To: Jeff Garzik
Cc: James Bottomley, Ingo Molnar, Thomas Gleixner, linux-kernel,
H. Peter Anvin, David S. Miller, Linus Torvalds
On Sun, 27 Apr 2008 18:39:24 -0400
Jeff Garzik <jeff@garzik.org> wrote:
> James Bottomley wrote:
> > Here's another piece of the x86 API that's designed to be cached.
> > The dma_declare_coherent_memory() usually represents behind bridge
> > memory that's fully participatory in the coherence model.
> >
> > Making it uncached damages the utility of this memory because doing
> > cacheline sized burst cycles when needed to it is far faster than
> > individual byte/word/quad writes.
> >
> > Signed-off-by: James Bottomley
> > <James.Bottomley@HansenPartnership.com>
> >
> > ---
> >
> > diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
> > index 388b113..df83ffd 100644
> > --- a/arch/x86/kernel/pci-dma.c
> > +++ b/arch/x86/kernel/pci-dma.c
> > @@ -214,7 +214,7 @@ int dma_declare_coherent_memory(struct device
> > *dev, dma_addr_t bus_addr,
> > /* FIXME: this routine just ignores
> > DMA_MEMORY_INCLUDES_CHILDREN */
> > - mem_base = ioremap(bus_addr, size);
> > + mem_base = ioremap_cache(bus_addr, size);
> > if (!mem_base)
> > goto out;
this patch patch is likely broken on x86; or rather, anyone who uses it is...
thinking you can find cache coherent memory on a PCI or similar bus that is actually
cachable... keep dreaming. (for now; there's talk about extending PCI)
> I would rather change drivers to use ioremap_nocache(), and leave the
> API as-is.
>
> Isn't there Yet More Breakage in lib/iomap.c, given these new
> semantics?
>
> if (flags & IORESOURCE_MEM) {
> if (flags & IORESOURCE_CACHEABLE)
> return ioremap(start, len);
> return ioremap_nocache(start, len);
> }
>
> Any driver using pci_iomap() (libata, and others) is affected.
only if you map ROM's. Anything but ROMs you cannot set IORESOURCE_CACHEABLE
on... since PCI MMIO memory isn't cache coherent per se. (it's cache coherent on
x86 by virtue of being uncachable ;-)
Thankfully Linux doesn't do that.
> I disagree with this semantics change. A number of code places _and
> drivers_ GET IT RIGHT, and these are all broken now?
Can you list one that gets it actually right ?
(cachable pretty much means: "the cpu is the only one changing it AND there is no side effect of
reading or writing")
--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-28 14:10 ` Arjan van de Ven
@ 2008-04-28 14:29 ` James Bottomley
2008-04-28 15:07 ` Arjan van de Ven
0 siblings, 1 reply; 51+ messages in thread
From: James Bottomley @ 2008-04-28 14:29 UTC (permalink / raw)
To: Arjan van de Ven
Cc: Jeff Garzik, Ingo Molnar, Thomas Gleixner, linux-kernel,
H. Peter Anvin, David S. Miller, Linus Torvalds
On Mon, 2008-04-28 at 07:10 -0700, Arjan van de Ven wrote:
> On Sun, 27 Apr 2008 18:39:24 -0400
> Jeff Garzik <jeff@garzik.org> wrote:
>
> > James Bottomley wrote:
> > > Here's another piece of the x86 API that's designed to be cached.
> > > The dma_declare_coherent_memory() usually represents behind bridge
> > > memory that's fully participatory in the coherence model.
> > >
> > > Making it uncached damages the utility of this memory because doing
> > > cacheline sized burst cycles when needed to it is far faster than
> > > individual byte/word/quad writes.
> > >
> > > Signed-off-by: James Bottomley
> > > <James.Bottomley@HansenPartnership.com>
> > >
> > > ---
> > >
> > > diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
> > > index 388b113..df83ffd 100644
> > > --- a/arch/x86/kernel/pci-dma.c
> > > +++ b/arch/x86/kernel/pci-dma.c
> > > @@ -214,7 +214,7 @@ int dma_declare_coherent_memory(struct device
> > > *dev, dma_addr_t bus_addr,
> > > /* FIXME: this routine just ignores
> > > DMA_MEMORY_INCLUDES_CHILDREN */
> > > - mem_base = ioremap(bus_addr, size);
> > > + mem_base = ioremap_cache(bus_addr, size);
> > > if (!mem_base)
> > > goto out;
>
> this patch patch is likely broken on x86; or rather, anyone who uses it is...
> thinking you can find cache coherent memory on a PCI or similar bus that is actually
> cachable... keep dreaming. (for now; there's talk about extending PCI)
No ... it works for me, and caching is a performance advantage for me
too. The only current consumer of this API is the NCR_Q720 SCSI card
which keeps a bunch of cacheable memory remote across the MCA bus.
If you think about it logically, most busses are second citizens in the
caching hierarchy: they really only get to force a flush and invalidate
of the CPU cache line rather than being fully participatory in the
coherence protocol. However, even being second class is enough of a
speed up on slow busses because it allows bursting of the cache line for
the bus transfers.
The other consumers are SoC embedded ... so yes, perhaps I should ask
about this on linux-arch.
James
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-28 14:29 ` James Bottomley
@ 2008-04-28 15:07 ` Arjan van de Ven
2008-04-28 19:59 ` H. Peter Anvin
0 siblings, 1 reply; 51+ messages in thread
From: Arjan van de Ven @ 2008-04-28 15:07 UTC (permalink / raw)
To: James Bottomley
Cc: Jeff Garzik, Ingo Molnar, Thomas Gleixner, linux-kernel,
H. Peter Anvin, David S. Miller, Linus Torvalds
On Mon, 28 Apr 2008 10:29:08 -0400
James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> On Mon, 2008-04-28 at 07:10 -0700, Arjan van de Ven wrote:
> > On Sun, 27 Apr 2008 18:39:24 -0400
> > Jeff Garzik <jeff@garzik.org> wrote:
> >
> > > James Bottomley wrote:
> > > > Here's another piece of the x86 API that's designed to be
> > > > cached. The dma_declare_coherent_memory() usually represents
> > > > behind bridge memory that's fully participatory in the
> > > > coherence model.
> > > >
> > > > Making it uncached damages the utility of this memory because
> > > > doing cacheline sized burst cycles when needed to it is far
> > > > faster than individual byte/word/quad writes.
> > > >
> > > > Signed-off-by: James Bottomley
> > > > <James.Bottomley@HansenPartnership.com>
> > > >
> > > > ---
> > > >
> > > > diff --git a/arch/x86/kernel/pci-dma.c
> > > > b/arch/x86/kernel/pci-dma.c index 388b113..df83ffd 100644
> > > > --- a/arch/x86/kernel/pci-dma.c
> > > > +++ b/arch/x86/kernel/pci-dma.c
> > > > @@ -214,7 +214,7 @@ int dma_declare_coherent_memory(struct
> > > > device *dev, dma_addr_t bus_addr,
> > > > /* FIXME: this routine just ignores
> > > > DMA_MEMORY_INCLUDES_CHILDREN */
> > > > - mem_base = ioremap(bus_addr, size);
> > > > + mem_base = ioremap_cache(bus_addr, size);
> > > > if (!mem_base)
> > > > goto out;
> >
> > this patch patch is likely broken on x86; or rather, anyone who
> > uses it is... thinking you can find cache coherent memory on a PCI
> > or similar bus that is actually cachable... keep dreaming. (for
> > now; there's talk about extending PCI)
>
> No ... it works for me, and caching is a performance advantage for me
> too. The only current consumer of this API is the NCR_Q720 SCSI card
> which keeps a bunch of cacheable memory remote across the MCA bus.
>
> If you think about it logically, most busses are second citizens in
> the caching hierarchy: they really only get to force a flush and
> invalidate of the CPU cache line rather than being fully
> participatory in the coherence protocol.
Cached means that the cpu, at any time, can do a speculative read to the memory.
It also means that the cpu can then write the speculated cacheline back at any time later,
if some speculation was going to write to the cacheline but didn't actually happen.
(before you think this is bogus, at least AMD cpus do this and I can't vouch for Intel
cpus never doing this).
If the on-the-bus hardware *ever* writes to the memory without being part of the full
cache coherence protocol it's in trouble. Big time.
Even if it sends an invalidate first (which PCI and others just don't allow, not sure about MCA though),
it's not enough because the cpu can just read it right back... one needs a "take for ownership" not
an "invalidate" for this to work, and that means being part of the full protocol.
> However, even being second
> class is enough of a speed up on slow busses because it allows
> bursting of the cache line for the bus transfers.
that's write combining not cachability I suspect... at least for writes it is.
>
> The other consumers are SoC embedded ... so yes, perhaps I should ask
> about this on linux-arch.
x86 SoC ? (we're talking about an arch/x86 file here)
Esp in SoC people don't bother doing cache coherency if they can get away
with it.
--
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-28 6:10 ` Christoph Hellwig
@ 2008-04-28 16:55 ` H. Peter Anvin
0 siblings, 0 replies; 51+ messages in thread
From: H. Peter Anvin @ 2008-04-28 16:55 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Ingo Molnar, James Bottomley, Thomas Gleixner, linux-kernel,
David S. Miller
Christoph Hellwig wrote:
> On Mon, Apr 28, 2008 at 01:18:09AM +0200, Ingo Molnar wrote:
>> ... but the PAT patchset was all around lkml. The ioremap change was
>> trivial, went into v2.6.25 and was discussed on lkml:
>>
>> http://lkml.org/lkml/2008/1/25/287
>
> lkml is a hell-hole with a signal/noise ratio worse than slashdot. Any
> chance you could create a linux-x86 list for x86 specific changes so
> that a focussed review and discussion can happen?
We have been talking about it between us. Originally, Ingo was opposed
to it exactly *because* he wanted to keep the development as open as
possible. After the merge window craziness is over, we may want to
revisit this.
I'm not saying we're doing everything right at the moment; heck, we're
still figuring out how to work this on three people. However, I want to
set the record straight - Ingo *especially* has been concerned to keep
development open from the very beginning.
-hpa
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-28 15:07 ` Arjan van de Ven
@ 2008-04-28 19:59 ` H. Peter Anvin
0 siblings, 0 replies; 51+ messages in thread
From: H. Peter Anvin @ 2008-04-28 19:59 UTC (permalink / raw)
To: Arjan van de Ven
Cc: James Bottomley, Jeff Garzik, Ingo Molnar, Thomas Gleixner,
linux-kernel, David S. Miller, Linus Torvalds
Arjan van de Ven wrote:
>
> Cached means that the cpu, at any time, can do a speculative read to the memory.
> It also means that the cpu can then write the speculated cacheline back at any time later,
> if some speculation was going to write to the cacheline but didn't actually happen.
> (before you think this is bogus, at least AMD cpus do this and I can't vouch for Intel
> cpus never doing this).
> If the on-the-bus hardware *ever* writes to the memory without being part of the full
> cache coherence protocol it's in trouble. Big time.
> Even if it sends an invalidate first (which PCI and others just don't allow, not sure about MCA though),
> it's not enough because the cpu can just read it right back... one needs a "take for ownership" not
> an "invalidate" for this to work, and that means being part of the full protocol.
>
Although realistically speaking, the OS *can* generally know that the
only transactor is the CPU -- in fact, that will be the normal condition.
-hpa
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-27 23:04 ` H. Peter Anvin
@ 2008-04-30 20:35 ` Eric W. Biederman
0 siblings, 0 replies; 51+ messages in thread
From: Eric W. Biederman @ 2008-04-30 20:35 UTC (permalink / raw)
To: H. Peter Anvin
Cc: David Miller, jeff, James.Bottomley, mingo, tglx, linux-kernel,
torvalds
"H. Peter Anvin" <hpa@zytor.com> writes:
> David Miller wrote:
>> I think this is a reasonable course of action.
>>
>> That leaves one of Jeff's concerns, what to do with pci_iomap(). That
>> was designed to give mappings with caching enabled, and as a result we
>> probably should make it behave that way.
>
> Yes, it should. Just be clear that *that* is a semantic change over what the
> code currently does; it would appear that that is what the code is *trying* to
> do.
>
> I believe on x86 it will still get clamped by the MTRRs, at least initially (I
> don't think we have flipped the default MTRR type to WB yet) but that would
> immediately change the behaviour on non-x86 architectures.
Flipping the default MTRR type to WB has the potential to break an SMM
monitor which does I/O and isn't using our page tables. So we need to
be very careful about that direction. Finally I remember what
the specific and really scary failure mode with SMM and mucking with
the MTRRs is.
There don't appear to be many bars that set IORESOURCE_CACHABLE for use
with pci_iomap so it doesn't appear to be much of an issue.
In addition it looks like we should map IORESOURCE_PREFETCH to ioremap_wc
in pci_iomap, for those places that can take advantage of it. It
might not always work (hardware is like that) but semantically pci
prefetch is the same as the WC attribute (reads and writes have no
side effects). I believe on x86 if we want to get cacheline sized
reads (read prefetch) there is special instruction in the latest
processors we need to use on a wc area.
Eric
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-27 23:01 ` Arjan van de Ven
@ 2008-04-30 21:44 ` James Bottomley
2008-04-30 22:39 ` H. Peter Anvin
0 siblings, 1 reply; 51+ messages in thread
From: James Bottomley @ 2008-04-30 21:44 UTC (permalink / raw)
To: Arjan van de Ven
Cc: David Miller, jeff, mingo, tglx, linux-kernel, hpa, torvalds
On Sun, 2008-04-27 at 16:01 -0700, Arjan van de Ven wrote:
> On Sun, 27 Apr 2008 15:46:20 -0700 (PDT)
> David Miller <davem@davemloft.net> wrote:
>
> > From: Jeff Garzik <jeff@garzik.org>
> > Date: Sun, 27 Apr 2008 18:39:24 -0400
> >
> > > I disagree with this semantics change. A number of code places
> > > _and drivers_ GET IT RIGHT, and these are all broken now?
> >
> > [ Note, James's patch that you quoted is about mapping DMA
> > memory, in dma_declare_coherent_memory(), rather than devices.
> > But I know what you are trying to talk about Jeff. :-) ]
> >
> > Wrt. ioremap() semanics, it is important to realize that if
> > the implementation of this on x86 has been giving non-cached
> > I/O mappings out up until recently, you can expect that there
> > are hundreds of drivers that might now be broken.
>
> it's even worse than that.
> 99% of the time it gave out uncached memory, but if the bios was iffy,
> it would suddenly give out something else.
>
> The changes to ioremap() recently turned that into "100% uncached".
> For me, that's the right (because safe) behavior, exactly for the
> reason you mentioned: it's what happened in practice, and driver writers
> would assume that to happen.
>
> Anything except uncached would just be insanity as default.
> (well the old "whatever mood the bios writer was in, usually uncached" is
> effectively uncached except for weird behaving boxes)
OK, but look; this is why you broke me.
The former meaning of ioremap() was give me whatever caching the mtrrs
set up (assuming you actually have mtrrs. For voyager, we don't so it
always meant give me cached memory).
The change broke the QIC ioremap area because it absolutely *requires*
cached memory. It drops performance on the Q720 SCSI cards because
caching was used essentially like write combining to reduce the
unbursted traffic across the MCA bus.
When the Voyagers were designed, the only way to get them better
performace with the slow bus technology was to do a massive amount of
caching, so they're optimised to the point where every element on the
voyager bus (sort of like the intel frontside bus) is a primary
participant in the caching algorithm, and this includes the MCA
controllers. So for me, I want every invocation of ioremap to mean
ioremap_cached() otherwise I don't benefit from the added caches.
I can fully understand that bus technology has got worse in terms of
caching since the voyager heydays. However, I'd rather not be penalised
for everyone else's hardware failings. Since the old ioremap is only
used in the IORESOURCE_CACHABLE case for PCI, and since that used to
defer to the mtrr settings anyway, it probably does make sense to keep
it as ioremap_cache() as long as that still defers to the mtrr settings.
I suppose I could think about an ioremap platform override for voyager
since, by and large, ioremap_nocache() is the wrong thing on the
platform.
James
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [patch] x86, voyager: fix ioremap_nocache()
2008-04-30 21:44 ` James Bottomley
@ 2008-04-30 22:39 ` H. Peter Anvin
0 siblings, 0 replies; 51+ messages in thread
From: H. Peter Anvin @ 2008-04-30 22:39 UTC (permalink / raw)
To: James Bottomley
Cc: Arjan van de Ven, David Miller, jeff, mingo, tglx, linux-kernel,
torvalds
James Bottomley wrote:
>
> OK, but look; this is why you broke me.
>
> The former meaning of ioremap() was give me whatever caching the mtrrs
> set up (assuming you actually have mtrrs. For voyager, we don't so it
> always meant give me cached memory).
>
Pre-686 generations of x86 had chipset logic (KEN logic) that did the
equivalent. The only change in the 686 was moving this logic into the
CPU itself. So no, it didn't always mean "give me cached memory".
> The change broke the QIC ioremap area because it absolutely *requires*
> cached memory. It drops performance on the Q720 SCSI cards because
> caching was used essentially like write combining to reduce the
> unbursted traffic across the MCA bus.
>
> When the Voyagers were designed, the only way to get them better
> performace with the slow bus technology was to do a massive amount of
> caching, so they're optimised to the point where every element on the
> voyager bus (sort of like the intel frontside bus) is a primary
> participant in the caching algorithm, and this includes the MCA
> controllers. So for me, I want every invocation of ioremap to mean
> ioremap_cached() otherwise I don't benefit from the added caches.
>
> I can fully understand that bus technology has got worse in terms of
> caching since the voyager heydays. However, I'd rather not be penalised
> for everyone else's hardware failings. Since the old ioremap is only
> used in the IORESOURCE_CACHABLE case for PCI, and since that used to
> defer to the mtrr settings anyway, it probably does make sense to keep
> it as ioremap_cache() as long as that still defers to the mtrr settings.
>
> I suppose I could think about an ioremap platform override for voyager
> since, by and large, ioremap_nocache() is the wrong thing on the
> platform.
Sounds like it, or you should just do a global replace of ioremap() with
ioremap_cache() for the Voyager drivers.
-hpa
^ permalink raw reply [flat|nested] 51+ messages in thread
end of thread, other threads:[~2008-04-30 22:43 UTC | newest]
Thread overview: 51+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-27 20:51 Breakage caused by unreviewed patch in x86 tree James Bottomley
2008-04-27 20:53 ` David Miller
2008-04-27 21:48 ` [patch] x86, voyager: fix ioremap_nocache() Ingo Molnar
2008-04-27 22:05 ` James Bottomley
2008-04-27 22:36 ` Willy Tarreau
2008-04-27 22:41 ` Ingo Molnar
2008-04-27 23:18 ` Ingo Molnar
2008-04-27 23:31 ` David Miller
2008-04-28 0:31 ` Rik van Riel
2008-04-28 0:45 ` Al Viro
2008-04-28 0:52 ` H. Peter Anvin
2008-04-28 9:01 ` Alan Cox
2008-04-28 9:17 ` David Miller
2008-04-28 9:48 ` Adrian Bunk
2008-04-28 11:50 ` Ingo Molnar
2008-04-28 6:10 ` Christoph Hellwig
2008-04-28 16:55 ` H. Peter Anvin
2008-04-27 22:34 ` James Bottomley
2008-04-27 22:39 ` Jeff Garzik
2008-04-27 22:44 ` H. Peter Anvin
2008-04-27 22:46 ` David Miller
2008-04-27 22:52 ` H. Peter Anvin
2008-04-27 22:58 ` David Miller
2008-04-27 23:04 ` H. Peter Anvin
2008-04-30 20:35 ` Eric W. Biederman
2008-04-27 23:34 ` Jeff Garzik
2008-04-27 23:39 ` H. Peter Anvin
2008-04-27 22:53 ` Jeff Garzik
2008-04-27 22:56 ` H. Peter Anvin
2008-04-27 22:59 ` David Miller
2008-04-27 23:02 ` Jeff Garzik
2008-04-27 23:14 ` Arjan van de Ven
2008-04-27 23:01 ` Arjan van de Ven
2008-04-30 21:44 ` James Bottomley
2008-04-30 22:39 ` H. Peter Anvin
2008-04-27 23:01 ` Thomas Gleixner
2008-04-28 14:10 ` Arjan van de Ven
2008-04-28 14:29 ` James Bottomley
2008-04-28 15:07 ` Arjan van de Ven
2008-04-28 19:59 ` H. Peter Anvin
2008-04-27 22:00 ` Breakage caused by unreviewed patch in x86 tree H. Peter Anvin
2008-04-27 22:10 ` James Bottomley
2008-04-27 22:13 ` H. Peter Anvin
2008-04-27 22:18 ` James Bottomley
2008-04-27 22:31 ` H. Peter Anvin
2008-04-27 22:58 ` Arjan van de Ven
2008-04-27 23:00 ` David Miller
2008-04-27 23:07 ` Arjan van de Ven
2008-04-27 23:03 ` James Bottomley
2008-04-27 23:11 ` Arjan van de Ven
2008-04-27 23:17 ` H. Peter Anvin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).