* [PATCH] Fix early_ioremap on x86-64
@ 2008-01-20 17:28 Andi Kleen
2008-01-20 17:59 ` Ingo Molnar
2008-01-20 18:04 ` [PATCH] Fix early_ioremap on x86-64 II Andi Kleen
0 siblings, 2 replies; 6+ messages in thread
From: Andi Kleen @ 2008-01-20 17:28 UTC (permalink / raw)
To: mingo, Jeremy Fitzhardinge, tglx, linux-kernel,
venkatesh.pallipadi
Fix early_ioremap on x86-64
[Venki, sorry for blaming PAT for this earlier. It was innocent.]
[Note the patch is on top of git-x86 + gbpages applied. I think
there will be an trivial reject without gbpages-direct applied first]
I had ACPI failures on several machines since a few days. Symptom
was NUMA nodes not getting detected or worse cores not getting detected.
They all came from ACPI not being able to read various of its tables. I finally
bisected it down to Jeremy's "put _PAGE_GLOBAL into PAGE_KERNEL" change.
With that the fix was fairly obvious. The problem was that early_ioremap()
didn't use a "_all" flush that would affect the global PTEs too. So
with global bits getting used everywhere now an early_ioremap would
not actually flush a mapping if something else was mapped previously
on that slot (which can happen with early_iounmap inbetween)
This patch changes all flushes in init_64.c to be __flush_tlb_all()
and fixes the problem here.
Signed-off-by: Andi Kleen <ak@suse.de>
Index: linux/arch/x86/mm/init_64.c
===================================================================
--- linux.orig/arch/x86/mm/init_64.c
+++ linux/arch/x86/mm/init_64.c
@@ -242,7 +242,7 @@ void *early_ioremap(unsigned long addr,
addr &= PMD_MASK;
for (i = 0; i < pmds; i++, addr += PMD_SIZE)
set_pmd(pmd+i, __pmd(addr | __PAGE_KERNEL_LARGE_EXEC));
- __flush_tlb();
+ __flush_tlb_all();
printk(KERN_CONT "%016ld\n", vaddr);
return (void *)vaddr;
next:
@@ -265,7 +265,7 @@ void early_iounmap(void *addr, unsigned
pmd = level2_kernel_pgt + pmd_index(vaddr);
for (i = 0; i < pmds; i++)
pmd_clear(pmd + i);
- __flush_tlb();
+ __flush_tlb_all();
}
static unsigned long direct_entry(unsigned long paddr)
@@ -339,7 +339,7 @@ static void __meminit phys_pud_init(pud_
spin_unlock(&init_mm.page_table_lock);
unmap_low_page(pmd);
}
- __flush_tlb();
+ __flush_tlb_all();
}
static void __init find_early_table_space(unsigned long end)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Fix early_ioremap on x86-64
2008-01-20 17:28 [PATCH] Fix early_ioremap on x86-64 Andi Kleen
@ 2008-01-20 17:59 ` Ingo Molnar
2008-01-20 19:12 ` Andi Kleen
2008-01-20 18:04 ` [PATCH] Fix early_ioremap on x86-64 II Andi Kleen
1 sibling, 1 reply; 6+ messages in thread
From: Ingo Molnar @ 2008-01-20 17:59 UTC (permalink / raw)
To: Andi Kleen; +Cc: Jeremy Fitzhardinge, tglx, linux-kernel, venkatesh.pallipadi
* Andi Kleen <andi@firstfloor.org> wrote:
> Fix early_ioremap on x86-64
>
> [Venki, sorry for blaming PAT for this earlier. It was innocent.]
> [Note the patch is on top of git-x86 + gbpages applied. I think there
> will be an trivial reject without gbpages-direct applied first]
>
> I had ACPI failures on several machines since a few days. Symptom was
> NUMA nodes not getting detected or worse cores not getting detected.
> They all came from ACPI not being able to read various of its tables.
> I finally bisected it down to Jeremy's "put _PAGE_GLOBAL into
> PAGE_KERNEL" change. With that the fix was fairly obvious. The problem
> was that early_ioremap() didn't use a "_all" flush that would affect
> the global PTEs too. So with global bits getting used everywhere now
> an early_ioremap would not actually flush a mapping if something else
> was mapped previously on that slot (which can happen with
> early_iounmap inbetween)
>
> This patch changes all flushes in init_64.c to be __flush_tlb_all()
> and fixes the problem here.
ah, very nice! This is the bad commit:
------------------------->
Subject: x86: fold _PAGE_GLOBAL into __PAGE_KERNEL
From: Jeremy Fitzhardinge <jeremy@goop.org>
With the iounmap problem resolved, it should be OK to always set
_PAGE_GLOBAL in __PAGE_KERNEL*.
[ Did this patch cause problems before? ]
<------------------------
Jeremy did suspect something about this change, as indicated in the
changelog. But because the change was so finegrained, the bisection
almost directly led to the fix. _That_ i think clearly demonstrates the
power of bisection and finegrained changes.
but note what the fundamental problem is - we've turned a previously
safe flushing API into an unsafe one - __flush_tlb() will only be safe
in the rarest of circumstances. There are some other matches:
./mm/init_64.c: __flush_tlb();
./kernel/head64.c: __flush_tlb();
The boot identity mappings zapped do not have PGE set at the moment, but
they could in the future (once we do native pagetable setup straight
from flat mode) - and this is not a performance critical path anyway.
./kernel/cpu/mtrr/generic.c: __flush_tlb();
./kernel/cpu/mtrr/generic.c: __flush_tlb();
these include an open-coded version of __flush_tlb_all() so they are
safe.
and we might as well make the non-PGE flush the 'special API'. I.e.
rename __flush_tlb() to __flush_tlb_partial() and rename
__flush_tlb_all() to __flush_tlb(). This makes it very apparent which
should be used by default and which does what.
Ingo
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Fix early_ioremap on x86-64 II
2008-01-20 17:28 [PATCH] Fix early_ioremap on x86-64 Andi Kleen
2008-01-20 17:59 ` Ingo Molnar
@ 2008-01-20 18:04 ` Andi Kleen
2008-01-21 19:00 ` Jeremy Fitzhardinge
1 sibling, 1 reply; 6+ messages in thread
From: Andi Kleen @ 2008-01-20 18:04 UTC (permalink / raw)
To: Andi Kleen
Cc: mingo, Jeremy Fitzhardinge, tglx, linux-kernel,
venkatesh.pallipadi
> This patch changes all flushes in init_64.c to be __flush_tlb_all()
> and fixes the problem here.
Hmm, I see new early crashes with that patch here in some cases unfortunately:
PANIC: early exception 06 rip 10:ffffffff8082df2d error 0 cr2 0
Pid: 0, comm: swapper Not tainted 2.6.24-rc8-g7ada4254 #39
Call Trace:
[<ffffffff8082df2d>] free_bootmem_core+0x1a/0x6b
[<ffffffff8082f143>] free_bootmem_with_active_regions+0x51/0x67
[<ffffffff8082a0f3>] setup_node_bootmem+0x16b/0x1b3
[<ffffffff8082b3a4>] acpi_scan_nodes+0x136/0x230
[<ffffffff8082a6bb>] numa_initmem_init+0x331/0x459
[<ffffffff80234ef6>] release_console_sem+0x1b4/0x1cd
[<ffffffff80234ef6>] release_console_sem+0x1b4/0x1cd
[<ffffffff80355726>] number+0x10e/0x1f9
[<ffffffff802354d8>] printk+0x4e/0x56
[<ffffffff80355726>] number+0x10e/0x1f9
[<ffffffff803563f8>] vsnprintf+0x53f/0x583
[<ffffffff80355726>] number+0x10e/0x1f9
[<ffffffff80234ef6>] release_console_sem+0x1b4/0x1cd
[<ffffffff803563f8>] vsnprintf+0x53f/0x583
[<ffffffff8022468a>] early_serial_write+0x22/0x32
Not sure what's going wrong. I think the patch is correct, but there
are probably more bugs around triggered by some change. Perhaps it is
better to just revert Jeremy's patch (789bad4ca2b604f2f21ae7e3d64b67325ec8f9bb) for now.
-Andi
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Fix early_ioremap on x86-64
2008-01-20 17:59 ` Ingo Molnar
@ 2008-01-20 19:12 ` Andi Kleen
2008-01-21 0:20 ` Ingo Molnar
0 siblings, 1 reply; 6+ messages in thread
From: Andi Kleen @ 2008-01-20 19:12 UTC (permalink / raw)
To: Ingo Molnar
Cc: Andi Kleen, Jeremy Fitzhardinge, tglx, linux-kernel,
venkatesh.pallipadi
On Sun, Jan 20, 2008 at 06:59:55PM +0100, Ingo Molnar wrote:
>
> Jeremy did suspect something about this change, as indicated in the
> changelog. But because the change was so finegrained, the bisection
> almost directly led to the fix. _That_ i think clearly demonstrates the
> power of bisection and finegrained changes.
Actually it is pretty hard to do bisects in current git-x86 currently because
you don't ammend original buggy changes but just stuff the fixes
at the end. This means there are large windows of breakage inbetween
that make it hard to pinpoint other bugs during bisection.
This is why I cannot easily bisect this new breakage.
Ok I suspect I can cook up some script later that always create a new branch
and reapplies the bug fixes. Or ammend the patch locally. But frankly it is
very unpleasant.
>
> but note what the fundamental problem is - we've turned a previously
> safe flushing API into an unsafe one - __flush_tlb() will only be safe
It works for user space which is by far the most common case.
> in the rarest of circumstances. There are some other matches:
>
> ./mm/init_64.c: __flush_tlb();
> ./kernel/head64.c: __flush_tlb();
You're right I missed some. I tried to just redefine it in tlbflush.h
(which covers user space fork() too so it's too much as a real fix, but
reasonable for debugging) but it unfortunately still results in the same
crash during NUMA node discovery:
There is some garbage in the stack trace; i don't think it's actually
in bootmem. [My kingdom for Jan's unwinder!]
RAT: Node 0 PXM 0 0-a0000
SRAT: Node 0 PXM 0 0-d0000000
SRAT: Node 0 PXM 0 0-130000000
SRAT: Node 1 PXM 1 130000000-330000000
early_iounmap(ffffffff828e08d9, 00000150)
Bootmem setup node 0 0000000000000000-00000000d0000000
PANIC: early exception 06 rip 10:ffffffff8082df35 error 0 cr2 0
Pid: 0, comm: swapper Not tainted 2.6.24-rc8-dirty #47
Call Trace:
[<ffffffff8082df35>] free_bootmem_core+0x1a/0x6b
[<ffffffff8082f14b>] free_bootmem_with_active_regions+0x51/0x67
[<ffffffff8082a0fb>] setup_node_bootmem+0x16b/0x1b3
[<ffffffff8082b3ac>] acpi_scan_nodes+0x136/0x230
[<ffffffff8082a6c3>] numa_initmem_init+0x331/0x459
[<ffffffff80234ef6>] release_console_sem+0x1b4/0x1cd
[<ffffffff80234ef6>] release_console_sem+0x1b4/0x1cd
[<ffffffff80355726>] number+0x10e/0x1f9
[<ffffffff802354d8>] printk+0x4e/0x56
[<ffffffff80355726>] number+0x10e/0x1f9
[<ffffffff803563f8>] vsnprintf+0x53f/0x583
[<ffffffff80355726>] number+0x10e/0x1f9
[<ffffffff80234ef6>] release_console_sem+0x1b4/0x1cd
[<ffffffff803563f8>] vsnprintf+0x53f/0x583
[<ffffffff8022468a>] early_serial_write+0x22/0x32
[<ffffffff80234b32>] __call_console_drivers+0x5e/0x6c
[<ffffffff80234ef6>] release_console_sem+0x1b4/0x1cd
[<ffffffff802354d8>] printk+0x4e/0x56
[<ffffffff802354d8>] printk+0x4e/0x56
[<ffffffff802354d8>] printk+0x4e/0x56
[<ffffffff8082b164>] acpi_numa_memory_affinity_init+0x228/0x33a
[<ffffffff8022635c>] early_iounmap+0x20/0x70
[<ffffffff80836045>] acpi_parse_memory_affinity+0x0/0x1f
[<ffffffff8083470c>] acpi_table_parse_entries+0x115/0x126
[<ffffffff80836011>] acpi_parse_slit+0x0/0x17
[<ffffffff808345d0>] acpi_table_parse+0x4c/0x73
[<ffffffff808215af>] setup_arch+0x266/0x414
[<ffffffff8081b8b7>] start_kernel+0x6f/0x2bd
[<ffffffff8081b1ce>] _sinittext+0x1ce/0x1d5
RIP 0x10
-Andi (seemingly chief QA officer of git-x86 currently)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Fix early_ioremap on x86-64
2008-01-20 19:12 ` Andi Kleen
@ 2008-01-21 0:20 ` Ingo Molnar
0 siblings, 0 replies; 6+ messages in thread
From: Ingo Molnar @ 2008-01-21 0:20 UTC (permalink / raw)
To: Andi Kleen
Cc: Jeremy Fitzhardinge, tglx, linux-kernel, venkatesh.pallipadi,
H. Peter Anvin
* Andi Kleen <andi@firstfloor.org> wrote:
> > Jeremy did suspect something about this change, as indicated in the
> > changelog. But because the change was so finegrained, the bisection
> > almost directly led to the fix. _That_ i think clearly demonstrates
> > the power of bisection and finegrained changes.
>
> Actually it is pretty hard to do bisects in current git-x86 currently
> because you don't ammend original buggy changes but just stuff the
> fixes at the end. [...]
i backmerge fixes all the time. x86.git consists of more than 800
patches and there are less than 10 tail-fixes in it - they are all at
the tail for a reason: because they have not all been fully qa-ed yet so
they might make things worse than what they try to fix. Once they are
proven i backmerge them to their proper spot.
and i frequently bisect x86.git#mm myself (almost daily) - and others do
it frequently too. There have been dozens of successful bisections done
on x86.git. (I believe it is so successful partly because we run
overnight automated bisectability tests: to make sure x86.git builds and
boots at every bisection point.)
Yours is the first claim of x86.git#mm being hard to bisect - so far all
other feedback was to the contrary. I claim that it is one of the most
bisectable subsystems in the kernel.
But we all err: if something slipped through despite all the testing,
let us know and we'll fix it up - it's just that your vague complaints
about non-bisectability wont help much in resolving the problem. You are
ignoring months of track record of excellent bisectability of x86.git
and your claims are quite unjust.
Reordering of patches is usually easy and trickling back the tail-fixes
to the actual buggy patches is easy and routine as well, except when
someone submits conceptually non-bisectable patches, like:
| Subject: Undo pat cpa patch
| From: Andi Kleen <ak@suse.de>
|
| Going to implement this differently
and then 4 patches later you do:
| Subject: CPA: Implement change_page_attr_addr entry point for i386
| From: Andi Kleen <ak@suse.de>
this breaks PAT completely in that window and makes it non-bisectable.
So i find it a bit dishonest from you that you are now complaining about
the supposed non-bisectability of x86.git ...
> -Andi (seemingly chief QA officer of git-x86 currently)
Andi, could you please stop these snide remarks? x86.git is the
development tip of the new arch/x86 tree and you should know that. I
boot it more than a 1000 times a day on my QA-grid, with different
kernel configs, so it's far from being unusable.
I update x86.git multiple times a day so that people can see the changes
immediately. But you are now abusing this transparency to create a false
impression of flakiness. If you cannot stop being negative then please
just dont deal with x86.git#mm - it's not obligatory to contribute.
Furthermore, your complaint is doubly unjust because we've fixed a good
deal of bugs in x86.git introduced by _you_. You are by far not the only
person who fixes bugs in x86.git and you are by far not the most active
one either - but you are certainly the loudest one :-(
Ingo
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] Fix early_ioremap on x86-64 II
2008-01-20 18:04 ` [PATCH] Fix early_ioremap on x86-64 II Andi Kleen
@ 2008-01-21 19:00 ` Jeremy Fitzhardinge
0 siblings, 0 replies; 6+ messages in thread
From: Jeremy Fitzhardinge @ 2008-01-21 19:00 UTC (permalink / raw)
To: Andi Kleen; +Cc: mingo, tglx, linux-kernel, venkatesh.pallipadi
Andi Kleen wrote:
>> This patch changes all flushes in init_64.c to be __flush_tlb_all()
>> and fixes the problem here.
>>
>
> Hmm, I see new early crashes with that patch here in some cases unfortunately:
>
> PANIC: early exception 06 rip 10:ffffffff8082df2d error 0 cr2 0
> Pid: 0, comm: swapper Not tainted 2.6.24-rc8-g7ada4254 #39
>
> Call Trace:
> [<ffffffff8082df2d>] free_bootmem_core+0x1a/0x6b
> [<ffffffff8082f143>] free_bootmem_with_active_regions+0x51/0x67
> [<ffffffff8082a0f3>] setup_node_bootmem+0x16b/0x1b3
> [<ffffffff8082b3a4>] acpi_scan_nodes+0x136/0x230
> [<ffffffff8082a6bb>] numa_initmem_init+0x331/0x459
> [<ffffffff80234ef6>] release_console_sem+0x1b4/0x1cd
> [<ffffffff80234ef6>] release_console_sem+0x1b4/0x1cd
> [<ffffffff80355726>] number+0x10e/0x1f9
> [<ffffffff802354d8>] printk+0x4e/0x56
> [<ffffffff80355726>] number+0x10e/0x1f9
> [<ffffffff803563f8>] vsnprintf+0x53f/0x583
> [<ffffffff80355726>] number+0x10e/0x1f9
> [<ffffffff80234ef6>] release_console_sem+0x1b4/0x1cd
> [<ffffffff803563f8>] vsnprintf+0x53f/0x583
> [<ffffffff8022468a>] early_serial_write+0x22/0x32
>
> Not sure what's going wrong. I think the patch is correct, but there
> are probably more bugs around triggered by some change. Perhaps it is
> better to just revert Jeremy's patch (789bad4ca2b604f2f21ae7e3d64b67325ec8f9bb) for now.
>
It's a purely "make things prettier" patch, so I have no problem with
that. If the non-global flags used in early 64-bit pagetables is very
deeply ingrained, I wonder if its worth making a separate set of
explicitly non-global _PAGE_KERNEL flags for its use, so that at least
_PAGE_KERNEL* is consistent between 32 and 64 bits. Otherwise its going
to be the sort of difference which gets increasingly awkward as things
get unified (I guess its mitigated by the fact that 32-bit code can
never assume that global mappings are available and present).
J
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-01-21 19:00 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-20 17:28 [PATCH] Fix early_ioremap on x86-64 Andi Kleen
2008-01-20 17:59 ` Ingo Molnar
2008-01-20 19:12 ` Andi Kleen
2008-01-21 0:20 ` Ingo Molnar
2008-01-20 18:04 ` [PATCH] Fix early_ioremap on x86-64 II Andi Kleen
2008-01-21 19:00 ` Jeremy Fitzhardinge
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).