linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Fix early_ioremap on x86-64
@ 2008-01-20 17:28 Andi Kleen
  2008-01-20 17:59 ` Ingo Molnar
  2008-01-20 18:04 ` [PATCH] Fix early_ioremap on x86-64 II Andi Kleen
  0 siblings, 2 replies; 6+ messages in thread
From: Andi Kleen @ 2008-01-20 17:28 UTC (permalink / raw)
  To: mingo, Jeremy Fitzhardinge, tglx, linux-kernel,
	venkatesh.pallipadi

Fix early_ioremap on x86-64

[Venki, sorry for blaming PAT for this earlier. It was innocent.]
[Note the patch is on top of git-x86 + gbpages applied. I think
there will be an trivial reject without gbpages-direct applied first]

I had ACPI failures on several machines since a few days. Symptom
was NUMA nodes not getting detected or worse cores not getting detected.
They all came from ACPI not being able to read various of its tables. I finally
bisected it down to Jeremy's "put _PAGE_GLOBAL into PAGE_KERNEL" change.
With that the fix was fairly obvious. The problem was that early_ioremap()
didn't use a "_all" flush that would affect the global PTEs too. So 
with global bits getting used everywhere now an early_ioremap would
not actually flush a mapping if something else was mapped previously
on that slot (which can happen with early_iounmap inbetween)

This patch changes all flushes in init_64.c to be __flush_tlb_all()
and fixes the problem here.

Signed-off-by: Andi Kleen <ak@suse.de>

Index: linux/arch/x86/mm/init_64.c
===================================================================
--- linux.orig/arch/x86/mm/init_64.c
+++ linux/arch/x86/mm/init_64.c
@@ -242,7 +242,7 @@ void *early_ioremap(unsigned long addr, 
 		addr &= PMD_MASK;
 		for (i = 0; i < pmds; i++, addr += PMD_SIZE)
 			set_pmd(pmd+i, __pmd(addr | __PAGE_KERNEL_LARGE_EXEC));
-		__flush_tlb();
+		__flush_tlb_all();
 		printk(KERN_CONT "%016ld\n", vaddr);
 		return (void *)vaddr;
 	next:
@@ -265,7 +265,7 @@ void early_iounmap(void *addr, unsigned 
 	pmd = level2_kernel_pgt + pmd_index(vaddr);
 	for (i = 0; i < pmds; i++)
 		pmd_clear(pmd + i);
-	__flush_tlb();
+	__flush_tlb_all();
 }
 
 static unsigned long direct_entry(unsigned long paddr)
@@ -339,7 +339,7 @@ static void __meminit phys_pud_init(pud_
 		spin_unlock(&init_mm.page_table_lock);
 		unmap_low_page(pmd);
 	}
-	__flush_tlb();
+	__flush_tlb_all();
 } 
 
 static void __init find_early_table_space(unsigned long end)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Fix early_ioremap on x86-64
  2008-01-20 17:28 [PATCH] Fix early_ioremap on x86-64 Andi Kleen
@ 2008-01-20 17:59 ` Ingo Molnar
  2008-01-20 19:12   ` Andi Kleen
  2008-01-20 18:04 ` [PATCH] Fix early_ioremap on x86-64 II Andi Kleen
  1 sibling, 1 reply; 6+ messages in thread
From: Ingo Molnar @ 2008-01-20 17:59 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Jeremy Fitzhardinge, tglx, linux-kernel, venkatesh.pallipadi


* Andi Kleen <andi@firstfloor.org> wrote:

> Fix early_ioremap on x86-64
> 
> [Venki, sorry for blaming PAT for this earlier. It was innocent.] 
> [Note the patch is on top of git-x86 + gbpages applied. I think there 
> will be an trivial reject without gbpages-direct applied first]
> 
> I had ACPI failures on several machines since a few days. Symptom was 
> NUMA nodes not getting detected or worse cores not getting detected. 
> They all came from ACPI not being able to read various of its tables. 
> I finally bisected it down to Jeremy's "put _PAGE_GLOBAL into 
> PAGE_KERNEL" change. With that the fix was fairly obvious. The problem 
> was that early_ioremap() didn't use a "_all" flush that would affect 
> the global PTEs too. So with global bits getting used everywhere now 
> an early_ioremap would not actually flush a mapping if something else 
> was mapped previously on that slot (which can happen with 
> early_iounmap inbetween)
> 
> This patch changes all flushes in init_64.c to be __flush_tlb_all() 
> and fixes the problem here.

ah, very nice! This is the bad commit:

  ------------------------->
  Subject:  x86: fold _PAGE_GLOBAL into __PAGE_KERNEL
  From: Jeremy Fitzhardinge <jeremy@goop.org>

  With the iounmap problem resolved, it should be OK to always set
  _PAGE_GLOBAL in __PAGE_KERNEL*.

  [ Did this patch cause problems before? ]
  <------------------------

Jeremy did suspect something about this change, as indicated in the 
changelog. But because the change was so finegrained, the bisection 
almost directly led to the fix. _That_ i think clearly demonstrates the 
power of bisection and finegrained changes.

but note what the fundamental problem is - we've turned a previously 
safe flushing API into an unsafe one - __flush_tlb() will only be safe 
in the rarest of circumstances. There are some other matches:

 ./mm/init_64.c: __flush_tlb();
 ./kernel/head64.c:      __flush_tlb();

The boot identity mappings zapped do not have PGE set at the moment, but 
they could in the future (once we do native pagetable setup straight 
from flat mode) - and this is not a performance critical path anyway.

 ./kernel/cpu/mtrr/generic.c:    __flush_tlb();
 ./kernel/cpu/mtrr/generic.c:    __flush_tlb();

these include an open-coded version of __flush_tlb_all() so they are 
safe.

and we might as well make the non-PGE flush the 'special API'. I.e. 
rename __flush_tlb() to __flush_tlb_partial() and rename 
__flush_tlb_all() to __flush_tlb(). This makes it very apparent which 
should be used by default and which does what.

	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Fix early_ioremap on x86-64 II
  2008-01-20 17:28 [PATCH] Fix early_ioremap on x86-64 Andi Kleen
  2008-01-20 17:59 ` Ingo Molnar
@ 2008-01-20 18:04 ` Andi Kleen
  2008-01-21 19:00   ` Jeremy Fitzhardinge
  1 sibling, 1 reply; 6+ messages in thread
From: Andi Kleen @ 2008-01-20 18:04 UTC (permalink / raw)
  To: Andi Kleen
  Cc: mingo, Jeremy Fitzhardinge, tglx, linux-kernel,
	venkatesh.pallipadi

> This patch changes all flushes in init_64.c to be __flush_tlb_all()
> and fixes the problem here.

Hmm, I see new early crashes with that patch here in some cases unfortunately:

PANIC: early exception 06 rip 10:ffffffff8082df2d error 0 cr2 0
Pid: 0, comm: swapper Not tainted 2.6.24-rc8-g7ada4254 #39

Call Trace:
 [<ffffffff8082df2d>] free_bootmem_core+0x1a/0x6b
 [<ffffffff8082f143>] free_bootmem_with_active_regions+0x51/0x67
 [<ffffffff8082a0f3>] setup_node_bootmem+0x16b/0x1b3
 [<ffffffff8082b3a4>] acpi_scan_nodes+0x136/0x230
 [<ffffffff8082a6bb>] numa_initmem_init+0x331/0x459
 [<ffffffff80234ef6>] release_console_sem+0x1b4/0x1cd
 [<ffffffff80234ef6>] release_console_sem+0x1b4/0x1cd
 [<ffffffff80355726>] number+0x10e/0x1f9
 [<ffffffff802354d8>] printk+0x4e/0x56
 [<ffffffff80355726>] number+0x10e/0x1f9
 [<ffffffff803563f8>] vsnprintf+0x53f/0x583
 [<ffffffff80355726>] number+0x10e/0x1f9
 [<ffffffff80234ef6>] release_console_sem+0x1b4/0x1cd
 [<ffffffff803563f8>] vsnprintf+0x53f/0x583
 [<ffffffff8022468a>] early_serial_write+0x22/0x32

Not sure what's going wrong. I think the patch is correct, but there
are probably more bugs around triggered by some change. Perhaps it is 
better to just revert Jeremy's patch (789bad4ca2b604f2f21ae7e3d64b67325ec8f9bb) for now.

-Andi

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Fix early_ioremap on x86-64
  2008-01-20 17:59 ` Ingo Molnar
@ 2008-01-20 19:12   ` Andi Kleen
  2008-01-21  0:20     ` Ingo Molnar
  0 siblings, 1 reply; 6+ messages in thread
From: Andi Kleen @ 2008-01-20 19:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Jeremy Fitzhardinge, tglx, linux-kernel,
	venkatesh.pallipadi

On Sun, Jan 20, 2008 at 06:59:55PM +0100, Ingo Molnar wrote:
> 
> Jeremy did suspect something about this change, as indicated in the 
> changelog. But because the change was so finegrained, the bisection 
> almost directly led to the fix. _That_ i think clearly demonstrates the 
> power of bisection and finegrained changes.

Actually it is pretty hard to do bisects in current git-x86 currently because
you don't ammend original buggy changes but just stuff the fixes 
at the end. This means there are large windows of breakage inbetween
that make it hard to pinpoint other bugs during bisection.

This is why I cannot easily bisect this new breakage.
Ok I suspect I can cook up some script later that always create a new branch 
and reapplies the bug fixes. Or ammend the patch locally. But frankly it is 
very unpleasant.

> 
> but note what the fundamental problem is - we've turned a previously 
> safe flushing API into an unsafe one - __flush_tlb() will only be safe 

It works for user space which is by far the most common case.

> in the rarest of circumstances. There are some other matches:
> 
>  ./mm/init_64.c: __flush_tlb();
>  ./kernel/head64.c:      __flush_tlb();

You're right I missed some. I tried to just redefine it in tlbflush.h
(which covers user space fork() too so it's too much as a real fix, but
reasonable for debugging) but it unfortunately still results in the same 
crash during NUMA node discovery:

There is some garbage in the stack trace; i don't think it's actually
in bootmem. [My kingdom for Jan's unwinder!]

RAT: Node 0 PXM 0 0-a0000
SRAT: Node 0 PXM 0 0-d0000000
SRAT: Node 0 PXM 0 0-130000000
SRAT: Node 1 PXM 1 130000000-330000000
early_iounmap(ffffffff828e08d9, 00000150)
Bootmem setup node 0 0000000000000000-00000000d0000000
PANIC: early exception 06 rip 10:ffffffff8082df35 error 0 cr2 0
Pid: 0, comm: swapper Not tainted 2.6.24-rc8-dirty #47

Call Trace:
 [<ffffffff8082df35>] free_bootmem_core+0x1a/0x6b
 [<ffffffff8082f14b>] free_bootmem_with_active_regions+0x51/0x67
 [<ffffffff8082a0fb>] setup_node_bootmem+0x16b/0x1b3
 [<ffffffff8082b3ac>] acpi_scan_nodes+0x136/0x230
 [<ffffffff8082a6c3>] numa_initmem_init+0x331/0x459
 [<ffffffff80234ef6>] release_console_sem+0x1b4/0x1cd
 [<ffffffff80234ef6>] release_console_sem+0x1b4/0x1cd
 [<ffffffff80355726>] number+0x10e/0x1f9
 [<ffffffff802354d8>] printk+0x4e/0x56
 [<ffffffff80355726>] number+0x10e/0x1f9
 [<ffffffff803563f8>] vsnprintf+0x53f/0x583
 [<ffffffff80355726>] number+0x10e/0x1f9
 [<ffffffff80234ef6>] release_console_sem+0x1b4/0x1cd
 [<ffffffff803563f8>] vsnprintf+0x53f/0x583
 [<ffffffff8022468a>] early_serial_write+0x22/0x32
 [<ffffffff80234b32>] __call_console_drivers+0x5e/0x6c
 [<ffffffff80234ef6>] release_console_sem+0x1b4/0x1cd
 [<ffffffff802354d8>] printk+0x4e/0x56
 [<ffffffff802354d8>] printk+0x4e/0x56
 [<ffffffff802354d8>] printk+0x4e/0x56
 [<ffffffff8082b164>] acpi_numa_memory_affinity_init+0x228/0x33a
 [<ffffffff8022635c>] early_iounmap+0x20/0x70
 [<ffffffff80836045>] acpi_parse_memory_affinity+0x0/0x1f
 [<ffffffff8083470c>] acpi_table_parse_entries+0x115/0x126
 [<ffffffff80836011>] acpi_parse_slit+0x0/0x17
 [<ffffffff808345d0>] acpi_table_parse+0x4c/0x73
 [<ffffffff808215af>] setup_arch+0x266/0x414
 [<ffffffff8081b8b7>] start_kernel+0x6f/0x2bd
 [<ffffffff8081b1ce>] _sinittext+0x1ce/0x1d5

RIP 0x10

-Andi (seemingly chief QA officer of git-x86 currently) 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Fix early_ioremap on x86-64
  2008-01-20 19:12   ` Andi Kleen
@ 2008-01-21  0:20     ` Ingo Molnar
  0 siblings, 0 replies; 6+ messages in thread
From: Ingo Molnar @ 2008-01-21  0:20 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jeremy Fitzhardinge, tglx, linux-kernel, venkatesh.pallipadi,
	H. Peter Anvin


* Andi Kleen <andi@firstfloor.org> wrote:

> > Jeremy did suspect something about this change, as indicated in the 
> > changelog. But because the change was so finegrained, the bisection 
> > almost directly led to the fix. _That_ i think clearly demonstrates 
> > the power of bisection and finegrained changes.
> 
> Actually it is pretty hard to do bisects in current git-x86 currently 
> because you don't ammend original buggy changes but just stuff the 
> fixes at the end. [...]

i backmerge fixes all the time. x86.git consists of more than 800 
patches and there are less than 10 tail-fixes in it - they are all at 
the tail for a reason: because they have not all been fully qa-ed yet so 
they might make things worse than what they try to fix. Once they are 
proven i backmerge them to their proper spot.

and i frequently bisect x86.git#mm myself (almost daily) - and others do 
it frequently too. There have been dozens of successful bisections done 
on x86.git. (I believe it is so successful partly because we run 
overnight automated bisectability tests: to make sure x86.git builds and 
boots at every bisection point.)

Yours is the first claim of x86.git#mm being hard to bisect - so far all 
other feedback was to the contrary. I claim that it is one of the most 
bisectable subsystems in the kernel.

But we all err: if something slipped through despite all the testing, 
let us know and we'll fix it up - it's just that your vague complaints 
about non-bisectability wont help much in resolving the problem. You are 
ignoring months of track record of excellent bisectability of x86.git 
and your claims are quite unjust.

Reordering of patches is usually easy and trickling back the tail-fixes 
to the actual buggy patches is easy and routine as well, except when 
someone submits conceptually non-bisectable patches, like:

|   Subject: Undo pat cpa patch
|   From: Andi Kleen <ak@suse.de>
|
|   Going to implement this differently

and then 4 patches later you do:

|   Subject: CPA: Implement change_page_attr_addr entry point for i386
|   From: Andi Kleen <ak@suse.de>

this breaks PAT completely in that window and makes it non-bisectable.

So i find it a bit dishonest from you that you are now complaining about 
the supposed non-bisectability of x86.git ...

> -Andi (seemingly chief QA officer of git-x86 currently)

Andi, could you please stop these snide remarks? x86.git is the 
development tip of the new arch/x86 tree and you should know that. I 
boot it more than a 1000 times a day on my QA-grid, with different 
kernel configs, so it's far from being unusable.

I update x86.git multiple times a day so that people can see the changes 
immediately. But you are now abusing this transparency to create a false 
impression of flakiness. If you cannot stop being negative then please 
just dont deal with x86.git#mm - it's not obligatory to contribute.

Furthermore, your complaint is doubly unjust because we've fixed a good 
deal of bugs in x86.git introduced by _you_. You are by far not the only 
person who fixes bugs in x86.git and you are by far not the most active 
one either - but you are certainly the loudest one :-(
 
	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Fix early_ioremap on x86-64 II
  2008-01-20 18:04 ` [PATCH] Fix early_ioremap on x86-64 II Andi Kleen
@ 2008-01-21 19:00   ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 6+ messages in thread
From: Jeremy Fitzhardinge @ 2008-01-21 19:00 UTC (permalink / raw)
  To: Andi Kleen; +Cc: mingo, tglx, linux-kernel, venkatesh.pallipadi

Andi Kleen wrote:
>> This patch changes all flushes in init_64.c to be __flush_tlb_all()
>> and fixes the problem here.
>>     
>
> Hmm, I see new early crashes with that patch here in some cases unfortunately:
>
> PANIC: early exception 06 rip 10:ffffffff8082df2d error 0 cr2 0
> Pid: 0, comm: swapper Not tainted 2.6.24-rc8-g7ada4254 #39
>
> Call Trace:
>  [<ffffffff8082df2d>] free_bootmem_core+0x1a/0x6b
>  [<ffffffff8082f143>] free_bootmem_with_active_regions+0x51/0x67
>  [<ffffffff8082a0f3>] setup_node_bootmem+0x16b/0x1b3
>  [<ffffffff8082b3a4>] acpi_scan_nodes+0x136/0x230
>  [<ffffffff8082a6bb>] numa_initmem_init+0x331/0x459
>  [<ffffffff80234ef6>] release_console_sem+0x1b4/0x1cd
>  [<ffffffff80234ef6>] release_console_sem+0x1b4/0x1cd
>  [<ffffffff80355726>] number+0x10e/0x1f9
>  [<ffffffff802354d8>] printk+0x4e/0x56
>  [<ffffffff80355726>] number+0x10e/0x1f9
>  [<ffffffff803563f8>] vsnprintf+0x53f/0x583
>  [<ffffffff80355726>] number+0x10e/0x1f9
>  [<ffffffff80234ef6>] release_console_sem+0x1b4/0x1cd
>  [<ffffffff803563f8>] vsnprintf+0x53f/0x583
>  [<ffffffff8022468a>] early_serial_write+0x22/0x32
>
> Not sure what's going wrong. I think the patch is correct, but there
> are probably more bugs around triggered by some change. Perhaps it is 
> better to just revert Jeremy's patch (789bad4ca2b604f2f21ae7e3d64b67325ec8f9bb) for now.
>   

It's a purely "make things prettier" patch, so I have no problem with 
that.   If the non-global flags used in early 64-bit pagetables is very 
deeply ingrained, I wonder if its worth making a separate set of 
explicitly non-global _PAGE_KERNEL flags for its use, so that at least 
_PAGE_KERNEL* is consistent between 32 and 64 bits.  Otherwise its going 
to be the sort of difference which gets increasingly awkward as things 
get unified (I guess its mitigated by the fact that 32-bit code can 
never assume that global mappings are available and present).

    J

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-01-21 19:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-20 17:28 [PATCH] Fix early_ioremap on x86-64 Andi Kleen
2008-01-20 17:59 ` Ingo Molnar
2008-01-20 19:12   ` Andi Kleen
2008-01-21  0:20     ` Ingo Molnar
2008-01-20 18:04 ` [PATCH] Fix early_ioremap on x86-64 II Andi Kleen
2008-01-21 19:00   ` Jeremy Fitzhardinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).