linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* bug with 3.4.6, 3.5.3, 3.6.1
@ 2012-10-11  5:46 Gilles Chanteperdrix
  2012-10-11 10:36 ` Will Deacon
  0 siblings, 1 reply; 9+ messages in thread
From: Gilles Chanteperdrix @ 2012-10-11  5:46 UTC (permalink / raw)
  To: linux-arm-kernel


Hi,

when booting Linux v3.4.6, v3.5.3, or v3.6.1 on a pandaboard with an 
OMAP4430 ES2.1, compiled with the following configuration:
http://xenomai.org/~gch/config-panda

I get the bug below after mounting the root filesystem.

CONFIG_VMSPLIT_2G and CONFIG_THUMB2_KERNEL disabled seems to be the 
combination which triggers the bug.

With this configuration, it seems the init_mm.mm_count incrementation
done at the beginning of secondary_start_kernel() is "lost" after the
calls to cpu_switch_mm() and local_flush_tlb().

Modifying the secondary_startup() function in head.S to pass the 
swapper pgdir instead of the idmap pgdir in r4 also avoids the issue.

Regards.


init (301): undefined instruction: pc=80027fe0                                   
Code: e59f3058 e1a04000 e1500003 1a000000 (e7f001f2)                             
------------[ cut here ]------------                                             
kernel BUG at kernel/fork.c:558!                                                 
Internal error: Oops - BUG: 0 [#1] SMP ARM                                       
CPU: 1    Tainted: G        W     (3.4.6+ #61)                                   
PC is at __mmdrop+0x1c/0x78                                                      
LR is at finish_task_switch+0xa4/0xec                                            
pc : [<80027fe0>]    lr : [<80049a78>]    psr: 60000113                          
sp : bfa79f60  ip : bfa79f78  fp : bfa79f74                                      
r10: 00000000  r9 : 00000000  r8 : 00000000                                      
r7 : 00000000  r6 : 00000000  r5 : bf83a080  r4 : 803655c8                       
r3 : 803655c8  r2 : 00000000  r1 : 00000000  r0 : 803655c8                       
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user                
Control: 10c53c7d  Table: bfa7c04a  DAC: 00000015                                
Process init (pid: 301, stack limit = 0xbfa782f0)                                
Stack: (0xbfa79f60 to 0xbfa7a000)                                                
9f60: 803655c8 bf83a080 bfa79f94 bfa79f78 80049a78 80027fd0 80be2100 bf83a080    
9f80: 00000000 00000000 bfa79fac bfa79f98 8004b2e0 800499e0 00000000 00000000    
9fa0: 00000000 bfa79fb0 8000de44 8004b2ac 00000000 7ee4aba0 76f734c0 00000000    
9fc0: 000e21e0 7ee4ac7c 00000001 000000be 00000305 00000000 0001196c 00000000    
9fe0: 00000075 7ee4ac30 000a2494 76e07384 60000010 80000000 ffefffff 00000000    
Backtrace:                                                                       
[<80027fc4>] (__mmdrop+0x0/0x78) from [<80049a78>] (finish_task_switch+0xa4/0xec)
 r5:bf83a080 r4:803655c8                                                         
[<800499d4>] (finish_task_switch+0x0/0xec) from [<8004b2e0>] (schedule_tail+0x40/0xd0)                                                                            
 r7:00000000 r6:00000000 r5:bf83a080 r4:80be2100                                 
[<8004b2a0>] (schedule_tail+0x0/0xd0) from [<8000de44>] (ret_from_fork+0x8/0x44) 
 r5:00000000 r4:00000000                                                         
Code: e59f3058 e1a04000 e1500003 1a000000 (e7f001f2)                             
mappings:                                                                        
0x00008000-0x000d7000 r-xp 0x00000000 /bin/busybox                               
0x000df000-0x000e0000 rw-p 0x000cf000 /bin/busybox                               
0x000e0000-0x00103000 rw-p 0x000e0000 [heap]                                     
0x76d66000-0x76ea1000 r-xp 0x00000000 /lib/libc.so.6                             
0x76ea1000-0x76ea9000 ---p 0x0013b000 /lib/libc.so.6                             
0x76ea9000-0x76eab000 r--p 0x0013b000 /lib/libc.so.6                             
0x76eab000-0x76eac000 rw-p 0x0013d000 /lib/libc.so.6                             
0x76eac000-0x76eaf000 rw-p 0x76eac000                                            
0x76eaf000-0x76f47000 r-xp 0x00000000 /lib/libm.so.6                             
0x76f47000-0x76f4e000 ---p 0x00098000 /lib/libm.so.6                             
0x76f4e000-0x76f4f000 r--p 0x00097000 /lib/libm.so.6                             
0x76f4f000-0x76f50000 rw-p 0x00098000 /lib/libm.so.6                             
0x76f50000-0x76f6f000 r-xp 0x00000000 /lib/ld-linux.so.3                         
0x76f73000-0x76f74000 rw-p 0x76f73000                                            
0x76f75000-0x76f76000 rw-p 0x76f75000                                            
0x76f76000-0x76f77000 r--p 0x0001e000 /lib/ld-linux.so.3                         
0x76f77000-0x76f78000 rw-p 0x0001f000 /lib/ld-linux.so.3                         
0x7ee29000-0x7ee4b000 rw-p 0x7efde000 [stack]                                    
---[ end trace 1b75b31a2719ed1e ]---                                             

-- 
                                                                Gilles.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug with 3.4.6, 3.5.3, 3.6.1
  2012-10-11  5:46 bug with 3.4.6, 3.5.3, 3.6.1 Gilles Chanteperdrix
@ 2012-10-11 10:36 ` Will Deacon
  2012-10-11 12:54   ` Gilles Chanteperdrix
  2012-10-11 13:32   ` Gilles Chanteperdrix
  0 siblings, 2 replies; 9+ messages in thread
From: Will Deacon @ 2012-10-11 10:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 11, 2012 at 06:46:35AM +0100, Gilles Chanteperdrix wrote:
> Hi,

Hi Gilles,

> when booting Linux v3.4.6, v3.5.3, or v3.6.1 on a pandaboard with an 
> OMAP4430 ES2.1, compiled with the following configuration:
> http://xenomai.org/~gch/config-panda
> 
> I get the bug below after mounting the root filesystem.
> 
> CONFIG_VMSPLIT_2G and CONFIG_THUMB2_KERNEL disabled seems to be the 
> combination which triggers the bug.
> 
> With this configuration, it seems the init_mm.mm_count incrementation
> done at the beginning of secondary_start_kernel() is "lost" after the
> calls to cpu_switch_mm() and local_flush_tlb().
> 
> Modifying the secondary_startup() function in head.S to pass the 
> swapper pgdir instead of the idmap pgdir in r4 also avoids the issue.

What's your PHYS_OFFSET? I suspect it's >= 2GB, in which case I have some
ideas about this problem.

Will

^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug with 3.4.6, 3.5.3, 3.6.1
  2012-10-11 10:36 ` Will Deacon
@ 2012-10-11 12:54   ` Gilles Chanteperdrix
  2012-10-11 13:32   ` Gilles Chanteperdrix
  1 sibling, 0 replies; 9+ messages in thread
From: Gilles Chanteperdrix @ 2012-10-11 12:54 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/11/2012 12:36 PM, Will Deacon wrote:
> On Thu, Oct 11, 2012 at 06:46:35AM +0100, Gilles Chanteperdrix wrote:
>> Hi,
> 
> Hi Gilles,
> 
>> when booting Linux v3.4.6, v3.5.3, or v3.6.1 on a pandaboard with an 
>> OMAP4430 ES2.1, compiled with the following configuration:
>> http://xenomai.org/~gch/config-panda
>>
>> I get the bug below after mounting the root filesystem.
>>
>> CONFIG_VMSPLIT_2G and CONFIG_THUMB2_KERNEL disabled seems to be the 
>> combination which triggers the bug.
>>
>> With this configuration, it seems the init_mm.mm_count incrementation
>> done at the beginning of secondary_start_kernel() is "lost" after the
>> calls to cpu_switch_mm() and local_flush_tlb().
>>
>> Modifying the secondary_startup() function in head.S to pass the 
>> swapper pgdir instead of the idmap pgdir in r4 also avoids the issue.
> 
> What's your PHYS_OFFSET? I suspect it's >= 2GB, in which case I have some
> ideas about this problem.

You mean the physical address of RAM ? I believe it is 0x80000000, will
check.

-- 
					    Gilles.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug with 3.4.6, 3.5.3, 3.6.1
  2012-10-11 10:36 ` Will Deacon
  2012-10-11 12:54   ` Gilles Chanteperdrix
@ 2012-10-11 13:32   ` Gilles Chanteperdrix
  2012-10-11 13:59     ` Will Deacon
  1 sibling, 1 reply; 9+ messages in thread
From: Gilles Chanteperdrix @ 2012-10-11 13:32 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/11/2012 12:36 PM, Will Deacon wrote:
> On Thu, Oct 11, 2012 at 06:46:35AM +0100, Gilles Chanteperdrix wrote:
>> Hi,
> 
> Hi Gilles,
> 
>> when booting Linux v3.4.6, v3.5.3, or v3.6.1 on a pandaboard with an 
>> OMAP4430 ES2.1, compiled with the following configuration:
>> http://xenomai.org/~gch/config-panda
>>
>> I get the bug below after mounting the root filesystem.
>>
>> CONFIG_VMSPLIT_2G and CONFIG_THUMB2_KERNEL disabled seems to be the 
>> combination which triggers the bug.
>>
>> With this configuration, it seems the init_mm.mm_count incrementation
>> done at the beginning of secondary_start_kernel() is "lost" after the
>> calls to cpu_switch_mm() and local_flush_tlb().
>>
>> Modifying the secondary_startup() function in head.S to pass the 
>> swapper pgdir instead of the idmap pgdir in r4 also avoids the issue.
> 
> What's your PHYS_OFFSET? I suspect it's >= 2GB, in which case I have some
> ideas about this problem.

Yes, according to /proc/iomem:
80000000-bfefffff : System RAM
  80008000-80339fff : Kernel code
  80364000-803c52e7 : Kernel data

So, PHYS_OFFSET is 0x80000000 that is, 2GB.

-- 
					    Gilles.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug with 3.4.6, 3.5.3, 3.6.1
  2012-10-11 13:32   ` Gilles Chanteperdrix
@ 2012-10-11 13:59     ` Will Deacon
  2012-10-11 14:01       ` Gilles Chanteperdrix
  0 siblings, 1 reply; 9+ messages in thread
From: Will Deacon @ 2012-10-11 13:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 11, 2012 at 02:32:06PM +0100, Gilles Chanteperdrix wrote:
> On 10/11/2012 12:36 PM, Will Deacon wrote:
> > On Thu, Oct 11, 2012 at 06:46:35AM +0100, Gilles Chanteperdrix wrote:
> >> Hi,
> > 
> > Hi Gilles,
> > 
> >> when booting Linux v3.4.6, v3.5.3, or v3.6.1 on a pandaboard with an 
> >> OMAP4430 ES2.1, compiled with the following configuration:
> >> http://xenomai.org/~gch/config-panda
> >>
> >> I get the bug below after mounting the root filesystem.
> >>
> >> CONFIG_VMSPLIT_2G and CONFIG_THUMB2_KERNEL disabled seems to be the 
> >> combination which triggers the bug.
> >>
> >> With this configuration, it seems the init_mm.mm_count incrementation
> >> done at the beginning of secondary_start_kernel() is "lost" after the
> >> calls to cpu_switch_mm() and local_flush_tlb().
> >>
> >> Modifying the secondary_startup() function in head.S to pass the 
> >> swapper pgdir instead of the idmap pgdir in r4 also avoids the issue.
> > 
> > What's your PHYS_OFFSET? I suspect it's >= 2GB, in which case I have some
> > ideas about this problem.
> 
> Yes, according to /proc/iomem:
> 80000000-bfefffff : System RAM
>   80008000-80339fff : Kernel code
>   80364000-803c52e7 : Kernel data
> 
> So, PHYS_OFFSET is 0x80000000 that is, 2GB.

Argh, then there's something fishy with the interaction between the idmap
and swapper. The overwritten entries *should* be identical, but something is
causing us to corrupt the initial tables from pgd_alloc(). Perhaps something
to do with us mapping in sections...

I'll have to do some digging and get back to you.

Cheers,

Will

^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug with 3.4.6, 3.5.3, 3.6.1
  2012-10-11 13:59     ` Will Deacon
@ 2012-10-11 14:01       ` Gilles Chanteperdrix
  2012-10-11 14:03         ` Will Deacon
  2012-10-11 19:50         ` Will Deacon
  0 siblings, 2 replies; 9+ messages in thread
From: Gilles Chanteperdrix @ 2012-10-11 14:01 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/11/2012 03:59 PM, Will Deacon wrote:
> On Thu, Oct 11, 2012 at 02:32:06PM +0100, Gilles Chanteperdrix wrote:
>> On 10/11/2012 12:36 PM, Will Deacon wrote:
>>> On Thu, Oct 11, 2012 at 06:46:35AM +0100, Gilles Chanteperdrix wrote:
>>>> Hi,
>>>
>>> Hi Gilles,
>>>
>>>> when booting Linux v3.4.6, v3.5.3, or v3.6.1 on a pandaboard with an 
>>>> OMAP4430 ES2.1, compiled with the following configuration:
>>>> http://xenomai.org/~gch/config-panda
>>>>
>>>> I get the bug below after mounting the root filesystem.
>>>>
>>>> CONFIG_VMSPLIT_2G and CONFIG_THUMB2_KERNEL disabled seems to be the 
>>>> combination which triggers the bug.
>>>>
>>>> With this configuration, it seems the init_mm.mm_count incrementation
>>>> done at the beginning of secondary_start_kernel() is "lost" after the
>>>> calls to cpu_switch_mm() and local_flush_tlb().
>>>>
>>>> Modifying the secondary_startup() function in head.S to pass the 
>>>> swapper pgdir instead of the idmap pgdir in r4 also avoids the issue.
>>>
>>> What's your PHYS_OFFSET? I suspect it's >= 2GB, in which case I have some
>>> ideas about this problem.
>>
>> Yes, according to /proc/iomem:
>> 80000000-bfefffff : System RAM
>>   80008000-80339fff : Kernel code
>>   80364000-803c52e7 : Kernel data
>>
>> So, PHYS_OFFSET is 0x80000000 that is, 2GB.
> 
> Argh, then there's something fishy with the interaction between the idmap
> and swapper. The overwritten entries *should* be identical, but something is
> causing us to corrupt the initial tables from pgd_alloc(). Perhaps something
> to do with us mapping in sections...
> 
> I'll have to do some digging and get back to you.

To satisfy my curiosity, what is the difference between VMSPLIT_3G and
VMSPLIT_2G? The fact that with VMSPLIT_2G with have physical == virtual?

-- 
					    Gilles.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug with 3.4.6, 3.5.3, 3.6.1
  2012-10-11 14:01       ` Gilles Chanteperdrix
@ 2012-10-11 14:03         ` Will Deacon
  2012-10-11 19:50         ` Will Deacon
  1 sibling, 0 replies; 9+ messages in thread
From: Will Deacon @ 2012-10-11 14:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 11, 2012 at 03:01:14PM +0100, Gilles Chanteperdrix wrote:
> On 10/11/2012 03:59 PM, Will Deacon wrote:
> > On Thu, Oct 11, 2012 at 02:32:06PM +0100, Gilles Chanteperdrix wrote:
> >> So, PHYS_OFFSET is 0x80000000 that is, 2GB.
> > 
> > Argh, then there's something fishy with the interaction between the idmap
> > and swapper. The overwritten entries *should* be identical, but something is
> > causing us to corrupt the initial tables from pgd_alloc(). Perhaps something
> > to do with us mapping in sections...
> > 
> > I'll have to do some digging and get back to you.
> 
> To satisfy my curiosity, what is the difference between VMSPLIT_3G and
> VMSPLIT_2G? The fact that with VMSPLIT_2G with have physical == virtual?

Yup, but we still create an idmap in that case (we could detect this case
and avoid doing it instead) and somehow it's corrupting the underlying
kernel mapping, despite mapping the same stuff.

Will

^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug with 3.4.6, 3.5.3, 3.6.1
  2012-10-11 14:01       ` Gilles Chanteperdrix
  2012-10-11 14:03         ` Will Deacon
@ 2012-10-11 19:50         ` Will Deacon
  2012-10-11 20:58           ` Gilles Chanteperdrix
  1 sibling, 1 reply; 9+ messages in thread
From: Will Deacon @ 2012-10-11 19:50 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/11/2012 03:59 PM, Will Deacon wrote:
> I'll have to do some digging and get back to you.

Ok, so here's what I think is going on (although note that I'm at home now,
so I've not been able to test anything):

	- Your PHYS_OFFSET is at 2GB, so your static idmap is as follows:
	  idmap: 0x8029c638 - 0x8029c66c and I think your init_mm lives
	  at 0x8037f2b4.

	- The idmap takes up two sections, so actually spans from:
	  0x80200000 - 0x80400000 and is mapped as *strongly ordered*.

This means that the atomic_inc(&mm->mm_count); in secondary_start_kernel
is UNPREDICTABLE, because it results in an exclusive access to
strongly-ordered memory.

There are several ways to solve this:

	1. Avoid exclusives with the idmap (see patch below)
	2. Set idmap_pgd to swapper when VA == PA
	3. Map idmap with pages and round up text section
	4. Switch to swapper before entering secondary_start_kernel
	5. Make idmap normal (cacheable?) shared memory

However, these have some problems:

	(2) means the idmap is cacheable. This is probably not an issue
	when VA == PA, but it's still an oddity compared to other setups

	(3) is really messy

	(4,5) probably have serious issues with SMP

so I've had a crack at (1) below. Please see if it fixes your problem.

Cheers,

Will

--->8

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index d100eac..aa55580 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -290,18 +290,24 @@ static void percpu_timer_setup(void);
 asmlinkage void __cpuinit secondary_start_kernel(void)
 {
 	struct mm_struct *mm = &init_mm;
-	unsigned int cpu = smp_processor_id();
+	unsigned int cpu;
+
+	/*
+	 * The identity mapping is uncached (strongly ordered), so
+	 * switch away from it before attempting any exclusive accesses.
+	 */
+	cpu_switch_mm(mm->pgd, mm);
+	enter_lazy_tlb(mm, current);
+	local_flush_tlb_all();
 
 	/*
 	 * All kernel threads share the same mm context; grab a
 	 * reference and switch to it.
 	 */
+	cpu = smp_processor_id();
 	atomic_inc(&mm->mm_count);
 	current->active_mm = mm;
 	cpumask_set_cpu(cpu, mm_cpumask(mm));
-	cpu_switch_mm(mm->pgd, mm);
-	enter_lazy_tlb(mm, current);
-	local_flush_tlb_all();
 
 	printk("CPU%u: Booted secondary processor\n", cpu);
 

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* bug with 3.4.6, 3.5.3, 3.6.1
  2012-10-11 19:50         ` Will Deacon
@ 2012-10-11 20:58           ` Gilles Chanteperdrix
  0 siblings, 0 replies; 9+ messages in thread
From: Gilles Chanteperdrix @ 2012-10-11 20:58 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/11/2012 09:50 PM, Will Deacon wrote:

> On 10/11/2012 03:59 PM, Will Deacon wrote:
>> I'll have to do some digging and get back to you.
> 
> Ok, so here's what I think is going on (although note that I'm at home now,
> so I've not been able to test anything):
> 
> 	- Your PHYS_OFFSET is at 2GB, so your static idmap is as follows:
> 	  idmap: 0x8029c638 - 0x8029c66c and I think your init_mm lives
> 	  at 0x8037f2b4.
> 
> 	- The idmap takes up two sections, so actually spans from:
> 	  0x80200000 - 0x80400000 and is mapped as *strongly ordered*.
> 
> This means that the atomic_inc(&mm->mm_count); in secondary_start_kernel
> is UNPREDICTABLE, because it results in an exclusive access to
> strongly-ordered memory.
> 
> There are several ways to solve this:
> 
> 	1. Avoid exclusives with the idmap (see patch below)
> 	2. Set idmap_pgd to swapper when VA == PA
> 	3. Map idmap with pages and round up text section
> 	4. Switch to swapper before entering secondary_start_kernel
> 	5. Make idmap normal (cacheable?) shared memory
> 
> However, these have some problems:
> 
> 	(2) means the idmap is cacheable. This is probably not an issue
> 	when VA == PA, but it's still an oddity compared to other setups
> 
> 	(3) is really messy
> 
> 	(4,5) probably have serious issues with SMP
> 
> so I've had a crack at (1) below. Please see if it fixes your problem.
> 
> Cheers,
> 
> Will
> 
> --->8
> 
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index d100eac..aa55580 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -290,18 +290,24 @@ static void percpu_timer_setup(void);
>  asmlinkage void __cpuinit secondary_start_kernel(void)
>  {
>  	struct mm_struct *mm = &init_mm;
> -	unsigned int cpu = smp_processor_id();
> +	unsigned int cpu;
> +
> +	/*
> +	 * The identity mapping is uncached (strongly ordered), so
> +	 * switch away from it before attempting any exclusive accesses.
> +	 */
> +	cpu_switch_mm(mm->pgd, mm);
> +	enter_lazy_tlb(mm, current);
> +	local_flush_tlb_all();
>  
>  	/*
>  	 * All kernel threads share the same mm context; grab a
>  	 * reference and switch to it.
>  	 */
> +	cpu = smp_processor_id();
>  	atomic_inc(&mm->mm_count);
>  	current->active_mm = mm;
>  	cpumask_set_cpu(cpu, mm_cpumask(mm));
> -	cpu_switch_mm(mm->pgd, mm);
> -	enter_lazy_tlb(mm, current);
> -	local_flush_tlb_all();
>  
>  	printk("CPU%u: Booted secondary processor\n", cpu);


Works for me. But note that now the comment is wrong.

-- 
                                                                Gilles.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-10-11 20:58 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-11  5:46 bug with 3.4.6, 3.5.3, 3.6.1 Gilles Chanteperdrix
2012-10-11 10:36 ` Will Deacon
2012-10-11 12:54   ` Gilles Chanteperdrix
2012-10-11 13:32   ` Gilles Chanteperdrix
2012-10-11 13:59     ` Will Deacon
2012-10-11 14:01       ` Gilles Chanteperdrix
2012-10-11 14:03         ` Will Deacon
2012-10-11 19:50         ` Will Deacon
2012-10-11 20:58           ` Gilles Chanteperdrix

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).