public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Mutex debug lock failure [was Re: Bad gcc-4.1.0 leads to Power4 crashes... and power5 too, actually
       [not found]     ` <1166650134.6673.9.camel@localhost.localdomain>
@ 2006-12-20 23:03       ` Linas Vepstas
  2006-12-20 23:09         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 7+ messages in thread
From: Linas Vepstas @ 2006-12-20 23:03 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Ingo Molnar
  Cc: Peter Bergner, linuxppc-dev, linux-kernel

On Thu, Dec 21, 2006 at 08:28:54AM +1100, Benjamin Herrenschmidt wrote:
> On Wed, 2006-12-20 at 15:19 -0600, Linas Vepstas wrote:
> > On Tue, Dec 19, 2006 at 07:46:50PM -0600, Peter Bergner wrote:
> > 
> > I'm trying to figure out how to try a different compiler,
> > I'm hoping that 3.3 can still compile new kernels.
> > 
> > I'll try to stare at the dump a bit too, now.
> 
> I've been using 4.1.2 from debian/ubuntu happily lately.

Do you have mutex debugging turned on?

It doesn't seem to be a compiler bug. Fails in the same place,
when compiled with gcc-3.3

Reading the code, it looks like the test case double_unlock_mutex() 
fails, because the mutex lock->owner  == NULL.  
whereas, in kernel/mutex-debug.c, it wanted 

void debug_mutex_unlock(struct mutex *lock) {
   DEBUG_LOCKS_WARN_ON(lock->owner != current_thread_info());

This is called from 
lib/locking-selftest.c  circa line 490
#include "locking-selftest-mutex.h"
GENERATE_TESTCASE(double_unlock_mutex)

This may be because include/linux/mutex-debug.h which has
 __DEBUG_MUTEX_INITIALIZER in it, failes to set .owner to
any interesting value.

WHile I desperately want to conclude that the mutex lock debug 
code is broken, it seems to be unchanged fron 2.6.19
(and seems to work fine in 2.6.19-git7), so I am confused.

This also doesn't explain the bizarre form of the failure on
power4 ... does power4 mishandle twi somehow? Will investigate
next. 

--linas

p.s. To recap: above report is for linux-2.6.20-rc1-git6

I get the following on power5:

97742.318323]          A-B-B-C-C-D-D-A deadlock:failed|failed|  ok |failed|failed|failed|
[97742.318352]          A-B-C-D-B-D-D-A deadlock:failed|failed|  ok |failed|failed|failed|
[97742.318380]          A-B-C-D-B-C-D-A deadlock:failed|failed|  ok |failed|failed|failed|
[97742.318408]                     double unlock:  ok  |  ok |failed|<0>------------[ cut here ]------------
[97742.318440] Kernel BUG at c00000000007d574 [verbose debug info
unavailable]
cpu 0x0: Vector: 700 (Program Check) at [c0000000007a3980]
    pc: c00000000007d574: .debug_mutex_unlock+0x5c/0x118
    lr: c000000000468068: .__mutex_unlock_slowpath+0x104/0x198
    sp: c0000000007a3c00
   msr: 8000000000029032
  current = 0xc000000000663690
  paca    = 0xc000000000663f80
    pid   = 0, comm = swapper
enter ? for help
[c0000000007a3c80] c000000000468068 .__mutex_unlock_slowpath+0x104/0x198
[c0000000007a3d20] c000000000231da8 .double_unlock_mutex+0x3c/0x58
[c0000000007a3db0] c00000000023b47c .dotest+0x5c/0x370
[c0000000007a3e50] c00000000023bc0c .locking_selftest+0x47c/0x17fc
[c0000000007a3ef0] c0000000005f06ec .start_kernel+0x1e4/0x344
[c0000000007a3f90] c0000000000084c8 .start_here_common+0x54/0x8c
0:mon>
0:mon> r

0:mon> di c00000000007d518
c00000000007d518  7c0802a6      mflr    r0
c00000000007d51c  fbc1fff0      std     r30,-16(r1)
c00000000007d520  fbe1fff8      std     r31,-8(r1)
c00000000007d524  ebc2d708      ld      r30,-10488(r2)
c00000000007d528  7c7f1b78      mr      r31,r3
c00000000007d52c  f8010010      std     r0,16(r1)
c00000000007d530  f821ff81      stdu    r1,-128(r1)
c00000000007d534  e93e8008      ld      r9,-32760(r30)
  r9 = toc ptr debug_locks
c00000000007d538  80090000      lwz     r0,0(r9)
c00000000007d53c  2f800000      cmpwi   cr7,r0,0
c00000000007d540  419e00d8      beq     cr7,c00000000007d618    # .debug_mutex_unlock+0x100/0x118
   if (unlikely(!debug_locks)) return;

c00000000007d544  e8030030      ld      r0,48(r3)
  r0 = lock-> owner  == NULL
c00000000007d548  78290464      rldicr  r9,r1,0,49
  r9 = bottom 14 bits lopped off of stack pointer, giving current ptr

c00000000007d54c  7fa04800      cmpd    cr7,r0,r9
c00000000007d550  41be0028      beq     cr7,c00000000007d578    # .debug_mutex_unlock+0x60/0x118
c00000000007d554  e93e8000      ld      r9,-32768(r30)
c00000000007d558  80090000      lwz     r0,0(r9)
c00000000007d55c  2f800000      cmpwi   cr7,r0,0
c00000000007d560  409e0014      bne     cr7,c00000000007d574    # .debug_mutex_unlock+0x5c/0x118
c00000000007d564  481b1f4d      bl      c00000000022f4b0        # .debug_locks_off+0x0/0x64
c00000000007d568  60000000      nop
c00000000007d56c  2fa30000      cmpdi   cr7,r3,0
c00000000007d570  419e0008      beq     cr7,c00000000007d578    # .debug_mutex_unlock+0x60/0x118
c00000000007d574  0fe00000      twi     31,r0,0

crash here

void debug_mutex_unlock(struct mutex *lock)
{
   if (unlikely(!debug_locks))
      return;

   DEBUG_LOCKS_WARN_ON(lock->owner != current_thread_info());
   DEBUG_LOCKS_WARN_ON(lock->magic != lock);
   DEBUG_LOCKS_WARN_ON(!lock->wait_list.prev && !lock->wait_list.next);
   DEBUG_LOCKS_WARN_ON(lock->owner != current_thread_info());
}

r3 is struct mutex, which is

0:mon> d c0000000006c12c0
c0000000006c12c0 0000000100000000 80000000dead4ead  |..............N.|
c0000000006c12d0 ffffffff00000000 ffffffffffffffff  |................|
c0000000006c12e0 c0000000006c12e0 c0000000006c12e0  |.....l.......l..|
c0000000006c12f0 0000000000000000 0000000000000000  |................|
c0000000006c1300 c0000000006c12c0 0000000000000000  |.....l..........|
c0000000006c1310 00000000dead4ead ffffffff00000000  |......N.........|
c0000000006c1320 ffffffffffffffff c0000000006c1328  |.............l.(|
c0000000006c1330 c0000000006c1328 0000000100000000  |.....l.(........|

struct mutex {
   /* 1: unlocked, 0: locked, negative: locked, possible waiters */
   atomic_t    count;                   // 1
   spinlock_t     wait_lock;            // 80000000dead4ead ffffffff00000000  ffffffffffffffff
   struct list_head  wait_list;         // c0000000006c12e0 c0000000006c12e0
#ifdef CONFIG_DEBUG_MUTEXES
   struct thread_info   *owner;         // 0000000000000000
   const char     *name;                // 0000000000000000
   void        *magic;                  // c0000000006c12c0
#endif
#ifdef CONFIG_DEBUG_LOCK_ALLOC
   struct lockdep_map   dep_map;
#endif
};

typedef struct {
   raw_spinlock_t raw_lock;         // 80000000
#if defined(CONFIG_PREEMPT) && defined(CONFIG_SMP)  // # CONFIG_PREEMPT
is not set
   unsigned int break_lock;
#endif
#ifdef CONFIG_DEBUG_SPINLOCK
   unsigned int magic, owner_cpu;   // dead4ead  ffffffff
   void *owner;                     // ffffffffffffffff
#endif
#ifdef CONFIG_DEBUG_LOCK_ALLOC
   struct lockdep_map dep_map;
#endif
} spinlock_t;


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Mutex debug lock failure [was Re: Bad gcc-4.1.0 leads to Power4 crashes... and power5 too, actually
  2006-12-20 23:03       ` Mutex debug lock failure [was Re: Bad gcc-4.1.0 leads to Power4 crashes... and power5 too, actually Linas Vepstas
@ 2006-12-20 23:09         ` Benjamin Herrenschmidt
  2006-12-20 23:46           ` Linas Vepstas
  0 siblings, 1 reply; 7+ messages in thread
From: Benjamin Herrenschmidt @ 2006-12-20 23:09 UTC (permalink / raw)
  To: Linas Vepstas; +Cc: Ingo Molnar, Peter Bergner, linuxppc-dev, linux-kernel


> This also doesn't explain the bizarre form of the failure on
> power4 ... does power4 mishandle twi somehow? Will investigate
> next. 

Or you hit it before you have a console ?

Ben.

> --linas
> 
> p.s. To recap: above report is for linux-2.6.20-rc1-git6
> 
> I get the following on power5:
> 
> 97742.318323]          A-B-B-C-C-D-D-A deadlock:failed|failed|  ok |failed|failed|failed|
> [97742.318352]          A-B-C-D-B-D-D-A deadlock:failed|failed|  ok |failed|failed|failed|
> [97742.318380]          A-B-C-D-B-C-D-A deadlock:failed|failed|  ok |failed|failed|failed|
> [97742.318408]                     double unlock:  ok  |  ok |failed|<0>------------[ cut here ]------------
> [97742.318440] Kernel BUG at c00000000007d574 [verbose debug info
> unavailable]
> cpu 0x0: Vector: 700 (Program Check) at [c0000000007a3980]
>     pc: c00000000007d574: .debug_mutex_unlock+0x5c/0x118
>     lr: c000000000468068: .__mutex_unlock_slowpath+0x104/0x198
>     sp: c0000000007a3c00
>    msr: 8000000000029032
>   current = 0xc000000000663690
>   paca    = 0xc000000000663f80
>     pid   = 0, comm = swapper
> enter ? for help
> [c0000000007a3c80] c000000000468068 .__mutex_unlock_slowpath+0x104/0x198
> [c0000000007a3d20] c000000000231da8 .double_unlock_mutex+0x3c/0x58
> [c0000000007a3db0] c00000000023b47c .dotest+0x5c/0x370
> [c0000000007a3e50] c00000000023bc0c .locking_selftest+0x47c/0x17fc
> [c0000000007a3ef0] c0000000005f06ec .start_kernel+0x1e4/0x344
> [c0000000007a3f90] c0000000000084c8 .start_here_common+0x54/0x8c
> 0:mon>
> 0:mon> r
> 
> 0:mon> di c00000000007d518
> c00000000007d518  7c0802a6      mflr    r0
> c00000000007d51c  fbc1fff0      std     r30,-16(r1)
> c00000000007d520  fbe1fff8      std     r31,-8(r1)
> c00000000007d524  ebc2d708      ld      r30,-10488(r2)
> c00000000007d528  7c7f1b78      mr      r31,r3
> c00000000007d52c  f8010010      std     r0,16(r1)
> c00000000007d530  f821ff81      stdu    r1,-128(r1)
> c00000000007d534  e93e8008      ld      r9,-32760(r30)
>   r9 = toc ptr debug_locks
> c00000000007d538  80090000      lwz     r0,0(r9)
> c00000000007d53c  2f800000      cmpwi   cr7,r0,0
> c00000000007d540  419e00d8      beq     cr7,c00000000007d618    # .debug_mutex_unlock+0x100/0x118
>    if (unlikely(!debug_locks)) return;
> 
> c00000000007d544  e8030030      ld      r0,48(r3)
>   r0 = lock-> owner  == NULL
> c00000000007d548  78290464      rldicr  r9,r1,0,49
>   r9 = bottom 14 bits lopped off of stack pointer, giving current ptr
> 
> c00000000007d54c  7fa04800      cmpd    cr7,r0,r9
> c00000000007d550  41be0028      beq     cr7,c00000000007d578    # .debug_mutex_unlock+0x60/0x118
> c00000000007d554  e93e8000      ld      r9,-32768(r30)
> c00000000007d558  80090000      lwz     r0,0(r9)
> c00000000007d55c  2f800000      cmpwi   cr7,r0,0
> c00000000007d560  409e0014      bne     cr7,c00000000007d574    # .debug_mutex_unlock+0x5c/0x118
> c00000000007d564  481b1f4d      bl      c00000000022f4b0        # .debug_locks_off+0x0/0x64
> c00000000007d568  60000000      nop
> c00000000007d56c  2fa30000      cmpdi   cr7,r3,0
> c00000000007d570  419e0008      beq     cr7,c00000000007d578    # .debug_mutex_unlock+0x60/0x118
> c00000000007d574  0fe00000      twi     31,r0,0
> 
> crash here
> 
> void debug_mutex_unlock(struct mutex *lock)
> {
>    if (unlikely(!debug_locks))
>       return;
> 
>    DEBUG_LOCKS_WARN_ON(lock->owner != current_thread_info());
>    DEBUG_LOCKS_WARN_ON(lock->magic != lock);
>    DEBUG_LOCKS_WARN_ON(!lock->wait_list.prev && !lock->wait_list.next);
>    DEBUG_LOCKS_WARN_ON(lock->owner != current_thread_info());
> }
> 
> r3 is struct mutex, which is
> 
> 0:mon> d c0000000006c12c0
> c0000000006c12c0 0000000100000000 80000000dead4ead  |..............N.|
> c0000000006c12d0 ffffffff00000000 ffffffffffffffff  |................|
> c0000000006c12e0 c0000000006c12e0 c0000000006c12e0  |.....l.......l..|
> c0000000006c12f0 0000000000000000 0000000000000000  |................|
> c0000000006c1300 c0000000006c12c0 0000000000000000  |.....l..........|
> c0000000006c1310 00000000dead4ead ffffffff00000000  |......N.........|
> c0000000006c1320 ffffffffffffffff c0000000006c1328  |.............l.(|
> c0000000006c1330 c0000000006c1328 0000000100000000  |.....l.(........|
> 
> struct mutex {
>    /* 1: unlocked, 0: locked, negative: locked, possible waiters */
>    atomic_t    count;                   // 1
>    spinlock_t     wait_lock;            // 80000000dead4ead ffffffff00000000  ffffffffffffffff
>    struct list_head  wait_list;         // c0000000006c12e0 c0000000006c12e0
> #ifdef CONFIG_DEBUG_MUTEXES
>    struct thread_info   *owner;         // 0000000000000000
>    const char     *name;                // 0000000000000000
>    void        *magic;                  // c0000000006c12c0
> #endif
> #ifdef CONFIG_DEBUG_LOCK_ALLOC
>    struct lockdep_map   dep_map;
> #endif
> };
> 
> typedef struct {
>    raw_spinlock_t raw_lock;         // 80000000
> #if defined(CONFIG_PREEMPT) && defined(CONFIG_SMP)  // # CONFIG_PREEMPT
> is not set
>    unsigned int break_lock;
> #endif
> #ifdef CONFIG_DEBUG_SPINLOCK
>    unsigned int magic, owner_cpu;   // dead4ead  ffffffff
>    void *owner;                     // ffffffffffffffff
> #endif
> #ifdef CONFIG_DEBUG_LOCK_ALLOC
>    struct lockdep_map dep_map;
> #endif
> } spinlock_t;


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Mutex debug lock failure [was Re: Bad gcc-4.1.0 leads to Power4 crashes... and power5 too, actually
  2006-12-20 23:09         ` Benjamin Herrenschmidt
@ 2006-12-20 23:46           ` Linas Vepstas
  2006-12-21  0:36             ` Anton Blanchard
  0 siblings, 1 reply; 7+ messages in thread
From: Linas Vepstas @ 2006-12-20 23:46 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ingo Molnar, Peter Bergner, linuxppc-dev, linux-kernel

On Thu, Dec 21, 2006 at 10:09:55AM +1100, Benjamin Herrenschmidt wrote:
> 
> > This also doesn't explain the bizarre form of the failure on
> > power4 ... does power4 mishandle twi somehow? Will investigate
> > next. 
> 
> Or you hit it before you have a console ?

Ahh... I was about to claim I have a console, but on closer
examination: someone has been messing with consoles? 
I don't get it ... The power4 has ony a serial connection.
the power5 has a graphics card in it, unused.

The power5 has "Console: colour dummy device 80x25"
kick in, at which point the time goes crazy as well.

There may be multiple bugs here. 

The weird crash for 2.6.19-git7 on power4 looks like this:

Looking for displays
opening PHB /pci@400000000110... done
opening PHB /pci@400000000111... done
instantiating rtas at 0x000000002fd10000 ... done
0000000000000000 : boot cpu     0000000000000000
copying OF device tree ...
Building dt strings...
Building dt structure...
Device tree strings 0x00000000043a0000 -> 0x00000000043a135c
Device tree struct  0x00000000043b0000 -> 0x00000000043c0000
Calling quiesce ...
returning from prom_init
[    0.000000] Starting Linux PPC64 #0 SMP Tue Dec 5 17:20:17 CST 2006
[    0.000000] -----------------------------------------------------
[    0.000000] ppc64_pft_size                = 0x0
[    0.000000] physicalMemorySize            = 0x400000000
[    0.000000] ppc64_caches.dcache_line_size = 0x80
[    0.000000] ppc64_caches.icache_line_size = 0x80
[    0.000000] htab_address                  = 0xc0000003e0000000
[    0.000000] htab_hash_mask                = 0x1fffff
[    0.000000] -----------------------------------------------------
[    0.000000] Linux version 2.6.19-git7linas (root@venn) (gcc version
4.1.0 (SUSE Linux)) #0 SMP Tue Dec 5 17:20:17 CST 2006
[    0.000000] [boot]0012 Setup Arch
[    0.000000] No ramdisk, default root is /dev/sda2
[    0.000000] EEH: PCI Enhanced I/O Error Handling Enabled
[    0.000000] PPC64 nvram contains 81920 bytes
[    0.000000] Zone PFN ranges:
[    0.000000]   DMA             0 ->   262144
[    0.000000]   Normal     262144 ->   262144
[    0.000000] early_node_map[1] active PFN ranges
[    0.000000]     0:        0 ->   262144
[    0.000000] [boot]0015 Setup Done
[    0.000000] Built 1 zonelists.  Total pages: 261888
[    0.000000] Kernel command line: root=/dev/sda3 console=ttyS0
selinux=0 elevator=cfq kdb=on
[    0.000000] [boot]0020 XICS Init
[    0.000000] i8259 legacy interrupt controller initialized
[    0.000000] [boot]0021 XICS Done
[    0.000000] PID hash table entries: 4096 (order: 12, 32768 bytes)
System assert at:  file: rtas_io_config.c  -- line: 195
rio_hub_num: 10
drawer_num: 6
phb_num: 3
buid: 7


which should be compared to 2.6.20-rc1-git6, build with the same
.config, on power4:

Looking for displays
opening PHB /pci@400000000110... done
opening PHB /pci@400000000111... done
instantiating rtas at 0x000000002fd10000 ... done
0000000000000000 : boot cpu     0000000000000000
copying OF device tree ...
Building dt strings...
Building dt structure...
Device tree strings 0x0000000004380000 -> 0x000000000438135c
Device tree struct  0x0000000004390000 -> 0x00000000043a0000
Calling quiesce ...
returning from prom_init
[    0.000000] Starting Linux PPC64 #0 SMP Tue Dec 19 17:42:44 CST 2006
[    0.000000] -----------------------------------------------------
[    0.000000] ppc64_pft_size                = 0x0
[    0.000000] physicalMemorySize            = 0x400000000
[    0.000000] ppc64_caches.dcache_line_size = 0x80
[    0.000000] ppc64_caches.icache_line_size = 0x80
[    0.000000] htab_address                  = 0xc0000003e0000000
[    0.000000] htab_hash_mask                = 0x1fffff
[    0.000000] -----------------------------------------------------
[    0.000000] Linux version 2.6.20-rc1-git6linas (root@venn) (gcc
version 4.1.0 (SUSE Linux)) #0 SMP Tue Dec 19 17:42:44 CST 2006
[    0.000000] [boot]0012 Setup Arch
[    0.000000] No ramdisk, default root is /dev/sda2
[    0.000000] EEH: PCI Enhanced I/O Error Handling Enabled
[    0.000000] PPC64 nvram contains 81920 bytes
[    0.000000] Zone PFN ranges:
[    0.000000]   DMA             0 ->   262144
[    0.000000]   Normal     262144 ->   262144
[    0.000000] early_node_map[1] active PFN ranges
[    0.000000]     0:        0 ->   262144
[    0.000000] [boot]0015 Setup Done
[    0.000000] Built 1 zonelists.  Total pages: 261888
[    0.000000] Kernel command line: root=/dev/sda3 console=ttyS0
selinux=0 elevator=cfq kdb=on
[    0.000000] [boot]0020 XICS Init
[    0.000000] i8259 legacy interrupt controller initialized
[    0.000000] [boot]0021 XICS Done
[    0.000000] PID hash table entries: 4096 (order: 12, 32768 bytes)
cpu 0x0: Vector: 700 (Program Check) at [c0000000007a3980]
    pc: c00000000007d574: .debug_mutex_unlock+0x5c/0x118
    lr: c000000000468068: .__mutex_unlock_slowpath+0x104/0x198
    sp: c0000000007a3c00
   msr: 9000000000029032
  current = 0xc000000000663690
  paca    = 0xc000000000663f80
    pid   = 0, comm = swapper
enter ? for help
[c0000000007a3c80] c000000000468068 .__mutex_unlock_slowpath+0x104/0x198
[c0000000007a3d20] c000000000231da8 .double_unlock_mutex+0x3c/0x58
[c0000000007a3db0] c00000000023b47c .dotest+0x5c/0x370
[c0000000007a3e50] c00000000023bc0c .locking_selftest+0x47c/0x17fc
[c0000000007a3ef0] c0000000005f06ec .start_kernel+0x1e4/0x344
[c0000000007a3f90] c0000000000084c8 .start_here_common+0x54/0x8c
0:mon>


And, exactly the same kernel, on power5:

Looking for displays
found display   : /pci@800000020000002/pci@2,2/pci@1/display@0, opening
... done
instantiating rtas at 0x0000000007710000 ... done
0000000000000000 : boot cpu     0000000000000000
0000000000000002 : starting cpu hw idx 0000000000000002... done
copying OF device tree ...
Building dt strings...
Building dt structure...
Device tree strings 0x0000000002580000 -> 0x000000000258141e
Device tree struct  0x0000000002590000 -> 0x00000000025a0000
[    0.000000] Using pSeries machine description
[    0.000000] Partition configured for 4 cpus.
[    0.000000] Starting Linux PPC64 #0 SMP Tue Dec 19 17:42:44 CST 2006
[    0.000000] -----------------------------------------------------
[    0.000000] ppc64_pft_size                = 0x19
[    0.000000] physicalMemorySize            = 0x72000000
[    0.000000] ppc64_caches.dcache_line_size = 0x80
[    0.000000] ppc64_caches.icache_line_size = 0x80
[    0.000000] htab_address                  = 0x0000000000000000
[    0.000000] htab_hash_mask                = 0x3ffff
[    0.000000] -----------------------------------------------------
[    0.000000] Linux version 2.6.20-rc1-git6linas (root@venn) (gcc
version 4.1.0 (SUSE Linux)) #0 SMP Tue Dec 19 17:42:446[    0.000000]
[boot]0012 Setup Arch
[    0.000000] No ramdisk, default root is /dev/sda2
[    0.000000] EEH: PCI Enhanced I/O Error Handling Enabled
[    0.000000] PPC64 nvram contains 7168 bytes
[    0.000000] Zone PFN ranges:
[    0.000000]   DMA             0 ->    29184
[    0.000000]   Normal      29184 ->    29184
[    0.000000] early_node_map[1] active PFN ranges
[    0.000000]     0:        0 ->    29184
[    0.000000] [boot]0015 Setup Done
[    0.000000] Built 1 zonelists.  Total pages: 29156
[    0.000000] Kernel command line: root=/dev/sda9 console=hvsi0 selinux=0 elevator=cfq rtas_msgs=1
[    0.000000] [boot]0020 XICS Init
[    0.000000] [boot]0021 XICS Done
[    0.000000] PID hash table entries: 4096 (order: 12, 32768 bytes)
[97741.568069] Console: colour dummy device 80x25
[97742.318175] ------------------------
[97742.318182] | Locking API testsuite:
[97742.318189] ----------------------------------------------------------------------------
[97742.318200]                                  | spin |wlock |rlock |mutex | wsem | rsem |
[97742.318211] --------------------------------------------------------------------------
[97742.318222]                      A-A deadlock:failed|failed|  ok |failed|failed|failed|
[97742.318246]                  A-B-B-A deadlock:failed|failed|  ok |failed|failed|failed|
[97742.318271]              A-B-B-C-C-A deadlock:failed|failed|  ok |failed|failed|failed|
[97742.318297]              A-B-C-A-B-C deadlock:failed|failed|  ok |failed|failed|failed|
[97742.318323]          A-B-B-C-C-D-D-A deadlock:failed|failed|  ok |failed|failed|failed|
[97742.318352]          A-B-C-D-B-D-D-A deadlock:failed|failed|  ok |failed|failed|failed|
[97742.318380]          A-B-C-D-B-C-D-A deadlock:failed|failed|  ok |failed|failed|failed|
[97742.318408]                     double unlock:  ok  |  ok |failed|<0>------------[ cut here ]------------
[97742.318440] Kernel BUG at c00000000007d574 [verbose debug info
unavailable]
cpu 0x0: Vector: 700 (Program Check) at [c0000000007a3980]
    pc: c00000000007d574: .debug_mutex_unlock+0x5c/0x118
    lr: c000000000468068: .__mutex_unlock_slowpath+0x104/0x198
    sp: c0000000007a3c00
   msr: 8000000000029032
  current = 0xc000000000663690
  paca    = 0xc000000000663f80
    pid   = 0, comm = swapper
enter ? for help
[c0000000007a3c80] c000000000468068 .__mutex_unlock_slowpath+0x104/0x198
[c0000000007a3d20] c000000000231da8 .double_unlock_mutex+0x3c/0x58
[c0000000007a3db0] c00000000023b47c .dotest+0x5c/0x370
[c0000000007a3e50] c00000000023bc0c .locking_selftest+0x47c/0x17fc
[c0000000007a3ef0] c0000000005f06ec .start_kernel+0x1e4/0x344
[c0000000007a3f90] c0000000000084c8 .start_here_common+0x54/0x8c
0:mon> r


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Mutex debug lock failure [was Re: Bad gcc-4.1.0 leads to Power4 crashes... and power5 too, actually
  2006-12-20 23:46           ` Linas Vepstas
@ 2006-12-21  0:36             ` Anton Blanchard
  2006-12-21  1:03               ` Linas Vepstas
  0 siblings, 1 reply; 7+ messages in thread
From: Anton Blanchard @ 2006-12-21  0:36 UTC (permalink / raw)
  To: Linas Vepstas
  Cc: Benjamin Herrenschmidt, Ingo Molnar, linux-kernel, linuxppc-dev


On Wed, Dec 20, 2006 at 05:46:47PM -0600, Linas Vepstas wrote:

> System assert at:  file: rtas_io_config.c  -- line: 195
> rio_hub_num: 10
> drawer_num: 6
> phb_num: 3
> buid: 7

Looks like a firmware assert. Did you pass in something dodgy to a
config read/write op? Maybe a bad buid?

Anton

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Mutex debug lock failure [was Re: Bad gcc-4.1.0 leads to Power4 crashes... and power5 too, actually
  2006-12-21  0:36             ` Anton Blanchard
@ 2006-12-21  1:03               ` Linas Vepstas
  2006-12-21 14:41                 ` Ingo Molnar
  0 siblings, 1 reply; 7+ messages in thread
From: Linas Vepstas @ 2006-12-21  1:03 UTC (permalink / raw)
  To: Anton Blanchard
  Cc: Benjamin Herrenschmidt, Ingo Molnar, linux-kernel, linuxppc-dev

On Thu, Dec 21, 2006 at 11:36:59AM +1100, Anton Blanchard wrote:
> 
> On Wed, Dec 20, 2006 at 05:46:47PM -0600, Linas Vepstas wrote:
> 
> > System assert at:  file: rtas_io_config.c  -- line: 195
> > rio_hub_num: 10
> > drawer_num: 6
> > phb_num: 3
> > buid: 7
> 
> Looks like a firmware assert. Did you pass in something dodgy to a
> config read/write op? Maybe a bad buid?

Same kernel runs fine on power5. Although it does have patches
applied, those very same patches boot fine when applied to a slightly
older kernel (2.6.19-rc4).  I haven't been messing with buids or 
pci config space (at least not intentionaly).

I'll try again with an unpatched, unmodified kernel. 

--linas

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Mutex debug lock failure [was Re: Bad gcc-4.1.0 leads to Power4 crashes... and power5 too, actually
  2006-12-21  1:03               ` Linas Vepstas
@ 2006-12-21 14:41                 ` Ingo Molnar
  2006-12-21 21:12                   ` Linas Vepstas
  0 siblings, 1 reply; 7+ messages in thread
From: Ingo Molnar @ 2006-12-21 14:41 UTC (permalink / raw)
  To: Linas Vepstas
  Cc: Anton Blanchard, Benjamin Herrenschmidt, linux-kernel,
	linuxppc-dev, mingo

On Wed, 2006-12-20 at 19:03 -0600, Linas Vepstas wrote:
> Same kernel runs fine on power5. Although it does have patches
> applied, those very same patches boot fine when applied to a slightly
> older kernel (2.6.19-rc4).  I haven't been messing with buids or 
> pci config space (at least not intentionaly).
> 
> I'll try again with an unpatched, unmodified kernel.

there have been a number of fixes to lockdep recently - could you try
the kernel/lockdep.c file from latest -mm, does that fail too?

one possibility would be a chain-hash collision.

	Ingo


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Mutex debug lock failure [was Re: Bad gcc-4.1.0 leads to Power4 crashes... and power5 too, actually
  2006-12-21 14:41                 ` Ingo Molnar
@ 2006-12-21 21:12                   ` Linas Vepstas
  0 siblings, 0 replies; 7+ messages in thread
From: Linas Vepstas @ 2006-12-21 21:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Anton Blanchard, Benjamin Herrenschmidt, linux-kernel,
	linuxppc-dev, mingo

On Thu, Dec 21, 2006 at 03:41:39PM +0100, Ingo Molnar wrote:
> On Wed, 2006-12-20 at 19:03 -0600, Linas Vepstas wrote:
> > Same kernel runs fine on power5. Although it does have patches
> > applied, those very same patches boot fine when applied to a slightly
> > older kernel (2.6.19-rc4).  I haven't been messing with buids or 
> > pci config space (at least not intentionaly).
> > 
> > I'll try again with an unpatched, unmodified kernel.
> 
> there have been a number of fixes to lockdep recently - could you try
> the kernel/lockdep.c file from latest -mm, does that fail too?
> 
> one possibility would be a chain-hash collision.

I see the same problem on linux-2.6.20-rc1-mm1 

The patch below fixes this, although I don't understand why 
this has become an issue just now:

Index: linux-2.6.20-rc1-mm1/kernel/mutex.c
===================================================================
--- linux-2.6.20-rc1-mm1.orig/kernel/mutex.c    2006-12-19
16:19:34.000000000 -0600
+++ linux-2.6.20-rc1-mm1/kernel/mutex.c 2006-12-21 14:31:33.000000000
-0600
@@ -249,7 +249,7 @@ __mutex_unlock_common_slowpath(atomic_t
                wake_up_process(waiter->task);
        }

-       debug_mutex_clear_owner(lock);
+       // debug_mutex_clear_owner(lock);

        spin_unlock_mutex(&lock->wait_lock, flags);
 }


It obvious that this is the proximal cause of the failure of 
the double_unlock_mutex() mutex self-test.  However, both
the double-unlock test, and this clear_owner() call, are 
in linux-2.6.19-git7, which doesn't fail this test. So I conclude
that __mutex_unlock_common_slowpath() is never taken in 2.6.19
but is always taken on 2.6.20-rc1 (in particular, is taken
during the double-unlock test).

I don't know why that would be. 

It might be wise to add a test to make sure the slowpath
is taken only when it should be taken? Its sort of scary 
to think that it might be always taken, and that no one 
notices the problem...

I'm gonna be out until after Christmas. -- and so, 

Merry Christmas! 
 
--linas



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-12-21 21:12 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20061220004653.GL5506@austin.ibm.com>
     [not found] ` <1166579210.4963.15.camel@otta>
     [not found]   ` <20061220211931.GB16860@austin.ibm.com>
     [not found]     ` <1166650134.6673.9.camel@localhost.localdomain>
2006-12-20 23:03       ` Mutex debug lock failure [was Re: Bad gcc-4.1.0 leads to Power4 crashes... and power5 too, actually Linas Vepstas
2006-12-20 23:09         ` Benjamin Herrenschmidt
2006-12-20 23:46           ` Linas Vepstas
2006-12-21  0:36             ` Anton Blanchard
2006-12-21  1:03               ` Linas Vepstas
2006-12-21 14:41                 ` Ingo Molnar
2006-12-21 21:12                   ` Linas Vepstas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox