From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter Hurley Date: Sun, 01 Dec 2013 01:18:50 +0000 Subject: Re: RED state exception (trap type 0x64) on U5 reboot Message-Id: <529A8E7A.3070009@hurleysoftware.com> List-Id: References: In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: sparclinux@vger.kernel.org On 11/30/2013 04:42 PM, Meelis Roos wrote: >>>> Another strange symptom is that the problem did not happen when >>>> openpromfs is compiled in statically, not loaded as module. When loaded >>>> as module, its memory is vmalloc()ed... but that's probably too weak >>>> connection to conclude anything. >>> >>> What happens with the not-even-compile-tested debug patch below? >> >> Now I have the results of that test. It does not trigger at all during >> normal shutdown since module is not unloaded. When I unmount openpromfs >> and rmmod openpromfs, both lines are promted to dmesg. After that, the >> RED state still happens on reboot. > > Played around some more (to reproduce the slab BUG with newer kernel > for reporting) and found 2 things: > > 1. When I apply the kzmalloc vs vmalloc revert patch to 3.12.0, it > breaks the serial layer with fireworks - did not investigate further. kmalloc() should work fine on top of 3.12.0+. Don't revert. Just change vmalloc->kmalloc and vfree->kfree. I can supply you with a patch if you'd prefer; just let me know. And please provide copies of the fireworks. > 2. When trying plain 3.12 with no debug patches but most debug options > except SLAB ones, the RED state exception is still present but I do get > a meaningful lockdep warning just before the exception. This is very > similar to the warning I posted today for sparc64 startup on another > machine (copied below). Maybe this is just some unannotated irq stuff > (or 2 independent ones) but it happens in exactly the right spot... I think the hardirqs warnings below and on the E3500 are because NMI is still enabled in p1275_cmd_direct() and arch/sparc:arch_irqs_disabled_flags() doesn't differentiate irqs on from nmi on, which triggers the WARNING. Does the RED state exception trigger if you manually break to the prom command line and issue a boot command? I'll continue to follow the SLAB bug thread in case some additional promising lead develops there. Regards, Peter Hurley > The warning from Ultra 5 with RED State Exception (full dmesg and > config are below): > > [info] Will now restart. > sd 0:0:0:0: [sda] Synchronizing SCSI cache > reboot: Restarting system > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 2826 at kernel/lockdep.c:3535 check_flags+0x7c/0x240() > DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled) > Modules linked in: openpromfs > CPU: 0 PID: 2826 Comm: reboot Tainted: G W 3.12.0 #133 > Call Trace: > [0000000000454b6c] warn_slowpath_common+0x4c/0x80 > [0000000000454c4c] warn_slowpath_fmt+0x2c/0x40 > [0000000000499e3c] check_flags+0x7c/0x240 > [000000000049d000] lock_acquire+0x20/0x80 > [000000000077afe8] _raw_spin_lock+0x28/0x40 > [00000000005f8ef4] p1275_cmd_direct+0x14/0x60 > [00000000005f8980] prom_reboot+0x20/0x40 > [0000000000434c88] machine_restart+0x48/0x60 > [000000000047d9cc] kernel_restart+0x4c/0x60 > [000000000047db34] SyS_reboot+0x134/0x200 > [00000000004060b4] linux_sparc_syscall32+0x34/0x40 > ---[ end trace 4759822ebc3658d5 ]--- > possible reason: unannotated irqs-off. > irq event stamp: 3799 > hardirqs last enabled at (3799): [<0000000000404b1c>] rtrap_xcall+0x18/0x20 > hardirqs last disabled at (3797): [<0000000000459380>] __do_softirq+0x100/0x180 > softirqs last enabled at (3798): [<00000000004593dc>] __do_softirq+0x15c/0x180 > softirqs last disabled at (3791): [<000000000042b89c>] do_softirq+0x5c/0xa0 > > RED State Exception > > TL00.0000.0000.0005 TT00.0000.0000.0064 > TPC00.0000.f000.4c80 TnPC00.0000.f000.4c84 TSTATE00.0099.1104.1403 > TL00.0000.0000.0004 TT00.0000.0000.0064 > TPC00.0000.f000.4c80 TnPC00.0000.f000.4c84 TSTATE00.0099.1104.1403 > TL00.0000.0000.0003 TT00.0000.0000.0064 > TPC00.0000.f000.4c80 TnPC00.0000.f000.4c84 TSTATE00.0099.1104.1403 > TL00.0000.0000.0002 TT00.0000.0000.0064 > TPC00.0000.f000.0c80 TnPC00.0000.f000.0c84 TSTATE00.0099.1104.1403 > TL00.0000.0000.0001 TT00.0000.0000.0064 > TPC00.0000.f004.55c0 TnPC00.0000.f004.55c4 TSTATE00.0099.1100.1603 > > > The warning from Sun E3500 startup: > > WARNING: CPU: 6 PID: 1 at kernel/locking/lockdep.c:3535 check_flags+0x7c/0x240() > DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled) > Modules linked in: > CPU: 6 PID: 1 Comm: swapper/6 Not tainted 3.13.0-rc1-dirty #17 > Call Trace: > [00000000004585cc] warn_slowpath_common+0x4c/0x80 > [00000000004586ac] warn_slowpath_fmt+0x2c/0x40 > [0000000000498d9c] check_flags+0x7c/0x240 > [000000000049bf40] lock_acquire+0x20/0x80 > [000000000081e188] _raw_spin_lock+0x28/0x40 > [000000000061bd74] p1275_cmd_direct+0x14/0x60 > [000000000061bc0c] prom_startcpu+0x2c/0x40 > [000000000043e3bc] __cpu_up+0x5c/0x180 > [0000000000458830] _cpu_up.constprop.1+0xd0/0x160 > [0000000000458958] cpu_up+0x58/0x80 > [00000000009fe2b4] smp_init+0x74/0xbc > [00000000009f49e4] kernel_init_freeable+0x7c/0x110 > [000000000080af24] kernel_init+0x4/0x120 > [00000000004060c4] ret_from_fork+0x1c/0x2c > [0000000000000000] (null) > ---[ end trace e61cc8445001155f ]--- > possible reason: unannotated irqs-off. > irq event stamp: 2051 > hardirqs last enabled at (2051): [<000000000081e9d8>] _raw_spin_unlock_irqrestore+0x38/0x60 > hardirqs last disabled at (2050): [<000000000081e234>] _raw_spin_lock_irqsave+0x14/0x60 > softirqs last enabled at (398): [<000000000045d098>] __do_softirq+0x178/0x200 > softirqs last disabled at (393): [<000000000042bb8c>] do_softirq_own_stack+0x2c/0x40