From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter Hurley Date: Wed, 23 Oct 2013 15:00:24 +0000 Subject: Re: RED state exception (trap type 0x64) on U5 reboot Message-Id: <5267E488.9060606@hurleysoftware.com> List-Id: References: In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: sparclinux@vger.kernel.org On 10/21/2013 04:58 AM, Meelis Roos wrote: >> Somwehere between 3.11.0 and 3.12-rc2, my U5-360 has consistently been >> >hanging on reboot. Today I connected a serial cable and learned about a >> >RED state exception. 3.10.0 and 3.11.0 are OK, 3.12-rc2 and later hang >> >reliably. I have not yet started bisecting since this will need remote >> >power cycle setup. > Another data point: the same problem happens on Sun Blade 100 with ALI > IDE. Does not happen on Fire V100 and Netra X1 that are also ALI IDE > based. The configs may be different too of course. > > I did a bisect for full tree. It landed into tty commits, some of them > being untestable without a compile fix Hi Meelis, What tty commits required a compile fix? > but it came out clearly finally > (each bad commit was clearly bad, each good commit was tested for 3 > reboots without a problem). Bisect resulted in his commit being at > fault: > > 8cb06c983822103da1cfe57b9901e60a00e61f67 is the first bad commit > commit 8cb06c983822103da1cfe57b9901e60a00e61f67 > Author: Peter Hurley > Date: Sat Jun 15 10:21:18 2013 -0400 > > n_tty: Remove alias ptrs in __receive_buf() > > The char and flag buffer local alias pointers, p and f, are > unnecessary; remove them. > > Signed-off-by: Peter Hurley > Signed-off-by: Greg Kroah-Hartman > > :040000 040000 ddc901fe810f43bc06a64397735b469b11e403e8 96d92e4e242c4b2ff11b25c005bccd093865b350 M drivers > > Reading the commit suggests that commit is not at fault - it seems so > unrelated. It just modifies on-stack function parameters instead of > local copies. As you note, this is an unlikely culprit. Does a repeat bisect from different good/bad starts give the same result? > Just reverting this patch in current master would not work, the code has > changed a lot. > > > Also, my matching with oops_enter was bad - the addresses differ by one > more '5' so it has nothing to do with oops_enter. And there is no > '00455c0' in System.map of these kernels so I have no idea what this TPC > corresponds to. > >> > >> >reboot: Restarting system >> > >> >RED State Exception >> > >> >TL00.0000.0000.0005 TT00.0000.0000.0064 >> > TPC00.0000.f000.4c80 TnPC00.0000.f000.4c84 TSTATE00.0099.1104.1402 >> >TL00.0000.0000.0004 TT00.0000.0000.0064 >> > TPC00.0000.f000.4c80 TnPC00.0000.f000.4c84 TSTATE00.0099.1104.1402 >> >TL00.0000.0000.0003 TT00.0000.0000.0064 >> > TPC00.0000.f000.4c80 TnPC00.0000.f000.4c84 TSTATE00.0099.1104.1402 >> >TL00.0000.0000.0002 TT00.0000.0000.0064 >> > TPC00.0000.f000.0c80 TnPC00.0000.f000.0c84 TSTATE00.0099.1104.1402 >> >TL00.0000.0000.0001 TT00.0000.0000.0064 >> > TPC00.0000.f004.55c0 TnPC00.0000.f004.55c4 TSTATE00.0099.1100.1602 >> > >> >Trap Type 0x64 seems to fast_instruction_access_MMU_miss. It keeps >> >trapping until 5 levels deep. The first one is from f00455c0 that may >> >be the System.map entry >> > >> >00000000004555c0 T oops_enter >> > >> >meaning we get late oops but the MMU setup has already been torn down? >> >Is this a sensible way to decode this RED data (matching TPC against >> >System.map)? >> > >> >Is full bisect recommended or does arch/sparc bisect look more >> >promising? Is any of the above exception information useful in diagnosing this? Regards, Peter Hurley