Linux MIPS Architecture development
 help / color / mirror / Atom feed
* epc register reported zero
@ 2014-08-28  0:45 Lin Ming
  2014-08-28  1:15 ` David Daney
  0 siblings, 1 reply; 3+ messages in thread
From: Lin Ming @ 2014-08-28  0:45 UTC (permalink / raw)
  To: linux-mips

Hi list,

Board: Broadcom 963268
CPU model: Broadcom BMIPS4350 V8.0
Kernel: 2.6.30
Toolchain: uclibc-crosstools-gcc-4.4.2-1

I encountered an userspace application crash with epc reported zero.
I don't understand how epc register could be zero.

Any help is appreciated.

wps_monitor/1699: potentially unexpected fatal signal 11.

Cpu 1
$ 0   : 00000000 10008d00 00000004 0000000a
$ 4   : 0000000a 7f88a55c 00000000 00000001
$ 8   : 00000000 00000000 00000001 00000000
$12   : 00000001 00000000 00000008 12182430
$16   : 00438968 00000001 00409620 00000000
$20   : 00000000 00000000 00000000 00406404
$24   : 00000002 2aaecc00
$28   : 2ab39a70 7f88a4c0 7f88a4f0 0041a838
Hi    : 00000000
Lo    : 00000000
epc   : 00000000 (null)
    Tainted: P
ra    : 0041a838 0x41a838
Status: 00008d13    USER EXL IE
Cause : 00000008
BadVA : 00000000
PrId  : 0002a080 (Broadcom4350)

mips-linux-addr2line -e wps_monitor 0041a838
This shows "ra" address mapped to below line 328.

322         if (max_fd == -1) {
323                 TUTRACE((TUTRACE_ERR, "wpsm_readData: no fd set!\n"));
324                 return NULL;
325         }
326
327         /* Do select */
328         n = select(max_fd + 1, &fdvar, NULL, NULL, &timeout);
329         if (n <= 0) {
330                 /*
331                  * to avoid the select operation interferenced by
led lighting timer.
332                  * this will be removed after led lighting timer
is replaced by wireless driver
333                  */
334                 if (n < 0 && errno != EINTR) {
335                         TUTRACE((TUTRACE_ERR, "wpsm_readData:
select recv failed\n"));
336                 }
337                 goto out;
338         }


0000eac0 <__libc_select>:
    eac0:       3c1c0006        lui     gp,0x6
    eac4:       279c1aa0        addiu   gp,gp,6816
    eac8:       0399e021        addu    gp,gp,t9
    eacc:       27bdffd8        addiu   sp,sp,-40
    ead0:       afbe0020        sw      s8,32(sp)
    ead4:       03a0f021        move    s8,sp
    ead8:       afbf0024        sw      ra,36(sp)
    eadc:       afb0001c        sw      s0,28(sp)
    eae0:       afbc0010        sw      gp,16(sp)
    eae4:       27bdfff0        addiu   sp,sp,-16
    eae8:       8fc20038        lw      v0,56(s8)
    eaec:       27bdffe0        addiu   sp,sp,-32
    eaf0:       afa20010        sw      v0,16(sp)
    eaf4:       2402102e        li      v0,4142
    eaf8:       0000000c        syscall
    eafc:       27bd0020        addiu   sp,sp,32
    eb00:       10e00006        beqz    a3,eb1c <__libc_select+0x5c>
    eb04:       00408021        move    s0,v0
    eb08:       8f9988d0        lw      t9,-30512(gp)
    eb0c:       0320f809        jalr    t9
    eb10:       00000000        nop
    eb14:       ac500000        sw      s0,0(v0)
    eb18:       2402ffff        li      v0,-1
    eb1c:       03c0e821        move    sp,s8
    eb20:       8fbf0024        lw      ra,36(sp)
    eb24:       8fbe0020        lw      s8,32(sp)
    eb28:       8fb0001c        lw      s0,28(sp)
    eb2c:       03e00008        jr      ra
    eb30:       27bd0028        addiu   sp,sp,40

Regards,
Ming

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: epc register reported zero
  2014-08-28  0:45 epc register reported zero Lin Ming
@ 2014-08-28  1:15 ` David Daney
  2014-08-28  1:33   ` Lin Ming
  0 siblings, 1 reply; 3+ messages in thread
From: David Daney @ 2014-08-28  1:15 UTC (permalink / raw)
  To: Lin Ming; +Cc: linux-mips

On 08/27/2014 05:45 PM, Lin Ming wrote:
> Hi list,
>
> Board: Broadcom 963268
> CPU model: Broadcom BMIPS4350 V8.0
> Kernel: 2.6.30
> Toolchain: uclibc-crosstools-gcc-4.4.2-1
>
> I encountered an userspace application crash with epc reported zero.
> I don't understand how epc register could be zero.
>
> Any help is appreciated.
>
> wps_monitor/1699: potentially unexpected fatal signal 11.
>
> Cpu 1
> $ 0   : 00000000 10008d00 00000004 0000000a
> $ 4   : 0000000a 7f88a55c 00000000 00000001
> $ 8   : 00000000 00000000 00000001 00000000
> $12   : 00000001 00000000 00000008 12182430
> $16   : 00438968 00000001 00409620 00000000
> $20   : 00000000 00000000 00000000 00406404
> $24   : 00000002 2aaecc00
> $28   : 2ab39a70 7f88a4c0 7f88a4f0 0041a838

Disassemble the surrounding the address in $31

I am guessing that at 0x41a830, you have an indirect jump (JR 
instruction) and that 'rs' contains a value of zero.  So the EPC when 
you get the SIGSEGV will be ... zero.

This is called a call through a NULL function pointer.


> Hi    : 00000000
> Lo    : 00000000
> epc   : 00000000 (null)
>      Tainted: P
> ra    : 0041a838 0x41a838
> Status: 00008d13    USER EXL IE
> Cause : 00000008
> BadVA : 00000000
> PrId  : 0002a080 (Broadcom4350)
>
> mips-linux-addr2line -e wps_monitor 0041a838
> This shows "ra" address mapped to below line 328.
>
> 322         if (max_fd == -1) {
> 323                 TUTRACE((TUTRACE_ERR, "wpsm_readData: no fd set!\n"));
> 324                 return NULL;
> 325         }
> 326
> 327         /* Do select */
> 328         n = select(max_fd + 1, &fdvar, NULL, NULL, &timeout);
> 329         if (n <= 0) {
> 330                 /*
> 331                  * to avoid the select operation interferenced by
> led lighting timer.
> 332                  * this will be removed after led lighting timer
> is replaced by wireless driver
> 333                  */
> 334                 if (n < 0 && errno != EINTR) {
> 335                         TUTRACE((TUTRACE_ERR, "wpsm_readData:
> select recv failed\n"));
> 336                 }
> 337                 goto out;
> 338         }
>
>
> 0000eac0 <__libc_select>:
>      eac0:       3c1c0006        lui     gp,0x6
>      eac4:       279c1aa0        addiu   gp,gp,6816
>      eac8:       0399e021        addu    gp,gp,t9
>      eacc:       27bdffd8        addiu   sp,sp,-40
>      ead0:       afbe0020        sw      s8,32(sp)
>      ead4:       03a0f021        move    s8,sp
>      ead8:       afbf0024        sw      ra,36(sp)
>      eadc:       afb0001c        sw      s0,28(sp)
>      eae0:       afbc0010        sw      gp,16(sp)
>      eae4:       27bdfff0        addiu   sp,sp,-16
>      eae8:       8fc20038        lw      v0,56(s8)
>      eaec:       27bdffe0        addiu   sp,sp,-32
>      eaf0:       afa20010        sw      v0,16(sp)
>      eaf4:       2402102e        li      v0,4142
>      eaf8:       0000000c        syscall
>      eafc:       27bd0020        addiu   sp,sp,32
>      eb00:       10e00006        beqz    a3,eb1c <__libc_select+0x5c>
>      eb04:       00408021        move    s0,v0
>      eb08:       8f9988d0        lw      t9,-30512(gp)
>      eb0c:       0320f809        jalr    t9
>      eb10:       00000000        nop
>      eb14:       ac500000        sw      s0,0(v0)
>      eb18:       2402ffff        li      v0,-1
>      eb1c:       03c0e821        move    sp,s8
>      eb20:       8fbf0024        lw      ra,36(sp)
>      eb24:       8fbe0020        lw      s8,32(sp)
>      eb28:       8fb0001c        lw      s0,28(sp)
>      eb2c:       03e00008        jr      ra
>      eb30:       27bd0028        addiu   sp,sp,40
>
> Regards,
> Ming
>
>
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: epc register reported zero
  2014-08-28  1:15 ` David Daney
@ 2014-08-28  1:33   ` Lin Ming
  0 siblings, 0 replies; 3+ messages in thread
From: Lin Ming @ 2014-08-28  1:33 UTC (permalink / raw)
  To: David Daney; +Cc: linux-mips

On Wed, Aug 27, 2014 at 6:15 PM, David Daney <ddaney.cavm@gmail.com> wrote:
> On 08/27/2014 05:45 PM, Lin Ming wrote:
>>
>> Hi list,
>>
>> Board: Broadcom 963268
>> CPU model: Broadcom BMIPS4350 V8.0
>> Kernel: 2.6.30
>> Toolchain: uclibc-crosstools-gcc-4.4.2-1
>>
>> I encountered an userspace application crash with epc reported zero.
>> I don't understand how epc register could be zero.
>>
>> Any help is appreciated.
>>
>> wps_monitor/1699: potentially unexpected fatal signal 11.
>>
>> Cpu 1
>> $ 0   : 00000000 10008d00 00000004 0000000a
>> $ 4   : 0000000a 7f88a55c 00000000 00000001
>> $ 8   : 00000000 00000000 00000001 00000000
>> $12   : 00000001 00000000 00000008 12182430
>> $16   : 00438968 00000001 00409620 00000000
>> $20   : 00000000 00000000 00000000 00406404
>> $24   : 00000002 2aaecc00
>> $28   : 2ab39a70 7f88a4c0 7f88a4f0 0041a838
>
>
> Disassemble the surrounding the address in $31
>
> I am guessing that at 0x41a830, you have an indirect jump (JR instruction)
> and that 'rs' contains a value of zero.  So the EPC when you get the SIGSEGV
> will be ... zero.
>
> This is called a call through a NULL function pointer.

Here it is.
There is only a "jalr t9", which I think it's call of __libc_select().

        /* Do select */
        n = select(max_fd + 1, &fdvar, NULL, NULL, &timeout);
  41a804:       8fc20034        lw      v0,52(s8)
  41a808:       24430001        addiu   v1,v0,1
  41a80c:       27c20044        addiu   v0,s8,68
  41a810:       27c400c4        addiu   a0,s8,196
  41a814:       afa40010        sw      a0,16(sp)
  41a818:       00602021        move    a0,v1
  41a81c:       00402821        move    a1,v0
  41a820:       00003021        move    a2,zero
  41a824:       00003821        move    a3,zero
  41a828:       8f82843c        lw      v0,-31684(gp)
  41a82c:       0040c821        move    t9,v0
  41a830:       0320f809        jalr    t9
  41a834:       00000000        nop
  41a838:       8fdc0018        lw      gp,24(s8)
  41a83c:       afc20038        sw      v0,56(s8)
        if (n <= 0) {
  41a840:       8fc20038        lw      v0,56(s8)
  41a844:       1c40000b        bgtz    v0,41a874
<wps_osl_wait_for_all_packets+0x21c>
  41a848:       00000000        nop

Here is my crazy thought:

One possibility is:
1. select() syscall entered kernel mode. Then epc register was saved
on kernel mode stack.
2. After select() syscall finished, kernel code read epc value from
stack and restore it to epc register.
3. CPU jump to the instruction pointed by epc register.

Maybe there's some bug in kernel that destroyed kernel mode stack. So
epc register value became zero.

I added below crazy code to simulate it.

diff --git a/bcmcpe2/kernel/linux-3.4rt/fs/select.c
b/bcmcpe2/kernel/linux-3.4rt/fs/select.c
index 0baa0a3..cd41c4d 100644
--- a/bcmcpe2/kernel/linux-3.4rt/fs/select.c
+++ b/bcmcpe2/kernel/linux-3.4rt/fs/select.c
@@ -597,6 +597,11 @@ SYSCALL_DEFINE5(select, int, n, fd_set __user *,
inp, fd_set __user *, outp,
        struct timeval tv;
        int ret;

+       if (!strcmp(current->comm, "wps_monitor")) {
+               printk("LINMING: hack wps_monitor epc\n");
+               task_pt_regs(current)->cp0_epc = 0;
+       }
+

And got below:

wps_monitor/1315: potentially unexpected fatal signal 11.

Cpu 1
$ 0   : 00000000 10008d00 00000000 0000f9d8
$ 4   : 00000008 7f7fe624 00000000 00000000
$ 8   : 00000000 7f7fe5f8 00000000 87c78000
$12   : 00504303 00000043 0000000e 0000dd18
$16   : 00000000 0043db30 0043bff8 0043bffc
$20   : 7f7fe624 7f7fe5f0 00000007 00000000
$24   : 00000000 77c59960
$28   : 77cc94d0 7f7fe578 7f7fe5a8 004090a8
Hi    : 00000000
Lo    : 00000000
epc   : 00000000   (null)
    Tainted: P
ra    : 004090a8 0x4090a8
Status: 00008d13    USER EXL IE
Cause : 00000008
BadVA : 00000000
PrId  : 0002a080 (Broadcom BMIPS4350)

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-08-28  1:33 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-28  0:45 epc register reported zero Lin Ming
2014-08-28  1:15 ` David Daney
2014-08-28  1:33   ` Lin Ming

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox