From: "Kevin D. Kissell" <kevink@mips.com>
To: "Chuck Meade" <chuckmeade@mindspring.com>, <linux-mips@linux-mips.org>
Cc: "Chuck Meade \(mindspring\)" <chuckmeade@mindspring.com>
Subject: Re: corruption of load instruction offset
Date: Mon, 3 Apr 2006 09:25:42 +0200 [thread overview]
Message-ID: <000f01c656ef$d2963670$10eca8c0@grendel> (raw)
In-Reply-To: IIEEICKJLNEPBBDJICNGEECHKIAA.chuckmeade@mindspring.com
That's pretty twisted - one could almost believe that the fetch from
0x8021e28c got corrupted to pick up the most significant 16 bits
of the instruction at 0x8021e22c or 0x8021e26c - but given that
instructions are fetched and issued word-by-word, it's hard to see
where that could happen, in either CPU hardware or software.
What is the I-cache line size? If it were me, I'd check my clocks,
voltages, and above all my RAM timing, and I'd re-seat my CPU
and RAM in their sockets...
Regards,
Kevin K.
----- Original Message -----
From: "Chuck Meade" <chuckmeade@mindspring.com>
To: <linux-mips@linux-mips.org>
Cc: "Chuck Meade (mindspring)" <chuckmeade@mindspring.com>
Sent: Monday, April 03, 2006 6:12 AM
Subject: corruption of load instruction offset
> Hello,
>
> I am seeing a very interesting/worrisome bug on an RM7965 cpu, which has
> an E9000 core. I am running 2.6.14-rc1. Please take a look at the
> behavior I describe and send me your thoughts. Thanks.
>
> The error message is immediately below. Notice that the epc is 8021e28c,
> and the BadVA is 87e39681, and register 4 (a0) is 87e38660.
>
> Now scan down below the error message, to the disassembly of move_32bytes.
> If you look at the instruction at 8021e28c, it appears harmless enough.
> Nothing to cause an unaligned access or invalid instruction. But look
> about 6 lines above that, and we are loading at offsets from a0. The
> offsets from a0 in those 4 load instructions are 16, 20, 24, and 28. If
> you look at the opcodes in the column to the left, those offsets appear in
> the least significant 16 bits of the opcode.
>
> Now look again at the value of a0 in the register dump: 87e38660. And
> at the BadVA value: 87e39681. The BadVA is offset exactly 0x1021 from
> a0. This indicates that we somehow tried to access memory at offset
> 0x1021 from a0. However, we never should have done that according to
> the disassembly. *But* there are many instructions in the vicinity which
> have a least significant 16 bits of 0x1021. None of them are loads from a0,
> but I believe that this is the root of the problem. Something is happening
> here, possibly an interrupt, or a cpu bug(?) that is causing the load from
> a0 to use an offset of 0x1021 (the least significant 16-bits of many of
> the nearby instructions) rather than the correct offset for the load
> instruction, which is found in the least significant 16-bits of the actual
> load instructions.
>
> This is not "quickly" reproducible. I run a TCP blaster/blastee test between
> this machine and Linux PC, and at some point during the run (sometimes much
> later) this error appears.
>
> Thanks for your ideas,
> Chuck
>
> Error message:
>
> Unhandled kernel unaligned access or invalid instruction in arch/mips/kernel/unaligned.c::emulate_load_store_insn, line 487[#1]:
> Cpu 0
> $ 0 : 00000000 10004ce8 00000000 00000000
> $ 4 : 87e38660 000005a8 00000000 00000000
> $ 8 : 00000000 00000000 00000020 00000000
> $12 : 00000000 80402000 00000001 00000000
> $16 : 00000000 87e171a0 000005a8 87c1f060
> $20 : 87e380e0 004009e0 10004740 00002ad8
> $24 : 00000008 803171c0
> $28 : 8120a000 8120bd48 00000000 802deb30
> Hi : 0000000c
> Lo : 000d4bf8
> epc : 8021e28c move_32bytes+0x64/0x88 Not tainted
> ra : 802deb30 tcp_sendmsg+0x460/0xd80
> Status: 90018403 KERNEL EXL IE
> Cause : 00000010
> BadVA : 87e39681
> PrId : 00003422
> Modules linked in:
> Process blaster (pid: 162, threadinfo=8120a000, task=8050b3f8)
> Stack : 8120bdd0 00000000 812fd4a0 8120bdf0 8120bd70 87e18520 00000001 00000000
> 8120be40 7fffffff 00000000 8120bf18 8120be14 00000000 000005a8 000005a8
> 000032e8 00000001 00000000 90018400 8120be40 00005dc0 10001458 8120bf18
> 00000005 004009e0 10011044 10010000 10010fd4 8028e7a8 00000020 ffffffff
> 00000001 00000000 00005dc0 10001458 87e18520 00005dc0 812fd4a0 004009e0
> ...
> Call Trace:
> [<8028e7a8>] sock_aio_write+0x10c/0x12c
> [<8016bef8>] do_sync_write+0xd0/0x128
> [<801037d4>] do_IRQ+0x24/0x34
> [<804203cc>] init+0xd8/0xe4
> [<8013cf78>] autoremove_wake_function+0x0/0x44
> [<8016c020>] vfs_write+0xd0/0x144
> [<8016c020>] vfs_write+0xd0/0x144
> [<8016c074>] vfs_write+0x124/0x144
> [<8016c150>] sys_write+0x24/0x98
> [<8016c180>] sys_write+0x54/0x98
> [<8016c154>] sys_write+0x28/0x98
> [<801037d4>] do_IRQ+0x24/0x34
> [<8010b260>] stack_done+0x20/0x3c
>
>
>
> Disassembly of relevant portion of move_32bytes:
>
> 8021e228 <move_32bytes>:
> 8021e228: 8c880000 lw t0,0(a0)
> 8021e22c: 8c890004 lw t1,4(a0)
> 8021e230: 8c8b0008 lw t3,8(a0)
> 8021e234: 8c8c000c lw t4,12(a0)
> 8021e238: 00481021 addu v0,v0,t0
> 8021e23c: 0048182b sltu v1,v0,t0
> 8021e240: 00431021 addu v0,v0,v1
> 8021e244: 00491021 addu v0,v0,t1
> 8021e248: 0049182b sltu v1,v0,t1
> 8021e24c: 00431021 addu v0,v0,v1
> 8021e250: 004b1021 addu v0,v0,t3
> 8021e254: 004b182b sltu v1,v0,t3
> 8021e258: 00431021 addu v0,v0,v1
> 8021e25c: 004c1021 addu v0,v0,t4
> 8021e260: 004c182b sltu v1,v0,t4
> 8021e264: 00431021 addu v0,v0,v1
> 8021e268: 8c880010 lw t0,16(a0)
> 8021e26c: 8c890014 lw t1,20(a0)
> 8021e270: 8c8b0018 lw t3,24(a0)
> 8021e274: 8c8c001c lw t4,28(a0)
> 8021e278: 00481021 addu v0,v0,t0
> 8021e27c: 0048182b sltu v1,v0,t0
> 8021e280: 00431021 addu v0,v0,v1
> 8021e284: 00491021 addu v0,v0,t1
> 8021e288: 0049182b sltu v1,v0,t1
> 8021e28c: 00431021 addu v0,v0,v1
> 8021e290: 004b1021 addu v0,v0,t3
> 8021e294: 004b182b sltu v1,v0,t3
> 8021e298: 00431021 addu v0,v0,v1
> 8021e29c: 004c1021 addu v0,v0,t4
> 8021e2a0: 004c182b sltu v1,v0,t4
> 8021e2a4: 00431021 addu v0,v0,v1
> 8021e2a8: 30b8001c andi t8,a1,0x1c
> 8021e2ac: 24840020 addiu a0,a0,32
>
>
>
>
>
WARNING: multiple messages have this Message-ID (diff)
From: "Kevin D. Kissell" <kevink@mips.com>
To: Chuck Meade <chuckmeade@mindspring.com>, linux-mips@linux-mips.org
Subject: Re: corruption of load instruction offset
Date: Mon, 3 Apr 2006 09:25:42 +0200 [thread overview]
Message-ID: <000f01c656ef$d2963670$10eca8c0@grendel> (raw)
Message-ID: <20060403072542.ox36d38f_izwh12_iPmlno1PxgnH_A4f4Un2mmOWs3s@z> (raw)
In-Reply-To: IIEEICKJLNEPBBDJICNGEECHKIAA.chuckmeade@mindspring.com
That's pretty twisted - one could almost believe that the fetch from
0x8021e28c got corrupted to pick up the most significant 16 bits
of the instruction at 0x8021e22c or 0x8021e26c - but given that
instructions are fetched and issued word-by-word, it's hard to see
where that could happen, in either CPU hardware or software.
What is the I-cache line size? If it were me, I'd check my clocks,
voltages, and above all my RAM timing, and I'd re-seat my CPU
and RAM in their sockets...
Regards,
Kevin K.
----- Original Message -----
From: "Chuck Meade" <chuckmeade@mindspring.com>
To: <linux-mips@linux-mips.org>
Cc: "Chuck Meade (mindspring)" <chuckmeade@mindspring.com>
Sent: Monday, April 03, 2006 6:12 AM
Subject: corruption of load instruction offset
> Hello,
>
> I am seeing a very interesting/worrisome bug on an RM7965 cpu, which has
> an E9000 core. I am running 2.6.14-rc1. Please take a look at the
> behavior I describe and send me your thoughts. Thanks.
>
> The error message is immediately below. Notice that the epc is 8021e28c,
> and the BadVA is 87e39681, and register 4 (a0) is 87e38660.
>
> Now scan down below the error message, to the disassembly of move_32bytes.
> If you look at the instruction at 8021e28c, it appears harmless enough.
> Nothing to cause an unaligned access or invalid instruction. But look
> about 6 lines above that, and we are loading at offsets from a0. The
> offsets from a0 in those 4 load instructions are 16, 20, 24, and 28. If
> you look at the opcodes in the column to the left, those offsets appear in
> the least significant 16 bits of the opcode.
>
> Now look again at the value of a0 in the register dump: 87e38660. And
> at the BadVA value: 87e39681. The BadVA is offset exactly 0x1021 from
> a0. This indicates that we somehow tried to access memory at offset
> 0x1021 from a0. However, we never should have done that according to
> the disassembly. *But* there are many instructions in the vicinity which
> have a least significant 16 bits of 0x1021. None of them are loads from a0,
> but I believe that this is the root of the problem. Something is happening
> here, possibly an interrupt, or a cpu bug(?) that is causing the load from
> a0 to use an offset of 0x1021 (the least significant 16-bits of many of
> the nearby instructions) rather than the correct offset for the load
> instruction, which is found in the least significant 16-bits of the actual
> load instructions.
>
> This is not "quickly" reproducible. I run a TCP blaster/blastee test between
> this machine and Linux PC, and at some point during the run (sometimes much
> later) this error appears.
>
> Thanks for your ideas,
> Chuck
>
> Error message:
>
> Unhandled kernel unaligned access or invalid instruction in arch/mips/kernel/unaligned.c::emulate_load_store_insn, line 487[#1]:
> Cpu 0
> $ 0 : 00000000 10004ce8 00000000 00000000
> $ 4 : 87e38660 000005a8 00000000 00000000
> $ 8 : 00000000 00000000 00000020 00000000
> $12 : 00000000 80402000 00000001 00000000
> $16 : 00000000 87e171a0 000005a8 87c1f060
> $20 : 87e380e0 004009e0 10004740 00002ad8
> $24 : 00000008 803171c0
> $28 : 8120a000 8120bd48 00000000 802deb30
> Hi : 0000000c
> Lo : 000d4bf8
> epc : 8021e28c move_32bytes+0x64/0x88 Not tainted
> ra : 802deb30 tcp_sendmsg+0x460/0xd80
> Status: 90018403 KERNEL EXL IE
> Cause : 00000010
> BadVA : 87e39681
> PrId : 00003422
> Modules linked in:
> Process blaster (pid: 162, threadinfo=8120a000, task=8050b3f8)
> Stack : 8120bdd0 00000000 812fd4a0 8120bdf0 8120bd70 87e18520 00000001 00000000
> 8120be40 7fffffff 00000000 8120bf18 8120be14 00000000 000005a8 000005a8
> 000032e8 00000001 00000000 90018400 8120be40 00005dc0 10001458 8120bf18
> 00000005 004009e0 10011044 10010000 10010fd4 8028e7a8 00000020 ffffffff
> 00000001 00000000 00005dc0 10001458 87e18520 00005dc0 812fd4a0 004009e0
> ...
> Call Trace:
> [<8028e7a8>] sock_aio_write+0x10c/0x12c
> [<8016bef8>] do_sync_write+0xd0/0x128
> [<801037d4>] do_IRQ+0x24/0x34
> [<804203cc>] init+0xd8/0xe4
> [<8013cf78>] autoremove_wake_function+0x0/0x44
> [<8016c020>] vfs_write+0xd0/0x144
> [<8016c020>] vfs_write+0xd0/0x144
> [<8016c074>] vfs_write+0x124/0x144
> [<8016c150>] sys_write+0x24/0x98
> [<8016c180>] sys_write+0x54/0x98
> [<8016c154>] sys_write+0x28/0x98
> [<801037d4>] do_IRQ+0x24/0x34
> [<8010b260>] stack_done+0x20/0x3c
>
>
>
> Disassembly of relevant portion of move_32bytes:
>
> 8021e228 <move_32bytes>:
> 8021e228: 8c880000 lw t0,0(a0)
> 8021e22c: 8c890004 lw t1,4(a0)
> 8021e230: 8c8b0008 lw t3,8(a0)
> 8021e234: 8c8c000c lw t4,12(a0)
> 8021e238: 00481021 addu v0,v0,t0
> 8021e23c: 0048182b sltu v1,v0,t0
> 8021e240: 00431021 addu v0,v0,v1
> 8021e244: 00491021 addu v0,v0,t1
> 8021e248: 0049182b sltu v1,v0,t1
> 8021e24c: 00431021 addu v0,v0,v1
> 8021e250: 004b1021 addu v0,v0,t3
> 8021e254: 004b182b sltu v1,v0,t3
> 8021e258: 00431021 addu v0,v0,v1
> 8021e25c: 004c1021 addu v0,v0,t4
> 8021e260: 004c182b sltu v1,v0,t4
> 8021e264: 00431021 addu v0,v0,v1
> 8021e268: 8c880010 lw t0,16(a0)
> 8021e26c: 8c890014 lw t1,20(a0)
> 8021e270: 8c8b0018 lw t3,24(a0)
> 8021e274: 8c8c001c lw t4,28(a0)
> 8021e278: 00481021 addu v0,v0,t0
> 8021e27c: 0048182b sltu v1,v0,t0
> 8021e280: 00431021 addu v0,v0,v1
> 8021e284: 00491021 addu v0,v0,t1
> 8021e288: 0049182b sltu v1,v0,t1
> 8021e28c: 00431021 addu v0,v0,v1
> 8021e290: 004b1021 addu v0,v0,t3
> 8021e294: 004b182b sltu v1,v0,t3
> 8021e298: 00431021 addu v0,v0,v1
> 8021e29c: 004c1021 addu v0,v0,t4
> 8021e2a0: 004c182b sltu v1,v0,t4
> 8021e2a4: 00431021 addu v0,v0,v1
> 8021e2a8: 30b8001c andi t8,a1,0x1c
> 8021e2ac: 24840020 addiu a0,a0,32
>
>
>
>
>
next prev parent reply other threads:[~2006-04-03 7:11 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-04-03 4:12 corruption of load instruction offset Chuck Meade
2006-04-03 4:12 ` Chuck Meade
2006-04-03 7:25 ` Kevin D. Kissell [this message]
2006-04-03 7:25 ` Kevin D. Kissell
2006-04-03 14:37 ` Chuck Meade
2006-04-03 14:37 ` Chuck Meade
2006-04-03 10:42 ` Ralf Baechle
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='000f01c656ef$d2963670$10eca8c0@grendel' \
--to=kevink@mips.com \
--cc=chuckmeade@mindspring.com \
--cc=linux-mips@linux-mips.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox