From mboxrd@z Thu Jan 1 00:00:00 1970 From: aric@sdgsystems.com (Aric D. Blumer) Date: Wed, 01 Dec 2010 21:35:51 -0500 Subject: bad pmd In-Reply-To: <20101201201440.GD29347@n2100.arm.linux.org.uk> References: <4CF6A7F2.80206@sdgsystems.com> <20101201201440.GD29347@n2100.arm.linux.org.uk> Message-ID: <4CF70607.1010905@sdgsystems.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On 12/01/2010 03:14 PM, Russell King - ARM Linux wrote: > On Wed, Dec 01, 2010 at 02:54:26PM -0500, Aric D. Blumer wrote: >> Hi. I'm using the long-term stable kernel 2.6.32 on a PXA320 platform, >> and I'm seeing errors like the following: >> >> /home/aric/sdg/git/linux/mm/memory.c:144: bad pmd 8040542e. >> >> I have seen these messages on both the 2.6.32.15 and 2.6.32.24 kernels >> (haven't tried others). Can someone tell me what the message means? I >> suspect memory is being clobbered. One interesting thing is that >> whenever that message is printed, the 8040542e is always the same. I >> have not been able to establish any correlation yet with what causes it. > A pmd value of 0x8040542e is a section mapping, which the generic MM > code will not understand. > > It is for address 0x80400000, is read/writable from SVC mode, inaccessible > from user mode, domain 1 (which is normally for 'user' memory), and has > a memory type of TEXCB=10111. > > As standard mainline doesn't create mappings with TEX=101, and we don't > create mappings with the 'user' domain using sections, the question this > immediately raises is: have you modified this kernel? Thanks for the info, Russell. We have modified this kernel in two ways: 1) We have added code to support the platform (GPIOs, touchscreen, bluetooth UART, etc.). 2) It has the patches for Android merged in. It doesn't look like the Android patches do any mappings different from mainline, but the bad entry looks very much like a real page table entry. But, supposing that memory is being trampled, can any driver mess up the page tables, or is a special processor mode required? Could a rogue DMA trample page table memory? Can you suggest how to determine what the address of the bad page table entry is? I'll start removing non-critical drivers to see if I can isolate the cause, but I have a bit more information in the meantime: I put __backtrace() into pmd_clear_bad(), and I always see a read() system call sequence like this when the error occurs: [ 8.894213] /home/aric/sdg/git/linux/mm/memory.c:144: bad pmd 8040542e. [ 8.901133] [] (pmd_clear_bad+0x0/0x40) from [] (walk_page_range+0x22c/0x230) [ 8.910128] r4:80600000 [ 8.912839] [] (walk_page_range+0x0/0x230) from [] (show_smap+0x84/0x17c) [ 8.921589] [] (show_smap+0x0/0x17c) from [] (seq_read+0x314/0x48c) [ 8.929810] [] (seq_read+0x0/0x48c) from [] (vfs_read+0xb8/0x16c) [ 8.937761] [] (vfs_read+0x0/0x16c) from [] (sys_read+0x44/0x74) [ 8.945603] r8:c002e108 r7:00000000 r6:00012000 r5:fffffff7 r4:cf32bf80 [ 8.952644] [] (sys_read+0x0/0x74) from [] (ret_fast_syscall+0x0/0x28) [ 8.961019] r7:00000003 r6:5a5cbc68 r5:afe14cfd r4:afe3bdfc [ 8.968572] /home/aric/sdg/git/linux/mm/memory.c:144: bad pmd 8040542e. [ 8.975402] [] (pmd_clear_bad+0x0/0x40) from [] (walk_page_range+0x22c/0x230) [ 8.984386] r4:80c00000 [ 8.987062] [] (walk_page_range+0x0/0x230) from [] (show_smap+0x84/0x17c) [ 8.995789] [] (show_smap+0x0/0x17c) from [] (seq_read+0x314/0x48c) [ 9.003994] [] (seq_read+0x0/0x48c) from [] (vfs_read+0xb8/0x16c) [ 9.012013] [] (vfs_read+0x0/0x16c) from [] (sys_read+0x44/0x74) [ 9.019908] r8:c002e108 r7:00000000 r6:00012800 r5:fffffff7 r4:cf32bf80 [ 9.026846] [] (sys_read+0x0/0x74) from [] (ret_fast_syscall+0x0/0x28) [ 9.035224] r7:00000003 r6:5a5cbc40 r5:afe14cfd r4:afe3bdfc