From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753859Ab1FPGDG (ORCPT ); Thu, 16 Jun 2011 02:03:06 -0400 Received: from gate.crashing.org ([63.228.1.57]:52278 "EHLO gate.crashing.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753420Ab1FPGDE (ORCPT ); Thu, 16 Jun 2011 02:03:04 -0400 Subject: Re: Oops in VMA code From: Benjamin Herrenschmidt To: Alexander Graf Cc: Linus Torvalds , linux-mm@kvack.org, "linux-kernel@vger.kernel.org List" In-Reply-To: References: <47FAB15C-B113-40FD-9CE0-49566AACC0DF@suse.de> Content-Type: text/plain; charset="UTF-8" Date: Thu, 16 Jun 2011 16:02:51 +1000 Message-ID: <1308204171.2516.65.camel@pasglop> Mime-Version: 1.0 X-Mailer: Evolution 2.30.3 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 2011-06-16 at 07:32 +0200, Alexander Graf wrote: > On 16.06.2011, at 06:32, Linus Torvalds wrote: > Thanks a lot for looking at it either way :). Yeah thanks ;-) Let me see what I can dig out. First it's a load from what looks like a valid pointer to the linear mapping that had one byte corrupted (or more but it looks reasonably "clean"). It's not a one bit error, there's at least 2 bad bits (the 09): DAR: c00090026236bbc0 Alex, how much RAM do you have ? If that was just a one byte corruption, the above would imply you have something valid between 9 and 10G. From the look of other registers, it seems that it could be a genuine pointer with just that stay "09" byte that landed onto it. > The latter is the one I'm executing, while the former still has all > the symbols. But you're right. It looks like this is simply an inlined > function - which is why it got stripped away. Here's the disassembly > of the whole do_unmap function. I hope it's of use despite your fading > PPC asm skills :). Host compiler is gcc 4.3.4 from SLES11SP1. .../... Ok, so let's see what we can dig from here. It -looks- like: if (!mm) goto out : > 0xc000000000190554 : cmpdi cr7,r3,0 > 0xc000000000190558 : beq cr7,0xc0000000001907f0 rb_node = mm->mm_rb.rb_node; (rb_node in r9): > 0xc00000000019055c : ld r9,8(r3) vma = mm->mmap (vma in r28) > 0xc000000000190560 : ld r28,0(r3) > 0xc000000000190564 : li r11,0 > 0xc000000000190568 : li r26,0 while(rb_node)... > 0xc00000000019056c : cmpdi cr7,r9,0 > 0xc000000000190570 : bne cr7,0xc000000000190594 > 0xc000000000190574 : b 0xc0000000001905d0 > 0xc000000000190578 : nop > 0xc00000000019057c : nop > 0xc000000000190580 : ld r9,16(r9) > 0xc000000000190584 : mr r26,r11 > 0xc000000000190588 : cmpdi cr7,r9,0 > 0xc00000000019058c : mr r11,r26 > 0xc000000000190590 : beq cr7,0xc0000000001905c4 vma_tmp = rb_entry(rb_node, struct vm_area_struct, vm_rb); > 0xc000000000190594 : addi r26,r9,-56 if (vma_tmp->vm_end) > 0xc000000000190598 : ld r0,16(r26) Here we go. So here vma_tmp is crap, which we got out of the rb_tree, so it's either corruption or use after free I'd say. It could also be a completely unrelated memory corruption of course.... If you had xmon we could have dug a little bit more to see what's before/after etc... but like this it doesn't ring any special bell to me. Cheers, Ben.