From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754506AbYHTHY1 (ORCPT ); Wed, 20 Aug 2008 03:24:27 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752452AbYHTHYS (ORCPT ); Wed, 20 Aug 2008 03:24:18 -0400 Received: from gate.crashing.org ([63.228.1.57]:49799 "EHLO gate.crashing.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752263AbYHTHYR (ORCPT ); Wed, 20 Aug 2008 03:24:17 -0400 Subject: Re: ftrace introduces instability into kernel 2.6.27(-rc2,-rc3) From: Benjamin Herrenschmidt Reply-To: benh@kernel.crashing.org To: Steven Rostedt Cc: "Paul E. McKenney" , Mathieu Desnoyers , linux-kernel@vger.kernel.org, linuxppc-dev@ozlabs.org, Steven Rostedt , Scott Wood , Eran Liberty , Alan Modra , Segher Boessenkool In-Reply-To: <1219119431.8062.35.camel@pasglop> References: <48591941.4070408@extricom.com> <48A92E15.2080709@extricom.com> <48A9901B.1080900@redhat.com> <20080818154746.GA26835@Krystal> <48A9AFA7.8080508@freescale.com> <1219110814.8062.2.camel@pasglop> <1219113549.8062.13.camel@pasglop> <1219114600.8062.15.camel@pasglop> <1219119431.8062.35.camel@pasglop> Content-Type: text/plain Date: Wed, 20 Aug 2008 17:18:25 +1000 Message-Id: <1219216705.21386.46.camel@pasglop> Mime-Version: 1.0 X-Mailer: Evolution 2.22.3.1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Found the problem (or at least -a- problem), it's a gcc bug. Well, first I must say the code generated by -pg is just plain horrible :-) Appart from that, look at the exit of, for example, __d_lookup, as generated by gcc when ftrace is enabled: c00c0498: 38 60 00 00 li r3,0 c00c049c: 81 61 00 00 lwz r11,0(r1) c00c04a0: 80 0b 00 04 lwz r0,4(r11) c00c04a4: 7d 61 5b 78 mr r1,r11 c00c04a8: bb 0b ff e0 lmw r24,-32(r11) c00c04ac: 7c 08 03 a6 mtlr r0 c00c04b0: 4e 80 00 20 blr As you can see, it restores r1 -before- it pops r24..r31 off the stack ! I let you imagine what happens if an interrupt happens just in between those two instructions (mr and lmw). We don't do redzones on our ABI, so basically, the registers end up corrupted by the interrupt. Cheers, Ben.