From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 91CF4DDF86 for ; Wed, 20 Aug 2008 17:23:57 +1000 (EST) Subject: Re: ftrace introduces instability into kernel 2.6.27(-rc2,-rc3) From: Benjamin Herrenschmidt To: Steven Rostedt In-Reply-To: <1219119431.8062.35.camel@pasglop> References: <48591941.4070408@extricom.com> <48A92E15.2080709@extricom.com> <48A9901B.1080900@redhat.com> <20080818154746.GA26835@Krystal> <48A9AFA7.8080508@freescale.com> <1219110814.8062.2.camel@pasglop> <1219113549.8062.13.camel@pasglop> <1219114600.8062.15.camel@pasglop> <1219119431.8062.35.camel@pasglop> Content-Type: text/plain Date: Wed, 20 Aug 2008 17:18:25 +1000 Message-Id: <1219216705.21386.46.camel@pasglop> Mime-Version: 1.0 Cc: Eran Liberty , Mathieu Desnoyers , linux-kernel@vger.kernel.org, linuxppc-dev@ozlabs.org, Steven Rostedt , Alan Modra , Scott Wood , "Paul E. McKenney" Reply-To: benh@kernel.crashing.org List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Found the problem (or at least -a- problem), it's a gcc bug. Well, first I must say the code generated by -pg is just plain horrible :-) Appart from that, look at the exit of, for example, __d_lookup, as generated by gcc when ftrace is enabled: c00c0498: 38 60 00 00 li r3,0 c00c049c: 81 61 00 00 lwz r11,0(r1) c00c04a0: 80 0b 00 04 lwz r0,4(r11) c00c04a4: 7d 61 5b 78 mr r1,r11 c00c04a8: bb 0b ff e0 lmw r24,-32(r11) c00c04ac: 7c 08 03 a6 mtlr r0 c00c04b0: 4e 80 00 20 blr As you can see, it restores r1 -before- it pops r24..r31 off the stack ! I let you imagine what happens if an interrupt happens just in between those two instructions (mr and lmw). We don't do redzones on our ABI, so basically, the registers end up corrupted by the interrupt. Cheers, Ben. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754506AbYHTHY1 (ORCPT ); Wed, 20 Aug 2008 03:24:27 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752452AbYHTHYS (ORCPT ); Wed, 20 Aug 2008 03:24:18 -0400 Received: from gate.crashing.org ([63.228.1.57]:49799 "EHLO gate.crashing.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752263AbYHTHYR (ORCPT ); Wed, 20 Aug 2008 03:24:17 -0400 Subject: Re: ftrace introduces instability into kernel 2.6.27(-rc2,-rc3) From: Benjamin Herrenschmidt Reply-To: benh@kernel.crashing.org To: Steven Rostedt Cc: "Paul E. McKenney" , Mathieu Desnoyers , linux-kernel@vger.kernel.org, linuxppc-dev@ozlabs.org, Steven Rostedt , Scott Wood , Eran Liberty , Alan Modra , Segher Boessenkool In-Reply-To: <1219119431.8062.35.camel@pasglop> References: <48591941.4070408@extricom.com> <48A92E15.2080709@extricom.com> <48A9901B.1080900@redhat.com> <20080818154746.GA26835@Krystal> <48A9AFA7.8080508@freescale.com> <1219110814.8062.2.camel@pasglop> <1219113549.8062.13.camel@pasglop> <1219114600.8062.15.camel@pasglop> <1219119431.8062.35.camel@pasglop> Content-Type: text/plain Date: Wed, 20 Aug 2008 17:18:25 +1000 Message-Id: <1219216705.21386.46.camel@pasglop> Mime-Version: 1.0 X-Mailer: Evolution 2.22.3.1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Found the problem (or at least -a- problem), it's a gcc bug. Well, first I must say the code generated by -pg is just plain horrible :-) Appart from that, look at the exit of, for example, __d_lookup, as generated by gcc when ftrace is enabled: c00c0498: 38 60 00 00 li r3,0 c00c049c: 81 61 00 00 lwz r11,0(r1) c00c04a0: 80 0b 00 04 lwz r0,4(r11) c00c04a4: 7d 61 5b 78 mr r1,r11 c00c04a8: bb 0b ff e0 lmw r24,-32(r11) c00c04ac: 7c 08 03 a6 mtlr r0 c00c04b0: 4e 80 00 20 blr As you can see, it restores r1 -before- it pops r24..r31 off the stack ! I let you imagine what happens if an interrupt happens just in between those two instructions (mr and lmw). We don't do redzones on our ABI, so basically, the registers end up corrupted by the interrupt. Cheers, Ben.