From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753057AbcHQKdN (ORCPT ); Wed, 17 Aug 2016 06:33:13 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:46812 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752629AbcHQKdM (ORCPT ); Wed, 17 Aug 2016 06:33:12 -0400 Date: Wed, 17 Aug 2016 12:33:06 +0200 From: Peter Zijlstra To: Steven Rostedt , Thomas Gleixner , Ingo Molnar , Alexander Shishkin Cc: linux-kernel@vger.kernel.org Subject: Re: [RFC] ftrace / perf 'recursion' Message-ID: <20160817103306.GI7141@twins.programming.kicks-ass.net> References: <20160817091953.GH7141@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160817091953.GH7141@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.23.1 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Aug 17, 2016 at 11:19:53AM +0200, Peter Zijlstra wrote: > > blergh, now with LKML added... > > --- > > Much like: d525211f9d1b ("perf: Fix irq_work 'tail' recursion") > > I found another infinite recursion problem with irq_work: > > [] ? perf_output_begin_forward+0x5/0x1e0 > [] ? arch_irq_work_raise+0x5/0x40 > [] ? perf_event_output_forward+0x30/0x60 > [] arch_irq_work_raise+0x5/0x40 > [] irq_work_queue+0x97/0xa0 > [] ? arch_irq_work_raise+0x5/0x40 > [] ? irq_work_queue+0x97/0xa0 > [] __perf_event_overflow+0xcf/0x1b0 > [] perf_swevent_overflow+0x9a/0xc0 > [] perf_swevent_event+0x5d/0x80 > [] perf_tp_event+0x1a2/0x1b0 > [] ? _raw_spin_trylock+0x30/0x30 > [] ? perf_ftrace_function_call+0x83/0xd0 > [] ? ftrace_ops_assist_func+0xb5/0x110 > [] ? ftrace_ops_assist_func+0xb5/0x110 > [] ? do_send_sig_info+0x5d/0x80 > [] ? _raw_spin_trylock+0x30/0x30 > [] ? perf_trace_buf_alloc+0x1f/0xa0 > [] ? kill_fasync+0x6b/0x90 > [] ? _raw_spin_trylock+0x30/0x30 > [] ? perf_ftrace_function_call+0x83/0xd0 > [] ? smp_irq_work_interrupt+0x33/0x40 > [] ? irq_enter+0x70/0x70 > [] perf_ftrace_function_call+0xbf/0xd0 > [] ? ftrace_ops_assist_func+0xb5/0x110 > [] ftrace_ops_assist_func+0xb5/0x110 > [] ? smp_irq_work_interrupt+0x33/0x40 > [] ? irq_enter+0x70/0x70 > [] 0xffffffffa157e077 > [] ? kill_fasync+0x6b/0x90 > [] ? irq_exit+0x5/0xb0 > [] irq_exit+0x5/0xb0 > [] smp_irq_work_interrupt+0x33/0x40 > [] ? irq_exit+0x5/0xb0 > [] ? smp_irq_work_interrupt+0x33/0x40 > [] irq_work_interrupt+0x89/0x90 > > > Here every irq_work execution triggers another irq_work queue, which > gets us stuck in an IRQ loop ad infinitum. > > This is through function tracing of irq_exit(), but the same can be done > through function tracing of pretty much anything else around there and > through the explicit IRQ_WORK_VECTOR tracepoints. > > The only 'solution' is something like the below, which I absolutely > detest because it makes the irq_work code slower for everyone. > > Also, this doesn't fix the problem for any other arch :/ > > I would much rather tag the whole irq_work thing notrace and remove the > tracepoints, but I'm sure that'll not be a popular solution either :/ So I also found: d5b5f391d434 ("ftrace, perf: Avoid infinite event generation loop") which is very similar. I suppose that the entry tracepoint is harmless because you cannot queue an irq_work that is already queued, so there it doesn't cause the recursion. So how to extend the same to function tracer .... we'd have to mark exiting_irq() -> irq_exit() and everything from that as notrace, which seems somewhat excessive, fragile and undesired because tracing those functions is useful in other context :/