From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8747EC388F9 for ; Tue, 3 Nov 2020 14:16:45 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7DF15206F1 for ; Tue, 3 Nov 2020 14:16:44 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=suse.com header.i=@suse.com header.b="IN3VErPd" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7DF15206F1 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from bilbo.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 4CQX082fNBzDqhl for ; Wed, 4 Nov 2020 01:16:40 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=suse.com (client-ip=195.135.220.15; helo=mx2.suse.de; envelope-from=pmladek@suse.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=suse.com Authentication-Results: lists.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=suse.com header.i=@suse.com header.a=rsa-sha256 header.s=susede1 header.b=IN3VErPd; dkim-atps=neutral Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by lists.ozlabs.org (Postfix) with ESMTP id 4CQWsQ4vPhzDqHL for ; Wed, 4 Nov 2020 01:10:48 +1100 (AEDT) X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1604412645; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=ba9qeE3P97TbQ2yvwB2ubaFiLUcB36w6IpkZF8tbKnk=; b=IN3VErPdp1WBPWe8s7Z/Hom6zeEPN00JGakf0pmbE0tHXmmxXEWD5++Vr61N/M6YEB54w7 TkpVK9DRs5pQGU2sXeGrOyjWu6VljkU4rvLr/rLgtbNpT6gEfg2GSAQckNPRAvbykm4vni bo/mIe9ANXQW2mhX2JkXe/I7Xbftp1o= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id C3BEFABF4; Tue, 3 Nov 2020 14:10:44 +0000 (UTC) Date: Tue, 3 Nov 2020 15:10:43 +0100 From: Petr Mladek To: Steven Rostedt Subject: Re: [PATCH 11/11 v2.2] ftrace: Add recording of functions that caused recursion Message-ID: <20201103141043.GO20201@alley> References: <20201030213142.096102821@goodmis.org> <20201030214014.801706340@goodmis.org> <20201102164147.GJ20201@alley> <20201102123721.4fcce2cb@gandalf.local.home> <20201102124606.72bd89c5@gandalf.local.home> <20201102142254.7e148f8a@gandalf.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20201102142254.7e148f8a@gandalf.local.home> X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Anton Vorontsov , linux-doc@vger.kernel.org, Peter Zijlstra , Sebastian Andrzej Siewior , Kamalesh Babulal , "James E.J. Bottomley" , Guo Ren , "H. Peter Anvin" , live-patching@vger.kernel.org, Miroslav Benes , Ingo Molnar , linux-s390@vger.kernel.org, Joe Lawrence , Jonathan Corbet , Mauro Carvalho Chehab , Helge Deller , x86@kernel.org, linux-csky@vger.kernel.org, Christian Borntraeger , Kees Cook , Vasily Gorbik , Heiko Carstens , Jiri Kosina , Borislav Petkov , Josh Poimboeuf , Thomas Gleixner , Tony Luck , linux-parisc@vger.kernel.org, linux-kernel@vger.kernel.org, Masami Hiramatsu , Colin Cross , Paul Mackerras , Andrew Morton , linuxppc-dev@lists.ozlabs.org Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On Mon 2020-11-02 14:23:14, Steven Rostedt wrote: > From c532ff6b048dd4a12943b05c7b8ce30666c587c8 Mon Sep 17 00:00:00 2001 > From: "Steven Rostedt (VMware)" > Date: Thu, 29 Oct 2020 15:27:06 -0400 > Subject: [PATCH] ftrace: Add recording of functions that caused recursion > > This adds CONFIG_FTRACE_RECORD_RECURSION that will record to a file > "recursed_functions" all the functions that caused recursion while a > callback to the function tracer was running. > > diff --git a/include/linux/trace_recursion.h b/include/linux/trace_recursion.h > index ac3d73484cb2..1cba5fe8777a 100644 > --- a/include/linux/trace_recursion.h > +++ b/include/linux/trace_recursion.h > @@ -142,7 +142,28 @@ static __always_inline int trace_get_context_bit(void) > pc & HARDIRQ_MASK ? TRACE_CTX_IRQ : TRACE_CTX_SOFTIRQ; > } > > -static __always_inline int trace_test_and_set_recursion(int start, int max) > +#ifdef CONFIG_FTRACE_RECORD_RECURSION > +extern void ftrace_record_recursion(unsigned long ip, unsigned long parent_ip); > +/* > +* The paranoid_test check can cause dropped reports (unlikely), but > +* if the recursion is common, it will likely still be recorded later. > +* But the paranoid_test is needed to make sure we don't crash. > +*/ > +# define do_ftrace_record_recursion(ip, pip) \ > + do { \ > + static atomic_t paranoid_test; \ > + if (!atomic_read(¶noid_test)) { \ > + atomic_inc(¶noid_test); \ > + ftrace_record_recursion(ip, pip); \ > + atomic_dec(¶noid_test); \ BTW: What is actually the purpose of paranoid_test, please? It prevents nested ftrace_record_recursion() calls on the same CPU (recursion, nesting from IRQ, NMI context). Parallel calls from different CPUs are still possible: CPU0 CPU1 if (!atomic_read(¶noid_test)) if (!atomic_read(¶noid_test)) // passes // passes atomic_inc(¶noid_test); atomic_inc(¶noid_test); I do not see how a nested call could cause crash while a parallel one would be OK. > --- /dev/null > +++ b/kernel/trace/trace_recursion_record.c > @@ -0,0 +1,236 @@ > +// SPDX-License-Identifier: GPL-2.0 > + > +#include > +#include > +#include > +#include > +#include > + > +#include "trace_output.h" > + > +struct recursed_functions { > + unsigned long ip; > + unsigned long parent_ip; > +}; > + > +static struct recursed_functions recursed_functions[CONFIG_FTRACE_RECORD_RECURSION_SIZE]; > +static atomic_t nr_records; > + > +/* > + * Cache the last found function. Yes, updates to this is racey, but > + * so is memory cache ;-) > + */ > +static unsigned long cached_function; > + > +void ftrace_record_recursion(unsigned long ip, unsigned long parent_ip) > +{ > + int index = 0; > + int i; > + unsigned long old; > + > + again: > + /* First check the last one recorded */ > + if (ip == cached_function) > + return; > + > + i = atomic_read(&nr_records); > + /* nr_records is -1 when clearing records */ > + smp_mb__after_atomic(); > + if (i < 0) > + return; > + > + /* > + * If there's two writers and this writer comes in second, > + * the cmpxchg() below to update the ip will fail. Then this > + * writer will try again. It is possible that index will now > + * be greater than nr_records. This is because the writer > + * that succeeded has not updated the nr_records yet. > + * This writer could keep trying again until the other writer > + * updates nr_records. But if the other writer takes an > + * interrupt, and that interrupt locks up that CPU, we do > + * not want this CPU to lock up due to the recursion protection, > + * and have a bug report showing this CPU as the cause of > + * locking up the computer. To not lose this record, this > + * writer will simply use the next position to update the > + * recursed_functions, and it will update the nr_records > + * accordingly. > + */ > + if (index < i) > + index = i; > + if (index >= CONFIG_FTRACE_RECORD_RECURSION_SIZE) > + return; > + > + for (i = index - 1; i >= 0; i--) { > + if (recursed_functions[i].ip == ip) { > + cached_function = ip; > + return; > + } > + } > + > + cached_function = ip; > + > + /* > + * We only want to add a function if it hasn't been added before. > + * Add to the current location before incrementing the count. > + * If it fails to add, then increment the index (save in i) > + * and try again. > + */ > + old = cmpxchg(&recursed_functions[index].ip, 0, ip); > + if (old != 0) { > + /* Did something else already added this for us? */ > + if (old == ip) > + return; > + /* Try the next location (use i for the next index) */ > + index++; > + goto again; > + } > + > + recursed_functions[index].parent_ip = parent_ip; > + > + /* > + * It's still possible that we could race with the clearing > + * CPU0 CPU1 > + * ---- ---- > + * ip = func > + * nr_records = -1; > + * recursed_functions[0] = 0; > + * i = -1 > + * if (i < 0) > + * nr_records = 0; > + * (new recursion detected) > + * recursed_functions[0] = func > + * cmpxchg(recursed_functions[0], > + * func, 0) > + * > + * But the worse that could happen is that we get a zero in > + * the recursed_functions array, and it's likely that "func" will > + * be recorded again. > + */ > + i = atomic_read(&nr_records); > + smp_mb__after_atomic(); > + if (i < 0) > + cmpxchg(&recursed_functions[index].ip, ip, 0); > + else if (i <= index) > + atomic_cmpxchg(&nr_records, i, index + 1); Are you aware of the following race, please? CPU0 CPU1 ftrace_record_recursion() i = atomic_read(&nr_records); // i = 20 (for example) if (i < index) index = i; // index = 20; recursed_function_open() atomic_set(&nr_records, -1); memset(recursed_functions, 0, ); atomic_set(&nr_records, 0); // successfully store ip at index == 20 cmpxchg(&recursed_functions[index].ip, 0, ip); recursed_functions[index].parent_ip = parent_ip; // check race with clearing i = atomic_read(&nr_records); // i == 0 if (i < 0) // no else atomic_cmpxchg(&nr_records, i, index + 1); RESULT: + nr_records == 21 + and slots 0..19 are zeroed I played with the code and ended with head entangled by chicken & egg like problems. I believe that a solution might be a combined atomic variable from nr_records + cleanup_count. ftrace_record_recursion() would be allowed to increase nr_records only when cleanup_count is still the same. cleanup_count would be incremented together with clearing nr_records. Well, I am not sure if it is worth the effort. The race is rather theoretical. In the worst case, the cache might contain many zero values. Anyway, it is yet another experience for me that lockless algorithms are more tricky that one would expect. Best Regards, Petr