From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9898AC4332F for ; Tue, 13 Dec 2022 14:09:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235857AbiLMOJn (ORCPT ); Tue, 13 Dec 2022 09:09:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36234 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235896AbiLMOJA (ORCPT ); Tue, 13 Dec 2022 09:09:00 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2C77C639A for ; Tue, 13 Dec 2022 06:08:50 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id B43EC6154C for ; Tue, 13 Dec 2022 14:08:49 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id F1FE6C433D2; Tue, 13 Dec 2022 14:08:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1670940529; bh=+KAyifsdNsFUdcOitS6tkBdqEqiit/PqqTCsG5drDeo=; h=From:To:Cc:Subject:Date:From; b=ruTtT1smoFSxj2PeK8gls3GGwev1uknK+nXB4oWju5JzXOuXzjSZXlEOV07GkEAOR U2c7kql+l7rm4HUzTjEM4gW+mWeUnS2/0Y4JTlKQfjK8jgHYo1+ORz0OPk4r1hG7vD yLfIDJ31IUxk6CUWPYrmap7qu5eYhpgM3aKpfakss7GdLw2AgyGJSvRcsoRCpxqS2D zhbAd0FOMtDSB0LBog63KFiNssQhDBBlIT4q9WmgyIyme3xqW5eg3hTPX2Cs0OJ0vD VUx0wkwP/zHD6yWOxqkQSNok1IcWVWLaISB8AhVhiKn8eg9uip/Ek7sieBrdNXZKoD yE+HnK0o2JuQg== From: Jiri Olsa To: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko Cc: Hao Sun , bpf@vger.kernel.org, Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo Subject: [PATCH bpf-next] bpf: Remove trace_printk_lock lock Date: Tue, 13 Dec 2022 15:08:43 +0100 Message-Id: <20221213140843.803293-1-jolsa@kernel.org> X-Mailer: git-send-email 2.38.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Both bpf_trace_printk and bpf_trace_vprintk helpers use static buffer guarded with trace_printk_lock spin lock. The spin lock contention causes issues with bpf programs attached to contention_begin tracepoint [1] [2]. Andrii suggested we could get rid of the contention by using trylock, but we could actually get rid of the spinlock completely by using percpu buffers the same way as for bin_args in bpf_bprintf_prepare function. Adding 4 per cpu buffers (1k each) which should be enough for all possible nesting contexts (normal, softirq, irq, nmi) or possible (yet unlikely) probe within the printk helpers. In very unlikely case we'd run out of the nesting levels the printk will be omitted. [1] https://lore.kernel.org/bpf/CACkBjsakT_yWxnSWr4r-0TpPvbKm9-OBmVUhJb7hV3hY8fdCkw@mail.gmail.com/ [2] https://lore.kernel.org/bpf/CACkBjsaCsTovQHFfkqJKto6S4Z8d02ud1D7MPESrHa1cVNNTrw@mail.gmail.com/ Reported-by: Hao Sun Suggested-by: Andrii Nakryiko Signed-off-by: Jiri Olsa --- kernel/trace/bpf_trace.c | 61 +++++++++++++++++++++++++++++++--------- 1 file changed, 47 insertions(+), 14 deletions(-) diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 3bbd3f0c810c..b9287b3a5540 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -369,33 +369,62 @@ static const struct bpf_func_proto *bpf_get_probe_write_proto(void) return &bpf_probe_write_user_proto; } -static DEFINE_RAW_SPINLOCK(trace_printk_lock); - #define MAX_TRACE_PRINTK_VARARGS 3 #define BPF_TRACE_PRINTK_SIZE 1024 +#define BPF_TRACE_PRINTK_LEVELS 4 + +struct trace_printk_buf { + char data[BPF_TRACE_PRINTK_LEVELS][BPF_TRACE_PRINTK_SIZE]; + int level; +}; +static DEFINE_PER_CPU(struct trace_printk_buf, printk_buf); + +static void put_printk_buf(struct trace_printk_buf __percpu *buf) +{ + if (WARN_ON_ONCE(this_cpu_read(buf->level) == 0)) + return; + this_cpu_dec(buf->level); + preempt_enable(); +} + +static bool get_printk_buf(struct trace_printk_buf __percpu *buf, char **data) +{ + int level; + + preempt_disable(); + level = this_cpu_inc_return(buf->level); + if (level > BPF_TRACE_PRINTK_LEVELS) { + put_printk_buf(buf); + return false; + } + *data = (char *) this_cpu_ptr(&buf->data[level - 1]); + return true; +} BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, fmt_size, u64, arg1, u64, arg2, u64, arg3) { u64 args[MAX_TRACE_PRINTK_VARARGS] = { arg1, arg2, arg3 }; u32 *bin_args; - static char buf[BPF_TRACE_PRINTK_SIZE]; - unsigned long flags; + char *buf; int ret; + if (!get_printk_buf(&printk_buf, &buf)) + return -EBUSY; + ret = bpf_bprintf_prepare(fmt, fmt_size, args, &bin_args, MAX_TRACE_PRINTK_VARARGS); if (ret < 0) - return ret; + goto out; - raw_spin_lock_irqsave(&trace_printk_lock, flags); - ret = bstr_printf(buf, sizeof(buf), fmt, bin_args); + ret = bstr_printf(buf, BPF_TRACE_PRINTK_SIZE, fmt, bin_args); trace_bpf_trace_printk(buf); - raw_spin_unlock_irqrestore(&trace_printk_lock, flags); bpf_bprintf_cleanup(); +out: + put_printk_buf(&printk_buf); return ret; } @@ -427,31 +456,35 @@ const struct bpf_func_proto *bpf_get_trace_printk_proto(void) return &bpf_trace_printk_proto; } +static DEFINE_PER_CPU(struct trace_printk_buf, vprintk_buf); + BPF_CALL_4(bpf_trace_vprintk, char *, fmt, u32, fmt_size, const void *, data, u32, data_len) { - static char buf[BPF_TRACE_PRINTK_SIZE]; - unsigned long flags; int ret, num_args; u32 *bin_args; + char *buf; if (data_len & 7 || data_len > MAX_BPRINTF_VARARGS * 8 || (data_len && !data)) return -EINVAL; num_args = data_len / 8; + if (!get_printk_buf(&vprintk_buf, &buf)) + return -EBUSY; + ret = bpf_bprintf_prepare(fmt, fmt_size, data, &bin_args, num_args); if (ret < 0) - return ret; + goto out; - raw_spin_lock_irqsave(&trace_printk_lock, flags); - ret = bstr_printf(buf, sizeof(buf), fmt, bin_args); + ret = bstr_printf(buf, BPF_TRACE_PRINTK_SIZE, fmt, bin_args); trace_bpf_trace_printk(buf); - raw_spin_unlock_irqrestore(&trace_printk_lock, flags); bpf_bprintf_cleanup(); +out: + put_printk_buf(&vprintk_buf); return ret; } -- 2.38.1