From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.4 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12DDFC432C2 for ; Wed, 25 Sep 2019 23:43:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D1B5021D7B for ; Wed, 25 Sep 2019 23:43:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Xd94kAYY" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733045AbfIYXna (ORCPT ); Wed, 25 Sep 2019 19:43:30 -0400 Received: from mail-pl1-f202.google.com ([209.85.214.202]:52041 "EHLO mail-pl1-f202.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1733001AbfIYXnX (ORCPT ); Wed, 25 Sep 2019 19:43:23 -0400 Received: by mail-pl1-f202.google.com with SMTP id 99so260259plc.18 for ; Wed, 25 Sep 2019 16:43:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=FpMTONf9jSbZVp/Ti1EaLgmO9nC11TmbgGftl+3PnOg=; b=Xd94kAYYME5hfFpdpbnsQg5ljRv8Qr/2aDSVVIWujRZV8DcbjVdowRQrZkwyXE7ZAf PsUJW8UIj7R9IYyn/klgxHUEWDrFWKvHHbFmE7pqzRQsuSnEEAbhqLQo9HsuC4V/gOCZ VhWvuEfVfkkW0HqEP7JU06o4gYLeN20Jc2S/3+Zl6januyJ4AQltm51bgOu8ctSNQ+Nw RHBZg8PBnkujl5T3oG7r5+Pjf7efoWSqeu4fsrvSIoii+8LyRmBdkrQyp02ua0RA9XUm IG/7KIKM/KIj0UN902pRhd05emQdifzS7xOp6rGKmi4aAei5loASuBV44XXyVdmJVeQo C31Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=FpMTONf9jSbZVp/Ti1EaLgmO9nC11TmbgGftl+3PnOg=; b=CZa5Fd51tW/KTuTAHRI1+5ww1aLWFwIZf6JAvH4ZL34JAzV6c1fNIS1nzZc2e7UeHo 3hRd54fLskyJQmbkIVYaHy/ns4cSVg+GA9cVAlfIyowS1E9P7IpzrTU3F8F0ue9j9vkK Q4gHz11Kz2eKjxFN3k8e9nUIhcbCxV3ABd+WnS48xAqmntQ3FV1CACLzc87lW6eeZtpA 7ZkzsF0qlai+5qhABJN8miuIYiWNqCdsMutE44RES3YY5OIVWkt+7Bhk0nbOF1xSn0XL 42V2UMkGxzXsDbch6bGBwIwcRiByQ3PzW/93ztdevYmvEBPCWPSY5G649tRp9wZAkW3E ImLg== X-Gm-Message-State: APjAAAUVTtDS8XtGgCTuC1DNOSEHaIArhkMJtWG7Sea22um22LXQjbW5 T9rMj7x0n+Sjqh8Kf55D/ej+E4/JZQTPC3E/ X-Google-Smtp-Source: APXvYqzge8vN4NVhKPBt/B36x1Xuk/WmMjWR8uBN2di5VXdnfgKMfW2UV0huV5Z4RBc9/qeU1rU+4xYJDrzhhzqm X-Received: by 2002:a63:e745:: with SMTP id j5mr417957pgk.302.1569455001448; Wed, 25 Sep 2019 16:43:21 -0700 (PDT) Date: Wed, 25 Sep 2019 16:43:12 -0700 In-Reply-To: <20190925234312.94063-1-allanzhang@google.com> Message-Id: <20190925234312.94063-2-allanzhang@google.com> Mime-Version: 1.0 References: <20190925234312.94063-1-allanzhang@google.com> X-Mailer: git-send-email 2.23.0.351.gc4317032e6-goog Subject: [PATCH 1/1] bpf: Fix bpf_event_output re-entry issue From: Allan Zhang To: daniel@iogearbox.net, songliubraving@fb.com, netdev@vger.kernel.org, bpf@vger.kernel.org Cc: linux-kernel@vger.kernel.org, Allan Zhang , Stanislav Fomichev , Eric Dumazet Content-Type: text/plain; charset="UTF-8" Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org BPF_PROG_TYPE_SOCK_OPS program can reenter bpf_event_output because it can be called from atomic and non-atomic contexts since we don't have bpf_prog_active to prevent it happen. This patch enables 3 level of nesting to support normal, irq and nmi context. We can easily reproduce the issue by running neper crr mode with 100 flows and 10 threads from neper client side. Here is the whole stack dump: [ 515.228898] WARNING: CPU: 20 PID: 14686 at kernel/trace/bpf_trace.c:549 bpf_event_output+0x1f9/0x220 [ 515.228903] CPU: 20 PID: 14686 Comm: tcp_crr Tainted: G W 4.15.0-smp-fixpanic #44 [ 515.228904] Hardware name: Intel TBG,ICH10/Ikaria_QC_1b, BIOS 1.22.0 06/04/2018 [ 515.228905] RIP: 0010:bpf_event_output+0x1f9/0x220 [ 515.228906] RSP: 0018:ffff9a57ffc03938 EFLAGS: 00010246 [ 515.228907] RAX: 0000000000000012 RBX: 0000000000000001 RCX: 0000000000000000 [ 515.228907] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffffffff836b0f80 [ 515.228908] RBP: ffff9a57ffc039c8 R08: 0000000000000004 R09: 0000000000000012 [ 515.228908] R10: ffff9a57ffc1de40 R11: 0000000000000000 R12: 0000000000000002 [ 515.228909] R13: ffff9a57e13bae00 R14: 00000000ffffffff R15: ffff9a57ffc1e2c0 [ 515.228910] FS: 00007f5a3e6ec700(0000) GS:ffff9a57ffc00000(0000) knlGS:0000000000000000 [ 515.228910] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 515.228911] CR2: 0000537082664fff CR3: 000000061fed6002 CR4: 00000000000226f0 [ 515.228911] Call Trace: [ 515.228913] [ 515.228919] [] bpf_sockopt_event_output+0x3b/0x50 [ 515.228923] [] ? bpf_ktime_get_ns+0xe/0x10 [ 515.228927] [] ? __cgroup_bpf_run_filter_sock_ops+0x85/0x100 [ 515.228930] [] ? tcp_init_transfer+0x125/0x150 [ 515.228933] [] ? tcp_finish_connect+0x89/0x110 [ 515.228936] [] ? tcp_rcv_state_process+0x704/0x1010 [ 515.228939] [] ? sk_filter_trim_cap+0x53/0x2a0 [ 515.228942] [] ? tcp_v6_inbound_md5_hash+0x6f/0x1d0 [ 515.228945] [] ? tcp_v6_do_rcv+0x1c0/0x460 [ 515.228947] [] ? tcp_v6_rcv+0x9f8/0xb30 [ 515.228951] [] ? ip6_route_input+0x190/0x220 [ 515.228955] [] ? ip6_protocol_deliver_rcu+0x6d/0x450 [ 515.228958] [] ? ip6_rcv_finish+0xb6/0x170 [ 515.228961] [] ? ip6_protocol_deliver_rcu+0x450/0x450 [ 515.228963] [] ? ipv6_rcv+0x61/0xe0 [ 515.228966] [] ? ipv6_list_rcv+0x330/0x330 [ 515.228969] [] ? __netif_receive_skb_one_core+0x5b/0xa0 [ 515.228972] [] ? __netif_receive_skb+0x21/0x70 [ 515.228975] [] ? process_backlog+0xb2/0x150 [ 515.228978] [] ? net_rx_action+0x16f/0x410 [ 515.228982] [] ? __do_softirq+0xdd/0x305 [ 515.228986] [] ? irq_exit+0x9c/0xb0 [ 515.228989] [] ? smp_call_function_single_interrupt+0x65/0x120 [ 515.228991] [] ? call_function_single_interrupt+0x81/0x90 [ 515.228992] [ 515.228996] [] ? io_serial_in+0x20/0x20 [ 515.229000] [] ? console_unlock+0x230/0x490 [ 515.229003] [] ? vprintk_emit+0x26a/0x2a0 [ 515.229006] [] ? vprintk_default+0x1f/0x30 [ 515.229008] [] ? vprintk_func+0x35/0x70 [ 515.229011] [] ? printk+0x50/0x66 [ 515.229013] [] ? bpf_event_output+0xb7/0x220 [ 515.229016] [] ? bpf_sockopt_event_output+0x3b/0x50 [ 515.229019] [] ? bpf_ktime_get_ns+0xe/0x10 [ 515.229023] [] ? release_sock+0x97/0xb0 [ 515.229026] [] ? tcp_recvmsg+0x31a/0xda0 [ 515.229029] [] ? __cgroup_bpf_run_filter_sock_ops+0x85/0x100 [ 515.229032] [] ? tcp_set_state+0x191/0x1b0 [ 515.229035] [] ? tcp_disconnect+0x2e/0x600 [ 515.229038] [] ? tcp_close+0x3eb/0x460 [ 515.229040] [] ? inet_release+0x42/0x70 [ 515.229043] [] ? inet6_release+0x39/0x50 [ 515.229046] [] ? __sock_release+0x4d/0xd0 [ 515.229049] [] ? sock_close+0x15/0x20 [ 515.229052] [] ? __fput+0xe7/0x1f0 [ 515.229055] [] ? ____fput+0xe/0x10 [ 515.229058] [] ? task_work_run+0x82/0xb0 [ 515.229061] [] ? exit_to_usermode_loop+0x7e/0x11f [ 515.229064] [] ? do_syscall_64+0x111/0x130 [ 515.229067] [] ? entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Fixes: a5a3a828cd00 ("bpf: add perf event notificaton support for sock_ops") Effort: BPF Signed-off-by: Allan Zhang Reviewed-by: Stanislav Fomichev Reviewed-by: Eric Dumazet --- kernel/trace/bpf_trace.c | 26 +++++++++++++++++++++----- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index ca1255d14576..3e38a010003c 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -500,14 +500,17 @@ static const struct bpf_func_proto bpf_perf_event_output_proto = { .arg5_type = ARG_CONST_SIZE_OR_ZERO, }; -static DEFINE_PER_CPU(struct pt_regs, bpf_pt_regs); -static DEFINE_PER_CPU(struct perf_sample_data, bpf_misc_sd); +static DEFINE_PER_CPU(int, bpf_event_output_nest_level); +struct bpf_nested_pt_regs { + struct pt_regs regs[3]; +}; +static DEFINE_PER_CPU(struct bpf_nested_pt_regs, bpf_pt_regs); +static DEFINE_PER_CPU(struct bpf_trace_sample_data, bpf_misc_sds); u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size, void *ctx, u64 ctx_size, bpf_ctx_copy_t ctx_copy) { - struct perf_sample_data *sd = this_cpu_ptr(&bpf_misc_sd); - struct pt_regs *regs = this_cpu_ptr(&bpf_pt_regs); + int nest_level = this_cpu_inc_return(bpf_event_output_nest_level); struct perf_raw_frag frag = { .copy = ctx_copy, .size = ctx_size, @@ -522,12 +525,25 @@ u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size, .data = meta, }, }; + struct perf_sample_data *sd; + struct pt_regs *regs; + u64 ret; + + if (WARN_ON_ONCE(nest_level > ARRAY_SIZE(bpf_misc_sds.sds))) { + ret = -EBUSY; + goto out; + } + sd = this_cpu_ptr(&bpf_misc_sds.sds[nest_level - 1]); + regs = this_cpu_ptr(&bpf_pt_regs.regs[nest_level - 1]); perf_fetch_caller_regs(regs); perf_sample_data_init(sd, 0, 0); sd->raw = &raw; - return __bpf_perf_event_output(regs, map, flags, sd); + ret = __bpf_perf_event_output(regs, map, flags, sd); +out: + this_cpu_dec(bpf_event_output_nest_level); + return ret; } BPF_CALL_0(bpf_get_current_task) -- 2.23.0.351.gc4317032e6-goog