From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F3D5C433ED for ; Wed, 21 Apr 2021 13:41:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 01DC3611F2 for ; Wed, 21 Apr 2021 13:41:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240650AbhDUNlh (ORCPT ); Wed, 21 Apr 2021 09:41:37 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:36134 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229536AbhDUNlg (ORCPT ); Wed, 21 Apr 2021 09:41:36 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1619012463; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=STcz0ul+ETL19QfN18GAZ+r8q/bNd6kMm1/aoWNmSeA=; b=CPiUrCPqU6QcA9y7f0E5/DiAWq+qeeUJC67rlTu3wfXk4k96sBOckZODZSuNtIJKsOfCEZ HyC09RZKGq0ddWWepkc+rFYpwE2zZX00//6XkWgkoexmwmLxPIYqyDK/4XyW2U3NANwe/l 3CZh5DMgr/Y2dmeWgkUGasSxdIEu+Nw= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-251-I3NGd9NBN_iruWEv8XyJwg-1; Wed, 21 Apr 2021 09:40:51 -0400 X-MC-Unique: I3NGd9NBN_iruWEv8XyJwg-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id B1AC21020C32; Wed, 21 Apr 2021 13:40:48 +0000 (UTC) Received: from krava (unknown [10.40.195.227]) by smtp.corp.redhat.com (Postfix) with SMTP id 583815D9C0; Wed, 21 Apr 2021 13:40:38 +0000 (UTC) Date: Wed, 21 Apr 2021 15:40:37 +0200 From: Jiri Olsa To: Alexei Starovoitov Cc: Steven Rostedt , Masami Hiramatsu , Andrii Nakryiko , Jiri Olsa , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Networking , bpf , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Daniel Xu , Jesper Brouer , Toke =?iso-8859-1?Q?H=F8iland-J=F8rgensen?= , Viktor Malik Subject: Re: [PATCHv2 RFC bpf-next 0/7] bpf: Add support for ftrace probe Message-ID: References: <20210415111002.324b6bfa@gandalf.local.home> <20210415170007.31420132@gandalf.local.home> <20210417000304.fc987dc00d706e7551b29c04@kernel.org> <20210416124834.05862233@gandalf.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Tue, Apr 20, 2021 at 04:38:45PM -0700, Alexei Starovoitov wrote: SNIP > > > > > > I don't see how you can do it without BTF. > > > The mass-attach feature should prepare generic 6 or so arguments > > > from all functions it attached to. > > > On x86-64 it's trivial because 6 regs are the same. > > > On arm64 is now more challenging since return value regs overlaps with > > > first argument, so bpf trampoline (when it's ready for arm64) will look > > > a bit different than bpf trampoline on x86-64 to preserve arg0, arg1, > > > ..arg6, ret > > > 64-bit values that bpf prog expects to see. > > > On x86-32 it's even more trickier, since the same 6 args need to be copied > > > from a combination of regs and stack. > > > This is not some hypothetical case. We already use BTF in x86-32 JIT > > > and btf_func_model was introduced specifically to handle such cases. > > > So I really don't see how ftrace can do that just yet. It has to understand BTF > > > of all of the funcs it attaches to otherwise it's just saving all regs. > > > That approach was a pain to deal with. > > > > ok, my idea was to get regs from the ftrace and have arch specific code > > to prepare 6 (or less) args for ebpf program.. that part would be > > already in bpf code > > > > so you'd like to see this functionality directly in ftrace, so we don't > > save unneeded regs, is that right? > > What do you mean by "already in bpf code" ? that it would not be part of ftrace code > > The main question is an api across layers. > If ftrace doesn't use BTF it has to prepare all regs that could be used. > Meaning on x86-64 that has to be 6 regs for args, 1 reg for return and > stack pointer. > That would be enough to discover input args and return value in fexit. > On arm64 that has to be similar, but while x86-64 can do with single pt_regs > where %rax is updated on fexit, arm64 cannot do so, since the same register > is used as arg1 and as a return value. > The most generic api between ftrace and bpf layers would be two sets of > pt_regs. One on entry and one on exit, but that's going to be very expensive. that's what I was going for and I think it's the only way if we use ftrace graph_ops for mass attaching > On x86-32 it would have to be 3 regs plus stack pointer and another 2 regs > to cover all input args and return value. > So there will be plenty of per-arch differences. > > Jiri, if you're thinking of a bpf helper like: > u64 bpf_read_argN(pt_regs, ip, arg_num) > that will do lookup of btf_id from ip, then it will parse btf_id and > function proto, > then it will translate that to btf_func_model and finally will extract the right > argument value from a combination of stack and regs ? > That's doable, but it's a lot of run-time overhead. > It would be usable by bpf progs that don't care much about run-time perf > and don't care that they're not usable 24/7 on production systems. > Such tools exist and they're useful, > but I'd like this mass-attach facility to be usable everywhere > including the production and 24/7 tracing. I did not think of this option, but yep, seems also expensive > Hence I think it's better to do this per-arch translation during bpf > prog attach. > That's exactly what bpf trampoline is doing. > Currently it's doing for single btf_id, single trampoline, and single bpf prog. > To make the same logic work across N attach points the trampoline logic > would need to iterate all btf_func_model-s of all btf_id-s and generate > M trampolines (where M < N) for a combination of possible argument passing. > On x86-64 the M will be equal to 1. On arm64 it will be equal to 1 as well. > But on x86-32 it will depend on a set of btf_ids. It could be 1,2,..10. > Since bpf doesn't allow to attach to struct-by-value it's only 32-bit and 64-bit > integers to deal with and number of combinations of possible calling conventions > is actually very small. I suspect it won't be more than 10. > This way there will be no additional run-time overhead and bpf programs > can be portable. They will work as-is on x86-64, x86-32, arm64. > Just like fentry/fexit work today. Or rather they will be portable > when bpf trampoline is supported on these archs. > This portability is the key feature of bpf trampoline design. The bpf trampoline > was implemented for x86-64 only so far. Arm64 patches are still wip. > btf_func_model is used by both x86-64 and x86-32 JITs. ok, I understand why this would be the best solution for calling the program from multiple probes I think it's the 'attach' layer which is the source of problems currently there is ftrace's fgraph_ops support that allows fast mass attach and calls callbacks for functions entry and exit: https://lore.kernel.org/lkml/20190525031633.811342628@goodmis.org/ these callbacks get ip/parent_ip and can get pt_regs (that's not implemented at the moment) but that gets us to the situation of having full pt_regs on both entry/exit callbacks that you described above and want to avoid, but I think it's the price for having this on top of generic tracing layer the way ftrace's fgraph_ops is implemented, I'm not sure it can be as fast as current bpf entry/exit trampoline but to better understand the pain points I think I'll try to implement the 'mass trampolines' call to the bpf program you described above and attach it for now to fgraph_ops callbacks perhaps this is a good topic to discuss in one of the Thursday's BPF mtg? thanks, jirka