All of lore.kernel.org
 help / color / mirror / Atom feed
From: Guo Ren <guoren@kernel.org>
To: Andy Chiu <andy.chiu@sifive.com>
Cc: "Puranjay Mohan" <puranjay12@gmail.com>,
	"Björn Töpel" <bjorn@kernel.org>,
	"Paul Walmsley" <paul.walmsley@sifive.com>,
	"Palmer Dabbelt" <palmer@dabbelt.com>,
	"Albert Ou" <aou@eecs.berkeley.edu>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Masami Hiramatsu" <mhiramat@kernel.org>,
	"Mark Rutland" <mark.rutland@arm.com>,
	"Sami Tolvanen" <samitolvanen@google.com>,
	"Ley Foon Tan" <leyfoon.tan@starfivetech.com>,
	"Deepak Gupta" <debug@rivosinc.com>,
	"Sia Jee Heng" <jeeheng.sia@starfivetech.com>,
	"Björn Töpel" <bjorn@rivosinc.com>,
	"Song Shuai" <suagrfillet@gmail.com>,
	"Clément Léger" <cleger@rivosinc.com>,
	"Al Viro" <viro@zeniv.linux.org.uk>,
	"Jisheng Zhang" <jszhang@kernel.org>,
	linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org,
	linux-trace-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] riscv: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
Date: Sat, 9 Mar 2024 20:37:51 -0500	[thread overview]
Message-ID: <Ze0O77ze56DGffzD@gmail.com> (raw)
In-Reply-To: <CABgGipUJU_joVOjPi4WZ4JJM72zaSrCA1QUAaP8hob3+LXkS0g@mail.gmail.com>

On Fri, Mar 08, 2024 at 05:18:21PM +0800, Andy Chiu wrote:
> Hi Puranjay,
> 
> On Fri, Mar 8, 2024 at 3:53 AM Puranjay Mohan <puranjay12@gmail.com> wrote:
> >
> > Hi Björn,
> >
> > On Thu, Mar 7, 2024 at 8:27 PM Björn Töpel <bjorn@kernel.org> wrote:
> > >
> > > Puranjay!
> > >
> > > Puranjay Mohan <puranjay12@gmail.com> writes:
> > >
> > > > This patch enables support for DYNAMIC_FTRACE_WITH_CALL_OPS on RISC-V.
> > > > This allows each ftrace callsite to provide an ftrace_ops to the common
> > > > ftrace trampoline, allowing each callsite to invoke distinct tracer
> > > > functions without the need to fall back to list processing or to
> > > > allocate custom trampolines for each callsite. This significantly speeds
> > > > up cases where multiple distinct trace functions are used and callsites
> > > > are mostly traced by a single tracer.
> > > >
> > > > The idea and most of the implementation is taken from the ARM64's
> > > > implementation of the same feature. The idea is to place a pointer to
> > > > the ftrace_ops as a literal at a fixed offset from the function entry
> > > > point, which can be recovered by the common ftrace trampoline.
> > >
> > > Not really a review, but some more background; Another rationale (on-top
> > > of the improved per-call performance!) for CALL_OPS was to use it to
> > > build ftrace direct call support (which BPF uses a lot!). Mark, please
> > > correct me if I'm lying here!
> > >
> > > On Arm64, CALL_OPS makes it possible to implement direct calls, while
> > > only patching one BL instruction -- nice!
> > >
> > > On RISC-V we cannot use use the same ideas as Arm64 straight off,
> > > because the range of jal (compare to BL) is simply too short (+/-1M).
> > > So, on RISC-V we need to use a full auipc/jal pair (the text patching
> > > story is another chapter, but let's leave that aside for now). Since we
> > > have to patch multiple instructions, the cmodx situation doesn't really
> > > improve with CALL_OPS.
> > >
> > > Let's say that we continue building on your patch and implement direct
> > > calls on CALL_OPS for RISC-V as well.
> > >
> > > From Florent's commit message for direct calls:
> > >
> > >   |    There are a few cases to distinguish:
> > >   |    - If a direct call ops is the only one tracing a function:
> > >   |      - If the direct called trampoline is within the reach of a BL
> > >   |        instruction
> > >   |         -> the ftrace patchsite jumps to the trampoline
> > >   |      - Else
> > >   |         -> the ftrace patchsite jumps to the ftrace_caller trampoline which
> > >   |            reads the ops pointer in the patchsite and jumps to the direct
> > >   |            call address stored in the ops
> > >   |    - Else
> > >   |      -> the ftrace patchsite jumps to the ftrace_caller trampoline and its
> > >   |         ops literal points to ftrace_list_ops so it iterates over all
> > >   |         registered ftrace ops, including the direct call ops and calls its
> > >   |         call_direct_funcs handler which stores the direct called
> > >   |         trampoline's address in the ftrace_regs and the ftrace_caller
> > >   |         trampoline will return to that address instead of returning to the
> > >   |         traced function
> > >
> > > On RISC-V, where auipc/jalr is used, the direct called trampoline would
> > > always be reachable, and then first Else-clause would never be entered.
> > > This means the the performance for direct calls would be the same as the
> > > one we have today (i.e. no regression!).
> > >
> > > RISC-V does like x86 does (-ish) -- patch multiple instructions, long
> > > reach.
> > >
> > > Arm64 uses CALL_OPS and patch one instruction BL.
> > >
> > > Now, with this background in mind, compared to what we have today,
> > > CALL_OPS would give us (again assuming we're using it for direct calls):
> > >
> > > * Better performance for tracer per-call (faster ops lookup) GOOD
> >
> > ^ this was the only motivation for me to implement this patch.
> >
> > I don't think implementing direct calls over call ops is fruitful for
> > RISC-V because once
> > the auipc/jalr can be patched atomically, the direct call trampoline
> > is always reachable.
> 
> Yes, the auipc/jalr instruction pair can be patched atomically just as
> long as their size is naturally aligned on. However, we cannot prevent
> 2 instructions from being fetched atomically :P
There are some micro-arch optimization methods here, such as:
 - Disable interrupt when auipc retired.
 - When auipc -> auipc, the second one still could cause an
   interruption.

> 
> > Solving the atomic text patching problem would be fun!! I am eager to
> > see how it will be
> > solved.
> 
> I have a patch series to solve the atomic code patching issue, which I
> am about to respin [1]. The idea is to solve it with yet another layer
> of indirection. We add a 8-B aligned space at each function entry. The
> space is a pointer to the ftrace entry. During boot, each function
> entry code is updated to perform a load and then take the jump from
> the 8-B space. When ftrace is disabled, we patch the first 4B-aligned
> instruction to a jump so as to skip the ftrace entry.
> 
> We are still discussing with Alex to see if we have a better way to do
> it. If not then I'd update the patchset and re-send it. There's a
> pending improvement in the series to reduce complexity. The 8-B
> aligned space can be added before the function entry (just like your
> patch).
> 
> >
> > > * Larger text size (function alignment + extra nops) BAD
> > > * Same direct call performance NEUTRAL
> > > * Same complicated text patching required NEUTRAL
> > >
> > > It would be interesting to see how the per-call performance would
> > > improve on x86 with CALL_OPS! ;-)
> >
> > If I remember from Steven's talk, x86 uses dynamically allocated trampolines
> > for per callsite tracers, would CALL_OPS provide better performance than that?
> >
> > >
> > > I'm trying to wrap my head if it makes sense to have it on RISC-V, given
> > > that we're a bit different from Arm64. Does the scale tip to the GOOD
> > > side?
> > >
> > > Oh, and we really need to see performance numbers on real HW! I have a
> > > VF2 that I could try this series on.
> >
> > It would be great if you can do it :D.
> >
> > Thanks,
> > Puranjay
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
> 
> - [1] https://yhbt.net/lore/all/CAJF2gTSn3_cDYsF9D8dt-br2Wf_M8y02A09xgRq8kXi91sN3Aw@mail.gmail.com/T/
> 
> Regards,
> Andy
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

WARNING: multiple messages have this Message-ID (diff)
From: Guo Ren <guoren@kernel.org>
To: Andy Chiu <andy.chiu@sifive.com>
Cc: "Puranjay Mohan" <puranjay12@gmail.com>,
	"Björn Töpel" <bjorn@kernel.org>,
	"Paul Walmsley" <paul.walmsley@sifive.com>,
	"Palmer Dabbelt" <palmer@dabbelt.com>,
	"Albert Ou" <aou@eecs.berkeley.edu>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Masami Hiramatsu" <mhiramat@kernel.org>,
	"Mark Rutland" <mark.rutland@arm.com>,
	"Sami Tolvanen" <samitolvanen@google.com>,
	"Ley Foon Tan" <leyfoon.tan@starfivetech.com>,
	"Deepak Gupta" <debug@rivosinc.com>,
	"Sia Jee Heng" <jeeheng.sia@starfivetech.com>,
	"Björn Töpel" <bjorn@rivosinc.com>,
	"Song Shuai" <suagrfillet@gmail.com>,
	"Clément Léger" <cleger@rivosinc.com>,
	"Al Viro" <viro@zeniv.linux.org.uk>,
	"Jisheng Zhang" <jszhang@kernel.org>,
	linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org,
	linux-trace-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] riscv: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
Date: Sat, 9 Mar 2024 20:37:51 -0500	[thread overview]
Message-ID: <Ze0O77ze56DGffzD@gmail.com> (raw)
In-Reply-To: <CABgGipUJU_joVOjPi4WZ4JJM72zaSrCA1QUAaP8hob3+LXkS0g@mail.gmail.com>

On Fri, Mar 08, 2024 at 05:18:21PM +0800, Andy Chiu wrote:
> Hi Puranjay,
> 
> On Fri, Mar 8, 2024 at 3:53 AM Puranjay Mohan <puranjay12@gmail.com> wrote:
> >
> > Hi Björn,
> >
> > On Thu, Mar 7, 2024 at 8:27 PM Björn Töpel <bjorn@kernel.org> wrote:
> > >
> > > Puranjay!
> > >
> > > Puranjay Mohan <puranjay12@gmail.com> writes:
> > >
> > > > This patch enables support for DYNAMIC_FTRACE_WITH_CALL_OPS on RISC-V.
> > > > This allows each ftrace callsite to provide an ftrace_ops to the common
> > > > ftrace trampoline, allowing each callsite to invoke distinct tracer
> > > > functions without the need to fall back to list processing or to
> > > > allocate custom trampolines for each callsite. This significantly speeds
> > > > up cases where multiple distinct trace functions are used and callsites
> > > > are mostly traced by a single tracer.
> > > >
> > > > The idea and most of the implementation is taken from the ARM64's
> > > > implementation of the same feature. The idea is to place a pointer to
> > > > the ftrace_ops as a literal at a fixed offset from the function entry
> > > > point, which can be recovered by the common ftrace trampoline.
> > >
> > > Not really a review, but some more background; Another rationale (on-top
> > > of the improved per-call performance!) for CALL_OPS was to use it to
> > > build ftrace direct call support (which BPF uses a lot!). Mark, please
> > > correct me if I'm lying here!
> > >
> > > On Arm64, CALL_OPS makes it possible to implement direct calls, while
> > > only patching one BL instruction -- nice!
> > >
> > > On RISC-V we cannot use use the same ideas as Arm64 straight off,
> > > because the range of jal (compare to BL) is simply too short (+/-1M).
> > > So, on RISC-V we need to use a full auipc/jal pair (the text patching
> > > story is another chapter, but let's leave that aside for now). Since we
> > > have to patch multiple instructions, the cmodx situation doesn't really
> > > improve with CALL_OPS.
> > >
> > > Let's say that we continue building on your patch and implement direct
> > > calls on CALL_OPS for RISC-V as well.
> > >
> > > From Florent's commit message for direct calls:
> > >
> > >   |    There are a few cases to distinguish:
> > >   |    - If a direct call ops is the only one tracing a function:
> > >   |      - If the direct called trampoline is within the reach of a BL
> > >   |        instruction
> > >   |         -> the ftrace patchsite jumps to the trampoline
> > >   |      - Else
> > >   |         -> the ftrace patchsite jumps to the ftrace_caller trampoline which
> > >   |            reads the ops pointer in the patchsite and jumps to the direct
> > >   |            call address stored in the ops
> > >   |    - Else
> > >   |      -> the ftrace patchsite jumps to the ftrace_caller trampoline and its
> > >   |         ops literal points to ftrace_list_ops so it iterates over all
> > >   |         registered ftrace ops, including the direct call ops and calls its
> > >   |         call_direct_funcs handler which stores the direct called
> > >   |         trampoline's address in the ftrace_regs and the ftrace_caller
> > >   |         trampoline will return to that address instead of returning to the
> > >   |         traced function
> > >
> > > On RISC-V, where auipc/jalr is used, the direct called trampoline would
> > > always be reachable, and then first Else-clause would never be entered.
> > > This means the the performance for direct calls would be the same as the
> > > one we have today (i.e. no regression!).
> > >
> > > RISC-V does like x86 does (-ish) -- patch multiple instructions, long
> > > reach.
> > >
> > > Arm64 uses CALL_OPS and patch one instruction BL.
> > >
> > > Now, with this background in mind, compared to what we have today,
> > > CALL_OPS would give us (again assuming we're using it for direct calls):
> > >
> > > * Better performance for tracer per-call (faster ops lookup) GOOD
> >
> > ^ this was the only motivation for me to implement this patch.
> >
> > I don't think implementing direct calls over call ops is fruitful for
> > RISC-V because once
> > the auipc/jalr can be patched atomically, the direct call trampoline
> > is always reachable.
> 
> Yes, the auipc/jalr instruction pair can be patched atomically just as
> long as their size is naturally aligned on. However, we cannot prevent
> 2 instructions from being fetched atomically :P
There are some micro-arch optimization methods here, such as:
 - Disable interrupt when auipc retired.
 - When auipc -> auipc, the second one still could cause an
   interruption.

> 
> > Solving the atomic text patching problem would be fun!! I am eager to
> > see how it will be
> > solved.
> 
> I have a patch series to solve the atomic code patching issue, which I
> am about to respin [1]. The idea is to solve it with yet another layer
> of indirection. We add a 8-B aligned space at each function entry. The
> space is a pointer to the ftrace entry. During boot, each function
> entry code is updated to perform a load and then take the jump from
> the 8-B space. When ftrace is disabled, we patch the first 4B-aligned
> instruction to a jump so as to skip the ftrace entry.
> 
> We are still discussing with Alex to see if we have a better way to do
> it. If not then I'd update the patchset and re-send it. There's a
> pending improvement in the series to reduce complexity. The 8-B
> aligned space can be added before the function entry (just like your
> patch).
> 
> >
> > > * Larger text size (function alignment + extra nops) BAD
> > > * Same direct call performance NEUTRAL
> > > * Same complicated text patching required NEUTRAL
> > >
> > > It would be interesting to see how the per-call performance would
> > > improve on x86 with CALL_OPS! ;-)
> >
> > If I remember from Steven's talk, x86 uses dynamically allocated trampolines
> > for per callsite tracers, would CALL_OPS provide better performance than that?
> >
> > >
> > > I'm trying to wrap my head if it makes sense to have it on RISC-V, given
> > > that we're a bit different from Arm64. Does the scale tip to the GOOD
> > > side?
> > >
> > > Oh, and we really need to see performance numbers on real HW! I have a
> > > VF2 that I could try this series on.
> >
> > It would be great if you can do it :D.
> >
> > Thanks,
> > Puranjay
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
> 
> - [1] https://yhbt.net/lore/all/CAJF2gTSn3_cDYsF9D8dt-br2Wf_M8y02A09xgRq8kXi91sN3Aw@mail.gmail.com/T/
> 
> Regards,
> Andy
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

  parent reply	other threads:[~2024-03-10  1:38 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-06 16:59 [RFC PATCH] riscv: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS Puranjay Mohan
2024-03-06 16:59 ` Puranjay Mohan
2024-03-06 20:35 ` Alexandre Ghiti
2024-03-06 20:35   ` Alexandre Ghiti
2024-03-06 20:38   ` Alexandre Ghiti
2024-03-06 20:38     ` Alexandre Ghiti
2024-03-07  0:17   ` Puranjay Mohan
2024-03-07  0:17     ` Puranjay Mohan
2024-03-08  8:48     ` Andy Chiu
2024-03-08  8:48       ` Andy Chiu
2024-03-11 13:56     ` [DMARC Error] " Evgenii Shatokhin
2024-03-11 13:56       ` Evgenii Shatokhin
2024-03-07 19:27 ` Björn Töpel
2024-03-07 19:27   ` Björn Töpel
2024-03-07 19:51   ` Puranjay Mohan
2024-03-07 19:51     ` Puranjay Mohan
2024-03-08  9:18     ` Andy Chiu
2024-03-08  9:18       ` Andy Chiu
2024-03-08 14:13       ` Puranjay Mohan
2024-03-08 14:13         ` Puranjay Mohan
2024-03-10  1:37       ` Guo Ren [this message]
2024-03-10  1:37         ` Guo Ren
2024-03-08 10:16     ` Björn Töpel
2024-03-08 10:16       ` Björn Töpel
2024-03-08 14:22       ` Puranjay Mohan
2024-03-08 14:22         ` Puranjay Mohan
2024-03-08 15:15         ` Björn Töpel
2024-03-08 15:15           ` Björn Töpel
2024-03-12 13:42   ` Mark Rutland
2024-03-12 13:42     ` Mark Rutland
2024-03-13 11:23     ` Björn Töpel
2024-03-13 11:23       ` Björn Töpel
2024-03-14 14:16       ` Puranjay Mohan
2024-03-14 14:16         ` Puranjay Mohan
2024-03-14 15:07         ` Björn Töpel
2024-03-14 15:07           ` Björn Töpel
2024-03-14 20:50           ` Björn Töpel
2024-03-14 20:50             ` Björn Töpel
2024-03-20 16:41             ` Andy Chiu
2024-03-20 16:41               ` Andy Chiu
2024-03-21  8:48               ` Björn Töpel
2024-03-21  8:48                 ` Björn Töpel
2024-03-21 17:39                 ` Andy Chiu
2024-03-21 17:39                   ` Andy Chiu
2024-03-21 18:10                   ` Björn Töpel
2024-03-21 18:10                     ` Björn Töpel
2024-03-25 12:50                   ` Robbin Ehn
2024-03-25 12:50                     ` Robbin Ehn
2024-03-20 18:03           ` Mark Rutland
2024-03-20 18:03             ` Mark Rutland
2024-03-21  8:58             ` Björn Töpel
2024-03-21  8:58               ` Björn Töpel
2024-03-21 14:49     ` Steven Rostedt
2024-03-21 14:49       ` Steven Rostedt
2024-03-21 20:02     ` Steven Rostedt
2024-03-21 20:02       ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Ze0O77ze56DGffzD@gmail.com \
    --to=guoren@kernel.org \
    --cc=andy.chiu@sifive.com \
    --cc=aou@eecs.berkeley.edu \
    --cc=bjorn@kernel.org \
    --cc=bjorn@rivosinc.com \
    --cc=cleger@rivosinc.com \
    --cc=debug@rivosinc.com \
    --cc=jeeheng.sia@starfivetech.com \
    --cc=jszhang@kernel.org \
    --cc=leyfoon.tan@starfivetech.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mhiramat@kernel.org \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=puranjay12@gmail.com \
    --cc=rostedt@goodmis.org \
    --cc=samitolvanen@google.com \
    --cc=suagrfillet@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.