linux-trace-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tao Chiu <andybnac@gmail.com>
To: "Björn Töpel" <bjorn@kernel.org>
Cc: alexghiti@rivosinc.com, andy.chiu@sifive.com,
	aou@eecs.berkeley.edu,  justinstitt@google.com,
	linux-kernel@vger.kernel.org,  linux-riscv@lists.infradead.org,
	linux-trace-kernel@vger.kernel.org,  llvm@lists.linux.dev,
	mark.rutland@arm.com, mhiramat@kernel.org,  morbo@google.com,
	nathan@kernel.org, ndesaulniers@google.com,  palmer@dabbelt.com,
	palmer@rivosinc.com, paul.walmsley@sifive.com,
	 puranjay@kernel.org, rostedt@goodmis.org, zong.li@sifive.com,
	 yongxuan.wang@sifive.com
Subject: Re: [PATCH v2 3/6] riscv: ftrace: prepare ftrace for atomic code patching
Date: Wed, 11 Sep 2024 23:03:48 +0800	[thread overview]
Message-ID: <CAFTtA3PYsdSio0EqkUKQpyLiMuY9iynOdWQAkebAPjL6THEaMQ@mail.gmail.com> (raw)
In-Reply-To: <87v7z2dzo6.fsf@all.your.base.are.belong.to.us>

Björn Töpel <bjorn@kernel.org> 於 2024年9月11日 週三 下午10:37寫道:

>
> Andy Chiu <andybnac@gmail.com> writes:
>
> > On Wed, Aug 14, 2024 at 02:57:52PM +0200, Björn Töpel wrote:
> >> Björn Töpel <bjorn@kernel.org> writes:
> >>
> >> > Andy Chiu <andy.chiu@sifive.com> writes:
> >> >
> >> >> We use an AUIPC+JALR pair to jump into a ftrace trampoline. Since
> >> >> instruction fetch can break down to 4 byte at a time, it is impossible
> >> >> to update two instructions without a race. In order to mitigate it, we
> >> >> initialize the patchable entry to AUIPC + NOP4. Then, the run-time code
> >> >> patching can change NOP4 to JALR to eable/disable ftrcae from a
> >> >                                       enable        ftrace
> >> >
> >> >> function. This limits the reach of each ftrace entry to +-2KB displacing
> >> >> from ftrace_caller.
> >> >>
> >> >> Starting from the trampoline, we add a level of indirection for it to
> >> >> reach ftrace caller target. Now, it loads the target address from a
> >> >> memory location, then perform the jump. This enable the kernel to update
> >> >> the target atomically.
> >> >
> >> > The +-2K limit is for direct calls, right?
> >> >
> >> > ...and this I would say breaks DIRECT_CALLS (which should be implemented
> >> > using call_ops later)?
> >>
> >> Thinking a bit more, and re-reading the series.
> >>
> >> This series is good work, and it's a big improvement for DYNAMIC_FTRACE,
> >> but
> >>
> >> +int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr)
> >> +{
> >> +    unsigned long distance, orig_addr;
> >> +
> >> +    orig_addr = (unsigned long)&ftrace_caller;
> >> +    distance = addr > orig_addr ? addr - orig_addr : orig_addr - addr;
> >> +    if (distance > JALR_RANGE)
> >> +            return -EINVAL;
> >> +
> >> +    return __ftrace_modify_call(rec->ip, addr, false);
> >> +}
> >> +
> >>
> >> breaks WITH_DIRECT_CALLS. The direct trampoline will *never* be within
> >> the JALR_RANGE.
> >
> >
> > Yes, it is hardly possible that a direct trampoline will be within the
> > range.
> >
> > Recently I have been thinking some solutions to address the issue. One
> > solution is replaying AUIPC at function entries. The idea has two sides.
> > First, if we are returning back to the second instruction at trap return,
> > then do sepc -= 4 so it executes the up-to-date AUIPC. The other side is
> > to fire synchronous IPI that does remote fence.i at right timings to
> > prevent concurrent executing on a mix of old and new instructions.
> >
> > Consider replacing instructions at a function's patchable entry with the
> > following sequence:
> >
> > Initial state:
> > --------------
> > 0: AUIPC
> > 4: JALR
> >
> > Step1:
> > write(0, "J +8")
> > fence w,w
> > send sync local+remote fence.i
> > ------------------------
> > 0: J +8
> > 4: JALR
> >
> > Step2:
> > write(4, "JALR'")
> > fence w,w
> > send sync local+remote fence.i
> > ------------------------
> > 0: J +8
> > 4: JALR'
> >
> > Step3:
> > write(0, "AUIPC'")
> > fence w,w
> > send sync local+remote fence.i (to activate the call)
> > -----------------------
> > 0: AUIPC'
> > 4: JALR'
> >
> > The following execution sequences are acceptable:
> > - AUIPC, JALR
> > - J +8, (skipping {JALR | JALR'})
> > - AUIPC', JALR'
> >
> > And here are sequences that we want to prevent:
> > - AUIPC', JALR
> > - AUIPC, JALR'
> >
> > The local core should never execute the forbidden sequence.
> >
> > By listing all possible combinations of executing sequence on a remote
> > core, we can find that the dangerous seqence is impossible to happen:
> >
> > let f be the fence.i at step 1, 2, 3. And let numbers be the location of
> > code being executed. Mathematically, here are all combinations at a site
> > happening on a remote core:
> >
> > fff04 -- updated seq
> > ff0f4 -- impossible, would be ff0f04, updated seq
> > ff04f -- impossible, would be ff08f, safe seq
> > f0ff4 -- impossible, would be f0ff04, updated seq
> > f0f4f -- impossible, would be f0f08f (safe), or f0f0f04 (updated)
> > f04ff -- impossible, would be f08ff, safe seq
> > 0fff4 -- impossible, would be 0fff04, updated seq
> > 0ff4f -- impossible, would be 0ff08f (safe), or 0ff0f04 (updated)
> > 0f4ff -- impossible, would be 0f08ff (safe), 0f0f08f (safe), 0f0f0f04 (updated)
> > 04fff -- old seq
> >
> > After the 1st 'fence.i', remote cores should observe (J +8, JALR) or (J +8, JALR')
> > After the 2nd 'fence.i', remote cores should observe (J +8, JALR') or (AUIPC', JALR')
> > After the 3rd 'fence.i', remote cores should observe (AUIPC', JALR')
> >
> > Remote cores should never execute (AUIPC',JALR) or (AUIPC,JALR')
> >
> > To correctly implement the solution, the trap return code must match JALR
> > and adjust sepc only for patchable function entries. This is undocumently
> > possible because we use t0 as source and destination registers for JALR
> > at function entries. Compiler never generates JALR that uses the same
> > register pattern.
> >
> > Another solution is inspired by zcmt, and perhaps we can optimize it if
> > the hardware does support zcmt. First, we allocate a page and divide it
> > into two halves. The first half of the page are 255 x 8B destination
> > addresses. Then, starting from offset 2056, the second half of the page
> > is composed by a series of 2 x 4 Byte instructions:
> >
> > 0:    ftrace_tramp_1
> > 8:    ftrace_tramp_2
> > ...
> > 2040: ftrace_tramp_255
> > 2048: ftrace_tramp_256 (not used when configured with 255 tramps)
> > 2056:
> > ld    t1, -2048(t1)
> > jr    t1
> > ld    t1, -2048(t1)
> > jr    t1
> > ...
> > 4088:
> > ld    t1, -2048(t1)
> > jr    t1
> > 4096:
> >
> > It is possible to expand to 511 trampolines by adding a page
> > below, and making a load+jr sequence from +2040 offset.
> >
> > When the kernel boots, we direct AUIPCs at patchable entries to the page,
> > and disable the call by setting the second instruction to NOP4. Then, we
> > can effectively enable/disable/modify a call by setting only the
> > instruction at JALR. It is possible to utilize most of the current patch
> > set to achieve atomic patching. A missing part is to allocate and manage
> > trampolines for ftrace users.
>
> (I will need to digest above in detail!)
>
> I don't think it's a good idea to try to handle direct calls w/o
> call_ops. What I was trying to say is "add the call_ops patch to your
> series, so that direct calls aren't broken". If direct calls depend on
> call_ops -- sure, no worries. But don't try to get direct calls W/O
> call_ops. That's a whole new bag of worms.
>
> Some more high-level thoughts: ...all this to workaround where we don't
> want the call_ops overhead? Is there really a use-case with a platform
> that doesn't handle the text overhead of call_ops?

Sorry for making any confusions. I have no strong personal preference
on what we should do. Just want to have a technical discussion on what
is possible if we want to optimize code size.

>
> Maybe I'm missing context here... but I'd say, let's follow what arm64
> did (but obviously w/o the BL direct call optimization, and always jump
> to a trampoline -- since that's not possible with RISC-V branch length),
> and just do the call_ops way.
>
> Then, as a second step, and if there are platforms that care, think
> about a variant w/o call_ops.
>
> Or what I wrote in the first section:
>
> 1. Keep this patch set
> 2. ...but add call_ops to it, and require call_ops for direct calls.
>
> Just my $.02.
>
>
> Björn

  reply	other threads:[~2024-09-11 15:04 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-28 11:47 [PATCH v2 0/6] riscv: ftrace: atmoic patching and preempt improvements Andy Chiu
2024-06-28 11:47 ` [PATCH v2 1/6] riscv: ftrace: support fastcc in Clang for WITH_ARGS Andy Chiu
2024-08-13 11:09   ` Björn Töpel
2024-08-22  8:39     ` Andy Chiu
2024-06-28 11:47 ` [PATCH v2 2/6] riscv: ftrace: align patchable functions to 4 Byte boundary Andy Chiu
2024-08-13 11:11   ` Björn Töpel
2024-06-28 11:47 ` [PATCH v2 3/6] riscv: ftrace: prepare ftrace for atomic code patching Andy Chiu
2024-08-13 12:59   ` Björn Töpel
2024-08-14 12:57     ` Björn Töpel
2024-09-11 10:57       ` Andy Chiu
2024-09-11 14:37         ` Björn Töpel
2024-09-11 15:03           ` Tao Chiu [this message]
2024-09-11 17:16             ` Björn Töpel
2024-06-28 11:47 ` [PATCH v2 4/6] riscv: ftrace: do not use stop_machine to update code Andy Chiu
2024-06-28 11:47 ` [PATCH v2 5/6] riscv: vector: Support calling schedule() for preemptible Vector Andy Chiu
2024-06-28 11:47 ` [PATCH v2 6/6] riscv: ftrace: support PREEMPT Andy Chiu
2024-08-13 11:00 ` [PATCH v2 0/6] riscv: ftrace: atmoic patching and preempt improvements Björn Töpel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFTtA3PYsdSio0EqkUKQpyLiMuY9iynOdWQAkebAPjL6THEaMQ@mail.gmail.com \
    --to=andybnac@gmail.com \
    --cc=alexghiti@rivosinc.com \
    --cc=andy.chiu@sifive.com \
    --cc=aou@eecs.berkeley.edu \
    --cc=bjorn@kernel.org \
    --cc=justinstitt@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=mark.rutland@arm.com \
    --cc=mhiramat@kernel.org \
    --cc=morbo@google.com \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=palmer@dabbelt.com \
    --cc=palmer@rivosinc.com \
    --cc=paul.walmsley@sifive.com \
    --cc=puranjay@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=yongxuan.wang@sifive.com \
    --cc=zong.li@sifive.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).