From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 87185C54E5D
	for <linux-riscv@archiver.kernel.org>; Tue, 12 Mar 2024 13:42:50 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=lists.infradead.org; s=bombadil.20210309; h=Sender:
	Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post:
	List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:
	Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description:
	Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:
	List-Owner; bh=U3QAkgX1A19te2bVITTmNuWxCdf/t+212aJAfPRHwNw=; b=laXL1DQLSsQ2a8
	eXejtfErtyRilZk+cB4c9dZxdD1UhL0ssPDpeW8OGDdTdi3Hp6Q8D37ChU5Km8yE1AB2npuEA8UY6
	ETpVSA40y8DRedt2LNCnV73a4C4dzd9TcluiHsnywuQpCyt98zBiKU9w2732kqEfyBeen/bb12Ulb
	CxVy/R0GbUufsfzywpm+tcAjBqtTnyjfPJKoH2tSjr1U1cDNtLEw7dhKpwikTwjE7lQ1kANofBSfc
	9yoXrek9EhV0VLOs0mfccNc1OU0QXbbkQ0ixuqXYGpVEeL5kViV8LFQ8pmttlluq1HoHuRZu5MyD9
	Tsed31W0CZJmrcuV6AqQ==;
Received: from localhost ([::1] helo=bombadil.infradead.org)
	by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux))
	id 1rk2Ox-00000005xVq-0qQw;
	Tue, 12 Mar 2024 13:42:43 +0000
Received: from foss.arm.com ([217.140.110.172])
	by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux))
	id 1rk2Ou-00000005xTp-49oN
	for linux-riscv@lists.infradead.org;
	Tue, 12 Mar 2024 13:42:42 +0000
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 31EAE1007;
	Tue, 12 Mar 2024 06:43:15 -0700 (PDT)
Received: from FVFF77S0Q05N (unknown [10.57.69.97])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 8E9313F762;
	Tue, 12 Mar 2024 06:42:34 -0700 (PDT)
Date: Tue, 12 Mar 2024 13:42:28 +0000
From: Mark Rutland <mark.rutland@arm.com>
To: "Bj\"orn T\"opel" <bjorn@kernel.org>
Cc: Puranjay Mohan <puranjay12@gmail.com>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Albert Ou <aou@eecs.berkeley.edu>,
	Steven Rostedt <rostedt@goodmis.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Sami Tolvanen <samitolvanen@google.com>,
	Guo Ren <guoren@kernel.org>,
	Ley Foon Tan <leyfoon.tan@starfivetech.com>,
	Deepak Gupta <debug@rivosinc.com>,
	Sia Jee Heng <jeeheng.sia@starfivetech.com>,
	"Bj\"orn T\"opel" <bjorn@rivosinc.com>,
	Song Shuai <suagrfillet@gmail.com>,
	Cl'ement L'eger <cleger@rivosinc.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Jisheng Zhang <jszhang@kernel.org>, linux-riscv@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] riscv: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
Message-ID: <ZfBbxPDd0rz6FN2T@FVFF77S0Q05N>
References: <20240306165904.108141-1-puranjay12@gmail.com>
 <87ttlhdeqb.fsf@all.your.base.are.belong.to.us>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <87ttlhdeqb.fsf@all.your.base.are.belong.to.us>
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20240312_064241_168605_7ECD4278 
X-CRM114-Status: GOOD (  49.24  )
X-BeenThere: linux-riscv@lists.infradead.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: <linux-riscv.lists.infradead.org>
List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-riscv>,
 <mailto:linux-riscv-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-riscv/>
List-Post: <mailto:linux-riscv@lists.infradead.org>
List-Help: <mailto:linux-riscv-request@lists.infradead.org?subject=help>
List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-riscv>,
 <mailto:linux-riscv-request@lists.infradead.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-riscv" <linux-riscv-bounces@lists.infradead.org>
Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org

Hi Bjorn

(apologies, my corporate mail server has butchered your name here).

There's a big info dump below; I realise this sounds like a sales pitch for
CALL_OPS, but my intent is more to say "here are some dragons you may not have
spotted".

On Thu, Mar 07, 2024 at 08:27:40PM +0100, Bj"orn T"opel wrote:
> Puranjay!
> 
> Puranjay Mohan <puranjay12@gmail.com> writes:
> 
> > This patch enables support for DYNAMIC_FTRACE_WITH_CALL_OPS on RISC-V.
> > This allows each ftrace callsite to provide an ftrace_ops to the common
> > ftrace trampoline, allowing each callsite to invoke distinct tracer
> > functions without the need to fall back to list processing or to
> > allocate custom trampolines for each callsite. This significantly speeds
> > up cases where multiple distinct trace functions are used and callsites
> > are mostly traced by a single tracer.
> >
> > The idea and most of the implementation is taken from the ARM64's
> > implementation of the same feature. The idea is to place a pointer to
> > the ftrace_ops as a literal at a fixed offset from the function entry
> > point, which can be recovered by the common ftrace trampoline.
> 
> Not really a review, but some more background; Another rationale (on-top
> of the improved per-call performance!) for CALL_OPS was to use it to
> build ftrace direct call support (which BPF uses a lot!). Mark, please
> correct me if I'm lying here!

Yep; it gives us the ability to do a number of per-callsite things, including
direct calls.

> On Arm64, CALL_OPS makes it possible to implement direct calls, while
> only patching one BL instruction -- nice!

The key thing here isn't that we patch a single instruction (since we have ot
patch the ops pointer too!); it's that we can safely patch either of the ops
pointer or BL/NOP at any time while threads are concurrently executing.

If you have a multi-instruction sequence, then threads can be preempted
mid-sequence, and it's very painful/complex to handle all of the races that
entails.

For example, if your callsites use a sequence:

	AUIPC <tmp>, <funcptr>
	JALR <tmp2>, <funcptr>(<tmp>)

Using stop_machine() won't allow you to patch that safely as some threads
could be stuck mid-sequence, e.g.

	AUIPC <tmp>, <funcptr>
	[ preempted here ]
	JALR <tmp2>, <funcptr>(<tmp>)

... and you can't update the JALR to use a new funcptr immediate until those
have completed the sequence.

There are ways around that, but they're complicated and/or expensive, e.g.

* Use a sequence of multiple patches, starting with replacing the JALR with an
  exception-generating instruction with a fixup handler, which is sort-of what
  x86 does with UD2. This may require multiple passes with
  synchronize_rcu_tasks() to make sure all threads have seen the latest
  instructions, and that cannot be done under stop_machine(), so if you need
  stop_machine() for CMODx reasons, you may need to use that several times with
  intervening calls to synchronize_rcu_tasks().

* Have the patching logic manually go over each thread and fix up the pt_regs
  for the interrupted thread. This is pretty horrid since you could have nested
  exceptions and a task could have several pt_regs which might require
  updating.

The CALL_OPS approach is a bit easier to deal with as we can patch the
per-callsite pointer atomically, then we can (possibly) enable/disable the
callsite's branch, then wait for threads to drain once. 

As a heads-up, there are some latent/generic issues with DYNAMIC_FTRACE
generally in this area (CALL_OPs happens to side-step those, but trampoline
usage is currently affected):

  https://lore.kernel.org/lkml/Zenx_Q0UiwMbSAdP@FVFF77S0Q05N/

... I'm looking into fixing that at the moment, and it looks like that's likely
to require some per-architecture changes.

> On RISC-V we cannot use use the same ideas as Arm64 straight off,
> because the range of jal (compare to BL) is simply too short (+/-1M).
> So, on RISC-V we need to use a full auipc/jal pair (the text patching
> story is another chapter, but let's leave that aside for now). Since we
> have to patch multiple instructions, the cmodx situation doesn't really
> improve with CALL_OPS.

The branch range thing is annoying, but I think this boils down to the same
problem as arm64 has with needing a "MOV <tmp>, LR" instruction that we have to
patch in once at boot time. You could do the same and patch in the AUIPC once,
e.g. have

| 	NOP
| 	NOP 
| func:
| 	AUIPC <tmp>, <common_ftrace_caller>
| 	JALR <tmp2>, <common_ftrace_caller>(<tmp>) // patched with NOP

... which'd look very similar to arm64's sequence:

| 	NOP
| 	NOP
| func:
| 	MOV X9, LR
| 	BL ftrace_caller // patched with NOP

... which I think means it *might* be better from a cmodx perspective?

> Let's say that we continue building on your patch and implement direct
> calls on CALL_OPS for RISC-V as well.
> 
> From Florent's commit message for direct calls:
> 
>   |    There are a few cases to distinguish:
>   |    - If a direct call ops is the only one tracing a function:
>   |      - If the direct called trampoline is within the reach of a BL
>   |        instruction
>   |         -> the ftrace patchsite jumps to the trampoline
>   |      - Else
>   |         -> the ftrace patchsite jumps to the ftrace_caller trampoline which
>   |            reads the ops pointer in the patchsite and jumps to the direct
>   |            call address stored in the ops
>   |    - Else
>   |      -> the ftrace patchsite jumps to the ftrace_caller trampoline and its
>   |         ops literal points to ftrace_list_ops so it iterates over all
>   |         registered ftrace ops, including the direct call ops and calls its
>   |         call_direct_funcs handler which stores the direct called
>   |         trampoline's address in the ftrace_regs and the ftrace_caller
>   |         trampoline will return to that address instead of returning to the
>   |         traced function
> 
> On RISC-V, where auipc/jalr is used, the direct called trampoline would
> always be reachable, and then first Else-clause would never be entered.
> This means the the performance for direct calls would be the same as the
> one we have today (i.e. no regression!).
> 
> RISC-V does like x86 does (-ish) -- patch multiple instructions, long
> reach.
> 
> Arm64 uses CALL_OPS and patch one instruction BL.
> 
> Now, with this background in mind, compared to what we have today,
> CALL_OPS would give us (again assuming we're using it for direct calls):
> 
> * Better performance for tracer per-call (faster ops lookup) GOOD
> * Larger text size (function alignment + extra nops) BAD
> * Same direct call performance NEUTRAL
> * Same complicated text patching required NEUTRAL

Is your current sequence safe for preemptible kernels (i.e. with PREEMPT_FULL=y
or PREEMPT_DYNAMIC=y + "preempt=full" on the kernel cmdline) ?

Looking at v6.8, IIUC you have:

	// unpatched		//patched
	NOP			AUIPC
	NOP			JALR

What prevents a thread being preempted mid-sequence such that it executes:

	NOP
	[ preempted ]
	[ stop_machine() used to patch text ]
	[ restored ]
	JALR

... ?

... or when changing the call:

	AUIPC		// old funcptr
	[ preempted ]
	[ stop_machine() used to patch text ]
	[ restored ]
	JALR		// new funcptr
... ?

I suspect those might both be broken, but it's difficult to hit the race and so
testing hasn't revealed that so far.

> It would be interesting to see how the per-call performance would
> improve on x86 with CALL_OPS! ;-)

Heh. ;)

> I'm trying to wrap my head if it makes sense to have it on RISC-V, given
> that we're a bit different from Arm64. Does the scale tip to the GOOD
> side?
>
> Oh, and we really need to see performance numbers on real HW! I have a
> VF2 that I could try this series on.

I'll have to leave those two to you. I suspect the preempt/stop_machine()
might just tip that from NEUTRAL to GOOD.

Mark.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv