All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jiri Olsa <olsajiri@gmail.com>
To: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Oleg Nesterov" <oleg@redhat.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	bpf@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-trace-kernel@vger.kernel.org, x86@kernel.org,
	"Song Liu" <songliubraving@fb.com>, "Yonghong Song" <yhs@fb.com>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"Hao Luo" <haoluo@google.com>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Alan Maguire" <alan.maguire@oracle.com>,
	"David Laight" <David.Laight@aculab.com>,
	"Thomas Weißschuh" <thomas@t-8ch.de>,
	"Ingo Molnar" <mingo@kernel.org>
Subject: Re: [PATCHv4 perf/core 08/22] uprobes/x86: Add mapping for optimized uprobe trampolines
Date: Thu, 10 Jul 2025 15:42:33 +0200	[thread overview]
Message-ID: <aG_DSXNFol_03h75@krava> (raw)
In-Reply-To: <20250710160233.8750ccd59b3b9d62e78491e5@kernel.org>

On Thu, Jul 10, 2025 at 04:02:33PM +0900, Masami Hiramatsu wrote:
> Hi Jiri,
> 
> On Tue,  8 Jul 2025 15:23:17 +0200
> Jiri Olsa <jolsa@kernel.org> wrote:
> 
> > Adding support to add special mapping for user space trampoline with
> > following functions:
> > 
> >   uprobe_trampoline_get - find or add uprobe_trampoline
> >   uprobe_trampoline_put - remove or destroy uprobe_trampoline
> > 
> > The user space trampoline is exported as arch specific user space special
> > mapping through tramp_mapping, which is initialized in following changes
> > with new uprobe syscall.
> > 
> > The uprobe trampoline needs to be callable/reachable from the probed address,
> > so while searching for available address we use is_reachable_by_call function
> > to decide if the uprobe trampoline is callable from the probe address.
> > 
> > All uprobe_trampoline objects are stored in uprobes_state object and are
> > cleaned up when the process mm_struct goes down. Adding new arch hooks
> > for that, because this change is x86_64 specific.
> > 
> > Locking is provided by callers in following changes.
> > 
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ---
> >  arch/x86/kernel/uprobes.c | 169 ++++++++++++++++++++++++++++++++++++++
> >  include/linux/uprobes.h   |   6 ++
> >  kernel/events/uprobes.c   |  10 +++
> >  kernel/fork.c             |   1 +
> >  4 files changed, 186 insertions(+)
> > 
> > diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> > index 77050e5a4680..6336bb961907 100644
> > --- a/arch/x86/kernel/uprobes.c
> > +++ b/arch/x86/kernel/uprobes.c
> > @@ -608,6 +608,175 @@ static void riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
> >  		*sr = utask->autask.saved_scratch_register;
> >  	}
> >  }
> > +
> > +static int tramp_mremap(const struct vm_special_mapping *sm, struct vm_area_struct *new_vma)
> > +{
> > +	return -EPERM;
> > +}
> > +
> > +static struct page *tramp_mapping_pages[2] __ro_after_init;
> > +
> > +static struct vm_special_mapping tramp_mapping = {
> > +	.name   = "[uprobes-trampoline]",
> > +	.mremap = tramp_mremap,
> > +	.pages  = tramp_mapping_pages,
> > +};
> > +
> > +struct uprobe_trampoline {
> > +	struct hlist_node	node;
> > +	unsigned long		vaddr;
> > +};
> > +
> > +static bool is_reachable_by_call(unsigned long vtramp, unsigned long vaddr)
> > +{
> > +	long delta = (long)(vaddr + 5 - vtramp);
> > +
> > +	return delta >= INT_MIN && delta <= INT_MAX;
> > +}
> > +
> > +#define __4GB		 (1UL << 32)
> > +#define MASK_4GB	~(__4GB - 1)
> > +#define PAGE_COUNT(addr) ((addr & ~MASK_4GB) >> PAGE_SHIFT)
> > +
> > +static unsigned long find_nearest_trampoline(unsigned long vaddr)
> > +{
> > +	struct vm_unmapped_area_info info = {
> > +		.length     = PAGE_SIZE,
> > +		.align_mask = ~PAGE_MASK,
> > +	};
> > +	unsigned long limit, low_limit = PAGE_SIZE, high_limit = TASK_SIZE;
> > +	unsigned long cross_4GB, low_4GB, high_4GB;
> > +	unsigned long low_tramp, high_tramp;
> > +	unsigned long call_end = vaddr + 5;
> > +
> > +	/*
> > +	 * The idea is to create a trampoline every 4GB, so we need to find free
> > +	 * page closest to the 4GB alignment. We find intersecting 4GB alignment
> > +	 * address and search up and down to find the closest free page.
> 
> It is not guaranteed to be able to find unmapped 4GB aligned page.
> I still think just finding the nearest area is better (simpler and
> good enough.)
> 
> 	if (check_add_overflow(call_end, INT_MIN, &low_limit))
> 		low_limit = PAGE_SIZE;
> 
> 	high_limit = call_end + INT_MAX;
> 
> 	/* Search up from intersecting 4GB alignment address. */
> 	info.low_limit = call_end;
> 	info.high_limit = min(high_limit, TASK_SIZE);
> 	high_tramp = vm_unmapped_area(&info);
> 
> 	/* Search down from intersecting 4GB alignment address. */
> 	info.low_limit = max(low_limit, PAGE_SIZE);
> 	info.high_limit = call_end;
> 	info.flags = VM_UNMAPPED_AREA_TOPDOWN;
> 	low_tramp = vm_unmapped_area(&info);
> 
> See below;
> 
> > +	 */
> > +
> > +	low_4GB = call_end & MASK_4GB;
> > +	high_4GB = low_4GB + __4GB;
> > +
> > +	/* Restrict limits to be within (PAGE_SIZE,TASK_SIZE) boundaries. */
> > +	if (!check_add_overflow(call_end, INT_MIN, &limit))
> > +		low_limit = limit;
> 
> if not overflow, low_limit = limit = call_end - 2GB.
> 
> * if call_end := 2GB + 4095, limit can be 4095 < PAGE_SIZE. 
>   at the same time, low_4G == 0.
> 
> Note that low_limit can be > low_4G or < low_4G.
> 
> > +	if (low_limit == PAGE_SIZE)
> > +		low_4GB = low_limit;
> 
> If overflow, low_4GB = PAGE_SIZE too.
> 
> In summary, 
> 
> (a) 0 < call_end < 2GB: (overflow)
>   low_limit := PAGE_SIZE
>   low_4GB := PAGE_SIZE
> 
> (b) 2GB <= call_end < 2GB + PAGE_SIZE:
>   low_limit := call_end - 2GB (>= 0, < PAGE_SIZE)
>   low_4GB := 0 (= call_end & MASK_4GB)
> 
> (c) call_end == 2GB + PAGE_SIZE:
>   low_limit := PAGE_SIZE
>   low_4GB := PAGE_SIZE
> 
> (d) 2GB + PAGE_SIZE <= call_end < 4GB:
>   low_limit := call_end - 2GB (> PAGE_SIZE)
>   low_4GB := 0
> 
> (e) 4GB <= call_end:
>   low_limit := call_end - 2GB (> 2GB)
>   low_4GB := call_end & MASK_4GB (> 4GB)
> 
> Maybe (b) and (d) cases are unexpected?
> 
> 
> > +
> > +	high_limit = call_end + INT_MAX;
> 
> This should not overflow, OK.
> 
> > +	if (high_limit > TASK_SIZE)
> > +		high_limit = high_4GB = TASK_SIZE;
> > +
> > +	/* Get 4GB alligned address that's within 2GB distance from call_end */
> > +	if (low_limit <= low_4GB)
> 
> This means call_end is within the [low_4GB, low_4GB + 2GB).
> Call this case as (A)
> 
> > +		cross_4GB = low_4GB;
> > +	else
> > +		cross_4GB = high_4GB;
> 
> And this case as (B).
> 
> > +
> > +	/* Search up from intersecting 4GB alignment address. */
> > +	info.low_limit = cross_4GB;
> > +	info.high_limit = high_limit;
> > +	high_tramp = vm_unmapped_area(&info);
> 
> This searches the unmapped pages from low_limit.
> In (A) case, this starts from low_4GB to high_limit.
> In (B) case, this starts from high_4GB to high_limit.
> 
> So basically you search the unmapped area around the 4GB
> aligned address instead of the nearest area of the vaddr.
> But it is not guarantee that can find unmapped area near
> the 4GB aligned address.

ok, as you said the current code does the same logic but from 4GB
aligned address, while you suggest nearest page from the caller

I can't think of any benefit one way or the other apart from that
your change is less code, I ended up with code below

thanks,
jirka


---
+static unsigned long find_nearest_trampoline(unsigned long vaddr)
+{
+	struct vm_unmapped_area_info info = {
+		.length     = PAGE_SIZE,
+		.align_mask = ~PAGE_MASK,
+	};
+	unsigned long low_limit, high_limit;
+	unsigned long low_tramp, high_tramp;
+	unsigned long call_end = vaddr + 5;
+
+	if (check_add_overflow(call_end, INT_MIN, &low_limit))
+		low_limit = PAGE_SIZE;
+
+	high_limit = call_end + INT_MAX;
+
+	/* Search up from the caller address. */
+	info.low_limit = call_end;
+	info.high_limit = min(high_limit, TASK_SIZE);
+	high_tramp = vm_unmapped_area(&info);
+
+	/* Search down from the caller address. */
+	info.low_limit = max(low_limit, PAGE_SIZE);
+	info.high_limit = call_end;
+	info.flags = VM_UNMAPPED_AREA_TOPDOWN;
+	low_tramp = vm_unmapped_area(&info);
+
+	if (IS_ERR_VALUE(high_tramp) && IS_ERR_VALUE(low_tramp))
+		return -ENOMEM;
+	if (IS_ERR_VALUE(high_tramp))
+		return low_tramp;
+	if (IS_ERR_VALUE(low_tramp))
+		return high_tramp;
+
+	/* Return address that's closest to the caller address. */
+	if (call_end - low_tramp < high_tramp - call_end)
+		return low_tramp;
+	return high_tramp;
+}

  reply	other threads:[~2025-07-10 13:42 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-08 13:23 [PATCHv4 perf/core 00/22] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
2025-07-08 13:23 ` [PATCHv4 perf/core 01/22] uprobes: Remove breakpoint in unapply_uprobe under mmap_write_lock Jiri Olsa
2025-07-08 13:23 ` [PATCHv4 perf/core 02/22] uprobes: Rename arch_uretprobe_trampoline function Jiri Olsa
2025-07-08 13:23 ` [PATCHv4 perf/core 03/22] uprobes: Make copy_from_page global Jiri Olsa
2025-07-08 13:23 ` [PATCHv4 perf/core 04/22] uprobes: Add uprobe_write function Jiri Olsa
2025-07-08 13:23 ` [PATCHv4 perf/core 05/22] uprobes: Add nbytes argument to uprobe_write Jiri Olsa
2025-07-08 13:23 ` [PATCHv4 perf/core 06/22] uprobes: Add is_register argument to uprobe_write and uprobe_write_opcode Jiri Olsa
2025-07-08 13:23 ` [PATCHv4 perf/core 07/22] uprobes: Add do_ref_ctr argument to uprobe_write function Jiri Olsa
2025-07-08 13:23 ` [PATCHv4 perf/core 08/22] uprobes/x86: Add mapping for optimized uprobe trampolines Jiri Olsa
2025-07-10  7:02   ` Masami Hiramatsu
2025-07-10 13:42     ` Jiri Olsa [this message]
2025-07-11  5:26       ` Masami Hiramatsu
2025-07-08 13:23 ` [PATCHv4 perf/core 09/22] uprobes/x86: Add uprobe syscall to speed up uprobe Jiri Olsa
2025-07-08 13:23 ` [PATCHv4 perf/core 10/22] uprobes/x86: Add support to optimize uprobes Jiri Olsa
2025-07-08 13:23 ` [PATCHv4 perf/core 11/22] selftests/bpf: Import usdt.h from libbpf/usdt project Jiri Olsa
2025-07-08 13:23 ` [PATCHv4 perf/core 12/22] selftests/bpf: Reorg the uprobe_syscall test function Jiri Olsa
2025-07-08 13:23 ` [PATCHv4 perf/core 13/22] selftests/bpf: Rename uprobe_syscall_executed prog to test_uretprobe_multi Jiri Olsa
2025-07-08 13:23 ` [PATCHv4 perf/core 14/22] selftests/bpf: Add uprobe/usdt syscall tests Jiri Olsa
2025-07-08 13:23 ` [PATCHv4 perf/core 15/22] selftests/bpf: Add hit/attach/detach race optimized uprobe test Jiri Olsa
2025-07-08 13:23 ` [PATCHv4 perf/core 16/22] selftests/bpf: Add uprobe syscall sigill signal test Jiri Olsa
2025-07-08 13:23 ` [PATCHv4 perf/core 17/22] selftests/bpf: Add optimized usdt variant for basic usdt test Jiri Olsa
2025-07-08 13:23 ` [PATCHv4 perf/core 18/22] selftests/bpf: Add uprobe_regs_equal test Jiri Olsa
2025-07-08 13:23 ` [PATCHv4 perf/core 19/22] selftests/bpf: Change test_uretprobe_regs_change for uprobe and uretprobe Jiri Olsa
2025-07-08 13:23 ` [PATCHv4 perf/core 20/22] seccomp: passthrough uprobe systemcall without filtering Jiri Olsa
2025-07-08 13:23 ` [PATCHv4 perf/core 21/22] selftests/seccomp: validate uprobe syscall passes through seccomp Jiri Olsa
2025-07-08 13:23 ` [PATCHv4 22/22] man2: Add uprobe syscall page Jiri Olsa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aG_DSXNFol_03h75@krava \
    --to=olsajiri@gmail.com \
    --cc=David.Laight@aculab.com \
    --cc=alan.maguire@oracle.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=songliubraving@fb.com \
    --cc=thomas@t-8ch.de \
    --cc=x86@kernel.org \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.