From: Jiri Olsa <olsajiri@gmail.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Andrii Nakryiko <andrii@kernel.org>,
bpf@vger.kernel.org, Song Liu <songliubraving@fb.com>,
Yonghong Song <yhs@fb.com>,
John Fastabend <john.fastabend@gmail.com>,
Hao Luo <haoluo@google.com>, Steven Rostedt <rostedt@goodmis.org>,
Masami Hiramatsu <mhiramat@kernel.org>,
Alan Maguire <alan.maguire@oracle.com>,
linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org
Subject: Re: [RFC perf/core 05/11] uprobes: Add mapping for optimized uprobe trampolines
Date: Sat, 16 Nov 2024 22:44:26 +0100 [thread overview]
Message-ID: <ZzkSOhQIMg_lzwiT@krava> (raw)
In-Reply-To: <CAEf4BzYycU7_8uNgi9XrnnPSAvP7iyWwNA7cHu0aLTcAUxsBFA@mail.gmail.com>
On Thu, Nov 14, 2024 at 03:44:14PM -0800, Andrii Nakryiko wrote:
> On Tue, Nov 5, 2024 at 5:35 AM Jiri Olsa <jolsa@kernel.org> wrote:
> >
> > Adding interface to add special mapping for user space page that will be
> > used as place holder for uprobe trampoline in following changes.
> >
> > The get_tramp_area(vaddr) function either finds 'callable' page or create
> > new one. The 'callable' means it's reachable by call instruction (from
> > vaddr argument) and is decided by each arch via new arch_uprobe_is_callable
> > function.
> >
> > The put_tramp_area function either drops refcount or destroys the special
> > mapping and all the maps are clean up when the process goes down.
> >
> > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > ---
> > include/linux/uprobes.h | 12 ++++
> > kernel/events/uprobes.c | 141 ++++++++++++++++++++++++++++++++++++++++
> > kernel/fork.c | 2 +
> > 3 files changed, 155 insertions(+)
> >
> > diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> > index be306028ed59..222d8e82cee2 100644
> > --- a/include/linux/uprobes.h
> > +++ b/include/linux/uprobes.h
> > @@ -172,6 +172,15 @@ struct xol_area;
> >
> > struct uprobes_state {
> > struct xol_area *xol_area;
> > + struct hlist_head tramp_head;
> > + struct mutex tramp_mutex;
> > +};
> > +
> > +struct tramp_area {
> > + unsigned long vaddr;
> > + struct page *page;
> > + struct hlist_node node;
> > + refcount_t ref;
>
> nit: any reason we are unnecessarily trying to save 4 bytes on
> refcount (and we don't actually, due to padding)
hum, I'm not sure what you mean.. what's the alternative?
>
> > };
> >
> > extern void __init uprobes_init(void);
> > @@ -219,6 +228,9 @@ extern int uprobe_verify_opcode(struct page *page, unsigned long vaddr, uprobe_o
> > extern int arch_uprobe_verify_opcode(struct page *page, unsigned long vaddr,
> > uprobe_opcode_t *new_opcode, void *data);
> > extern bool arch_uprobe_is_register(uprobe_opcode_t *insn, int len, void *data);
> > +struct tramp_area *get_tramp_area(unsigned long vaddr);
>
> uprobe_get_tramp_area() to make it clear this is uprobe specific,
> given this is exposed function?
>
> and add that extern like we do for other functions?
ok
>
> > +void put_tramp_area(struct tramp_area *area);
>
> uprobe_put_tramp_area() ?
ok
>
> > +bool arch_uprobe_is_callable(unsigned long vtramp, unsigned long vaddr);
> > #else /* !CONFIG_UPROBES */
> > struct uprobes_state {
> > };
> > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> > index 944d9df1f081..a44305c559a4 100644
> > --- a/kernel/events/uprobes.c
> > +++ b/kernel/events/uprobes.c
> > @@ -616,6 +616,145 @@ set_orig_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long v
> > (uprobe_opcode_t *)&auprobe->insn, UPROBE_SWBP_INSN_SIZE, NULL);
> > }
> >
> > +bool __weak arch_uprobe_is_callable(unsigned long vtramp, unsigned long vaddr)
> > +{
> > + return false;
> > +}
> > +
> > +static unsigned long find_nearest_page(unsigned long vaddr)
> > +{
> > + struct mm_struct *mm = current->mm;
> > + struct vm_area_struct *vma, *prev;
> > + VMA_ITERATOR(vmi, mm, 0);
> > +
> > + prev = vma_next(&vmi);
> > + vma = vma_next(&vmi);
> > + while (vma) {
> > + if (vma->vm_start - prev->vm_end >= PAGE_SIZE &&
> > + arch_uprobe_is_callable(prev->vm_end, vaddr))
> > + return prev->vm_end;
>
> shouldn't we try both `prev->vm_end` and `vma->vm_start - PAGE_SIZE`
> as two possible places
right, we should do that
SNIP
> > +static struct tramp_area *create_tramp_area(unsigned long vaddr)
> > +{
> > + struct mm_struct *mm = current->mm;
> > + struct vm_area_struct *vma;
> > + struct tramp_area *area;
> > +
> > + vaddr = find_nearest_page(vaddr);
> > + if (!vaddr)
> > + return NULL;
> > +
> > + area = kzalloc(sizeof(*area), GFP_KERNEL);
> > + if (unlikely(!area))
> > + return NULL;
> > +
> > + area->page = alloc_page(GFP_HIGHUSER);
> > + if (!area->page)
> > + goto free_area;
> > +
> > + refcount_set(&area->ref, 1);
> > + area->vaddr = vaddr;
> > +
> > + vma = _install_special_mapping(mm, area->vaddr, PAGE_SIZE,
> > + VM_READ|VM_EXEC|VM_MAYEXEC|VM_MAYREAD|VM_DONTCOPY|VM_IO,
> > + &tramp_mapping);
> > + if (!IS_ERR(vma))
> > + return area;
>
> please keep a pattern, it's less surprising that way
>
> if (IS_ERR(vma))
> goto free_page;
>
> return area;
>
> free_page:
ok
SNIP
> > +static void destroy_tramp_area(struct tramp_area *area)
> > +{
> > + hlist_del(&area->node);
> > + put_page(area->page);
> > + kfree(area);
> > +}
> > +
> > +void put_tramp_area(struct tramp_area *area)
> > +{
> > + struct mm_struct *mm = current->mm;
> > + struct uprobes_state *state = &mm->uprobes_state;
> > +
> > + if (area == NULL)
>
> nit: !area
ok
thanks,
jirka
next prev parent reply other threads:[~2024-11-16 21:44 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-05 13:33 [RFC 00/11] uprobes: Add support to optimize usdt probes on x86_64 Jiri Olsa
2024-11-05 13:33 ` [RFC perf/core 01/11] uprobes: Rename arch_uretprobe_trampoline function Jiri Olsa
2024-11-05 13:33 ` [RFC perf/core 02/11] uprobes: Make copy_from_page global Jiri Olsa
2024-11-14 23:40 ` Andrii Nakryiko
2024-11-16 21:41 ` Jiri Olsa
2024-11-05 13:33 ` [RFC perf/core 03/11] uprobes: Add len argument to uprobe_write_opcode Jiri Olsa
2024-11-14 23:41 ` Andrii Nakryiko
2024-11-16 21:41 ` Jiri Olsa
2024-11-05 13:33 ` [RFC perf/core 04/11] uprobes: Add data argument to uprobe_write_opcode function Jiri Olsa
2024-11-14 23:41 ` Andrii Nakryiko
2024-11-16 21:43 ` Jiri Olsa
2024-11-05 13:33 ` [RFC perf/core 05/11] uprobes: Add mapping for optimized uprobe trampolines Jiri Olsa
2024-11-05 14:23 ` Peter Zijlstra
2024-11-05 16:33 ` Jiri Olsa
2024-11-14 23:44 ` Andrii Nakryiko
2024-11-16 21:44 ` Jiri Olsa
2024-11-19 6:06 ` Andrii Nakryiko
2024-11-19 9:13 ` Peter Zijlstra
2024-11-19 15:15 ` Jiri Olsa
2024-11-21 0:07 ` Andrii Nakryiko
2024-11-21 11:53 ` Peter Zijlstra
2024-11-21 16:02 ` Alexei Starovoitov
2024-11-21 16:34 ` Peter Zijlstra
2024-11-21 16:47 ` Alexei Starovoitov
2024-11-21 19:38 ` Mark Rutland
2024-11-14 23:44 ` Andrii Nakryiko
2024-11-16 21:44 ` Jiri Olsa [this message]
2024-11-19 6:05 ` Andrii Nakryiko
2024-11-19 15:14 ` Jiri Olsa
2024-11-21 0:10 ` Andrii Nakryiko
2024-11-05 13:34 ` [RFC perf/core 06/11] uprobes: Add uprobe syscall to speed up uprobe Jiri Olsa
2024-11-05 13:34 ` [RFC perf/core 07/11] uprobes/x86: Add support to optimize uprobes Jiri Olsa
2024-11-14 23:44 ` Andrii Nakryiko
2024-11-16 21:44 ` Jiri Olsa
2024-11-18 8:18 ` Masami Hiramatsu
2024-11-18 9:39 ` Jiri Olsa
2024-11-05 13:34 ` [RFC bpf-next 08/11] selftests/bpf: Use 5-byte nop for x86 usdt probes Jiri Olsa
2024-11-05 13:34 ` [RFC bpf-next 09/11] selftests/bpf: Add usdt trigger bench Jiri Olsa
2024-11-14 23:40 ` Andrii Nakryiko
2024-11-16 21:45 ` Jiri Olsa
2024-11-19 6:08 ` Andrii Nakryiko
2024-11-05 13:34 ` [RFC bpf-next 10/11] selftests/bpf: Add uprobe/usdt optimized test Jiri Olsa
2024-11-05 13:34 ` [RFC bpf-next 11/11] selftests/bpf: Add hit/attach/detach race optimized uprobe test Jiri Olsa
2024-11-17 11:49 ` [RFC 00/11] uprobes: Add support to optimize usdt probes on x86_64 Peter Zijlstra
2024-11-18 9:29 ` Jiri Olsa
2024-11-18 10:06 ` Mark Rutland
2024-11-19 6:13 ` Andrii Nakryiko
2024-11-21 18:18 ` Mark Rutland
2024-11-26 19:13 ` Andrii Nakryiko
2024-11-18 8:04 ` Masami Hiramatsu
2024-11-18 9:52 ` Jiri Olsa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZzkSOhQIMg_lzwiT@krava \
--to=olsajiri@gmail.com \
--cc=alan.maguire@oracle.com \
--cc=andrii.nakryiko@gmail.com \
--cc=andrii@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mhiramat@kernel.org \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=songliubraving@fb.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.