* [PATCH v6 bpf-next 00/10] Harden and extend ELF build ID parsing logic
@ 2024-08-14 18:54 Andrii Nakryiko
2024-08-14 18:54 ` [PATCH v6 bpf-next 01/10] lib/buildid: harden " Andrii Nakryiko
` (10 more replies)
0 siblings, 11 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2024-08-14 18:54 UTC (permalink / raw)
To: bpf
Cc: linux-mm, akpm, adobriyan, shakeel.butt, hannes, ak, osandov,
song, jannh, linux-fsdevel, willy, Andrii Nakryiko
The goal of this patch set is to extend existing ELF build ID parsing logic,
currently mostly used by BPF subsystem, with support for working in sleepable
mode in which memory faults are allowed and can be relied upon to fetch
relevant parts of ELF file to find and fetch .note.gnu.build-id information.
This is useful and important for BPF subsystem itself, but also for
PROCMAP_QUERY ioctl(), built atop of /proc/<pid>/maps functionality (see [0]),
which makes use of the same build_id_parse() functionality. PROCMAP_QUERY is
always called from sleepable user process context, so it doesn't have to
suffer from current restrictions of build_id_parse() which are due to the NMI
context assumption.
Along the way, we harden the logic to avoid TOCTOU, overflow, out-of-bounds
access problems. This is the very first patch, which can be backported to
older releases, if necessary.
We also lift existing limitations of only working as long as ELF program
headers and build ID note section is contained strictly within the very first
page of ELF file.
We achieve all of the above without duplication of logic between sleepable and
non-sleepable modes through freader abstraction that manages underlying folio
from page cache (on demand) and gives a simple to use direct memory access
interface. With that, single page restrictions and adding sleepable mode
support is rather straightforward.
We also extend existing set of BPF selftests with a few tests targeting build
ID logic across sleepable and non-sleepabe contexts (we utilize sleepable and
non-sleepable uprobes for that).
[0] https://lore.kernel.org/linux-mm/20240627170900.1672542-4-andrii@kernel.org/
v5->v6:
- use local phnum variable in get_build_id_32() (Jann);
- switch memcmp() instead of strcmp() in parse_build_id() (Jann);
v4->v5:
- pass proper file reference to read_cache_folio() (Shakeel);
- fix another potential overflow due to two u32 additions (Andi);
- add PageUptodate() check to patch #1 (Jann);
v3->v4:
- fix few more potential overflow and out-of-bounds access issues (Andi);
- use purely folio-based implementation for freader (Matthew);
v2->v3:
- remove unneeded READ_ONCE()s and force phoff to u64 for 32-bit mode (Andi);
- moved hardening fixes to the front for easier backporting (Jann);
- call freader_cleanup() from build_id_parse_buf() for consistency (Jiri);
v1->v2:
- ensure MADV_PAGEOUT works reliably by paging data in first (Shakeel);
- to fix BPF CI build optionally define MADV_POPULATE_READ in selftest.
Andrii Nakryiko (10):
lib/buildid: harden build ID parsing logic
lib/buildid: add single folio-based file reader abstraction
lib/buildid: take into account e_phoff when fetching program headers
lib/buildid: remove single-page limit for PHDR search
lib/buildid: rename build_id_parse() into build_id_parse_nofault()
lib/buildid: implement sleepable build_id_parse() API
lib/buildid: don't limit .note.gnu.build-id to the first page in ELF
bpf: decouple stack_map_get_build_id_offset() from
perf_callchain_entry
bpf: wire up sleepable bpf_get_stack() and bpf_get_task_stack()
helpers
selftests/bpf: add build ID tests
include/linux/bpf.h | 2 +
include/linux/buildid.h | 4 +-
kernel/bpf/stackmap.c | 131 ++++--
kernel/events/core.c | 2 +-
kernel/trace/bpf_trace.c | 5 +-
lib/buildid.c | 395 +++++++++++++-----
tools/testing/selftests/bpf/Makefile | 5 +-
.../selftests/bpf/prog_tests/build_id.c | 118 ++++++
.../selftests/bpf/progs/test_build_id.c | 31 ++
tools/testing/selftests/bpf/uprobe_multi.c | 41 ++
tools/testing/selftests/bpf/uprobe_multi.ld | 11 +
11 files changed, 603 insertions(+), 142 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/build_id.c
create mode 100644 tools/testing/selftests/bpf/progs/test_build_id.c
create mode 100644 tools/testing/selftests/bpf/uprobe_multi.ld
--
2.43.5
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v6 bpf-next 01/10] lib/buildid: harden build ID parsing logic
2024-08-14 18:54 [PATCH v6 bpf-next 00/10] Harden and extend ELF build ID parsing logic Andrii Nakryiko
@ 2024-08-14 18:54 ` Andrii Nakryiko
2024-08-14 18:54 ` [PATCH v6 bpf-next 02/10] lib/buildid: add single folio-based file reader abstraction Andrii Nakryiko
` (9 subsequent siblings)
10 siblings, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2024-08-14 18:54 UTC (permalink / raw)
To: bpf
Cc: linux-mm, akpm, adobriyan, shakeel.butt, hannes, ak, osandov,
song, jannh, linux-fsdevel, willy, Andrii Nakryiko, stable
Harden build ID parsing logic, adding explicit READ_ONCE() where it's
important to have a consistent value read and validated just once.
Also, as pointed out by Andi Kleen, we need to make sure that entire ELF
note is within a page bounds, so move the overflow check up and add an
extra note_size boundaries validation.
Fixes tag below points to the code that moved this code into
lib/buildid.c, and then subsequently was used in perf subsystem, making
this code exposed to perf_event_open() users in v5.12+.
Cc: stable@vger.kernel.org
Cc: Jann Horn <jannh@google.com>
Reviewed-by: Jann Horn <jannh@google.com>
Suggested-by: Andi Kleen <ak@linux.intel.com>
Fixes: bd7525dacd7e ("bpf: Move stack_map_get_build_id into lib")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
lib/buildid.c | 76 +++++++++++++++++++++++++++++----------------------
1 file changed, 44 insertions(+), 32 deletions(-)
diff --git a/lib/buildid.c b/lib/buildid.c
index e02b5507418b..26007cc99a38 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -18,31 +18,37 @@ static int parse_build_id_buf(unsigned char *build_id,
const void *note_start,
Elf32_Word note_size)
{
- Elf32_Word note_offs = 0, new_offs;
-
- while (note_offs + sizeof(Elf32_Nhdr) < note_size) {
- Elf32_Nhdr *nhdr = (Elf32_Nhdr *)(note_start + note_offs);
+ const char note_name[] = "GNU";
+ const size_t note_name_sz = sizeof(note_name);
+ u64 note_off = 0, new_off, name_sz, desc_sz;
+ const char *data;
+
+ while (note_off + sizeof(Elf32_Nhdr) < note_size &&
+ note_off + sizeof(Elf32_Nhdr) > note_off /* overflow */) {
+ Elf32_Nhdr *nhdr = (Elf32_Nhdr *)(note_start + note_off);
+
+ name_sz = READ_ONCE(nhdr->n_namesz);
+ desc_sz = READ_ONCE(nhdr->n_descsz);
+
+ new_off = note_off + sizeof(Elf32_Nhdr);
+ if (check_add_overflow(new_off, ALIGN(name_sz, 4), &new_off) ||
+ check_add_overflow(new_off, ALIGN(desc_sz, 4), &new_off) ||
+ new_off > note_size)
+ break;
if (nhdr->n_type == BUILD_ID &&
- nhdr->n_namesz == sizeof("GNU") &&
- !strcmp((char *)(nhdr + 1), "GNU") &&
- nhdr->n_descsz > 0 &&
- nhdr->n_descsz <= BUILD_ID_SIZE_MAX) {
- memcpy(build_id,
- note_start + note_offs +
- ALIGN(sizeof("GNU"), 4) + sizeof(Elf32_Nhdr),
- nhdr->n_descsz);
- memset(build_id + nhdr->n_descsz, 0,
- BUILD_ID_SIZE_MAX - nhdr->n_descsz);
+ name_sz == note_name_sz &&
+ memcmp(nhdr + 1, note_name, note_name_sz) == 0 &&
+ desc_sz > 0 && desc_sz <= BUILD_ID_SIZE_MAX) {
+ data = note_start + note_off + ALIGN(note_name_sz, 4);
+ memcpy(build_id, data, desc_sz);
+ memset(build_id + desc_sz, 0, BUILD_ID_SIZE_MAX - desc_sz);
if (size)
- *size = nhdr->n_descsz;
+ *size = desc_sz;
return 0;
}
- new_offs = note_offs + sizeof(Elf32_Nhdr) +
- ALIGN(nhdr->n_namesz, 4) + ALIGN(nhdr->n_descsz, 4);
- if (new_offs <= note_offs) /* overflow */
- break;
- note_offs = new_offs;
+
+ note_off = new_off;
}
return -EINVAL;
@@ -71,7 +77,7 @@ static int get_build_id_32(const void *page_addr, unsigned char *build_id,
{
Elf32_Ehdr *ehdr = (Elf32_Ehdr *)page_addr;
Elf32_Phdr *phdr;
- int i;
+ __u32 i, phnum;
/*
* FIXME
@@ -80,18 +86,19 @@ static int get_build_id_32(const void *page_addr, unsigned char *build_id,
*/
if (ehdr->e_phoff != sizeof(Elf32_Ehdr))
return -EINVAL;
+
+ phnum = READ_ONCE(ehdr->e_phnum);
/* only supports phdr that fits in one page */
- if (ehdr->e_phnum >
- (PAGE_SIZE - sizeof(Elf32_Ehdr)) / sizeof(Elf32_Phdr))
+ if (phnum > (PAGE_SIZE - sizeof(Elf32_Ehdr)) / sizeof(Elf32_Phdr))
return -EINVAL;
phdr = (Elf32_Phdr *)(page_addr + sizeof(Elf32_Ehdr));
- for (i = 0; i < ehdr->e_phnum; ++i) {
+ for (i = 0; i < phnum; ++i) {
if (phdr[i].p_type == PT_NOTE &&
!parse_build_id(page_addr, build_id, size,
- page_addr + phdr[i].p_offset,
- phdr[i].p_filesz))
+ page_addr + READ_ONCE(phdr[i].p_offset),
+ READ_ONCE(phdr[i].p_filesz)))
return 0;
}
return -EINVAL;
@@ -103,7 +110,7 @@ static int get_build_id_64(const void *page_addr, unsigned char *build_id,
{
Elf64_Ehdr *ehdr = (Elf64_Ehdr *)page_addr;
Elf64_Phdr *phdr;
- int i;
+ __u32 i, phnum;
/*
* FIXME
@@ -112,18 +119,19 @@ static int get_build_id_64(const void *page_addr, unsigned char *build_id,
*/
if (ehdr->e_phoff != sizeof(Elf64_Ehdr))
return -EINVAL;
+
+ phnum = READ_ONCE(ehdr->e_phnum);
/* only supports phdr that fits in one page */
- if (ehdr->e_phnum >
- (PAGE_SIZE - sizeof(Elf64_Ehdr)) / sizeof(Elf64_Phdr))
+ if (phnum > (PAGE_SIZE - sizeof(Elf64_Ehdr)) / sizeof(Elf64_Phdr))
return -EINVAL;
phdr = (Elf64_Phdr *)(page_addr + sizeof(Elf64_Ehdr));
- for (i = 0; i < ehdr->e_phnum; ++i) {
+ for (i = 0; i < phnum; ++i) {
if (phdr[i].p_type == PT_NOTE &&
!parse_build_id(page_addr, build_id, size,
- page_addr + phdr[i].p_offset,
- phdr[i].p_filesz))
+ page_addr + READ_ONCE(phdr[i].p_offset),
+ READ_ONCE(phdr[i].p_filesz)))
return 0;
}
return -EINVAL;
@@ -152,6 +160,10 @@ int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
page = find_get_page(vma->vm_file->f_mapping, 0);
if (!page)
return -EFAULT; /* page not mapped */
+ if (!PageUptodate(page)) {
+ put_page(page);
+ return -EFAULT;
+ }
ret = -EINVAL;
page_addr = kmap_local_page(page);
--
2.43.5
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v6 bpf-next 02/10] lib/buildid: add single folio-based file reader abstraction
2024-08-14 18:54 [PATCH v6 bpf-next 00/10] Harden and extend ELF build ID parsing logic Andrii Nakryiko
2024-08-14 18:54 ` [PATCH v6 bpf-next 01/10] lib/buildid: harden " Andrii Nakryiko
@ 2024-08-14 18:54 ` Andrii Nakryiko
2024-08-14 18:54 ` [PATCH v6 bpf-next 03/10] lib/buildid: take into account e_phoff when fetching program headers Andrii Nakryiko
` (8 subsequent siblings)
10 siblings, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2024-08-14 18:54 UTC (permalink / raw)
To: bpf
Cc: linux-mm, akpm, adobriyan, shakeel.butt, hannes, ak, osandov,
song, jannh, linux-fsdevel, willy, Andrii Nakryiko
Add freader abstraction that transparently manages fetching and local
mapping of the underlying file page(s) and provides a simple direct data
access interface.
freader_fetch() is the only and single interface necessary. It accepts
file offset and desired number of bytes that should be accessed, and
will return a kernel mapped pointer that caller can use to dereference
data up to requested size. Requested size can't be bigger than the size
of the extra buffer provided during initialization (because, worst case,
all requested data has to be copied into it, so it's better to flag
wrongly sized buffer unconditionally, regardless if requested data range
is crossing page boundaries or not).
If folio is not paged in, or some of the conditions are not satisfied,
NULL is returned and more detailed error code can be accessed through
freader->err field. This approach makes the usage of freader_fetch()
cleaner.
To accommodate accessing file data that crosses folio boundaries, user
has to provide an extra buffer that will be used to make a local copy,
if necessary. This is done to maintain a simple linear pointer data
access interface.
We switch existing build ID parsing logic to it, without changing or
lifting any of the existing constraints, yet. This will be done
separately.
Given existing code was written with the assumption that it's always
working with a single (first) page of the underlying ELF file, logic
passes direct pointers around, which doesn't really work well with
freader approach and would be limiting when removing the single page (folio)
limitation. So we adjust all the logic to work in terms of file offsets.
There is also a memory buffer-based version (freader_init_from_mem())
for cases when desired data is already available in kernel memory. This
is used for parsing vmlinux's own build ID note. In this mode assumption
is that provided data starts at "file offset" zero, which works great
when parsing ELF notes sections, as all the parsing logic is relative to
note section's start.
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
lib/buildid.c | 263 ++++++++++++++++++++++++++++++++++++++++----------
1 file changed, 210 insertions(+), 53 deletions(-)
diff --git a/lib/buildid.c b/lib/buildid.c
index 26007cc99a38..bfe00b66b1e8 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -8,24 +8,155 @@
#define BUILD_ID 3
+struct freader {
+ void *buf;
+ u32 buf_sz;
+ int err;
+ union {
+ struct {
+ struct address_space *mapping;
+ struct folio *folio;
+ void *addr;
+ loff_t folio_off;
+ };
+ struct {
+ const char *data;
+ u64 data_sz;
+ };
+ };
+};
+
+static void freader_init_from_file(struct freader *r, void *buf, u32 buf_sz,
+ struct address_space *mapping)
+{
+ memset(r, 0, sizeof(*r));
+ r->buf = buf;
+ r->buf_sz = buf_sz;
+ r->mapping = mapping;
+}
+
+static void freader_init_from_mem(struct freader *r, const char *data, u64 data_sz)
+{
+ memset(r, 0, sizeof(*r));
+ r->data = data;
+ r->data_sz = data_sz;
+}
+
+static void freader_put_folio(struct freader *r)
+{
+ if (!r->folio)
+ return;
+ kunmap_local(r->addr);
+ folio_put(r->folio);
+ r->folio = NULL;
+}
+
+static int freader_get_folio(struct freader *r, loff_t file_off)
+{
+ /* check if we can just reuse current folio */
+ if (r->folio && file_off >= r->folio_off &&
+ file_off < r->folio_off + folio_size(r->folio))
+ return 0;
+
+ freader_put_folio(r);
+
+ r->folio = filemap_get_folio(r->mapping, file_off >> PAGE_SHIFT);
+ if (IS_ERR(r->folio) || !folio_test_uptodate(r->folio)) {
+ if (!IS_ERR(r->folio))
+ folio_put(r->folio);
+ r->folio = NULL;
+ return -EFAULT;
+ }
+
+ r->folio_off = folio_pos(r->folio);
+ r->addr = kmap_local_folio(r->folio, 0);
+
+ return 0;
+}
+
+static const void *freader_fetch(struct freader *r, loff_t file_off, size_t sz)
+{
+ size_t folio_sz;
+
+ /* provided internal temporary buffer should be sized correctly */
+ if (WARN_ON(r->buf && sz > r->buf_sz)) {
+ r->err = -E2BIG;
+ return NULL;
+ }
+
+ if (unlikely(file_off + sz < file_off)) {
+ r->err = -EOVERFLOW;
+ return NULL;
+ }
+
+ /* working with memory buffer is much more straightforward */
+ if (!r->buf) {
+ if (file_off + sz > r->data_sz) {
+ r->err = -ERANGE;
+ return NULL;
+ }
+ return r->data + file_off;
+ }
+
+ /* fetch or reuse folio for given file offset */
+ r->err = freader_get_folio(r, file_off);
+ if (r->err)
+ return NULL;
+
+ /* if requested data is crossing folio boundaries, we have to copy
+ * everything into our local buffer to keep a simple linear memory
+ * access interface
+ */
+ folio_sz = folio_size(r->folio);
+ if (file_off + sz > r->folio_off + folio_sz) {
+ int part_sz = r->folio_off + folio_sz - file_off;
+
+ /* copy the part that resides in the current folio */
+ memcpy(r->buf, r->addr + (file_off - r->folio_off), part_sz);
+
+ /* fetch next folio */
+ r->err = freader_get_folio(r, r->folio_off + folio_sz);
+ if (r->err)
+ return NULL;
+
+ /* copy the rest of requested data */
+ memcpy(r->buf + part_sz, r->addr, sz - part_sz);
+
+ return r->buf;
+ }
+
+ /* if data fits in a single folio, just return direct pointer */
+ return r->addr + (file_off - r->folio_off);
+}
+
+static void freader_cleanup(struct freader *r)
+{
+ if (!r->buf)
+ return; /* non-file-backed mode */
+
+ freader_put_folio(r);
+}
+
/*
* Parse build id from the note segment. This logic can be shared between
* 32-bit and 64-bit system, because Elf32_Nhdr and Elf64_Nhdr are
* identical.
*/
-static int parse_build_id_buf(unsigned char *build_id,
- __u32 *size,
- const void *note_start,
- Elf32_Word note_size)
+static int parse_build_id_buf(struct freader *r,
+ unsigned char *build_id, __u32 *size,
+ loff_t note_off, Elf32_Word note_size)
{
const char note_name[] = "GNU";
const size_t note_name_sz = sizeof(note_name);
- u64 note_off = 0, new_off, name_sz, desc_sz;
+ u32 build_id_off, new_off, note_end, name_sz, desc_sz;
+ const Elf32_Nhdr *nhdr;
const char *data;
- while (note_off + sizeof(Elf32_Nhdr) < note_size &&
- note_off + sizeof(Elf32_Nhdr) > note_off /* overflow */) {
- Elf32_Nhdr *nhdr = (Elf32_Nhdr *)(note_start + note_off);
+ note_end = note_off + note_size;
+ while (note_end - note_off > sizeof(Elf32_Nhdr) + note_name_sz) {
+ nhdr = freader_fetch(r, note_off, sizeof(Elf32_Nhdr) + note_name_sz);
+ if (!nhdr)
+ return r->err;
name_sz = READ_ONCE(nhdr->n_namesz);
desc_sz = READ_ONCE(nhdr->n_descsz);
@@ -33,14 +164,20 @@ static int parse_build_id_buf(unsigned char *build_id,
new_off = note_off + sizeof(Elf32_Nhdr);
if (check_add_overflow(new_off, ALIGN(name_sz, 4), &new_off) ||
check_add_overflow(new_off, ALIGN(desc_sz, 4), &new_off) ||
- new_off > note_size)
+ new_off > note_end)
break;
if (nhdr->n_type == BUILD_ID &&
name_sz == note_name_sz &&
memcmp(nhdr + 1, note_name, note_name_sz) == 0 &&
desc_sz > 0 && desc_sz <= BUILD_ID_SIZE_MAX) {
- data = note_start + note_off + ALIGN(note_name_sz, 4);
+ build_id_off = note_off + sizeof(Elf32_Nhdr) + ALIGN(note_name_sz, 4);
+
+ /* freader_fetch() will invalidate nhdr pointer */
+ data = freader_fetch(r, build_id_off, desc_sz);
+ if (!data)
+ return r->err;
+
memcpy(build_id, data, desc_sz);
memset(build_id + desc_sz, 0, BUILD_ID_SIZE_MAX - desc_sz);
if (size)
@@ -54,30 +191,33 @@ static int parse_build_id_buf(unsigned char *build_id,
return -EINVAL;
}
-static inline int parse_build_id(const void *page_addr,
+static inline int parse_build_id(struct freader *r,
unsigned char *build_id,
__u32 *size,
- const void *note_start,
+ loff_t note_start_off,
Elf32_Word note_size)
{
/* check for overflow */
- if (note_start < page_addr || note_start + note_size < note_start)
+ if (note_start_off + note_size < note_start_off)
return -EINVAL;
/* only supports note that fits in the first page */
- if (note_start + note_size > page_addr + PAGE_SIZE)
+ if (note_start_off + note_size > PAGE_SIZE)
return -EINVAL;
- return parse_build_id_buf(build_id, size, note_start, note_size);
+ return parse_build_id_buf(r, build_id, size, note_start_off, note_size);
}
/* Parse build ID from 32-bit ELF */
-static int get_build_id_32(const void *page_addr, unsigned char *build_id,
- __u32 *size)
+static int get_build_id_32(struct freader *r, unsigned char *build_id, __u32 *size)
{
- Elf32_Ehdr *ehdr = (Elf32_Ehdr *)page_addr;
- Elf32_Phdr *phdr;
- __u32 i, phnum;
+ const Elf32_Ehdr *ehdr;
+ const Elf32_Phdr *phdr;
+ __u32 phnum, i;
+
+ ehdr = freader_fetch(r, 0, sizeof(Elf32_Ehdr));
+ if (!ehdr)
+ return r->err;
/*
* FIXME
@@ -87,30 +227,35 @@ static int get_build_id_32(const void *page_addr, unsigned char *build_id,
if (ehdr->e_phoff != sizeof(Elf32_Ehdr))
return -EINVAL;
+ /* subsequent freader_fetch() calls invalidate pointers, so remember locally */
phnum = READ_ONCE(ehdr->e_phnum);
/* only supports phdr that fits in one page */
if (phnum > (PAGE_SIZE - sizeof(Elf32_Ehdr)) / sizeof(Elf32_Phdr))
return -EINVAL;
- phdr = (Elf32_Phdr *)(page_addr + sizeof(Elf32_Ehdr));
-
for (i = 0; i < phnum; ++i) {
- if (phdr[i].p_type == PT_NOTE &&
- !parse_build_id(page_addr, build_id, size,
- page_addr + READ_ONCE(phdr[i].p_offset),
- READ_ONCE(phdr[i].p_filesz)))
+ phdr = freader_fetch(r, i * sizeof(Elf32_Phdr), sizeof(Elf32_Phdr));
+ if (!phdr)
+ return r->err;
+
+ if (phdr->p_type == PT_NOTE &&
+ !parse_build_id(r, build_id, size, READ_ONCE(phdr->p_offset),
+ READ_ONCE(phdr->p_filesz)))
return 0;
}
return -EINVAL;
}
/* Parse build ID from 64-bit ELF */
-static int get_build_id_64(const void *page_addr, unsigned char *build_id,
- __u32 *size)
+static int get_build_id_64(struct freader *r, unsigned char *build_id, __u32 *size)
{
- Elf64_Ehdr *ehdr = (Elf64_Ehdr *)page_addr;
- Elf64_Phdr *phdr;
- __u32 i, phnum;
+ const Elf64_Ehdr *ehdr;
+ const Elf64_Phdr *phdr;
+ __u32 phnum, i;
+
+ ehdr = freader_fetch(r, 0, sizeof(Elf64_Ehdr));
+ if (!ehdr)
+ return r->err;
/*
* FIXME
@@ -120,23 +265,29 @@ static int get_build_id_64(const void *page_addr, unsigned char *build_id,
if (ehdr->e_phoff != sizeof(Elf64_Ehdr))
return -EINVAL;
+ /* subsequent freader_fetch() calls invalidate pointers, so remember locally */
phnum = READ_ONCE(ehdr->e_phnum);
/* only supports phdr that fits in one page */
if (phnum > (PAGE_SIZE - sizeof(Elf64_Ehdr)) / sizeof(Elf64_Phdr))
return -EINVAL;
- phdr = (Elf64_Phdr *)(page_addr + sizeof(Elf64_Ehdr));
-
for (i = 0; i < phnum; ++i) {
- if (phdr[i].p_type == PT_NOTE &&
- !parse_build_id(page_addr, build_id, size,
- page_addr + READ_ONCE(phdr[i].p_offset),
- READ_ONCE(phdr[i].p_filesz)))
+ phdr = freader_fetch(r, i * sizeof(Elf64_Phdr), sizeof(Elf64_Phdr));
+ if (!phdr)
+ return r->err;
+
+ if (phdr->p_type == PT_NOTE &&
+ !parse_build_id(r, build_id, size, READ_ONCE(phdr->p_offset),
+ READ_ONCE(phdr->p_filesz)))
return 0;
}
+
return -EINVAL;
}
+/* enough for Elf64_Ehdr, Elf64_Phdr, and all the smaller requests */
+#define MAX_FREADER_BUF_SZ 64
+
/*
* Parse build ID of ELF file mapped to vma
* @vma: vma object
@@ -148,26 +299,25 @@ static int get_build_id_64(const void *page_addr, unsigned char *build_id,
int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
__u32 *size)
{
- Elf32_Ehdr *ehdr;
- struct page *page;
- void *page_addr;
+ const Elf32_Ehdr *ehdr;
+ struct freader r;
+ char buf[MAX_FREADER_BUF_SZ];
int ret;
/* only works for page backed storage */
if (!vma->vm_file)
return -EINVAL;
- page = find_get_page(vma->vm_file->f_mapping, 0);
- if (!page)
- return -EFAULT; /* page not mapped */
- if (!PageUptodate(page)) {
- put_page(page);
- return -EFAULT;
+ freader_init_from_file(&r, buf, sizeof(buf), vma->vm_file->f_mapping);
+
+ /* fetch first 18 bytes of ELF header for checks */
+ ehdr = freader_fetch(&r, 0, offsetofend(Elf32_Ehdr, e_type));
+ if (!ehdr) {
+ ret = r.err;
+ goto out;
}
ret = -EINVAL;
- page_addr = kmap_local_page(page);
- ehdr = (Elf32_Ehdr *)page_addr;
/* compare magic x7f "ELF" */
if (memcmp(ehdr->e_ident, ELFMAG, SELFMAG) != 0)
@@ -178,12 +328,11 @@ int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
goto out;
if (ehdr->e_ident[EI_CLASS] == ELFCLASS32)
- ret = get_build_id_32(page_addr, build_id, size);
+ ret = get_build_id_32(&r, build_id, size);
else if (ehdr->e_ident[EI_CLASS] == ELFCLASS64)
- ret = get_build_id_64(page_addr, build_id, size);
+ ret = get_build_id_64(&r, build_id, size);
out:
- kunmap_local(page_addr);
- put_page(page);
+ freader_cleanup(&r);
return ret;
}
@@ -197,7 +346,15 @@ int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
*/
int build_id_parse_buf(const void *buf, unsigned char *build_id, u32 buf_size)
{
- return parse_build_id_buf(build_id, NULL, buf, buf_size);
+ struct freader r;
+ int err;
+
+ freader_init_from_mem(&r, buf, buf_size);
+
+ err = parse_build_id(&r, build_id, NULL, 0, buf_size);
+
+ freader_cleanup(&r);
+ return err;
}
#if IS_ENABLED(CONFIG_STACKTRACE_BUILD_ID) || IS_ENABLED(CONFIG_VMCORE_INFO)
--
2.43.5
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v6 bpf-next 03/10] lib/buildid: take into account e_phoff when fetching program headers
2024-08-14 18:54 [PATCH v6 bpf-next 00/10] Harden and extend ELF build ID parsing logic Andrii Nakryiko
2024-08-14 18:54 ` [PATCH v6 bpf-next 01/10] lib/buildid: harden " Andrii Nakryiko
2024-08-14 18:54 ` [PATCH v6 bpf-next 02/10] lib/buildid: add single folio-based file reader abstraction Andrii Nakryiko
@ 2024-08-14 18:54 ` Andrii Nakryiko
2024-08-14 18:54 ` [PATCH v6 bpf-next 04/10] lib/buildid: remove single-page limit for PHDR search Andrii Nakryiko
` (7 subsequent siblings)
10 siblings, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2024-08-14 18:54 UTC (permalink / raw)
To: bpf
Cc: linux-mm, akpm, adobriyan, shakeel.butt, hannes, ak, osandov,
song, jannh, linux-fsdevel, willy, Andrii Nakryiko
Current code assumption is that program (segment) headers are following
ELF header immediately. This is a common case, but is not guaranteed. So
take into account e_phoff field of the ELF header when accessing program
headers.
Reported-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
lib/buildid.c | 35 ++++++++++++++++-------------------
1 file changed, 16 insertions(+), 19 deletions(-)
diff --git a/lib/buildid.c b/lib/buildid.c
index bfe00b66b1e8..7fb08a1d98bd 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -213,28 +213,26 @@ static int get_build_id_32(struct freader *r, unsigned char *build_id, __u32 *si
{
const Elf32_Ehdr *ehdr;
const Elf32_Phdr *phdr;
- __u32 phnum, i;
+ __u32 phnum, phoff, i;
ehdr = freader_fetch(r, 0, sizeof(Elf32_Ehdr));
if (!ehdr)
return r->err;
- /*
- * FIXME
- * Neither ELF spec nor ELF loader require that program headers
- * start immediately after ELF header.
- */
- if (ehdr->e_phoff != sizeof(Elf32_Ehdr))
- return -EINVAL;
-
/* subsequent freader_fetch() calls invalidate pointers, so remember locally */
phnum = READ_ONCE(ehdr->e_phnum);
+ phoff = READ_ONCE(ehdr->e_phoff);
+
/* only supports phdr that fits in one page */
if (phnum > (PAGE_SIZE - sizeof(Elf32_Ehdr)) / sizeof(Elf32_Phdr))
return -EINVAL;
+ /* check that phoff is not large enough to cause an overflow */
+ if (phoff + phnum * sizeof(Elf32_Phdr) < phoff)
+ return -EINVAL;
+
for (i = 0; i < phnum; ++i) {
- phdr = freader_fetch(r, i * sizeof(Elf32_Phdr), sizeof(Elf32_Phdr));
+ phdr = freader_fetch(r, phoff + i * sizeof(Elf32_Phdr), sizeof(Elf32_Phdr));
if (!phdr)
return r->err;
@@ -252,27 +250,26 @@ static int get_build_id_64(struct freader *r, unsigned char *build_id, __u32 *si
const Elf64_Ehdr *ehdr;
const Elf64_Phdr *phdr;
__u32 phnum, i;
+ __u64 phoff;
ehdr = freader_fetch(r, 0, sizeof(Elf64_Ehdr));
if (!ehdr)
return r->err;
- /*
- * FIXME
- * Neither ELF spec nor ELF loader require that program headers
- * start immediately after ELF header.
- */
- if (ehdr->e_phoff != sizeof(Elf64_Ehdr))
- return -EINVAL;
-
/* subsequent freader_fetch() calls invalidate pointers, so remember locally */
phnum = READ_ONCE(ehdr->e_phnum);
+ phoff = READ_ONCE(ehdr->e_phoff);
+
/* only supports phdr that fits in one page */
if (phnum > (PAGE_SIZE - sizeof(Elf64_Ehdr)) / sizeof(Elf64_Phdr))
return -EINVAL;
+ /* check that phoff is not large enough to cause an overflow */
+ if (phoff + phnum * sizeof(Elf64_Phdr) < phoff)
+ return -EINVAL;
+
for (i = 0; i < phnum; ++i) {
- phdr = freader_fetch(r, i * sizeof(Elf64_Phdr), sizeof(Elf64_Phdr));
+ phdr = freader_fetch(r, phoff + i * sizeof(Elf64_Phdr), sizeof(Elf64_Phdr));
if (!phdr)
return r->err;
--
2.43.5
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v6 bpf-next 04/10] lib/buildid: remove single-page limit for PHDR search
2024-08-14 18:54 [PATCH v6 bpf-next 00/10] Harden and extend ELF build ID parsing logic Andrii Nakryiko
` (2 preceding siblings ...)
2024-08-14 18:54 ` [PATCH v6 bpf-next 03/10] lib/buildid: take into account e_phoff when fetching program headers Andrii Nakryiko
@ 2024-08-14 18:54 ` Andrii Nakryiko
2024-08-14 18:54 ` [PATCH v6 bpf-next 05/10] lib/buildid: rename build_id_parse() into build_id_parse_nofault() Andrii Nakryiko
` (6 subsequent siblings)
10 siblings, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2024-08-14 18:54 UTC (permalink / raw)
To: bpf
Cc: linux-mm, akpm, adobriyan, shakeel.butt, hannes, ak, osandov,
song, jannh, linux-fsdevel, willy, Andrii Nakryiko
Now that freader allows to access multiple pages transparently, there is
no need to limit program headers to the very first ELF file page. Remove
this limitation, but still put some sane limit on amount of program
headers that we are willing to iterate over (set arbitrarily to 256).
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
lib/buildid.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/lib/buildid.c b/lib/buildid.c
index 7fb08a1d98bd..e8fc4aeb01f2 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -8,6 +8,8 @@
#define BUILD_ID 3
+#define MAX_PHDR_CNT 256
+
struct freader {
void *buf;
u32 buf_sz;
@@ -223,9 +225,9 @@ static int get_build_id_32(struct freader *r, unsigned char *build_id, __u32 *si
phnum = READ_ONCE(ehdr->e_phnum);
phoff = READ_ONCE(ehdr->e_phoff);
- /* only supports phdr that fits in one page */
- if (phnum > (PAGE_SIZE - sizeof(Elf32_Ehdr)) / sizeof(Elf32_Phdr))
- return -EINVAL;
+ /* set upper bound on amount of segments (phdrs) we iterate */
+ if (phnum > MAX_PHDR_CNT)
+ phnum = MAX_PHDR_CNT;
/* check that phoff is not large enough to cause an overflow */
if (phoff + phnum * sizeof(Elf32_Phdr) < phoff)
@@ -260,9 +262,9 @@ static int get_build_id_64(struct freader *r, unsigned char *build_id, __u32 *si
phnum = READ_ONCE(ehdr->e_phnum);
phoff = READ_ONCE(ehdr->e_phoff);
- /* only supports phdr that fits in one page */
- if (phnum > (PAGE_SIZE - sizeof(Elf64_Ehdr)) / sizeof(Elf64_Phdr))
- return -EINVAL;
+ /* set upper bound on amount of segments (phdrs) we iterate */
+ if (phnum > MAX_PHDR_CNT)
+ phnum = MAX_PHDR_CNT;
/* check that phoff is not large enough to cause an overflow */
if (phoff + phnum * sizeof(Elf64_Phdr) < phoff)
--
2.43.5
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v6 bpf-next 05/10] lib/buildid: rename build_id_parse() into build_id_parse_nofault()
2024-08-14 18:54 [PATCH v6 bpf-next 00/10] Harden and extend ELF build ID parsing logic Andrii Nakryiko
` (3 preceding siblings ...)
2024-08-14 18:54 ` [PATCH v6 bpf-next 04/10] lib/buildid: remove single-page limit for PHDR search Andrii Nakryiko
@ 2024-08-14 18:54 ` Andrii Nakryiko
2024-08-14 18:54 ` [PATCH v6 bpf-next 06/10] lib/buildid: implement sleepable build_id_parse() API Andrii Nakryiko
` (5 subsequent siblings)
10 siblings, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2024-08-14 18:54 UTC (permalink / raw)
To: bpf
Cc: linux-mm, akpm, adobriyan, shakeel.butt, hannes, ak, osandov,
song, jannh, linux-fsdevel, willy, Andrii Nakryiko
Make it clear that build_id_parse() assumes that it can take no page
fault by renaming it and current few users to build_id_parse_nofault().
Also add build_id_parse() stub which for now falls back to non-sleepable
implementation, but will be changed in subsequent patches to take
advantage of sleepable context. PROCMAP_QUERY ioctl() on
/proc/<pid>/maps file is using build_id_parse() and will automatically
take advantage of more reliable sleepable context implementation.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
include/linux/buildid.h | 4 ++--
kernel/bpf/stackmap.c | 2 +-
kernel/events/core.c | 2 +-
lib/buildid.c | 25 ++++++++++++++++++++++---
4 files changed, 26 insertions(+), 7 deletions(-)
diff --git a/include/linux/buildid.h b/include/linux/buildid.h
index 20aa3c2d89f7..014a88c41073 100644
--- a/include/linux/buildid.h
+++ b/include/linux/buildid.h
@@ -7,8 +7,8 @@
#define BUILD_ID_SIZE_MAX 20
struct vm_area_struct;
-int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
- __u32 *size);
+int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id, __u32 *size);
+int build_id_parse_nofault(struct vm_area_struct *vma, unsigned char *build_id, __u32 *size);
int build_id_parse_buf(const void *buf, unsigned char *build_id, u32 buf_size);
#if IS_ENABLED(CONFIG_STACKTRACE_BUILD_ID) || IS_ENABLED(CONFIG_VMCORE_INFO)
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index c99f8e5234ac..770ae8e88016 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -156,7 +156,7 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
goto build_id_valid;
}
vma = find_vma(current->mm, ips[i]);
- if (!vma || build_id_parse(vma, id_offs[i].build_id, NULL)) {
+ if (!vma || build_id_parse_nofault(vma, id_offs[i].build_id, NULL)) {
/* per entry fall back to ips */
id_offs[i].status = BPF_STACK_BUILD_ID_IP;
id_offs[i].ip = ips[i];
diff --git a/kernel/events/core.c b/kernel/events/core.c
index aa3450bdc227..c263a8b0ce54 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8851,7 +8851,7 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event)
mmap_event->event_id.header.size = sizeof(mmap_event->event_id) + size;
if (atomic_read(&nr_build_id_events))
- build_id_parse(vma, mmap_event->build_id, &mmap_event->build_id_size);
+ build_id_parse_nofault(vma, mmap_event->build_id, &mmap_event->build_id_size);
perf_iterate_sb(perf_event_mmap_output,
mmap_event,
diff --git a/lib/buildid.c b/lib/buildid.c
index e8fc4aeb01f2..c1cbd34f3685 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -293,10 +293,12 @@ static int get_build_id_64(struct freader *r, unsigned char *build_id, __u32 *si
* @build_id: buffer to store build id, at least BUILD_ID_SIZE long
* @size: returns actual build id size in case of success
*
- * Return: 0 on success, -EINVAL otherwise
+ * Assumes no page fault can be taken, so if relevant portions of ELF file are
+ * not already paged in, fetching of build ID fails.
+ *
+ * Return: 0 on success; negative error, otherwise
*/
-int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
- __u32 *size)
+int build_id_parse_nofault(struct vm_area_struct *vma, unsigned char *build_id, __u32 *size)
{
const Elf32_Ehdr *ehdr;
struct freader r;
@@ -335,6 +337,23 @@ int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
return ret;
}
+/*
+ * Parse build ID of ELF file mapped to VMA
+ * @vma: vma object
+ * @build_id: buffer to store build id, at least BUILD_ID_SIZE long
+ * @size: returns actual build id size in case of success
+ *
+ * Assumes faultable context and can cause page faults to bring in file data
+ * into page cache.
+ *
+ * Return: 0 on success; negative error, otherwise
+ */
+int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id, __u32 *size)
+{
+ /* fallback to non-faultable version for now */
+ return build_id_parse_nofault(vma, build_id, size);
+}
+
/**
* build_id_parse_buf - Get build ID from a buffer
* @buf: ELF note section(s) to parse
--
2.43.5
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v6 bpf-next 06/10] lib/buildid: implement sleepable build_id_parse() API
2024-08-14 18:54 [PATCH v6 bpf-next 00/10] Harden and extend ELF build ID parsing logic Andrii Nakryiko
` (4 preceding siblings ...)
2024-08-14 18:54 ` [PATCH v6 bpf-next 05/10] lib/buildid: rename build_id_parse() into build_id_parse_nofault() Andrii Nakryiko
@ 2024-08-14 18:54 ` Andrii Nakryiko
2024-08-14 18:54 ` [PATCH v6 bpf-next 07/10] lib/buildid: don't limit .note.gnu.build-id to the first page in ELF Andrii Nakryiko
` (4 subsequent siblings)
10 siblings, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2024-08-14 18:54 UTC (permalink / raw)
To: bpf
Cc: linux-mm, akpm, adobriyan, shakeel.butt, hannes, ak, osandov,
song, jannh, linux-fsdevel, willy, Andrii Nakryiko, Omar Sandoval
Extend freader with a flag specifying whether it's OK to cause page
fault to fetch file data that is not already physically present in
memory. With this, it's now easy to wait for data if the caller is
running in sleepable (faultable) context.
We utilize read_cache_folio() to bring the desired folio into page
cache, after which the rest of the logic works just the same at folio level.
Suggested-by: Omar Sandoval <osandov@fb.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
lib/buildid.c | 52 ++++++++++++++++++++++++++++++++-------------------
1 file changed, 33 insertions(+), 19 deletions(-)
diff --git a/lib/buildid.c b/lib/buildid.c
index c1cbd34f3685..5da5a32a1af8 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -16,10 +16,11 @@ struct freader {
int err;
union {
struct {
- struct address_space *mapping;
+ struct file *file;
struct folio *folio;
void *addr;
loff_t folio_off;
+ bool may_fault;
};
struct {
const char *data;
@@ -29,12 +30,13 @@ struct freader {
};
static void freader_init_from_file(struct freader *r, void *buf, u32 buf_sz,
- struct address_space *mapping)
+ struct file *file, bool may_fault)
{
memset(r, 0, sizeof(*r));
r->buf = buf;
r->buf_sz = buf_sz;
- r->mapping = mapping;
+ r->file = file;
+ r->may_fault = may_fault;
}
static void freader_init_from_mem(struct freader *r, const char *data, u64 data_sz)
@@ -62,7 +64,14 @@ static int freader_get_folio(struct freader *r, loff_t file_off)
freader_put_folio(r);
- r->folio = filemap_get_folio(r->mapping, file_off >> PAGE_SHIFT);
+ r->folio = filemap_get_folio(r->file->f_mapping, file_off >> PAGE_SHIFT);
+
+ /* if sleeping is allowed, wait for the page, if necessary */
+ if (r->may_fault && (IS_ERR(r->folio) || !folio_test_uptodate(r->folio))) {
+ r->folio = read_cache_folio(r->file->f_mapping, file_off >> PAGE_SHIFT,
+ NULL, r->file);
+ }
+
if (IS_ERR(r->folio) || !folio_test_uptodate(r->folio)) {
if (!IS_ERR(r->folio))
folio_put(r->folio);
@@ -287,18 +296,8 @@ static int get_build_id_64(struct freader *r, unsigned char *build_id, __u32 *si
/* enough for Elf64_Ehdr, Elf64_Phdr, and all the smaller requests */
#define MAX_FREADER_BUF_SZ 64
-/*
- * Parse build ID of ELF file mapped to vma
- * @vma: vma object
- * @build_id: buffer to store build id, at least BUILD_ID_SIZE long
- * @size: returns actual build id size in case of success
- *
- * Assumes no page fault can be taken, so if relevant portions of ELF file are
- * not already paged in, fetching of build ID fails.
- *
- * Return: 0 on success; negative error, otherwise
- */
-int build_id_parse_nofault(struct vm_area_struct *vma, unsigned char *build_id, __u32 *size)
+static int __build_id_parse(struct vm_area_struct *vma, unsigned char *build_id,
+ __u32 *size, bool may_fault)
{
const Elf32_Ehdr *ehdr;
struct freader r;
@@ -309,7 +308,7 @@ int build_id_parse_nofault(struct vm_area_struct *vma, unsigned char *build_id,
if (!vma->vm_file)
return -EINVAL;
- freader_init_from_file(&r, buf, sizeof(buf), vma->vm_file->f_mapping);
+ freader_init_from_file(&r, buf, sizeof(buf), vma->vm_file, may_fault);
/* fetch first 18 bytes of ELF header for checks */
ehdr = freader_fetch(&r, 0, offsetofend(Elf32_Ehdr, e_type));
@@ -337,6 +336,22 @@ int build_id_parse_nofault(struct vm_area_struct *vma, unsigned char *build_id,
return ret;
}
+/*
+ * Parse build ID of ELF file mapped to vma
+ * @vma: vma object
+ * @build_id: buffer to store build id, at least BUILD_ID_SIZE long
+ * @size: returns actual build id size in case of success
+ *
+ * Assumes no page fault can be taken, so if relevant portions of ELF file are
+ * not already paged in, fetching of build ID fails.
+ *
+ * Return: 0 on success; negative error, otherwise
+ */
+int build_id_parse_nofault(struct vm_area_struct *vma, unsigned char *build_id, __u32 *size)
+{
+ return __build_id_parse(vma, build_id, size, false /* !may_fault */);
+}
+
/*
* Parse build ID of ELF file mapped to VMA
* @vma: vma object
@@ -350,8 +365,7 @@ int build_id_parse_nofault(struct vm_area_struct *vma, unsigned char *build_id,
*/
int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id, __u32 *size)
{
- /* fallback to non-faultable version for now */
- return build_id_parse_nofault(vma, build_id, size);
+ return __build_id_parse(vma, build_id, size, true /* may_fault */);
}
/**
--
2.43.5
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v6 bpf-next 07/10] lib/buildid: don't limit .note.gnu.build-id to the first page in ELF
2024-08-14 18:54 [PATCH v6 bpf-next 00/10] Harden and extend ELF build ID parsing logic Andrii Nakryiko
` (5 preceding siblings ...)
2024-08-14 18:54 ` [PATCH v6 bpf-next 06/10] lib/buildid: implement sleepable build_id_parse() API Andrii Nakryiko
@ 2024-08-14 18:54 ` Andrii Nakryiko
2024-08-14 18:54 ` [PATCH v6 bpf-next 08/10] bpf: decouple stack_map_get_build_id_offset() from perf_callchain_entry Andrii Nakryiko
` (3 subsequent siblings)
10 siblings, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2024-08-14 18:54 UTC (permalink / raw)
To: bpf
Cc: linux-mm, akpm, adobriyan, shakeel.butt, hannes, ak, osandov,
song, jannh, linux-fsdevel, willy, Andrii Nakryiko
With freader we don't need to restrict ourselves to a single page, so
let's allow ELF notes to be at any valid position with the file.
We also merge parse_build_id() and parse_build_id_buf() as now the only
difference between them is note offset overflow, which makes sense to
check in all situations.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
lib/buildid.c | 26 +++++---------------------
1 file changed, 5 insertions(+), 21 deletions(-)
diff --git a/lib/buildid.c b/lib/buildid.c
index 5da5a32a1af8..b404b89f61a3 100644
--- a/lib/buildid.c
+++ b/lib/buildid.c
@@ -153,9 +153,8 @@ static void freader_cleanup(struct freader *r)
* 32-bit and 64-bit system, because Elf32_Nhdr and Elf64_Nhdr are
* identical.
*/
-static int parse_build_id_buf(struct freader *r,
- unsigned char *build_id, __u32 *size,
- loff_t note_off, Elf32_Word note_size)
+static int parse_build_id(struct freader *r, unsigned char *build_id, __u32 *size,
+ loff_t note_off, Elf32_Word note_size)
{
const char note_name[] = "GNU";
const size_t note_name_sz = sizeof(note_name);
@@ -163,7 +162,9 @@ static int parse_build_id_buf(struct freader *r,
const Elf32_Nhdr *nhdr;
const char *data;
- note_end = note_off + note_size;
+ if (check_add_overflow(note_off, note_size, ¬e_end))
+ return -EINVAL;
+
while (note_end - note_off > sizeof(Elf32_Nhdr) + note_name_sz) {
nhdr = freader_fetch(r, note_off, sizeof(Elf32_Nhdr) + note_name_sz);
if (!nhdr)
@@ -202,23 +203,6 @@ static int parse_build_id_buf(struct freader *r,
return -EINVAL;
}
-static inline int parse_build_id(struct freader *r,
- unsigned char *build_id,
- __u32 *size,
- loff_t note_start_off,
- Elf32_Word note_size)
-{
- /* check for overflow */
- if (note_start_off + note_size < note_start_off)
- return -EINVAL;
-
- /* only supports note that fits in the first page */
- if (note_start_off + note_size > PAGE_SIZE)
- return -EINVAL;
-
- return parse_build_id_buf(r, build_id, size, note_start_off, note_size);
-}
-
/* Parse build ID from 32-bit ELF */
static int get_build_id_32(struct freader *r, unsigned char *build_id, __u32 *size)
{
--
2.43.5
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v6 bpf-next 08/10] bpf: decouple stack_map_get_build_id_offset() from perf_callchain_entry
2024-08-14 18:54 [PATCH v6 bpf-next 00/10] Harden and extend ELF build ID parsing logic Andrii Nakryiko
` (6 preceding siblings ...)
2024-08-14 18:54 ` [PATCH v6 bpf-next 07/10] lib/buildid: don't limit .note.gnu.build-id to the first page in ELF Andrii Nakryiko
@ 2024-08-14 18:54 ` Andrii Nakryiko
2024-08-22 20:32 ` Eduard Zingerman
2024-08-14 18:54 ` [PATCH v6 bpf-next 09/10] bpf: wire up sleepable bpf_get_stack() and bpf_get_task_stack() helpers Andrii Nakryiko
` (2 subsequent siblings)
10 siblings, 1 reply; 20+ messages in thread
From: Andrii Nakryiko @ 2024-08-14 18:54 UTC (permalink / raw)
To: bpf
Cc: linux-mm, akpm, adobriyan, shakeel.butt, hannes, ak, osandov,
song, jannh, linux-fsdevel, willy, Andrii Nakryiko
Change stack_map_get_build_id_offset() which is used to convert stack
trace IP addresses into build ID+offset pairs. Right now this function
accepts an array of u64s as an input, and uses array of
struct bpf_stack_build_id as an output.
This is problematic because u64 array is coming from
perf_callchain_entry, which is (non-sleepable) RCU protected, so once we
allows sleepable build ID fetching, this all breaks down.
But its actually pretty easy to make stack_map_get_build_id_offset()
works with array of struct bpf_stack_build_id as both input and output.
Which is what this patch is doing, eliminating the dependency on
perf_callchain_entry. We require caller to fill out
bpf_stack_build_id.ip fields (all other can be left uninitialized), and
update in place as we do build ID resolution.
We make sure to READ_ONCE() and cache locally current IP value as we
used it in a few places to find matching VMA and so on. Given this data
is directly accessible and modifiable by user's BPF code, we should make
sure to have a consistent view of it.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
kernel/bpf/stackmap.c | 49 +++++++++++++++++++++++++++++--------------
1 file changed, 33 insertions(+), 16 deletions(-)
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 770ae8e88016..6457222b0b46 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -124,8 +124,18 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr)
return ERR_PTR(err);
}
+/*
+ * Expects all id_offs[i].ip values to be set to correct initial IPs.
+ * They will be subsequently:
+ * - either adjusted in place to a file offset, if build ID fetching
+ * succeeds; in this case id_offs[i].build_id is set to correct build ID,
+ * and id_offs[i].status is set to BPF_STACK_BUILD_ID_VALID;
+ * - or IP will be kept intact, if build ID fetching failed; in this case
+ * id_offs[i].build_id is zeroed out and id_offs[i].status is set to
+ * BPF_STACK_BUILD_ID_IP.
+ */
static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
- u64 *ips, u32 trace_nr, bool user)
+ u32 trace_nr, bool user)
{
int i;
struct mmap_unlock_irq_work *work = NULL;
@@ -142,30 +152,28 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
/* cannot access current->mm, fall back to ips */
for (i = 0; i < trace_nr; i++) {
id_offs[i].status = BPF_STACK_BUILD_ID_IP;
- id_offs[i].ip = ips[i];
memset(id_offs[i].build_id, 0, BUILD_ID_SIZE_MAX);
}
return;
}
for (i = 0; i < trace_nr; i++) {
- if (range_in_vma(prev_vma, ips[i], ips[i])) {
+ u64 ip = READ_ONCE(id_offs[i].ip);
+
+ if (range_in_vma(prev_vma, ip, ip)) {
vma = prev_vma;
- memcpy(id_offs[i].build_id, prev_build_id,
- BUILD_ID_SIZE_MAX);
+ memcpy(id_offs[i].build_id, prev_build_id, BUILD_ID_SIZE_MAX);
goto build_id_valid;
}
- vma = find_vma(current->mm, ips[i]);
+ vma = find_vma(current->mm, ip);
if (!vma || build_id_parse_nofault(vma, id_offs[i].build_id, NULL)) {
/* per entry fall back to ips */
id_offs[i].status = BPF_STACK_BUILD_ID_IP;
- id_offs[i].ip = ips[i];
memset(id_offs[i].build_id, 0, BUILD_ID_SIZE_MAX);
continue;
}
build_id_valid:
- id_offs[i].offset = (vma->vm_pgoff << PAGE_SHIFT) + ips[i]
- - vma->vm_start;
+ id_offs[i].offset = (vma->vm_pgoff << PAGE_SHIFT) + ip - vma->vm_start;
id_offs[i].status = BPF_STACK_BUILD_ID_VALID;
prev_vma = vma;
prev_build_id = id_offs[i].build_id;
@@ -216,7 +224,7 @@ static long __bpf_get_stackid(struct bpf_map *map,
struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map);
struct stack_map_bucket *bucket, *new_bucket, *old_bucket;
u32 skip = flags & BPF_F_SKIP_FIELD_MASK;
- u32 hash, id, trace_nr, trace_len;
+ u32 hash, id, trace_nr, trace_len, i;
bool user = flags & BPF_F_USER_STACK;
u64 *ips;
bool hash_matches;
@@ -238,15 +246,18 @@ static long __bpf_get_stackid(struct bpf_map *map,
return id;
if (stack_map_use_build_id(map)) {
+ struct bpf_stack_build_id *id_offs;
+
/* for build_id+offset, pop a bucket before slow cmp */
new_bucket = (struct stack_map_bucket *)
pcpu_freelist_pop(&smap->freelist);
if (unlikely(!new_bucket))
return -ENOMEM;
new_bucket->nr = trace_nr;
- stack_map_get_build_id_offset(
- (struct bpf_stack_build_id *)new_bucket->data,
- ips, trace_nr, user);
+ id_offs = (struct bpf_stack_build_id *)new_bucket->data;
+ for (i = 0; i < trace_nr; i++)
+ id_offs[i].ip = ips[i];
+ stack_map_get_build_id_offset(id_offs, trace_nr, user);
trace_len = trace_nr * sizeof(struct bpf_stack_build_id);
if (hash_matches && bucket->nr == trace_nr &&
memcmp(bucket->data, new_bucket->data, trace_len) == 0) {
@@ -445,10 +456,16 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
copy_len = trace_nr * elem_size;
ips = trace->ip + skip;
- if (user && user_build_id)
- stack_map_get_build_id_offset(buf, ips, trace_nr, user);
- else
+ if (user && user_build_id) {
+ struct bpf_stack_build_id *id_offs = buf;
+ u32 i;
+
+ for (i = 0; i < trace_nr; i++)
+ id_offs[i].ip = ips[i];
+ stack_map_get_build_id_offset(buf, trace_nr, user);
+ } else {
memcpy(buf, ips, copy_len);
+ }
if (size > copy_len)
memset(buf + copy_len, 0, size - copy_len);
--
2.43.5
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v6 bpf-next 09/10] bpf: wire up sleepable bpf_get_stack() and bpf_get_task_stack() helpers
2024-08-14 18:54 [PATCH v6 bpf-next 00/10] Harden and extend ELF build ID parsing logic Andrii Nakryiko
` (7 preceding siblings ...)
2024-08-14 18:54 ` [PATCH v6 bpf-next 08/10] bpf: decouple stack_map_get_build_id_offset() from perf_callchain_entry Andrii Nakryiko
@ 2024-08-14 18:54 ` Andrii Nakryiko
2024-08-23 22:22 ` Eduard Zingerman
2024-08-14 18:54 ` [PATCH v6 bpf-next 10/10] selftests/bpf: add build ID tests Andrii Nakryiko
2024-08-23 23:22 ` [PATCH v6 bpf-next 00/10] Harden and extend ELF build ID parsing logic Eduard Zingerman
10 siblings, 1 reply; 20+ messages in thread
From: Andrii Nakryiko @ 2024-08-14 18:54 UTC (permalink / raw)
To: bpf
Cc: linux-mm, akpm, adobriyan, shakeel.butt, hannes, ak, osandov,
song, jannh, linux-fsdevel, willy, Andrii Nakryiko
Add sleepable implementations of bpf_get_stack() and
bpf_get_task_stack() helpers and allow them to be used from sleepable
BPF program (e.g., sleepable uprobes).
Note, the stack trace IPs capturing itself is not sleepable (that would
need to be a separate project), only build ID fetching is sleepable and
thus more reliable, as it will wait for data to be paged in, if
necessary. For that we make use of sleepable build_id_parse()
implementation.
Now that build ID related internals in kernel/bpf/stackmap.c can be used
both in sleepable and non-sleepable contexts, we need to add additional
rcu_read_lock()/rcu_read_unlock() protection around fetching
perf_callchain_entry, but with the refactoring in previous commit it's
now pretty straightforward. We make sure to do rcu_read_unlock (in
sleepable mode only) right before stack_map_get_build_id_offset() call
which can sleep. By that time we don't have any more use of
perf_callchain_entry.
Note, bpf_get_task_stack() will fail for user mode if task != current.
And for kernel mode build ID are irrelevant. So in that sense adding
sleepable bpf_get_task_stack() implementation is a no-op. It feel right
to wire this up for symmetry and completeness, but I'm open to just
dropping it until we support `user && crosstask` condition.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
include/linux/bpf.h | 2 +
kernel/bpf/stackmap.c | 90 ++++++++++++++++++++++++++++++++--------
kernel/trace/bpf_trace.c | 5 ++-
3 files changed, 77 insertions(+), 20 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index b9425e410bcb..0f3dc903bea8 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3198,7 +3198,9 @@ extern const struct bpf_func_proto bpf_get_current_uid_gid_proto;
extern const struct bpf_func_proto bpf_get_current_comm_proto;
extern const struct bpf_func_proto bpf_get_stackid_proto;
extern const struct bpf_func_proto bpf_get_stack_proto;
+extern const struct bpf_func_proto bpf_get_stack_sleepable_proto;
extern const struct bpf_func_proto bpf_get_task_stack_proto;
+extern const struct bpf_func_proto bpf_get_task_stack_sleepable_proto;
extern const struct bpf_func_proto bpf_get_stackid_proto_pe;
extern const struct bpf_func_proto bpf_get_stack_proto_pe;
extern const struct bpf_func_proto bpf_sock_map_update_proto;
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 6457222b0b46..3615c06b7dfa 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -124,6 +124,12 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr)
return ERR_PTR(err);
}
+static int fetch_build_id(struct vm_area_struct *vma, unsigned char *build_id, bool may_fault)
+{
+ return may_fault ? build_id_parse(vma, build_id, NULL)
+ : build_id_parse_nofault(vma, build_id, NULL);
+}
+
/*
* Expects all id_offs[i].ip values to be set to correct initial IPs.
* They will be subsequently:
@@ -135,7 +141,7 @@ static struct bpf_map *stack_map_alloc(union bpf_attr *attr)
* BPF_STACK_BUILD_ID_IP.
*/
static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
- u32 trace_nr, bool user)
+ u32 trace_nr, bool user, bool may_fault)
{
int i;
struct mmap_unlock_irq_work *work = NULL;
@@ -166,7 +172,7 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs,
goto build_id_valid;
}
vma = find_vma(current->mm, ip);
- if (!vma || build_id_parse_nofault(vma, id_offs[i].build_id, NULL)) {
+ if (!vma || fetch_build_id(vma, id_offs[i].build_id, may_fault)) {
/* per entry fall back to ips */
id_offs[i].status = BPF_STACK_BUILD_ID_IP;
memset(id_offs[i].build_id, 0, BUILD_ID_SIZE_MAX);
@@ -257,7 +263,7 @@ static long __bpf_get_stackid(struct bpf_map *map,
id_offs = (struct bpf_stack_build_id *)new_bucket->data;
for (i = 0; i < trace_nr; i++)
id_offs[i].ip = ips[i];
- stack_map_get_build_id_offset(id_offs, trace_nr, user);
+ stack_map_get_build_id_offset(id_offs, trace_nr, user, false /* !may_fault */);
trace_len = trace_nr * sizeof(struct bpf_stack_build_id);
if (hash_matches && bucket->nr == trace_nr &&
memcmp(bucket->data, new_bucket->data, trace_len) == 0) {
@@ -398,7 +404,7 @@ const struct bpf_func_proto bpf_get_stackid_proto_pe = {
static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
struct perf_callchain_entry *trace_in,
- void *buf, u32 size, u64 flags)
+ void *buf, u32 size, u64 flags, bool may_fault)
{
u32 trace_nr, copy_len, elem_size, num_elem, max_depth;
bool user_build_id = flags & BPF_F_USER_BUILD_ID;
@@ -416,8 +422,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
if (kernel && user_build_id)
goto clear;
- elem_size = (user && user_build_id) ? sizeof(struct bpf_stack_build_id)
- : sizeof(u64);
+ elem_size = user_build_id ? sizeof(struct bpf_stack_build_id) : sizeof(u64);
if (unlikely(size % elem_size))
goto clear;
@@ -438,6 +443,9 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
if (sysctl_perf_event_max_stack < max_depth)
max_depth = sysctl_perf_event_max_stack;
+ if (may_fault)
+ rcu_read_lock(); /* need RCU for perf's callchain below */
+
if (trace_in)
trace = trace_in;
else if (kernel && task)
@@ -445,28 +453,35 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
else
trace = get_perf_callchain(regs, 0, kernel, user, max_depth,
crosstask, false);
- if (unlikely(!trace))
- goto err_fault;
- if (trace->nr < skip)
+ if (unlikely(!trace) || trace->nr < skip) {
+ if (may_fault)
+ rcu_read_unlock();
goto err_fault;
+ }
trace_nr = trace->nr - skip;
trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem;
copy_len = trace_nr * elem_size;
ips = trace->ip + skip;
- if (user && user_build_id) {
+ if (user_build_id) {
struct bpf_stack_build_id *id_offs = buf;
u32 i;
for (i = 0; i < trace_nr; i++)
id_offs[i].ip = ips[i];
- stack_map_get_build_id_offset(buf, trace_nr, user);
} else {
memcpy(buf, ips, copy_len);
}
+ /* trace/ips should not be dereferenced after this point */
+ if (may_fault)
+ rcu_read_unlock();
+
+ if (user_build_id)
+ stack_map_get_build_id_offset(buf, trace_nr, user, may_fault);
+
if (size > copy_len)
memset(buf + copy_len, 0, size - copy_len);
return copy_len;
@@ -481,7 +496,7 @@ static long __bpf_get_stack(struct pt_regs *regs, struct task_struct *task,
BPF_CALL_4(bpf_get_stack, struct pt_regs *, regs, void *, buf, u32, size,
u64, flags)
{
- return __bpf_get_stack(regs, NULL, NULL, buf, size, flags);
+ return __bpf_get_stack(regs, NULL, NULL, buf, size, flags, false /* !may_fault */);
}
const struct bpf_func_proto bpf_get_stack_proto = {
@@ -494,8 +509,24 @@ const struct bpf_func_proto bpf_get_stack_proto = {
.arg4_type = ARG_ANYTHING,
};
-BPF_CALL_4(bpf_get_task_stack, struct task_struct *, task, void *, buf,
- u32, size, u64, flags)
+BPF_CALL_4(bpf_get_stack_sleepable, struct pt_regs *, regs, void *, buf, u32, size,
+ u64, flags)
+{
+ return __bpf_get_stack(regs, NULL, NULL, buf, size, flags, true /* may_fault */);
+}
+
+const struct bpf_func_proto bpf_get_stack_sleepable_proto = {
+ .func = bpf_get_stack_sleepable,
+ .gpl_only = true,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_PTR_TO_UNINIT_MEM,
+ .arg3_type = ARG_CONST_SIZE_OR_ZERO,
+ .arg4_type = ARG_ANYTHING,
+};
+
+static long __bpf_get_task_stack(struct task_struct *task, void *buf, u32 size,
+ u64 flags, bool may_fault)
{
struct pt_regs *regs;
long res = -EINVAL;
@@ -505,12 +536,18 @@ BPF_CALL_4(bpf_get_task_stack, struct task_struct *, task, void *, buf,
regs = task_pt_regs(task);
if (regs)
- res = __bpf_get_stack(regs, task, NULL, buf, size, flags);
+ res = __bpf_get_stack(regs, task, NULL, buf, size, flags, may_fault);
put_task_stack(task);
return res;
}
+BPF_CALL_4(bpf_get_task_stack, struct task_struct *, task, void *, buf,
+ u32, size, u64, flags)
+{
+ return __bpf_get_task_stack(task, buf, size, flags, false /* !may_fault */);
+}
+
const struct bpf_func_proto bpf_get_task_stack_proto = {
.func = bpf_get_task_stack,
.gpl_only = false,
@@ -522,6 +559,23 @@ const struct bpf_func_proto bpf_get_task_stack_proto = {
.arg4_type = ARG_ANYTHING,
};
+BPF_CALL_4(bpf_get_task_stack_sleepable, struct task_struct *, task, void *, buf,
+ u32, size, u64, flags)
+{
+ return __bpf_get_task_stack(task, buf, size, flags, true /* !may_fault */);
+}
+
+const struct bpf_func_proto bpf_get_task_stack_sleepable_proto = {
+ .func = bpf_get_task_stack_sleepable,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_BTF_ID,
+ .arg1_btf_id = &btf_tracing_ids[BTF_TRACING_TYPE_TASK],
+ .arg2_type = ARG_PTR_TO_UNINIT_MEM,
+ .arg3_type = ARG_CONST_SIZE_OR_ZERO,
+ .arg4_type = ARG_ANYTHING,
+};
+
BPF_CALL_4(bpf_get_stack_pe, struct bpf_perf_event_data_kern *, ctx,
void *, buf, u32, size, u64, flags)
{
@@ -533,7 +587,7 @@ BPF_CALL_4(bpf_get_stack_pe, struct bpf_perf_event_data_kern *, ctx,
__u64 nr_kernel;
if (!(event->attr.sample_type & PERF_SAMPLE_CALLCHAIN))
- return __bpf_get_stack(regs, NULL, NULL, buf, size, flags);
+ return __bpf_get_stack(regs, NULL, NULL, buf, size, flags, false /* !may_fault */);
if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK |
BPF_F_USER_BUILD_ID)))
@@ -553,7 +607,7 @@ BPF_CALL_4(bpf_get_stack_pe, struct bpf_perf_event_data_kern *, ctx,
__u64 nr = trace->nr;
trace->nr = nr_kernel;
- err = __bpf_get_stack(regs, NULL, trace, buf, size, flags);
+ err = __bpf_get_stack(regs, NULL, trace, buf, size, flags, false /* !may_fault */);
/* restore nr */
trace->nr = nr;
@@ -565,7 +619,7 @@ BPF_CALL_4(bpf_get_stack_pe, struct bpf_perf_event_data_kern *, ctx,
goto clear;
flags = (flags & ~BPF_F_SKIP_FIELD_MASK) | skip;
- err = __bpf_get_stack(regs, NULL, trace, buf, size, flags);
+ err = __bpf_get_stack(regs, NULL, trace, buf, size, flags, false /* !may_fault */);
}
return err;
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index d557bb11e0ff..87fc35778131 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1530,7 +1530,8 @@ bpf_tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
case BPF_FUNC_jiffies64:
return &bpf_jiffies64_proto;
case BPF_FUNC_get_task_stack:
- return &bpf_get_task_stack_proto;
+ return prog->sleepable ? &bpf_get_task_stack_sleepable_proto
+ : &bpf_get_task_stack_proto;
case BPF_FUNC_copy_from_user:
return &bpf_copy_from_user_proto;
case BPF_FUNC_copy_from_user_task:
@@ -1586,7 +1587,7 @@ kprobe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
case BPF_FUNC_get_stackid:
return &bpf_get_stackid_proto;
case BPF_FUNC_get_stack:
- return &bpf_get_stack_proto;
+ return prog->sleepable ? &bpf_get_stack_sleepable_proto : &bpf_get_stack_proto;
#ifdef CONFIG_BPF_KPROBE_OVERRIDE
case BPF_FUNC_override_return:
return &bpf_override_return_proto;
--
2.43.5
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v6 bpf-next 10/10] selftests/bpf: add build ID tests
2024-08-14 18:54 [PATCH v6 bpf-next 00/10] Harden and extend ELF build ID parsing logic Andrii Nakryiko
` (8 preceding siblings ...)
2024-08-14 18:54 ` [PATCH v6 bpf-next 09/10] bpf: wire up sleepable bpf_get_stack() and bpf_get_task_stack() helpers Andrii Nakryiko
@ 2024-08-14 18:54 ` Andrii Nakryiko
2024-08-22 22:30 ` Eduard Zingerman
2024-08-23 23:22 ` [PATCH v6 bpf-next 00/10] Harden and extend ELF build ID parsing logic Eduard Zingerman
10 siblings, 1 reply; 20+ messages in thread
From: Andrii Nakryiko @ 2024-08-14 18:54 UTC (permalink / raw)
To: bpf
Cc: linux-mm, akpm, adobriyan, shakeel.butt, hannes, ak, osandov,
song, jannh, linux-fsdevel, willy, Andrii Nakryiko
Add a new set of tests validating behavior of capturing stack traces
with build ID. We extend uprobe_multi target binary with ability to
trigger uprobe (so that we can capture stack traces from it), but also
we allow to force build ID data to be either resident or non-resident in
memory (see also a comment about quirks of MADV_PAGEOUT).
That way we can validate that in non-sleepable context we won't get
build ID (as expected), but with sleepable uprobes we will get that
build ID regardless of it being physically present in memory.
Also, we add a small add-on linker script which reorders
.note.gnu.build-id section and puts it after (big) .text section,
putting build ID data outside of the very first page of ELF file. This
will test all the relaxations we did in build ID parsing logic in kernel
thanks to freader abstraction.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
tools/testing/selftests/bpf/Makefile | 5 +-
.../selftests/bpf/prog_tests/build_id.c | 118 ++++++++++++++++++
.../selftests/bpf/progs/test_build_id.c | 31 +++++
tools/testing/selftests/bpf/uprobe_multi.c | 41 ++++++
tools/testing/selftests/bpf/uprobe_multi.ld | 11 ++
5 files changed, 204 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/build_id.c
create mode 100644 tools/testing/selftests/bpf/progs/test_build_id.c
create mode 100644 tools/testing/selftests/bpf/uprobe_multi.ld
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 7e4b107b37b4..e47d983d2694 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -787,9 +787,10 @@ $(OUTPUT)/veristat: $(OUTPUT)/veristat.o
# Linking uprobe_multi can fail due to relocation overflows on mips.
$(OUTPUT)/uprobe_multi: CFLAGS += $(if $(filter mips, $(ARCH)),-mxgot)
-$(OUTPUT)/uprobe_multi: uprobe_multi.c
+$(OUTPUT)/uprobe_multi: uprobe_multi.c uprobe_multi.ld
$(call msg,BINARY,,$@)
- $(Q)$(CC) $(CFLAGS) -O0 $(LDFLAGS) $^ $(LDLIBS) -o $@
+ $(Q)$(CC) $(CFLAGS) -Wl,-T,uprobe_multi.ld -O0 $(LDFLAGS) \
+ $(filter-out %.ld,$^) $(LDLIBS) -o $@
EXTRA_CLEAN := $(SCRATCH_DIR) $(HOST_SCRATCH_DIR) \
prog_tests/tests.h map_tests/tests.h verifier/tests.h \
diff --git a/tools/testing/selftests/bpf/prog_tests/build_id.c b/tools/testing/selftests/bpf/prog_tests/build_id.c
new file mode 100644
index 000000000000..aec9c8d6bc96
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/build_id.c
@@ -0,0 +1,118 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
+#include <test_progs.h>
+
+#include "test_build_id.skel.h"
+
+static char build_id[BPF_BUILD_ID_SIZE];
+static int build_id_sz;
+
+static void print_stack(struct bpf_stack_build_id *stack, int frame_cnt)
+{
+ int i, j;
+
+ for (i = 0; i < frame_cnt; i++) {
+ printf("FRAME #%02d: ", i);
+ switch (stack[i].status) {
+ case BPF_STACK_BUILD_ID_EMPTY:
+ printf("<EMPTY>\n");
+ break;
+ case BPF_STACK_BUILD_ID_VALID:
+ printf("BUILD ID = ");
+ for (j = 0; j < BPF_BUILD_ID_SIZE; j++)
+ printf("%02hhx", (unsigned)stack[i].build_id[j]);
+ printf(" OFFSET = %llx", (unsigned long long)stack[i].offset);
+ break;
+ case BPF_STACK_BUILD_ID_IP:
+ printf("IP = %llx", (unsigned long long)stack[i].ip);
+ break;
+ default:
+ printf("UNEXPECTED STATUS %d ", stack[i].status);
+ break;
+ }
+ printf("\n");
+ }
+}
+
+static void subtest_nofault(bool build_id_resident)
+{
+ struct test_build_id *skel;
+ struct bpf_stack_build_id *stack;
+ int frame_cnt;
+
+ skel = test_build_id__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel_open"))
+ return;
+
+ skel->links.uprobe_nofault = bpf_program__attach(skel->progs.uprobe_nofault);
+ if (!ASSERT_OK_PTR(skel->links.uprobe_nofault, "link"))
+ goto cleanup;
+
+ if (build_id_resident)
+ ASSERT_OK(system("./uprobe_multi uprobe-paged-in"), "trigger_uprobe");
+ else
+ ASSERT_OK(system("./uprobe_multi uprobe-paged-out"), "trigger_uprobe");
+
+ if (!ASSERT_GT(skel->bss->res_nofault, 0, "res"))
+ goto cleanup;
+
+ stack = skel->bss->stack_nofault;
+ frame_cnt = skel->bss->res_nofault / sizeof(struct bpf_stack_build_id);
+ if (env.verbosity >= VERBOSE_NORMAL)
+ print_stack(stack, frame_cnt);
+
+ if (build_id_resident) {
+ ASSERT_EQ(stack[0].status, BPF_STACK_BUILD_ID_VALID, "build_id_status");
+ ASSERT_EQ(memcmp(stack[0].build_id, build_id, build_id_sz), 0, "build_id_match");
+ } else {
+ ASSERT_EQ(stack[0].status, BPF_STACK_BUILD_ID_IP, "build_id_status");
+ }
+
+cleanup:
+ test_build_id__destroy(skel);
+}
+
+static void subtest_sleepable(void)
+{
+ struct test_build_id *skel;
+ struct bpf_stack_build_id *stack;
+ int frame_cnt;
+
+ skel = test_build_id__open_and_load();
+ if (!ASSERT_OK_PTR(skel, "skel_open"))
+ return;
+
+ skel->links.uprobe_sleepable = bpf_program__attach(skel->progs.uprobe_sleepable);
+ if (!ASSERT_OK_PTR(skel->links.uprobe_sleepable, "link"))
+ goto cleanup;
+
+ /* force build ID to not be paged in */
+ ASSERT_OK(system("./uprobe_multi uprobe-paged-out"), "trigger_uprobe");
+
+ if (!ASSERT_GT(skel->bss->res_sleepable, 0, "res"))
+ goto cleanup;
+
+ stack = skel->bss->stack_sleepable;
+ frame_cnt = skel->bss->res_sleepable / sizeof(struct bpf_stack_build_id);
+ if (env.verbosity >= VERBOSE_NORMAL)
+ print_stack(stack, frame_cnt);
+
+ ASSERT_EQ(stack[0].status, BPF_STACK_BUILD_ID_VALID, "build_id_status");
+ ASSERT_EQ(memcmp(stack[0].build_id, build_id, build_id_sz), 0, "build_id_match");
+
+cleanup:
+ test_build_id__destroy(skel);
+}
+
+void serial_test_build_id(void)
+{
+ build_id_sz = read_build_id("uprobe_multi", build_id, sizeof(build_id));
+ ASSERT_EQ(build_id_sz, BPF_BUILD_ID_SIZE, "parse_build_id");
+
+ if (test__start_subtest("nofault-paged-out"))
+ subtest_nofault(false /* not resident */);
+ if (test__start_subtest("nofault-paged-in"))
+ subtest_nofault(true /* resident */);
+ if (test__start_subtest("sleepable"))
+ subtest_sleepable();
+}
diff --git a/tools/testing/selftests/bpf/progs/test_build_id.c b/tools/testing/selftests/bpf/progs/test_build_id.c
new file mode 100644
index 000000000000..32ce59f9aa27
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_build_id.c
@@ -0,0 +1,31 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */
+
+#include "vmlinux.h"
+#include <bpf/bpf_helpers.h>
+
+struct bpf_stack_build_id stack_sleepable[128];
+int res_sleepable;
+
+struct bpf_stack_build_id stack_nofault[128];
+int res_nofault;
+
+SEC("uprobe.multi/./uprobe_multi:uprobe")
+int uprobe_nofault(struct pt_regs *ctx)
+{
+ res_nofault = bpf_get_stack(ctx, stack_nofault, sizeof(stack_nofault),
+ BPF_F_USER_STACK | BPF_F_USER_BUILD_ID);
+
+ return 0;
+}
+
+SEC("uprobe.multi.s/./uprobe_multi:uprobe")
+int uprobe_sleepable(struct pt_regs *ctx)
+{
+ res_sleepable = bpf_get_stack(ctx, stack_sleepable, sizeof(stack_sleepable),
+ BPF_F_USER_STACK | BPF_F_USER_BUILD_ID);
+
+ return 0;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/uprobe_multi.c b/tools/testing/selftests/bpf/uprobe_multi.c
index 7ffa563ffeba..c7828b13e5ff 100644
--- a/tools/testing/selftests/bpf/uprobe_multi.c
+++ b/tools/testing/selftests/bpf/uprobe_multi.c
@@ -2,8 +2,21 @@
#include <stdio.h>
#include <string.h>
+#include <stdbool.h>
+#include <stdint.h>
+#include <sys/mman.h>
+#include <unistd.h>
#include <sdt.h>
+#ifndef MADV_POPULATE_READ
+#define MADV_POPULATE_READ 22
+#endif
+
+int __attribute__((weak)) uprobe(void)
+{
+ return 0;
+}
+
#define __PASTE(a, b) a##b
#define PASTE(a, b) __PASTE(a, b)
@@ -75,6 +88,30 @@ static int usdt(void)
return 0;
}
+extern char build_id_start[];
+extern char build_id_end[];
+
+int __attribute__((weak)) trigger_uprobe(bool build_id_resident)
+{
+ int page_sz = sysconf(_SC_PAGESIZE);
+ void *addr;
+
+ /* page-align build ID start */
+ addr = (void *)((uintptr_t)&build_id_start & ~(page_sz - 1));
+
+ /* to guarantee MADV_PAGEOUT work reliably, we need to ensure that
+ * memory range is mapped into current process, so we unconditionally
+ * do MADV_POPULATE_READ, and then MADV_PAGEOUT, if necessary
+ */
+ madvise(addr, page_sz, MADV_POPULATE_READ);
+ if (!build_id_resident)
+ madvise(addr, page_sz, MADV_PAGEOUT);
+
+ (void)uprobe();
+
+ return 0;
+}
+
int main(int argc, char **argv)
{
if (argc != 2)
@@ -84,6 +121,10 @@ int main(int argc, char **argv)
return bench();
if (!strcmp("usdt", argv[1]))
return usdt();
+ if (!strcmp("uprobe-paged-out", argv[1]))
+ return trigger_uprobe(false /* page-out build ID */);
+ if (!strcmp("uprobe-paged-in", argv[1]))
+ return trigger_uprobe(true /* page-in build ID */);
error:
fprintf(stderr, "usage: %s <bench|usdt>\n", argv[0]);
diff --git a/tools/testing/selftests/bpf/uprobe_multi.ld b/tools/testing/selftests/bpf/uprobe_multi.ld
new file mode 100644
index 000000000000..a2e94828bc8c
--- /dev/null
+++ b/tools/testing/selftests/bpf/uprobe_multi.ld
@@ -0,0 +1,11 @@
+SECTIONS
+{
+ . = ALIGN(4096);
+ .note.gnu.build-id : { *(.note.gnu.build-id) }
+ . = ALIGN(4096);
+}
+INSERT AFTER .text;
+
+build_id_start = ADDR(.note.gnu.build-id);
+build_id_end = ADDR(.note.gnu.build-id) + SIZEOF(.note.gnu.build-id);
+
--
2.43.5
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v6 bpf-next 08/10] bpf: decouple stack_map_get_build_id_offset() from perf_callchain_entry
2024-08-14 18:54 ` [PATCH v6 bpf-next 08/10] bpf: decouple stack_map_get_build_id_offset() from perf_callchain_entry Andrii Nakryiko
@ 2024-08-22 20:32 ` Eduard Zingerman
0 siblings, 0 replies; 20+ messages in thread
From: Eduard Zingerman @ 2024-08-22 20:32 UTC (permalink / raw)
To: Andrii Nakryiko, bpf
Cc: linux-mm, akpm, adobriyan, shakeel.butt, hannes, ak, osandov,
song, jannh, linux-fsdevel, willy
On Wed, 2024-08-14 at 11:54 -0700, Andrii Nakryiko wrote:
> Change stack_map_get_build_id_offset() which is used to convert stack
> trace IP addresses into build ID+offset pairs. Right now this function
> accepts an array of u64s as an input, and uses array of
> struct bpf_stack_build_id as an output.
>
> This is problematic because u64 array is coming from
> perf_callchain_entry, which is (non-sleepable) RCU protected, so once we
> allows sleepable build ID fetching, this all breaks down.
>
> But its actually pretty easy to make stack_map_get_build_id_offset()
> works with array of struct bpf_stack_build_id as both input and output.
> Which is what this patch is doing, eliminating the dependency on
> perf_callchain_entry. We require caller to fill out
> bpf_stack_build_id.ip fields (all other can be left uninitialized), and
> update in place as we do build ID resolution.
>
> We make sure to READ_ONCE() and cache locally current IP value as we
> used it in a few places to find matching VMA and so on. Given this data
> is directly accessible and modifiable by user's BPF code, we should make
> sure to have a consistent view of it.
>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
[...]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v6 bpf-next 10/10] selftests/bpf: add build ID tests
2024-08-14 18:54 ` [PATCH v6 bpf-next 10/10] selftests/bpf: add build ID tests Andrii Nakryiko
@ 2024-08-22 22:30 ` Eduard Zingerman
2024-08-22 22:55 ` Andrii Nakryiko
0 siblings, 1 reply; 20+ messages in thread
From: Eduard Zingerman @ 2024-08-22 22:30 UTC (permalink / raw)
To: Andrii Nakryiko, bpf
Cc: linux-mm, akpm, adobriyan, shakeel.butt, hannes, ak, osandov,
song, jannh, linux-fsdevel, willy
On Wed, 2024-08-14 at 11:54 -0700, Andrii Nakryiko wrote:
> Add a new set of tests validating behavior of capturing stack traces
> with build ID. We extend uprobe_multi target binary with ability to
> trigger uprobe (so that we can capture stack traces from it), but also
> we allow to force build ID data to be either resident or non-resident in
> memory (see also a comment about quirks of MADV_PAGEOUT).
>
> That way we can validate that in non-sleepable context we won't get
> build ID (as expected), but with sleepable uprobes we will get that
> build ID regardless of it being physically present in memory.
>
> Also, we add a small add-on linker script which reorders
> .note.gnu.build-id section and puts it after (big) .text section,
> putting build ID data outside of the very first page of ELF file. This
> will test all the relaxations we did in build ID parsing logic in kernel
> thanks to freader abstraction.
>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
[...]
> diff --git a/tools/testing/selftests/bpf/uprobe_multi.c b/tools/testing/selftests/bpf/uprobe_multi.c
> index 7ffa563ffeba..c7828b13e5ff 100644
> --- a/tools/testing/selftests/bpf/uprobe_multi.c
> +++ b/tools/testing/selftests/bpf/uprobe_multi.c
[...]
> +int __attribute__((weak)) trigger_uprobe(bool build_id_resident)
> +{
> + int page_sz = sysconf(_SC_PAGESIZE);
> + void *addr;
> +
> + /* page-align build ID start */
> + addr = (void *)((uintptr_t)&build_id_start & ~(page_sz - 1));
> +
> + /* to guarantee MADV_PAGEOUT work reliably, we need to ensure that
> + * memory range is mapped into current process, so we unconditionally
> + * do MADV_POPULATE_READ, and then MADV_PAGEOUT, if necessary
> + */
> + madvise(addr, page_sz, MADV_POPULATE_READ);
Nit: check error code?
> + if (!build_id_resident)
> + madvise(addr, page_sz, MADV_PAGEOUT);
> +
> + (void)uprobe();
> +
> + return 0;
> +}
> +
[...]
Silly question, unrelated to the patch-set itself.
When I do ./test_progs -vvv -t build_id/sleepable five stack frames
are printed:
FRAME #00: BUILD ID = 46d2568fe293274105f9dad0cc73de54a176f368 OFFSET = 2c4156
FRAME #01: BUILD ID = 46d2568fe293274105f9dad0cc73de54a176f368 OFFSET = 393aef
FRAME #02: BUILD ID = 8f53abaad945a669f2bdcd25f471d80e077568ef OFFSET = 2a088
FRAME #03: BUILD ID = 8f53abaad945a669f2bdcd25f471d80e077568ef OFFSET = 2a14b
FRAME #04: BUILD ID = 46d2568fe293274105f9dad0cc73de54a176f368 OFFSET = 2c4095
The ...6f368 is build-id of the uprobe_multi.
How do I check where ...568ef comes from?
Also, why are there 5 frames when nesting level for uprobe() is 3?
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v6 bpf-next 10/10] selftests/bpf: add build ID tests
2024-08-22 22:30 ` Eduard Zingerman
@ 2024-08-22 22:55 ` Andrii Nakryiko
2024-08-22 23:07 ` Eduard Zingerman
0 siblings, 1 reply; 20+ messages in thread
From: Andrii Nakryiko @ 2024-08-22 22:55 UTC (permalink / raw)
To: Eduard Zingerman
Cc: Andrii Nakryiko, bpf, linux-mm, akpm, adobriyan, shakeel.butt,
hannes, ak, osandov, song, jannh, linux-fsdevel, willy
On Thu, Aug 22, 2024 at 3:30 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Wed, 2024-08-14 at 11:54 -0700, Andrii Nakryiko wrote:
> > Add a new set of tests validating behavior of capturing stack traces
> > with build ID. We extend uprobe_multi target binary with ability to
> > trigger uprobe (so that we can capture stack traces from it), but also
> > we allow to force build ID data to be either resident or non-resident in
> > memory (see also a comment about quirks of MADV_PAGEOUT).
> >
> > That way we can validate that in non-sleepable context we won't get
> > build ID (as expected), but with sleepable uprobes we will get that
> > build ID regardless of it being physically present in memory.
> >
> > Also, we add a small add-on linker script which reorders
> > .note.gnu.build-id section and puts it after (big) .text section,
> > putting build ID data outside of the very first page of ELF file. This
> > will test all the relaxations we did in build ID parsing logic in kernel
> > thanks to freader abstraction.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
>
> Acked-by: Eduard Zingerman <eddyz87@gmail.com>
>
> [...]
>
> > diff --git a/tools/testing/selftests/bpf/uprobe_multi.c b/tools/testing/selftests/bpf/uprobe_multi.c
> > index 7ffa563ffeba..c7828b13e5ff 100644
> > --- a/tools/testing/selftests/bpf/uprobe_multi.c
> > +++ b/tools/testing/selftests/bpf/uprobe_multi.c
>
> [...]
>
> > +int __attribute__((weak)) trigger_uprobe(bool build_id_resident)
> > +{
> > + int page_sz = sysconf(_SC_PAGESIZE);
> > + void *addr;
> > +
> > + /* page-align build ID start */
> > + addr = (void *)((uintptr_t)&build_id_start & ~(page_sz - 1));
> > +
> > + /* to guarantee MADV_PAGEOUT work reliably, we need to ensure that
> > + * memory range is mapped into current process, so we unconditionally
> > + * do MADV_POPULATE_READ, and then MADV_PAGEOUT, if necessary
> > + */
> > + madvise(addr, page_sz, MADV_POPULATE_READ);
>
> Nit: check error code?
>
Well, even if this errors out there is no one to notice and do
anything about it, given this is in a forked process. The idea,
though, is that if this doesn't work, we'll catch it as part of the
actual selftest.
> > + if (!build_id_resident)
> > + madvise(addr, page_sz, MADV_PAGEOUT);
> > +
> > + (void)uprobe();
> > +
> > + return 0;
> > +}
> > +
>
> [...]
>
> Silly question, unrelated to the patch-set itself.
> When I do ./test_progs -vvv -t build_id/sleepable five stack frames
> are printed:
>
> FRAME #00: BUILD ID = 46d2568fe293274105f9dad0cc73de54a176f368 OFFSET = 2c4156
> FRAME #01: BUILD ID = 46d2568fe293274105f9dad0cc73de54a176f368 OFFSET = 393aef
> FRAME #02: BUILD ID = 8f53abaad945a669f2bdcd25f471d80e077568ef OFFSET = 2a088
> FRAME #03: BUILD ID = 8f53abaad945a669f2bdcd25f471d80e077568ef OFFSET = 2a14b
> FRAME #04: BUILD ID = 46d2568fe293274105f9dad0cc73de54a176f368 OFFSET = 2c4095
In my QEMU I only get 3:
FRAME #00: BUILD ID = d370860567af6d28316d45726045f1c59bbfc416 OFFSET = 2c4156
FRAME #01: BUILD ID = d370860567af6d28316d45726045f1c59bbfc416 OFFSET = 393ac7
FRAME #02: BUILD ID = 8bfe03f6bf9b6a6e2591babd0bbc266837d8f658 OFFSET = 27cd0
But see below, for my actual devserver there are 4 frames. My bet
would be that 568ef is libc. A bit confused why you get frame 04 from
uprobe_multi, but maybe that's how things work with musl or whatever?
Don't know. Check libc.so.
>
> The ...6f368 is build-id of the uprobe_multi.
> How do I check where ...568ef comes from?
> Also, why are there 5 frames when nesting level for uprobe() is 3?
>
Well, libc has some function calls before it gets to main. E.g., for
my local machine:
$ sudo bpftrace -e 'uprobe:./uprobe_multi:uprobe { print(ustack()); }'
Attaching 1 probe...
uprobe+4
trigger_uprobe+113
main+176
__libc_start_call_main+128
Note that you won't have trigger_uprobe in your stack trace until your
kernel has [0]
[0] https://lore.kernel.org/linux-trace-kernel/20240729175223.23914-1-andrii@kernel.org/
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v6 bpf-next 10/10] selftests/bpf: add build ID tests
2024-08-22 22:55 ` Andrii Nakryiko
@ 2024-08-22 23:07 ` Eduard Zingerman
0 siblings, 0 replies; 20+ messages in thread
From: Eduard Zingerman @ 2024-08-22 23:07 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Andrii Nakryiko, bpf, linux-mm, akpm, adobriyan, shakeel.butt,
hannes, ak, osandov, song, jannh, linux-fsdevel, willy
On Thu, 2024-08-22 at 15:55 -0700, Andrii Nakryiko wrote:
> > > + madvise(addr, page_sz, MADV_POPULATE_READ);
> >
> > Nit: check error code?
>
> Well, even if this errors out there is no one to notice and do
> anything about it, given this is in a forked process. The idea,
> though, is that if this doesn't work, we'll catch it as part of the
> actual selftest.
Ok.
[...]
> In my QEMU I only get 3:
>
> FRAME #00: BUILD ID = d370860567af6d28316d45726045f1c59bbfc416 OFFSET = 2c4156
> FRAME #01: BUILD ID = d370860567af6d28316d45726045f1c59bbfc416 OFFSET = 393ac7
> FRAME #02: BUILD ID = 8bfe03f6bf9b6a6e2591babd0bbc266837d8f658 OFFSET = 27cd0
>
> But see below, for my actual devserver there are 4 frames. My bet
> would be that 568ef is libc. A bit confused why you get frame 04 from
> uprobe_multi, but maybe that's how things work with musl or whatever?
> Don't know. Check libc.so.
Oh, right, I had to check libc build-id inside QEMU, not outside...
Yes, this is libc signature.
This figures, thank you for explaining.
[...]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v6 bpf-next 09/10] bpf: wire up sleepable bpf_get_stack() and bpf_get_task_stack() helpers
2024-08-14 18:54 ` [PATCH v6 bpf-next 09/10] bpf: wire up sleepable bpf_get_stack() and bpf_get_task_stack() helpers Andrii Nakryiko
@ 2024-08-23 22:22 ` Eduard Zingerman
2024-08-26 16:19 ` Andrii Nakryiko
0 siblings, 1 reply; 20+ messages in thread
From: Eduard Zingerman @ 2024-08-23 22:22 UTC (permalink / raw)
To: Andrii Nakryiko, bpf
Cc: linux-mm, akpm, adobriyan, shakeel.butt, hannes, ak, osandov,
song, jannh, linux-fsdevel, willy
On Wed, 2024-08-14 at 11:54 -0700, Andrii Nakryiko wrote:
> Add sleepable implementations of bpf_get_stack() and
> bpf_get_task_stack() helpers and allow them to be used from sleepable
> BPF program (e.g., sleepable uprobes).
>
> Note, the stack trace IPs capturing itself is not sleepable (that would
> need to be a separate project), only build ID fetching is sleepable and
> thus more reliable, as it will wait for data to be paged in, if
> necessary. For that we make use of sleepable build_id_parse()
> implementation.
>
> Now that build ID related internals in kernel/bpf/stackmap.c can be used
> both in sleepable and non-sleepable contexts, we need to add additional
> rcu_read_lock()/rcu_read_unlock() protection around fetching
> perf_callchain_entry, but with the refactoring in previous commit it's
> now pretty straightforward. We make sure to do rcu_read_unlock (in
> sleepable mode only) right before stack_map_get_build_id_offset() call
> which can sleep. By that time we don't have any more use of
> perf_callchain_entry.
>
> Note, bpf_get_task_stack() will fail for user mode if task != current.
> And for kernel mode build ID are irrelevant. So in that sense adding
> sleepable bpf_get_task_stack() implementation is a no-op. It feel right
> to wire this up for symmetry and completeness, but I'm open to just
> dropping it until we support `user && crosstask` condition.
>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> ---
All seems logical.
You skip wiring up support for sleepable bpf_get_task_stack() in
tp_prog_func_proto(), pe_prog_func_proto() and
raw_tp_prog_func_proto(), this is because these are used for programs
that are never run in sleepable context, right?
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
[...]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v6 bpf-next 00/10] Harden and extend ELF build ID parsing logic
2024-08-14 18:54 [PATCH v6 bpf-next 00/10] Harden and extend ELF build ID parsing logic Andrii Nakryiko
` (9 preceding siblings ...)
2024-08-14 18:54 ` [PATCH v6 bpf-next 10/10] selftests/bpf: add build ID tests Andrii Nakryiko
@ 2024-08-23 23:22 ` Eduard Zingerman
2024-08-25 19:35 ` Alexei Starovoitov
2024-08-26 21:30 ` Andrii Nakryiko
10 siblings, 2 replies; 20+ messages in thread
From: Eduard Zingerman @ 2024-08-23 23:22 UTC (permalink / raw)
To: Andrii Nakryiko, bpf
Cc: linux-mm, akpm, adobriyan, shakeel.butt, hannes, ak, osandov,
song, jannh, linux-fsdevel, willy
On Wed, 2024-08-14 at 11:54 -0700, Andrii Nakryiko wrote:
[...]
> Andrii Nakryiko (10):
> lib/buildid: harden build ID parsing logic
> lib/buildid: add single folio-based file reader abstraction
> lib/buildid: take into account e_phoff when fetching program headers
> lib/buildid: remove single-page limit for PHDR search
> lib/buildid: rename build_id_parse() into build_id_parse_nofault()
> lib/buildid: implement sleepable build_id_parse() API
> lib/buildid: don't limit .note.gnu.build-id to the first page in ELF
Never worked with lib/buildid before, so not sure how valuable my input is.
Anyways:
- I compared the resulting parser with ELF specification and available
documentation for buildid, all seems correct.
(with a small caveat that ELF defines Elf{32,64}_Ehdr->e_ehsize field
to encode actual size of the elf header, and e_phentsize
to encode actual size of the program header.
Parser uses sizeof(Elf{32,64}_{Ehdr,Phdr}) instead,
and this is how it was before, so probably does not matter).
- The `freader` abstraction nicely hides away difference between
sleepable and non-sleepable contexts.
(with a caveat, that freader_get_folio() uses read_cache_folio()
which is documented as expecting mapping->invalidate_lock to be held.
I assume that this is true for vma's passed to build_id_parse(), right?)
For what it's worth, full patch-set looks good to me.
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
[...]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v6 bpf-next 00/10] Harden and extend ELF build ID parsing logic
2024-08-23 23:22 ` [PATCH v6 bpf-next 00/10] Harden and extend ELF build ID parsing logic Eduard Zingerman
@ 2024-08-25 19:35 ` Alexei Starovoitov
2024-08-26 21:30 ` Andrii Nakryiko
1 sibling, 0 replies; 20+ messages in thread
From: Alexei Starovoitov @ 2024-08-25 19:35 UTC (permalink / raw)
To: Eduard Zingerman
Cc: Andrii Nakryiko, bpf, linux-mm, Andrew Morton, Alexey Dobriyan,
shakeel.butt, Johannes Weiner, Andi Kleen, Omar Sandoval,
Song Liu, Jann Horn, Linux-Fsdevel, Matthew Wilcox
On Fri, Aug 23, 2024 at 4:23 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Wed, 2024-08-14 at 11:54 -0700, Andrii Nakryiko wrote:
>
> [...]
>
> > Andrii Nakryiko (10):
> > lib/buildid: harden build ID parsing logic
> > lib/buildid: add single folio-based file reader abstraction
> > lib/buildid: take into account e_phoff when fetching program headers
> > lib/buildid: remove single-page limit for PHDR search
> > lib/buildid: rename build_id_parse() into build_id_parse_nofault()
> > lib/buildid: implement sleepable build_id_parse() API
> > lib/buildid: don't limit .note.gnu.build-id to the first page in ELF
>
> Never worked with lib/buildid before, so not sure how valuable my input is.
> Anyways:
> - I compared the resulting parser with ELF specification and available
> documentation for buildid, all seems correct.
> (with a small caveat that ELF defines Elf{32,64}_Ehdr->e_ehsize field
> to encode actual size of the elf header, and e_phentsize
> to encode actual size of the program header.
> Parser uses sizeof(Elf{32,64}_{Ehdr,Phdr}) instead,
> and this is how it was before, so probably does not matter).
>
> - The `freader` abstraction nicely hides away difference between
> sleepable and non-sleepable contexts.
> (with a caveat, that freader_get_folio() uses read_cache_folio()
> which is documented as expecting mapping->invalidate_lock to be held.
> I assume that this is true for vma's passed to build_id_parse(), right?)
>
> For what it's worth, full patch-set looks good to me.
>
> Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Thank you for the review.
The patch set looks good to me as well, but I think it needs
a bit more Acks to land it through bpf-next.
Andrew,
since lib/ is under your supervision, please review and hopefully ack.
Matthew,
since you commented on the previous version pls double check
that patch 2 plus patch 6 make the right use of folio apis.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v6 bpf-next 09/10] bpf: wire up sleepable bpf_get_stack() and bpf_get_task_stack() helpers
2024-08-23 22:22 ` Eduard Zingerman
@ 2024-08-26 16:19 ` Andrii Nakryiko
0 siblings, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2024-08-26 16:19 UTC (permalink / raw)
To: Eduard Zingerman
Cc: Andrii Nakryiko, bpf, linux-mm, akpm, adobriyan, shakeel.butt,
hannes, ak, osandov, song, jannh, linux-fsdevel, willy
On Fri, Aug 23, 2024 at 3:22 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Wed, 2024-08-14 at 11:54 -0700, Andrii Nakryiko wrote:
> > Add sleepable implementations of bpf_get_stack() and
> > bpf_get_task_stack() helpers and allow them to be used from sleepable
> > BPF program (e.g., sleepable uprobes).
> >
> > Note, the stack trace IPs capturing itself is not sleepable (that would
> > need to be a separate project), only build ID fetching is sleepable and
> > thus more reliable, as it will wait for data to be paged in, if
> > necessary. For that we make use of sleepable build_id_parse()
> > implementation.
> >
> > Now that build ID related internals in kernel/bpf/stackmap.c can be used
> > both in sleepable and non-sleepable contexts, we need to add additional
> > rcu_read_lock()/rcu_read_unlock() protection around fetching
> > perf_callchain_entry, but with the refactoring in previous commit it's
> > now pretty straightforward. We make sure to do rcu_read_unlock (in
> > sleepable mode only) right before stack_map_get_build_id_offset() call
> > which can sleep. By that time we don't have any more use of
> > perf_callchain_entry.
> >
> > Note, bpf_get_task_stack() will fail for user mode if task != current.
> > And for kernel mode build ID are irrelevant. So in that sense adding
> > sleepable bpf_get_task_stack() implementation is a no-op. It feel right
> > to wire this up for symmetry and completeness, but I'm open to just
> > dropping it until we support `user && crosstask` condition.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > ---
>
> All seems logical.
> You skip wiring up support for sleepable bpf_get_task_stack() in
> tp_prog_func_proto(), pe_prog_func_proto() and
> raw_tp_prog_func_proto(), this is because these are used for programs
> that are never run in sleepable context, right?
Correct. I contemplated for a bit wiring it up for tracepoints, as we
might get sleepable tracepoints eventually, but we can always do it as
a follow up.
>
> Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
>
> [...]
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v6 bpf-next 00/10] Harden and extend ELF build ID parsing logic
2024-08-23 23:22 ` [PATCH v6 bpf-next 00/10] Harden and extend ELF build ID parsing logic Eduard Zingerman
2024-08-25 19:35 ` Alexei Starovoitov
@ 2024-08-26 21:30 ` Andrii Nakryiko
1 sibling, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2024-08-26 21:30 UTC (permalink / raw)
To: Eduard Zingerman
Cc: Andrii Nakryiko, bpf, linux-mm, akpm, adobriyan, shakeel.butt,
hannes, ak, osandov, song, jannh, linux-fsdevel, willy
On Fri, Aug 23, 2024 at 4:23 PM Eduard Zingerman <eddyz87@gmail.com> wrote:
>
> On Wed, 2024-08-14 at 11:54 -0700, Andrii Nakryiko wrote:
>
> [...]
>
> > Andrii Nakryiko (10):
> > lib/buildid: harden build ID parsing logic
> > lib/buildid: add single folio-based file reader abstraction
> > lib/buildid: take into account e_phoff when fetching program headers
> > lib/buildid: remove single-page limit for PHDR search
> > lib/buildid: rename build_id_parse() into build_id_parse_nofault()
> > lib/buildid: implement sleepable build_id_parse() API
> > lib/buildid: don't limit .note.gnu.build-id to the first page in ELF
>
> Never worked with lib/buildid before, so not sure how valuable my input is.
> Anyways:
> - I compared the resulting parser with ELF specification and available
> documentation for buildid, all seems correct.
> (with a small caveat that ELF defines Elf{32,64}_Ehdr->e_ehsize field
> to encode actual size of the elf header, and e_phentsize
> to encode actual size of the program header.
> Parser uses sizeof(Elf{32,64}_{Ehdr,Phdr}) instead,
> and this is how it was before, so probably does not matter).
>
> - The `freader` abstraction nicely hides away difference between
> sleepable and non-sleepable contexts.
> (with a caveat, that freader_get_folio() uses read_cache_folio()
> which is documented as expecting mapping->invalidate_lock to be held.
> I assume that this is true for vma's passed to build_id_parse(), right?)
No, I don't think it's automatically true. So good catch, I think I'll
need to add filemap_invalidate_lock_shared() +
filemap_invalidate_unlock_shared() around read_cache_folio().
I'll give Matthew and Andrew a chance to reply to Alexei, and will
post a new revision tomorrow. Thanks for a thorough review!
>
> For what it's worth, full patch-set looks good to me.
>
> Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
>
> [...]
>
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2024-08-26 21:31 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-14 18:54 [PATCH v6 bpf-next 00/10] Harden and extend ELF build ID parsing logic Andrii Nakryiko
2024-08-14 18:54 ` [PATCH v6 bpf-next 01/10] lib/buildid: harden " Andrii Nakryiko
2024-08-14 18:54 ` [PATCH v6 bpf-next 02/10] lib/buildid: add single folio-based file reader abstraction Andrii Nakryiko
2024-08-14 18:54 ` [PATCH v6 bpf-next 03/10] lib/buildid: take into account e_phoff when fetching program headers Andrii Nakryiko
2024-08-14 18:54 ` [PATCH v6 bpf-next 04/10] lib/buildid: remove single-page limit for PHDR search Andrii Nakryiko
2024-08-14 18:54 ` [PATCH v6 bpf-next 05/10] lib/buildid: rename build_id_parse() into build_id_parse_nofault() Andrii Nakryiko
2024-08-14 18:54 ` [PATCH v6 bpf-next 06/10] lib/buildid: implement sleepable build_id_parse() API Andrii Nakryiko
2024-08-14 18:54 ` [PATCH v6 bpf-next 07/10] lib/buildid: don't limit .note.gnu.build-id to the first page in ELF Andrii Nakryiko
2024-08-14 18:54 ` [PATCH v6 bpf-next 08/10] bpf: decouple stack_map_get_build_id_offset() from perf_callchain_entry Andrii Nakryiko
2024-08-22 20:32 ` Eduard Zingerman
2024-08-14 18:54 ` [PATCH v6 bpf-next 09/10] bpf: wire up sleepable bpf_get_stack() and bpf_get_task_stack() helpers Andrii Nakryiko
2024-08-23 22:22 ` Eduard Zingerman
2024-08-26 16:19 ` Andrii Nakryiko
2024-08-14 18:54 ` [PATCH v6 bpf-next 10/10] selftests/bpf: add build ID tests Andrii Nakryiko
2024-08-22 22:30 ` Eduard Zingerman
2024-08-22 22:55 ` Andrii Nakryiko
2024-08-22 23:07 ` Eduard Zingerman
2024-08-23 23:22 ` [PATCH v6 bpf-next 00/10] Harden and extend ELF build ID parsing logic Eduard Zingerman
2024-08-25 19:35 ` Alexei Starovoitov
2024-08-26 21:30 ` Andrii Nakryiko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).