Re: [PATCH v2] bpf, sockmap: keep sk_msg copy state in sync

All of lore.kernel.org
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Zhang Cen" <rollkingzzc@gmail.com>
Cc: bpf@vger.kernel.org
Subject: Re: [PATCH v2] bpf, sockmap: keep sk_msg copy state in sync
Date: Sun, 17 May 2026 12:50:01 +0000	[thread overview]
Message-ID: <20260517125002.82952C2BCB0@smtp.kernel.org> (raw)
In-Reply-To: <20260517121626.406516-1-rollkingzzc@gmail.com>

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [Critical] Integer Overflow in bpf_msg_push_data() leading to Out-of-bounds Write
- [High] Uninitialized Memory Leak (Information Disclosure) in bpf_msg_push_data()
- [High] Ring Buffer Overflow and Data Corruption in bpf_msg_push_data()
--

commit 9081fa26db59dd3001e41a16c3e730205c716696
Author: Zhang Cen <rollkingzzc@gmail.com>

bpf, sockmap: keep sk_msg copy state in sync

sk_msg uses msg->sg.copy as per-scatterlist-entry provenance. Entries with
this bit set are copied before data/data_end are exposed to sk_msg bpf
programs for direct packet access. bpf_msg_pull_data(), bpf_msg_push_data()
and bpf_msg_pop_data() rewrite the sk_msg scatterlist ring by collapsing,
splitting and shifting entries.

[ ... ]

> diff --git a/net/core/filter.c b/net/core/filter.c
> index 9590877b0714f..018c30a0d71fb 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c

[ ... ]

> @@ -2794,6 +2827,8 @@ BPF_CALL_4(bpf_msg_push_data, struct sk_msg *, msg, u32, start,
>  {
>  	struct scatterlist sge, nsge, nnsge, rsge = {0}, *psge;
>  	u32 new, i = 0, l = 0, space, copy = 0, offset = 0;
> +	bool sge_copy = false, nsge_copy = false, nnsge_copy = false;
> +	bool rsge_copy = false;
>  	u8 *raw, *to, *from;
>  	struct page *page;

This isn't a bug in this patch, but does bpf_msg_push_data() leak
uninitialized kernel memory?

bpf_msg_push_data() allocates a new memory page for the pushed data and the
copied bytes:

net/core/filter.c:bpf_msg_push_data() {
    ...
	page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC | __GFP_COMP,
			   get_order(copy + len));
    ...
	sg_set_page(&msg->sg.data[new], page, len + copy, 0);
    ...
}

Since the allocation does not include the zeroing flag, the new len bytes
inserted into the message are never explicitly zeroed or initialized.

If the bpf program does not manually overwrite these bytes and the message is
transmitted over a socket, does this regression leak uninitialized kernel
memory to the network or userspace?

Also in bpf_msg_push_data(), could len cause an integer overflow leading to an
out-of-bounds write?

The len parameter is an unbounded u32 value passed from a bpf program. When
calculating the allocation size, copy + len can overflow a 32-bit integer:

net/core/filter.c:bpf_msg_push_data() {
    ...
	page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC | __GFP_COMP,
			   get_order(copy + len));
    ...
		if (back) {
			from += front;
			to = raw + front + len;

			memcpy(to, from, back);
		}
    ...
}

If copy + len wraps around, alloc_pages() will allocate an undersized buffer.

Later, the destination pointer for shifting tail data is calculated as
to = raw + front + len. Due to len being a large value, front + len wraps
around and results in a large positive 32-bit offset (zero-extended to
64-bit).

memcpy(to, from, back) then performs a controlled out-of-bounds write well
past the allocated page.

Could this regression lead to memory corruption?

Could the ring buffer state become corrupted when data is appended exactly at
the end of the message and the ring buffer is full?

In bpf_msg_push_data(), if data is appended exactly at the end of the message
(start == offset + l), the loop finishes with i pointing to the unused slot
msg->sg.end:

net/core/filter.c:bpf_msg_push_data() {
    ...
	if (!space || (space == 1 && start != offset))
		copy = msg->sg.data[i].length;
    ...
}

If the ring buffer is full (!space), the coalescing fallback logic reads
copy = msg->sg.data[i].length. Because i points to an unused slot, its length
is 0, making copy == 0, and the fallback coalescing block (if (copy) { ... })
is bypassed.

The function then unconditionally calls sk_msg_iter_next(msg, end), which
increments msg->sg.end.

Since the ring buffer was already full, advancing msg->sg.end causes it to
wrap around and equal msg->sg.start.

Does this regression corrupt the ring buffer state and overwrite the first
element without calling put_page(), leading to page leaks?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260517121626.406516-1-rollkingzzc@gmail.com?part=1

next prev parent reply	other threads:[~2026-05-17 12:50 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-17 12:16 [PATCH v2] bpf, sockmap: keep sk_msg copy state in sync Zhang Cen
2026-05-17 12:50 ` sashiko-bot [this message]
2026-05-18 21:32 ` John Fastabend
2026-05-19 12:49   ` Cen Zhang
2026-05-19 19:29     ` John Fastabend
2026-05-20  4:28       ` Cen Zhang
2026-05-20  3:13 ` Jiayuan Chen
2026-05-20  3:29   ` Han Guidong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260517125002.82952C2BCB0@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=rollkingzzc@gmail.com \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.