[PATCH v2] skbuff: Fix a race between coalescing and releasing SKBs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Liang Chen <liangchen.linux@gmail.com>
To: kuba@kernel.org, alexander.duyck@gmail.com,
	ilias.apalodimas@linaro.org, hawk@kernel.org
Cc: davem@davemloft.net, edumazet@google.com, pabeni@redhat.com,
	netdev@vger.kernel.org, liangchen.linux@gmail.com
Subject: [PATCH v2] skbuff: Fix a race between coalescing and releasing SKBs
Date: Thu,  6 Apr 2023 19:48:25 +0800	[thread overview]
Message-ID: <20230406114825.18597-1-liangchen.linux@gmail.com> (raw)

Commit 1effe8ca4e34 ("skbuff: fix coalescing for page_pool fragment
recycling") allowed coalescing to proceed with non page pool page and
page pool page when @from is cloned, i.e.

to->pp_recycle    --> false
from->pp_recycle  --> true
skb_cloned(from)  --> true

However, it actually requires skb_cloned(@from) to hold true until
coalescing finishes in this situation. If the other cloned SKB is
released while the merging is in process, from_shinfo->nr_frags will be
set to 0 towards the end of the function, causing the increment of frag
page _refcount to be unexpectedly skipped resulting in inconsistent
reference counts. Later when SKB(@to) is released, it frees the page
directly even though the page pool page is still in use, leading to
use-after-free or double-free errors.

So it needs to be specially handled at where the ref count may get lost.

The double-free error message below prompted us to investigate:
BUG: Bad page state in process swapper/1  pfn:0e0d1
page:00000000c6548b28 refcount:-1 mapcount:0 mapping:0000000000000000
index:0x2 pfn:0xe0d1
flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
raw: 000fffffc0000000 0000000000000000 ffffffff00000101 0000000000000000
raw: 0000000000000002 0000000000000000 ffffffffffffffff 0000000000000000
page dumped because: nonzero _refcount

CPU: 1 PID: 0 Comm: swapper/1 Tainted: G            E      6.2.0+
Call Trace:
 <IRQ>
dump_stack_lvl+0x32/0x50
bad_page+0x69/0xf0
free_pcp_prepare+0x260/0x2f0
free_unref_page+0x20/0x1c0
skb_release_data+0x10b/0x1a0
napi_consume_skb+0x56/0x150
net_rx_action+0xf0/0x350
? __napi_schedule+0x79/0x90
__do_softirq+0xc8/0x2b1
__irq_exit_rcu+0xb9/0xf0
common_interrupt+0x82/0xa0
</IRQ>
<TASK>
asm_common_interrupt+0x22/0x40
RIP: 0010:default_idle+0xb/0x20

Signed-off-by: Liang Chen <liangchen.linux@gmail.com>
---
Changes from v1:
- deal with the ref count problem instead of return back to give more opportunities to coalesce skbs.
---
 net/core/skbuff.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 050a875d09c5..77da8ce74a1e 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5643,7 +5643,19 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
 
 		skb_fill_page_desc(to, to_shinfo->nr_frags,
 				   page, offset, skb_headlen(from));
-		*fragstolen = true;
+
+		/* When @from is pp_recycle and @to isn't, coalescing is
+		 * allowed to proceed if @from is cloned. However if the
+		 * execution reaches this point, @from is already transitioned
+		 * into non-cloned because the other cloned skb is released
+		 * somewhere else concurrently. In this case, we need to make
+		 * sure the ref count is incremented, not directly stealing
+		 * from page pool.
+		 */
+		if (to->pp_recycle != from->pp_recycle)
+			get_page(page);
+		else
+			*fragstolen = true;
 	} else {
 		if (to_shinfo->nr_frags +
 		    from_shinfo->nr_frags > MAX_SKB_FRAGS)
@@ -5659,7 +5671,13 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
 	       from_shinfo->nr_frags * sizeof(skb_frag_t));
 	to_shinfo->nr_frags += from_shinfo->nr_frags;
 
-	if (!skb_cloned(from))
+	/* Same situation as above where head data presents. When @from is
+	 * pp_recycle and @to isn't, coalescing is allowed to proceed if @from
+	 * is cloned. However @from can be transitioned into non-cloned
+	 * concurrently by this point. If it does happen, we need to make sure
+	 * the ref count is properly incremented.
+	 */
+	if (to->pp_recycle == from->pp_recycle && !skb_cloned(from))
 		from_shinfo->nr_frags = 0;
 
 	/* if the skb is not cloned this does nothing
-- 
2.18.2

next             reply	other threads:[~2023-04-06 11:53 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-06 11:48 Liang Chen [this message]
2023-04-06 15:25 ` [PATCH v2] skbuff: Fix a race between coalescing and releasing SKBs Alexander H Duyck
2023-04-07  8:02   ` Ilias Apalodimas
2023-04-10  7:26     ` Liang Chen
2023-04-07  8:06   ` Yunsheng Lin
2023-04-07 15:01     ` Alexander Duyck

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:050a875d09c dfblob:77da8ce74a1 )
 OR (
bs:"[PATCH v2] skbuff: Fix a race between coalescing and releasing SKBs" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230406114825.18597-1-liangchen.linux@gmail.com \
    --to=liangchen.linux@gmail.com \
    --cc=alexander.duyck@gmail.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=ilias.apalodimas@linaro.org \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.