[PATCH net v4] net: skbuff: add usercopy region to skbuff_fclone

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net v4] net: skbuff: add usercopy region to skbuff_fclone_cache
@ 2025-12-16  8:44 bestswngs
  2025-12-16 11:37 ` Simon Horman
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: bestswngs @ 2025-12-16  8:44 UTC (permalink / raw)
  To: security
  Cc: davem, edumazet, kuba, pabeni, horms, netdev, linux-kernel, xmei5,
	Weiming Shi

From: Weiming Shi <bestswngs@gmail.com>

skbuff_fclone_cache was created without defining a usercopy region, [1]
unlike skbuff_head_cache which properly whitelists the cb[] field.  [2]
This causes a usercopy BUG() when CONFIG_HARDENED_USERCOPY is enabled
and the kernel attempts to copy sk_buff.cb data to userspace via
sock_recv_errqueue() -> put_cmsg().

The crash occurs when:
1. TCP allocates an skb using alloc_skb_fclone()
   (from skbuff_fclone_cache) [1]
2. The skb is cloned via skb_clone() using the pre-allocated fclone [3]
3. The cloned skb is queued to sk_error_queue for timestamp reporting
4. Userspace reads the error queue via recvmsg(MSG_ERRQUEUE)
5. sock_recv_errqueue() calls put_cmsg() to copy serr->ee from skb->cb [4]
6. __check_heap_object() fails because skbuff_fclone_cache has no
   usercopy whitelist [5]

When cloned skbs allocated from skbuff_fclone_cache are used in the
socket error queue, accessing the sock_exterr_skb structure in skb->cb
via put_cmsg() triggers a usercopy hardening violation:

[    5.379589] usercopy: Kernel memory exposure attempt detected from SLUB object 'skbuff_fclone_cache' (offset 296, size 16)!
[    5.382796] kernel BUG at mm/usercopy.c:102!
[    5.383923] Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
[    5.384903] CPU: 1 UID: 0 PID: 138 Comm: poc_put_cmsg Not tainted 6.12.57 #7
[    5.384903] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[    5.384903] RIP: 0010:usercopy_abort+0x6c/0x80
[    5.384903] Code: 1a 86 51 48 c7 c2 40 15 1a 86 41 52 48 c7 c7 c0 15 1a 86 48 0f 45 d6 48 c7 c6 80 15 1a 86 48 89 c1 49 0f 45 f3 e8 84 27 88 ff <0f> 0b 490
[    5.384903] RSP: 0018:ffffc900006f77a8 EFLAGS: 00010246
[    5.384903] RAX: 000000000000006f RBX: ffff88800f0ad2a8 RCX: 1ffffffff0f72e74
[    5.384903] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff87b973a0
[    5.384903] RBP: 0000000000000010 R08: 0000000000000000 R09: fffffbfff0f72e74
[    5.384903] R10: 0000000000000003 R11: 79706f6372657375 R12: 0000000000000001
[    5.384903] R13: ffff88800f0ad2b8 R14: ffffea00003c2b40 R15: ffffea00003c2b00
[    5.384903] FS:  0000000011bc4380(0000) GS:ffff8880bf100000(0000) knlGS:0000000000000000
[    5.384903] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    5.384903] CR2: 000056aa3b8e5fe4 CR3: 000000000ea26004 CR4: 0000000000770ef0
[    5.384903] PKRU: 55555554
[    5.384903] Call Trace:
[    5.384903]  <TASK>
[    5.384903]  __check_heap_object+0x9a/0xd0
[    5.384903]  __check_object_size+0x46c/0x690
[    5.384903]  put_cmsg+0x129/0x5e0
[    5.384903]  sock_recv_errqueue+0x22f/0x380
[    5.384903]  tls_sw_recvmsg+0x7ed/0x1960
[    5.384903]  ? srso_alias_return_thunk+0x5/0xfbef5
[    5.384903]  ? schedule+0x6d/0x270
[    5.384903]  ? srso_alias_return_thunk+0x5/0xfbef5
[    5.384903]  ? mutex_unlock+0x81/0xd0
[    5.384903]  ? __pfx_mutex_unlock+0x10/0x10
[    5.384903]  ? __pfx_tls_sw_recvmsg+0x10/0x10
[    5.384903]  ? _raw_spin_lock_irqsave+0x8f/0xf0
[    5.384903]  ? _raw_read_unlock_irqrestore+0x20/0x40
[    5.384903]  ? srso_alias_return_thunk+0x5/0xfbef5

The crash offset 296 corresponds to skb2->cb within skbuff_fclones:
  - sizeof(struct sk_buff) = 232
  - offsetof(struct sk_buff, cb) = 40
  - offset of skb2.cb in fclones = 232 + 40 = 272
  - crash offset 296 = 272 + 24 (inside sock_exterr_skb.ee)

Fix this by using kmem_cache_create_usercopy() for skbuff_fclone_cache
and whitelisting the cb regions.
In our patch, we referenced
    net: Whitelist the `skb_head_cache` "cb" field. [6]

Fix by using kmem_cache_create_usercopy() with the same cb[] region
whitelist as skbuff_head_cache.

[1] https://elixir.bootlin.com/linux/v6.12.62/source/net/ipv4/tcp.c#L885
[2] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5104
[3] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5566
[4] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5491
[5] https://elixir.bootlin.com/linux/v6.12.62/source/mm/slub.c#L5719
[6] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=79a8a642bf05c

Fixes: 6d07d1cd300f ("usercopy: Restrict non-usercopy caches to size 0")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
---
v2: Fix the Commit Message
v3: Add "From" email address, Fix "CC" and "TO" email address
v4: Fix The Patch Code

 net/core/skbuff.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a00808f7be6a..89c98ce6106a 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5157,10 +5157,12 @@ void __init skb_init(void)
 					      NULL);
 	skbuff_cache_size = kmem_cache_size(net_hotdata.skbuff_cache);
 
-	net_hotdata.skbuff_fclone_cache = kmem_cache_create("skbuff_fclone_cache",
+	net_hotdata.skbuff_fclone_cache = kmem_cache_create_usercopy("skbuff_fclone_cache",
 						sizeof(struct sk_buff_fclones),
 						0,
 						SLAB_HWCACHE_ALIGN|SLAB_PANIC,
+						offsetof(struct sk_buff, cb),
+						sizeof(struct sk_buff) + sizeof_field(struct sk_buff, cb),
 						NULL);
 	/* usercopy should only access first SKB_SMALL_HEAD_HEADROOM bytes.
 	 * struct skb_shared_info is located at the end of skb->head,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net v4] net: skbuff: add usercopy region to skbuff_fclone_cache
  2025-12-16  8:44 [PATCH net v4] net: skbuff: add usercopy region to skbuff_fclone_cache bestswngs
@ 2025-12-16 11:37 ` Simon Horman
  2025-12-17 18:24 ` Eric Dumazet
  2025-12-23 10:34 ` Paolo Abeni
  2 siblings, 0 replies; 6+ messages in thread
From: Simon Horman @ 2025-12-16 11:37 UTC (permalink / raw)
  To: bestswngs
  Cc: security, davem, edumazet, kuba, pabeni, netdev, linux-kernel,
	xmei5

On Tue, Dec 16, 2025 at 04:44:53PM +0800, bestswngs@gmail.com wrote:
> From: Weiming Shi <bestswngs@gmail.com>
> 
> skbuff_fclone_cache was created without defining a usercopy region, [1]
> unlike skbuff_head_cache which properly whitelists the cb[] field.  [2]
> This causes a usercopy BUG() when CONFIG_HARDENED_USERCOPY is enabled
> and the kernel attempts to copy sk_buff.cb data to userspace via
> sock_recv_errqueue() -> put_cmsg().

...

Hi Weiming Shi,

Please slow down.

When posting patches to the Netdev ML please allow 24h to 24h should elapse
between versions. This is to allow time for review. And reduce load on
shared CI infrastructure.

See: https://docs.kernel.org/process/maintainer-netdev.html

Also, I do not believe it is appropriate to involve security@kernel.org
in reports that are made public. As there is nothing left for the security
officers to do.

See:
- https://lore.kernel.org/netdev/CANn89i+3_50FX1RWutvipTMROD3FnK-nBeG4L+br86W85fzRdQ@mail.gmail.com/
- https://www.kernel.org/doc/Documentation/process/security-bugs.rst

Thanks!

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net v4] net: skbuff: add usercopy region to skbuff_fclone_cache
  2025-12-16  8:44 [PATCH net v4] net: skbuff: add usercopy region to skbuff_fclone_cache bestswngs
  2025-12-16 11:37 ` Simon Horman
@ 2025-12-17 18:24 ` Eric Dumazet
  2025-12-23 10:34 ` Paolo Abeni
  2 siblings, 0 replies; 6+ messages in thread
From: Eric Dumazet @ 2025-12-17 18:24 UTC (permalink / raw)
  To: bestswngs
  Cc: security, davem, kuba, pabeni, horms, netdev, linux-kernel, xmei5,
	Willem de Bruijn

On Tue, Dec 16, 2025 at 9:51 AM <bestswngs@gmail.com> wrote:
>
> From: Weiming Shi <bestswngs@gmail.com>
>
> skbuff_fclone_cache was created without defining a usercopy region, [1]
> unlike skbuff_head_cache which properly whitelists the cb[] field.  [2]
> This causes a usercopy BUG() when CONFIG_HARDENED_USERCOPY is enabled
> and the kernel attempts to copy sk_buff.cb data to userspace via
> sock_recv_errqueue() -> put_cmsg().
>
> The crash occurs when:
> 1. TCP allocates an skb using alloc_skb_fclone()
>    (from skbuff_fclone_cache) [1]
> 2. The skb is cloned via skb_clone() using the pre-allocated fclone [3]
> 3. The cloned skb is queued to sk_error_queue for timestamp reporting
> 4. Userspace reads the error queue via recvmsg(MSG_ERRQUEUE)
> 5. sock_recv_errqueue() calls put_cmsg() to copy serr->ee from skb->cb [4]
> 6. __check_heap_object() fails because skbuff_fclone_cache has no
>    usercopy whitelist [5]
>
> When cloned skbs allocated from skbuff_fclone_cache are used in the
> socket error queue, accessing the sock_exterr_skb structure in skb->cb
> via put_cmsg() triggers a usercopy hardening violation:
>
> [    5.379589] usercopy: Kernel memory exposure attempt detected from SLUB object 'skbuff_fclone_cache' (offset 296, size 16)!
> [    5.382796] kernel BUG at mm/usercopy.c:102!
> [    5.383923] Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
> [    5.384903] CPU: 1 UID: 0 PID: 138 Comm: poc_put_cmsg Not tainted 6.12.57 #7
> [    5.384903] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> [    5.384903] RIP: 0010:usercopy_abort+0x6c/0x80
> [    5.384903] Code: 1a 86 51 48 c7 c2 40 15 1a 86 41 52 48 c7 c7 c0 15 1a 86 48 0f 45 d6 48 c7 c6 80 15 1a 86 48 89 c1 49 0f 45 f3 e8 84 27 88 ff <0f> 0b 490
> [    5.384903] RSP: 0018:ffffc900006f77a8 EFLAGS: 00010246
> [    5.384903] RAX: 000000000000006f RBX: ffff88800f0ad2a8 RCX: 1ffffffff0f72e74
> [    5.384903] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff87b973a0
> [    5.384903] RBP: 0000000000000010 R08: 0000000000000000 R09: fffffbfff0f72e74
> [    5.384903] R10: 0000000000000003 R11: 79706f6372657375 R12: 0000000000000001
> [    5.384903] R13: ffff88800f0ad2b8 R14: ffffea00003c2b40 R15: ffffea00003c2b00
> [    5.384903] FS:  0000000011bc4380(0000) GS:ffff8880bf100000(0000) knlGS:0000000000000000
> [    5.384903] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    5.384903] CR2: 000056aa3b8e5fe4 CR3: 000000000ea26004 CR4: 0000000000770ef0
> [    5.384903] PKRU: 55555554
> [    5.384903] Call Trace:
> [    5.384903]  <TASK>
> [    5.384903]  __check_heap_object+0x9a/0xd0
> [    5.384903]  __check_object_size+0x46c/0x690
> [    5.384903]  put_cmsg+0x129/0x5e0
> [    5.384903]  sock_recv_errqueue+0x22f/0x380
> [    5.384903]  tls_sw_recvmsg+0x7ed/0x1960
> [    5.384903]  ? srso_alias_return_thunk+0x5/0xfbef5
> [    5.384903]  ? schedule+0x6d/0x270
> [    5.384903]  ? srso_alias_return_thunk+0x5/0xfbef5
> [    5.384903]  ? mutex_unlock+0x81/0xd0
> [    5.384903]  ? __pfx_mutex_unlock+0x10/0x10
> [    5.384903]  ? __pfx_tls_sw_recvmsg+0x10/0x10
> [    5.384903]  ? _raw_spin_lock_irqsave+0x8f/0xf0
> [    5.384903]  ? _raw_read_unlock_irqrestore+0x20/0x40
> [    5.384903]  ? srso_alias_return_thunk+0x5/0xfbef5
>
> The crash offset 296 corresponds to skb2->cb within skbuff_fclones:
>   - sizeof(struct sk_buff) = 232
>   - offsetof(struct sk_buff, cb) = 40
>   - offset of skb2.cb in fclones = 232 + 40 = 272
>   - crash offset 296 = 272 + 24 (inside sock_exterr_skb.ee)
>
> Fix this by using kmem_cache_create_usercopy() for skbuff_fclone_cache
> and whitelisting the cb regions.
> In our patch, we referenced
>     net: Whitelist the `skb_head_cache` "cb" field. [6]
>
> Fix by using kmem_cache_create_usercopy() with the same cb[] region
> whitelist as skbuff_head_cache.
>
> [1] https://elixir.bootlin.com/linux/v6.12.62/source/net/ipv4/tcp.c#L885
> [2] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5104
> [3] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5566
> [4] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5491
> [5] https://elixir.bootlin.com/linux/v6.12.62/source/mm/slub.c#L5719
> [6] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=79a8a642bf05c
>
> Fixes: 6d07d1cd300f ("usercopy: Restrict non-usercopy caches to size 0")
> Reported-by: Xiang Mei <xmei5@asu.edu>
> Signed-off-by: Weiming Shi <bestswngs@gmail.com>
> ---
> v2: Fix the Commit Message
> v3: Add "From" email address, Fix "CC" and "TO" email address
> v4: Fix The Patch Code
>
>  net/core/skbuff.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index a00808f7be6a..89c98ce6106a 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -5157,10 +5157,12 @@ void __init skb_init(void)
>                                               NULL);
>         skbuff_cache_size = kmem_cache_size(net_hotdata.skbuff_cache);
>
> -       net_hotdata.skbuff_fclone_cache = kmem_cache_create("skbuff_fclone_cache",
> +       net_hotdata.skbuff_fclone_cache = kmem_cache_create_usercopy("skbuff_fclone_cache",
>                                                 sizeof(struct sk_buff_fclones),
>                                                 0,
>                                                 SLAB_HWCACHE_ALIGN|SLAB_PANIC,
> +                                               offsetof(struct sk_buff, cb),
> +                                               sizeof(struct sk_buff) + sizeof_field(struct sk_buff, cb),
>                                                 NULL);

I have a bad feeling about this patch.

Really we should not put a fast clone skb back in error queue in the
first place, because we can not control how long the (possibly large)
skb will stay there.

Things like skb_fclone_busy() would need a fix otherwise.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net v4] net: skbuff: add usercopy region to skbuff_fclone_cache
  2025-12-16  8:44 [PATCH net v4] net: skbuff: add usercopy region to skbuff_fclone_cache bestswngs
  2025-12-16 11:37 ` Simon Horman
  2025-12-17 18:24 ` Eric Dumazet
@ 2025-12-23 10:34 ` Paolo Abeni
  2025-12-23 10:59   ` Eric Dumazet
  2 siblings, 1 reply; 6+ messages in thread
From: Paolo Abeni @ 2025-12-23 10:34 UTC (permalink / raw)
  To: bestswngs, security
  Cc: davem, edumazet, kuba, horms, netdev, linux-kernel, xmei5

On 12/16/25 9:44 AM, bestswngs@gmail.com wrote:
> From: Weiming Shi <bestswngs@gmail.com>
> 
> skbuff_fclone_cache was created without defining a usercopy region, [1]
> unlike skbuff_head_cache which properly whitelists the cb[] field.  [2]
> This causes a usercopy BUG() when CONFIG_HARDENED_USERCOPY is enabled
> and the kernel attempts to copy sk_buff.cb data to userspace via
> sock_recv_errqueue() -> put_cmsg().
> 
> The crash occurs when:
> 1. TCP allocates an skb using alloc_skb_fclone()
>    (from skbuff_fclone_cache) [1]
> 2. The skb is cloned via skb_clone() using the pre-allocated fclone [3]
> 3. The cloned skb is queued to sk_error_queue for timestamp reporting
> 4. Userspace reads the error queue via recvmsg(MSG_ERRQUEUE)
> 5. sock_recv_errqueue() calls put_cmsg() to copy serr->ee from skb->cb [4]
> 6. __check_heap_object() fails because skbuff_fclone_cache has no
>    usercopy whitelist [5]
> 
> When cloned skbs allocated from skbuff_fclone_cache are used in the
> socket error queue, accessing the sock_exterr_skb structure in skb->cb
> via put_cmsg() triggers a usercopy hardening violation:
> 
> [    5.379589] usercopy: Kernel memory exposure attempt detected from SLUB object 'skbuff_fclone_cache' (offset 296, size 16)!
> [    5.382796] kernel BUG at mm/usercopy.c:102!
> [    5.383923] Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
> [    5.384903] CPU: 1 UID: 0 PID: 138 Comm: poc_put_cmsg Not tainted 6.12.57 #7
> [    5.384903] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> [    5.384903] RIP: 0010:usercopy_abort+0x6c/0x80
> [    5.384903] Code: 1a 86 51 48 c7 c2 40 15 1a 86 41 52 48 c7 c7 c0 15 1a 86 48 0f 45 d6 48 c7 c6 80 15 1a 86 48 89 c1 49 0f 45 f3 e8 84 27 88 ff <0f> 0b 490
> [    5.384903] RSP: 0018:ffffc900006f77a8 EFLAGS: 00010246
> [    5.384903] RAX: 000000000000006f RBX: ffff88800f0ad2a8 RCX: 1ffffffff0f72e74
> [    5.384903] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff87b973a0
> [    5.384903] RBP: 0000000000000010 R08: 0000000000000000 R09: fffffbfff0f72e74
> [    5.384903] R10: 0000000000000003 R11: 79706f6372657375 R12: 0000000000000001
> [    5.384903] R13: ffff88800f0ad2b8 R14: ffffea00003c2b40 R15: ffffea00003c2b00
> [    5.384903] FS:  0000000011bc4380(0000) GS:ffff8880bf100000(0000) knlGS:0000000000000000
> [    5.384903] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    5.384903] CR2: 000056aa3b8e5fe4 CR3: 000000000ea26004 CR4: 0000000000770ef0
> [    5.384903] PKRU: 55555554
> [    5.384903] Call Trace:
> [    5.384903]  <TASK>
> [    5.384903]  __check_heap_object+0x9a/0xd0
> [    5.384903]  __check_object_size+0x46c/0x690
> [    5.384903]  put_cmsg+0x129/0x5e0
> [    5.384903]  sock_recv_errqueue+0x22f/0x380
> [    5.384903]  tls_sw_recvmsg+0x7ed/0x1960
> [    5.384903]  ? srso_alias_return_thunk+0x5/0xfbef5
> [    5.384903]  ? schedule+0x6d/0x270
> [    5.384903]  ? srso_alias_return_thunk+0x5/0xfbef5
> [    5.384903]  ? mutex_unlock+0x81/0xd0
> [    5.384903]  ? __pfx_mutex_unlock+0x10/0x10
> [    5.384903]  ? __pfx_tls_sw_recvmsg+0x10/0x10
> [    5.384903]  ? _raw_spin_lock_irqsave+0x8f/0xf0
> [    5.384903]  ? _raw_read_unlock_irqrestore+0x20/0x40
> [    5.384903]  ? srso_alias_return_thunk+0x5/0xfbef5
> 
> The crash offset 296 corresponds to skb2->cb within skbuff_fclones:
>   - sizeof(struct sk_buff) = 232
>   - offsetof(struct sk_buff, cb) = 40
>   - offset of skb2.cb in fclones = 232 + 40 = 272
>   - crash offset 296 = 272 + 24 (inside sock_exterr_skb.ee)
> 
> Fix this by using kmem_cache_create_usercopy() for skbuff_fclone_cache
> and whitelisting the cb regions.
> In our patch, we referenced
>     net: Whitelist the `skb_head_cache` "cb" field. [6]
> 
> Fix by using kmem_cache_create_usercopy() with the same cb[] region
> whitelist as skbuff_head_cache.
> 
> [1] https://elixir.bootlin.com/linux/v6.12.62/source/net/ipv4/tcp.c#L885
> [2] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5104
> [3] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5566
> [4] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5491
> [5] https://elixir.bootlin.com/linux/v6.12.62/source/mm/slub.c#L5719
> [6] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=79a8a642bf05c
> 
> Fixes: 6d07d1cd300f ("usercopy: Restrict non-usercopy caches to size 0")
> Reported-by: Xiang Mei <xmei5@asu.edu>
> Signed-off-by: Weiming Shi <bestswngs@gmail.com>

Rephrasing Eric's comment (and hoping to have not misread it), you
should fix the issue differently, catching and fclones before adding
them to the error queue and try to unclone them.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net v4] net: skbuff: add usercopy region to skbuff_fclone_cache
  2025-12-23 10:34 ` Paolo Abeni
@ 2025-12-23 10:59   ` Eric Dumazet
       [not found]     ` <ea0174b4-dd9d-40a9-8206-5ae3845a5cab@Canary>
  0 siblings, 1 reply; 6+ messages in thread
From: Eric Dumazet @ 2025-12-23 10:59 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: bestswngs, security, davem, kuba, horms, netdev, linux-kernel,
	xmei5

On Tue, Dec 23, 2025 at 11:34 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 12/16/25 9:44 AM, bestswngs@gmail.com wrote:
> > From: Weiming Shi <bestswngs@gmail.com>
> >
> > skbuff_fclone_cache was created without defining a usercopy region, [1]
> > unlike skbuff_head_cache which properly whitelists the cb[] field.  [2]
> > This causes a usercopy BUG() when CONFIG_HARDENED_USERCOPY is enabled
> > and the kernel attempts to copy sk_buff.cb data to userspace via
> > sock_recv_errqueue() -> put_cmsg().
> >
> > The crash occurs when:
> > 1. TCP allocates an skb using alloc_skb_fclone()
> >    (from skbuff_fclone_cache) [1]
> > 2. The skb is cloned via skb_clone() using the pre-allocated fclone [3]
> > 3. The cloned skb is queued to sk_error_queue for timestamp reporting
> > 4. Userspace reads the error queue via recvmsg(MSG_ERRQUEUE)
> > 5. sock_recv_errqueue() calls put_cmsg() to copy serr->ee from skb->cb [4]
> > 6. __check_heap_object() fails because skbuff_fclone_cache has no
> >    usercopy whitelist [5]
> >
> > When cloned skbs allocated from skbuff_fclone_cache are used in the
> > socket error queue, accessing the sock_exterr_skb structure in skb->cb
> > via put_cmsg() triggers a usercopy hardening violation:
> >
> > [    5.379589] usercopy: Kernel memory exposure attempt detected from SLUB object 'skbuff_fclone_cache' (offset 296, size 16)!
> > [    5.382796] kernel BUG at mm/usercopy.c:102!
> > [    5.383923] Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
> > [    5.384903] CPU: 1 UID: 0 PID: 138 Comm: poc_put_cmsg Not tainted 6.12.57 #7
> > [    5.384903] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> > [    5.384903] RIP: 0010:usercopy_abort+0x6c/0x80
> > [    5.384903] Code: 1a 86 51 48 c7 c2 40 15 1a 86 41 52 48 c7 c7 c0 15 1a 86 48 0f 45 d6 48 c7 c6 80 15 1a 86 48 89 c1 49 0f 45 f3 e8 84 27 88 ff <0f> 0b 490
> > [    5.384903] RSP: 0018:ffffc900006f77a8 EFLAGS: 00010246
> > [    5.384903] RAX: 000000000000006f RBX: ffff88800f0ad2a8 RCX: 1ffffffff0f72e74
> > [    5.384903] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff87b973a0
> > [    5.384903] RBP: 0000000000000010 R08: 0000000000000000 R09: fffffbfff0f72e74
> > [    5.384903] R10: 0000000000000003 R11: 79706f6372657375 R12: 0000000000000001
> > [    5.384903] R13: ffff88800f0ad2b8 R14: ffffea00003c2b40 R15: ffffea00003c2b00
> > [    5.384903] FS:  0000000011bc4380(0000) GS:ffff8880bf100000(0000) knlGS:0000000000000000
> > [    5.384903] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    5.384903] CR2: 000056aa3b8e5fe4 CR3: 000000000ea26004 CR4: 0000000000770ef0
> > [    5.384903] PKRU: 55555554
> > [    5.384903] Call Trace:
> > [    5.384903]  <TASK>
> > [    5.384903]  __check_heap_object+0x9a/0xd0
> > [    5.384903]  __check_object_size+0x46c/0x690
> > [    5.384903]  put_cmsg+0x129/0x5e0
> > [    5.384903]  sock_recv_errqueue+0x22f/0x380
> > [    5.384903]  tls_sw_recvmsg+0x7ed/0x1960
> > [    5.384903]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [    5.384903]  ? schedule+0x6d/0x270
> > [    5.384903]  ? srso_alias_return_thunk+0x5/0xfbef5
> > [    5.384903]  ? mutex_unlock+0x81/0xd0
> > [    5.384903]  ? __pfx_mutex_unlock+0x10/0x10
> > [    5.384903]  ? __pfx_tls_sw_recvmsg+0x10/0x10
> > [    5.384903]  ? _raw_spin_lock_irqsave+0x8f/0xf0
> > [    5.384903]  ? _raw_read_unlock_irqrestore+0x20/0x40
> > [    5.384903]  ? srso_alias_return_thunk+0x5/0xfbef5
> >
> > The crash offset 296 corresponds to skb2->cb within skbuff_fclones:
> >   - sizeof(struct sk_buff) = 232
> >   - offsetof(struct sk_buff, cb) = 40
> >   - offset of skb2.cb in fclones = 232 + 40 = 272
> >   - crash offset 296 = 272 + 24 (inside sock_exterr_skb.ee)
> >
> > Fix this by using kmem_cache_create_usercopy() for skbuff_fclone_cache
> > and whitelisting the cb regions.
> > In our patch, we referenced
> >     net: Whitelist the `skb_head_cache` "cb" field. [6]
> >
> > Fix by using kmem_cache_create_usercopy() with the same cb[] region
> > whitelist as skbuff_head_cache.
> >
> > [1] https://elixir.bootlin.com/linux/v6.12.62/source/net/ipv4/tcp.c#L885
> > [2] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5104
> > [3] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5566
> > [4] https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5491
> > [5] https://elixir.bootlin.com/linux/v6.12.62/source/mm/slub.c#L5719
> > [6] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=79a8a642bf05c
> >
> > Fixes: 6d07d1cd300f ("usercopy: Restrict non-usercopy caches to size 0")
> > Reported-by: Xiang Mei <xmei5@asu.edu>
> > Signed-off-by: Weiming Shi <bestswngs@gmail.com>
>
> Rephrasing Eric's comment (and hoping to have not misread it), you
> should fix the issue differently, catching and fclones before adding
> them to the error queue and try to unclone them.

Instead of opening/weakening skbuff_clone to wide user copies, I would rather
use what we did in:

commit 2558b8039d059342197610498c8749ad294adee5
Author: Eric Dumazet <edumazet@google.com>
Date:   Mon Feb 13 16:00:59 2023 +0000

    net: use a bounce buffer for copying skb->mark

ie :

diff --git a/net/core/sock.c b/net/core/sock.c
index 45c98bf524b2..a1c8b47b0d56 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -3896,7 +3896,7 @@ void sock_enable_timestamp(struct sock *sk, enum
sock_flags flag)
 int sock_recv_errqueue(struct sock *sk, struct msghdr *msg, int len,
                       int level, int type)
 {
-       struct sock_exterr_skb *serr;
+       struct sock_extended_err ee;
        struct sk_buff *skb;
        int copied, err;

@@ -3916,8 +3916,9 @@ int sock_recv_errqueue(struct sock *sk, struct
msghdr *msg, int len,

        sock_recv_timestamp(msg, sk, skb);

-       serr = SKB_EXT_ERR(skb);
-       put_cmsg(msg, level, type, sizeof(serr->ee), &serr->ee);
+       /* We must use a bounce buffer for CONFIG_HARDENED_USERCOPY=y */
+       ee = SKB_EXT_ERR(skb)->ee;
+       put_cmsg(msg, level, type, sizeof(ee), &ee);

        msg->msg_flags |= MSG_ERRQUEUE;
        err = copied;

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net v4] net: skbuff: add usercopy region to skbuff_fclone_cache
       [not found]       ` <CANn89iK-N9davqJg-BdF9K25T3+oHoabcnyAyrE+8sq1qe-7pQ@mail.gmail.com>
@ 2025-12-23 18:31         ` Weiming Shi
  0 siblings, 0 replies; 6+ messages in thread
From: Weiming Shi @ 2025-12-23 18:31 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Paolo Abeni, linux-kernel, xmei5, kuba, davem, security, horms,
	netdev

On 25-12-23 18:22, Eric Dumazet wrote:
> On Tue, Dec 23, 2025 at 6:08 PM swing <bestswngs@gmail.com> wrote:
> 
> > I tested this on Linux 6.12.57. Running the PoC that previously caused the
> > issue no longer triggers the crash/panic with this patch applied.
> >
> >
> >
> >
> Great, please send a V2 with it, you can be the author, I will then add my
> 'Reviewed-by:' tag.
> 
> Thanks !
Thank you for your suggestion. I am currently preparing a v5 version
patch and will need some time to conduct more comprehensive testing.

Best,
Weiming
> 
> 
> > On 星期二, 12月 23, 2025 at 6:59 下午, Eric Dumazet <edumazet@google.com> wrote:
> > On Tue, Dec 23, 2025 at 11:34 AM Paolo Abeni <pabeni@redhat.com> wrote:
> >
> >
> > On 12/16/25 9:44 AM, bestswngs@gmail.com wrote:
> >
> > From: Weiming Shi <bestswngs@gmail.com>
> >
> > skbuff_fclone_cache was created without defining a usercopy region, [1]
> > unlike skbuff_head_cache which properly whitelists the cb[] field. [2]
> > This causes a usercopy BUG() when CONFIG_HARDENED_USERCOPY is enabled
> > and the kernel attempts to copy sk_buff.cb data to userspace via
> > sock_recv_errqueue() -> put_cmsg().
> >
> > The crash occurs when:
> > 1. TCP allocates an skb using alloc_skb_fclone()
> > (from skbuff_fclone_cache) [1]
> > 2. The skb is cloned via skb_clone() using the pre-allocated fclone [3]
> > 3. The cloned skb is queued to sk_error_queue for timestamp reporting
> > 4. Userspace reads the error queue via recvmsg(MSG_ERRQUEUE)
> > 5. sock_recv_errqueue() calls put_cmsg() to copy serr->ee from skb->cb [4]
> > 6. __check_heap_object() fails because skbuff_fclone_cache has no
> > usercopy whitelist [5]
> >
> > When cloned skbs allocated from skbuff_fclone_cache are used in the
> > socket error queue, accessing the sock_exterr_skb structure in skb->cb
> > via put_cmsg() triggers a usercopy hardening violation:
> >
> > [ 5.379589] usercopy: Kernel memory exposure attempt detected from SLUB
> > object 'skbuff_fclone_cache' (offset 296, size 16)!
> > [ 5.382796] kernel BUG at mm/usercopy.c:102!
> > [ 5.383923] Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
> > [ 5.384903] CPU: 1 UID: 0 PID: 138 Comm: poc_put_cmsg Not tainted 6.12.57
> > #7
> > [ 5.384903] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> > rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> > [ 5.384903] RIP: 0010:usercopy_abort+0x6c/0x80
> > [ 5.384903] Code: 1a 86 51 48 c7 c2 40 15 1a 86 41 52 48 c7 c7 c0 15 1a 86
> > 48 0f 45 d6 48 c7 c6 80 15 1a 86 48 89 c1 49 0f 45 f3 e8 84 27 88 ff <0f>
> > 0b 490
> > [ 5.384903] RSP: 0018:ffffc900006f77a8 EFLAGS: 00010246
> > [ 5.384903] RAX: 000000000000006f RBX: ffff88800f0ad2a8 RCX:
> > 1ffffffff0f72e74
> > [ 5.384903] RDX: 0000000000000000 RSI: 0000000000000004 RDI:
> > ffffffff87b973a0
> > [ 5.384903] RBP: 0000000000000010 R08: 0000000000000000 R09:
> > fffffbfff0f72e74
> > [ 5.384903] R10: 0000000000000003 R11: 79706f6372657375 R12:
> > 0000000000000001
> > [ 5.384903] R13: ffff88800f0ad2b8 R14: ffffea00003c2b40 R15:
> > ffffea00003c2b00
> > [ 5.384903] FS: 0000000011bc4380(0000) GS:ffff8880bf100000(0000)
> > knlGS:0000000000000000
> > [ 5.384903] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 5.384903] CR2: 000056aa3b8e5fe4 CR3: 000000000ea26004 CR4:
> > 0000000000770ef0
> > [ 5.384903] PKRU: 55555554
> > [ 5.384903] Call Trace:
> > [ 5.384903] <TASK>
> > [ 5.384903] __check_heap_object+0x9a/0xd0
> > [ 5.384903] __check_object_size+0x46c/0x690
> > [ 5.384903] put_cmsg+0x129/0x5e0
> > [ 5.384903] sock_recv_errqueue+0x22f/0x380
> > [ 5.384903] tls_sw_recvmsg+0x7ed/0x1960
> > [ 5.384903] ? srso_alias_return_thunk+0x5/0xfbef5
> > [ 5.384903] ? schedule+0x6d/0x270
> > [ 5.384903] ? srso_alias_return_thunk+0x5/0xfbef5
> > [ 5.384903] ? mutex_unlock+0x81/0xd0
> > [ 5.384903] ? __pfx_mutex_unlock+0x10/0x10
> > [ 5.384903] ? __pfx_tls_sw_recvmsg+0x10/0x10
> > [ 5.384903] ? _raw_spin_lock_irqsave+0x8f/0xf0
> > [ 5.384903] ? _raw_read_unlock_irqrestore+0x20/0x40
> > [ 5.384903] ? srso_alias_return_thunk+0x5/0xfbef5
> >
> > The crash offset 296 corresponds to skb2->cb within skbuff_fclones:
> > - sizeof(struct sk_buff) = 232
> > - offsetof(struct sk_buff, cb) = 40
> > - offset of skb2.cb in fclones = 232 + 40 = 272
> > - crash offset 296 = 272 + 24 (inside sock_exterr_skb.ee)
> >
> > Fix this by using kmem_cache_create_usercopy() for skbuff_fclone_cache
> > and whitelisting the cb regions.
> > In our patch, we referenced
> > net: Whitelist the `skb_head_cache` "cb" field. [6]
> >
> > Fix by using kmem_cache_create_usercopy() with the same cb[] region
> > whitelist as skbuff_head_cache.
> >
> > [1] https://elixir.bootlin.com/linux/v6.12.62/source/net/ipv4/tcp.c#L885
> > [2]
> > https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5104
> > [3]
> > https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5566
> > [4]
> > https://elixir.bootlin.com/linux/v6.12.62/source/net/core/skbuff.c#L5491
> > [5] https://elixir.bootlin.com/linux/v6.12.62/source/mm/slub.c#L5719
> > [6]
> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=79a8a642bf05c
> >
> > Fixes: 6d07d1cd300f ("usercopy: Restrict non-usercopy caches to size 0")
> > Reported-by: Xiang Mei <xmei5@asu.edu>
> > Signed-off-by: Weiming Shi <bestswngs@gmail.com>
> >
> >
> > Rephrasing Eric's comment (and hoping to have not misread it), you
> > should fix the issue differently, catching and fclones before adding
> > them to the error queue and try to unclone them.
> >
> >
> > Instead of opening/weakening skbuff_clone to wide user copies, I would
> > rather
> > use what we did in:
> >
> > commit 2558b8039d059342197610498c8749ad294adee5
> > Author: Eric Dumazet <edumazet@google.com>
> > Date: Mon Feb 13 16:00:59 2023 +0000
> >
> > net: use a bounce buffer for copying skb->mark
> >
> > ie :
> >
> > diff --git a/net/core/sock.c b/net/core/sock.c
> > index 45c98bf524b2..a1c8b47b0d56 100644
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -3896,7 +3896,7 @@ void sock_enable_timestamp(struct sock *sk, enum
> > sock_flags flag)
> > int sock_recv_errqueue(struct sock *sk, struct msghdr *msg, int len,
> > int level, int type)
> > {
> > - struct sock_exterr_skb *serr;
> > + struct sock_extended_err ee;
> > struct sk_buff *skb;
> > int copied, err;
> >
> > @@ -3916,8 +3916,9 @@ int sock_recv_errqueue(struct sock *sk, struct
> > msghdr *msg, int len,
> >
> > sock_recv_timestamp(msg, sk, skb);
> >
> > - serr = SKB_EXT_ERR(skb);
> > - put_cmsg(msg, level, type, sizeof(serr->ee), &serr->ee);
> > + /* We must use a bounce buffer for CONFIG_HARDENED_USERCOPY=y */
> > + ee = SKB_EXT_ERR(skb)->ee;
> > + put_cmsg(msg, level, type, sizeof(ee), &ee);
> >
> > msg->msg_flags |= MSG_ERRQUEUE;
> > err = copied;
> >
> >

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-12-23 18:32 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-16  8:44 [PATCH net v4] net: skbuff: add usercopy region to skbuff_fclone_cache bestswngs
2025-12-16 11:37 ` Simon Horman
2025-12-17 18:24 ` Eric Dumazet
2025-12-23 10:34 ` Paolo Abeni
2025-12-23 10:59   ` Eric Dumazet
     [not found]     ` <ea0174b4-dd9d-40a9-8206-5ae3845a5cab@Canary>
     [not found]       ` <CANn89iK-N9davqJg-BdF9K25T3+oHoabcnyAyrE+8sq1qe-7pQ@mail.gmail.com>
2025-12-23 18:31         ` Weiming Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).