* [PATCH 0/2] net: isolate SKB data area allocations @ 2026-06-02 18:31 Pedro Falcato 2026-06-02 18:31 ` [PATCH 1/2] mm/slab: add a node-track-caller variant for kmem buckets allocation Pedro Falcato 2026-06-02 18:31 ` [PATCH 2/2] net: skb: isolate skb data area allocations into a separate bucket Pedro Falcato 0 siblings, 2 replies; 12+ messages in thread From: Pedro Falcato @ 2026-06-02 18:31 UTC (permalink / raw) To: Vlastimil Babka, Harry Yoo, Andrew Morton, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni Cc: linux-hardening, linux-mm, netdev, linux-kernel, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Simon Horman, Jason Xing, Kuniyuki Iwashima, Pedro Falcato This is a rather simple series that attempts to address a possible exploitation avenue - the allocation of skbs around the network stack, which frequently get user-controlled contents. Found while doing some amateur exploitation analysis for some other issue, elsewhere. Patch 0 is a precursor patch that adds a slab allocation helper, patch 1 does the actual bucketing. I don't know what tree should pick this up, so I just based this on linux-next. Pedro Falcato (2): mm/slab: add a node-track-caller variant for kmem buckets allocation net: skb: isolate skb data area allocations into a separate bucket include/linux/slab.h | 7 +++++-- net/core/skbuff.c | 5 ++++- 2 files changed, 9 insertions(+), 3 deletions(-) -- 2.54.0 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/2] mm/slab: add a node-track-caller variant for kmem buckets allocation 2026-06-02 18:31 [PATCH 0/2] net: isolate SKB data area allocations Pedro Falcato @ 2026-06-02 18:31 ` Pedro Falcato 2026-06-04 5:19 ` Harry Yoo ` (2 more replies) 2026-06-02 18:31 ` [PATCH 2/2] net: skb: isolate skb data area allocations into a separate bucket Pedro Falcato 1 sibling, 3 replies; 12+ messages in thread From: Pedro Falcato @ 2026-06-02 18:31 UTC (permalink / raw) To: Vlastimil Babka, Harry Yoo, Andrew Morton, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni Cc: linux-hardening, linux-mm, netdev, linux-kernel, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Simon Horman, Jason Xing, Kuniyuki Iwashima, Pedro Falcato This is required by users that want to use kmem buckets, but still desire specifying the NUMA node. Signed-off-by: Pedro Falcato <pfalcato@suse.de> --- include/linux/slab.h | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index 7b46fa499b08..685a87d8f0c5 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -1153,8 +1153,11 @@ void *kmalloc_nolock(size_t size, gfp_t gfp_flags, int node); #define kmem_buckets_alloc(_b, _size, _flags) \ alloc_hooks(__kmalloc_node_noprof(PASS_KMALLOC_PARAMS(_size, _b, __kmalloc_token(_size)), _flags, NUMA_NO_NODE)) -#define kmem_buckets_alloc_track_caller(_b, _size, _flags) \ - alloc_hooks(__kmalloc_node_track_caller_noprof(PASS_KMALLOC_PARAMS(_size, _b, __kmalloc_token(_size)), _flags, NUMA_NO_NODE, _RET_IP_)) +#define kmem_buckets_alloc_node_track_caller(_b, _size, _flags, _node) \ + alloc_hooks(__kmalloc_node_track_caller_noprof(PASS_KMALLOC_PARAMS(_size, _b, __kmalloc_token(_size)), _flags, _node, _RET_IP_)) + +#define kmem_buckets_alloc_track_caller(_b, _size, _flags) \ + kmem_buckets_alloc_node_track_caller(_b, _size, _flags, NUMA_NO_NODE) static __always_inline __alloc_size(1) void *_kmalloc_node_noprof(size_t size, gfp_t flags, int node, kmalloc_token_t token) { -- 2.54.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] mm/slab: add a node-track-caller variant for kmem buckets allocation 2026-06-02 18:31 ` [PATCH 1/2] mm/slab: add a node-track-caller variant for kmem buckets allocation Pedro Falcato @ 2026-06-04 5:19 ` Harry Yoo 2026-06-04 19:12 ` Pedro Falcato 2026-06-05 11:59 ` Vlastimil Babka (SUSE) 2026-06-05 18:08 ` Kees Cook 2 siblings, 1 reply; 12+ messages in thread From: Harry Yoo @ 2026-06-04 5:19 UTC (permalink / raw) To: Pedro Falcato, Vlastimil Babka, Andrew Morton, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni Cc: linux-hardening, linux-mm, netdev, linux-kernel, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Simon Horman, Jason Xing, Kuniyuki Iwashima [-- Attachment #1.1: Type: text/plain, Size: 299 bytes --] On 6/3/26 3:31 AM, Pedro Falcato wrote: > This is required by users that want to use kmem buckets, but still > desire specifying the NUMA node. > > Signed-off-by: Pedro Falcato <pfalcato@suse.de> > --- Acked-by: Harry Yoo (Oracle) <harry@kernel.org> -- Cheers, Harry / Hyeonggon [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] mm/slab: add a node-track-caller variant for kmem buckets allocation 2026-06-04 5:19 ` Harry Yoo @ 2026-06-04 19:12 ` Pedro Falcato 0 siblings, 0 replies; 12+ messages in thread From: Pedro Falcato @ 2026-06-04 19:12 UTC (permalink / raw) To: Harry Yoo Cc: Vlastimil Babka, Andrew Morton, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-hardening, linux-mm, netdev, linux-kernel, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Simon Horman, Jason Xing, Kuniyuki Iwashima On Thu, Jun 04, 2026 at 02:19:30PM +0900, Harry Yoo wrote: > > > On 6/3/26 3:31 AM, Pedro Falcato wrote: > > This is required by users that want to use kmem buckets, but still > > desire specifying the NUMA node. > > > > Signed-off-by: Pedro Falcato <pfalcato@suse.de> > > --- > > Acked-by: Harry Yoo (Oracle) <harry@kernel.org> Thanks for the review! -- Pedro ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] mm/slab: add a node-track-caller variant for kmem buckets allocation 2026-06-02 18:31 ` [PATCH 1/2] mm/slab: add a node-track-caller variant for kmem buckets allocation Pedro Falcato 2026-06-04 5:19 ` Harry Yoo @ 2026-06-05 11:59 ` Vlastimil Babka (SUSE) 2026-06-05 18:08 ` Kees Cook 2 siblings, 0 replies; 12+ messages in thread From: Vlastimil Babka (SUSE) @ 2026-06-05 11:59 UTC (permalink / raw) To: Pedro Falcato, Harry Yoo, Andrew Morton, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni Cc: linux-hardening, linux-mm, netdev, linux-kernel, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Simon Horman, Jason Xing, Kuniyuki Iwashima On 6/2/26 20:31, Pedro Falcato wrote: > This is required by users that want to use kmem buckets, but still > desire specifying the NUMA node. > > Signed-off-by: Pedro Falcato <pfalcato@suse.de> Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> As the main change is in net code, probably should go in the net tree? > --- > include/linux/slab.h | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/include/linux/slab.h b/include/linux/slab.h > index 7b46fa499b08..685a87d8f0c5 100644 > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -1153,8 +1153,11 @@ void *kmalloc_nolock(size_t size, gfp_t gfp_flags, int node); > #define kmem_buckets_alloc(_b, _size, _flags) \ > alloc_hooks(__kmalloc_node_noprof(PASS_KMALLOC_PARAMS(_size, _b, __kmalloc_token(_size)), _flags, NUMA_NO_NODE)) > > -#define kmem_buckets_alloc_track_caller(_b, _size, _flags) \ > - alloc_hooks(__kmalloc_node_track_caller_noprof(PASS_KMALLOC_PARAMS(_size, _b, __kmalloc_token(_size)), _flags, NUMA_NO_NODE, _RET_IP_)) > +#define kmem_buckets_alloc_node_track_caller(_b, _size, _flags, _node) \ > + alloc_hooks(__kmalloc_node_track_caller_noprof(PASS_KMALLOC_PARAMS(_size, _b, __kmalloc_token(_size)), _flags, _node, _RET_IP_)) > + > +#define kmem_buckets_alloc_track_caller(_b, _size, _flags) \ > + kmem_buckets_alloc_node_track_caller(_b, _size, _flags, NUMA_NO_NODE) > > static __always_inline __alloc_size(1) void *_kmalloc_node_noprof(size_t size, gfp_t flags, int node, kmalloc_token_t token) > { ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] mm/slab: add a node-track-caller variant for kmem buckets allocation 2026-06-02 18:31 ` [PATCH 1/2] mm/slab: add a node-track-caller variant for kmem buckets allocation Pedro Falcato 2026-06-04 5:19 ` Harry Yoo 2026-06-05 11:59 ` Vlastimil Babka (SUSE) @ 2026-06-05 18:08 ` Kees Cook 2 siblings, 0 replies; 12+ messages in thread From: Kees Cook @ 2026-06-05 18:08 UTC (permalink / raw) To: Pedro Falcato Cc: Vlastimil Babka, Harry Yoo, Andrew Morton, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-hardening, linux-mm, netdev, linux-kernel, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Simon Horman, Jason Xing, Kuniyuki Iwashima On Tue, Jun 02, 2026 at 07:31:21PM +0100, Pedro Falcato wrote: > This is required by users that want to use kmem buckets, but still > desire specifying the NUMA node. > > Signed-off-by: Pedro Falcato <pfalcato@suse.de> Nice! Reviewed-by: Kees Cook <kees@kernel.org> -Kees -- Kees Cook ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2/2] net: skb: isolate skb data area allocations into a separate bucket 2026-06-02 18:31 [PATCH 0/2] net: isolate SKB data area allocations Pedro Falcato 2026-06-02 18:31 ` [PATCH 1/2] mm/slab: add a node-track-caller variant for kmem buckets allocation Pedro Falcato @ 2026-06-02 18:31 ` Pedro Falcato [not found] ` <6d70757a-a849-4828-89e7-f3d51bf8c9f8@kernel.org> ` (2 more replies) 1 sibling, 3 replies; 12+ messages in thread From: Pedro Falcato @ 2026-06-02 18:31 UTC (permalink / raw) To: Vlastimil Babka, Harry Yoo, Andrew Morton, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni Cc: linux-hardening, linux-mm, netdev, linux-kernel, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Simon Horman, Jason Xing, Kuniyuki Iwashima, Pedro Falcato SKB data area allocations (as done from alloc_skb()) use kmalloc(). These allocations can be variably sized and their contents can be more or less controlled from userspace, which makes them useful for attackers that want to overwrite a use-after-free'd object from the same kmalloc slab (which often just requires the sizes to roughly match into the same kmalloc bucket). [0] is an easy example of an exploit that uses netlink skb allocation to target another similarly-sized accidentally freed object. While other mitigations like CONFIG_RANDOM_KMALLOC_CACHES exist, these are probabilistic. Use the existing kmem buckets API to further isolate these allocations in a guaranteed fashion, when CONFIG_SLAB_BUCKETS=y. Link: https://github.com/google/security-research/blob/master/pocs/linux/kernelctf/CVE-2023-4207_lts_cos_mitigation_2/docs/exploit.md [0] Signed-off-by: Pedro Falcato <pfalcato@suse.de> --- net/core/skbuff.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 44a7f8401468..1f6c6b531ece 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -594,6 +594,8 @@ static void *kmalloc_pfmemalloc(size_t obj_size, gfp_t flags, int node) return kmalloc_node_track_caller(obj_size, flags, node); } +static kmem_buckets *skb_data_buckets __ro_after_init; + /* * kmalloc_reserve is a wrapper around kmalloc_node_track_caller that tells * the caller if emergency pfmemalloc reserves are being used. If it is and @@ -632,7 +634,7 @@ static void *kmalloc_reserve(unsigned int *size, gfp_t flags, int node, * Try a regular allocation, when that fails and we're not entitled * to the reserves, fail. */ - obj = kmalloc_node_track_caller(obj_size, + obj = kmem_buckets_alloc_node_track_caller(skb_data_buckets, obj_size, flags | __GFP_NOMEMALLOC | __GFP_NOWARN, node); if (likely(obj)) @@ -5213,6 +5215,7 @@ void __init skb_init(void) 0, SKB_SMALL_HEAD_HEADROOM, NULL); + skb_data_buckets = kmem_buckets_create("skb_data", SLAB_PANIC, 0, INT_MAX, NULL); skb_extensions_init(); } -- 2.54.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
[parent not found: <6d70757a-a849-4828-89e7-f3d51bf8c9f8@kernel.org>]
* Re: [PATCH 2/2] net: skb: isolate skb data area allocations into a separate bucket [not found] ` <6d70757a-a849-4828-89e7-f3d51bf8c9f8@kernel.org> @ 2026-06-04 19:12 ` Pedro Falcato 2026-06-05 5:45 ` Harry Yoo 0 siblings, 1 reply; 12+ messages in thread From: Pedro Falcato @ 2026-06-04 19:12 UTC (permalink / raw) To: Harry Yoo Cc: Vlastimil Babka, Andrew Morton, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-hardening, linux-mm, netdev, linux-kernel, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Simon Horman, Jason Xing, Kuniyuki Iwashima On Thu, Jun 04, 2026 at 02:30:34PM +0900, Harry Yoo wrote: > > > On 6/3/26 3:31 AM, Pedro Falcato wrote: > > SKB data area allocations (as done from alloc_skb()) use kmalloc(). > > These allocations can be variably sized and their contents can be more > > or less controlled from userspace, which makes them useful for attackers > > that want to overwrite a use-after-free'd object from the same kmalloc slab > > (which often just requires the sizes to roughly match into the same kmalloc > > bucket). [0] is an easy example of an exploit that uses netlink skb > > allocation to target another similarly-sized accidentally freed object. > > > > While other mitigations like CONFIG_RANDOM_KMALLOC_CACHES exist, these are > > probabilistic. Use the existing kmem buckets API to further isolate these > > allocations in a guaranteed fashion, when CONFIG_SLAB_BUCKETS=y. > > > > Link: https://github.com/google/security-research/blob/master/pocs/linux/kernelctf/CVE-2023-4207_lts_cos_mitigation_2/docs/exploit.md [0] > > Signed-off-by: Pedro Falcato <pfalcato@suse.de> > > --- > > net/core/skbuff.c | 5 ++++- > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > index 44a7f8401468..1f6c6b531ece 100644 > > --- a/net/core/skbuff.c > > +++ b/net/core/skbuff.c > > @@ -594,6 +594,8 @@ static void *kmalloc_pfmemalloc(size_t obj_size, gfp_t flags, int node) > > return kmalloc_node_track_caller(obj_size, flags, node); > > } > > > > +static kmem_buckets *skb_data_buckets __ro_after_init; > > + > > /* > > * kmalloc_reserve is a wrapper around kmalloc_node_track_caller that tells > > * the caller if emergency pfmemalloc reserves are being used. If it is and > > @@ -632,7 +634,7 @@ static void *kmalloc_reserve(unsigned int *size, gfp_t flags, int node, > > * Try a regular allocation, when that fails and we're not entitled > > * to the reserves, fail. > > */ > > - obj = kmalloc_node_track_caller(obj_size, > > + obj = kmem_buckets_alloc_node_track_caller(skb_data_buckets, obj_size, > > flags | __GFP_NOMEMALLOC | __GFP_NOWARN, > > node); > > if (likely(obj)) > > What about kmalloc_pfmemalloc()? Good point, that looks free as well. Sidenote: isolating kmem_cache_alloc for possibly-aliasing caches could also be useful. skb allocation has net_hotdata.skb_small_head_cache. It doesn't merge with anything for $raisins (odd size, plus I don't think usercopy caches are getting merged?) but it feels too... accidental? Maybe passing something like SLAB_NO_MERGE and making the size standard-looking would be nice. I have a size of 704 bytes per object, and this probably causes some weird wastage for each slab. -- Pedro ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] net: skb: isolate skb data area allocations into a separate bucket 2026-06-04 19:12 ` Pedro Falcato @ 2026-06-05 5:45 ` Harry Yoo 2026-06-05 7:25 ` Eric Dumazet 0 siblings, 1 reply; 12+ messages in thread From: Harry Yoo @ 2026-06-05 5:45 UTC (permalink / raw) To: Pedro Falcato Cc: Vlastimil Babka, Andrew Morton, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-hardening, linux-mm, netdev, linux-kernel, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Simon Horman, Jason Xing, Kuniyuki Iwashima [-- Attachment #1.1: Type: text/plain, Size: 3612 bytes --] On 6/5/26 4:12 AM, Pedro Falcato wrote: > On Thu, Jun 04, 2026 at 02:30:34PM +0900, Harry Yoo wrote: >> >> >> On 6/3/26 3:31 AM, Pedro Falcato wrote: >>> SKB data area allocations (as done from alloc_skb()) use kmalloc(). >>> These allocations can be variably sized and their contents can be more >>> or less controlled from userspace, which makes them useful for attackers >>> that want to overwrite a use-after-free'd object from the same kmalloc slab >>> (which often just requires the sizes to roughly match into the same kmalloc >>> bucket). [0] is an easy example of an exploit that uses netlink skb >>> allocation to target another similarly-sized accidentally freed object. >>> >>> While other mitigations like CONFIG_RANDOM_KMALLOC_CACHES exist, these are >>> probabilistic. Use the existing kmem buckets API to further isolate these >>> allocations in a guaranteed fashion, when CONFIG_SLAB_BUCKETS=y. >>> >>> Link: https://github.com/google/security-research/blob/master/pocs/linux/kernelctf/CVE-2023-4207_lts_cos_mitigation_2/docs/exploit.md [0] >>> Signed-off-by: Pedro Falcato <pfalcato@suse.de> >>> --- >>> net/core/skbuff.c | 5 ++++- >>> 1 file changed, 4 insertions(+), 1 deletion(-) >>> >>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c >>> index 44a7f8401468..1f6c6b531ece 100644 >>> --- a/net/core/skbuff.c >>> +++ b/net/core/skbuff.c >>> @@ -594,6 +594,8 @@ static void *kmalloc_pfmemalloc(size_t obj_size, gfp_t flags, int node) >>> return kmalloc_node_track_caller(obj_size, flags, node); >>> } >>> >>> +static kmem_buckets *skb_data_buckets __ro_after_init; >>> + >>> /* >>> * kmalloc_reserve is a wrapper around kmalloc_node_track_caller that tells >>> * the caller if emergency pfmemalloc reserves are being used. If it is and >>> @@ -632,7 +634,7 @@ static void *kmalloc_reserve(unsigned int *size, gfp_t flags, int node, >>> * Try a regular allocation, when that fails and we're not entitled >>> * to the reserves, fail. >>> */ >>> - obj = kmalloc_node_track_caller(obj_size, >>> + obj = kmem_buckets_alloc_node_track_caller(skb_data_buckets, obj_size, >>> flags | __GFP_NOMEMALLOC | __GFP_NOWARN, >>> node); >>> if (likely(obj)) >> >> What about kmalloc_pfmemalloc()? > > Good point, that looks free as well. > > Sidenote: isolating kmem_cache_alloc for possibly-aliasing caches could also > be useful. skb allocation has net_hotdata.skb_small_head_cache. It doesn't merge > with anything for $raisins (odd size, plus I don't think usercopy caches are > getting merged?) but it feels too... accidental? Right, we never merge caches with useroffset/usersize. Hmm... /* SKB_SMALL_HEAD_CACHE_SIZE is the size used for the skbuff_small_head * kmem_cache. The non-power-of-2 padding is kept for historical reasons and * to avoid potential collisions with generic kmalloc bucket sizes. */ #define SKB_SMALL_HEAD_CACHE_SIZE \ (is_power_of_2(SKB_SMALL_HEAD_SIZE) ? \ (SKB_SMALL_HEAD_SIZE + L1_CACHE_BYTES) : \ SKB_SMALL_HEAD_SIZE) What are "historical reasons" other than avoiding collisions with kmalloc caches? > Maybe passing something like SLAB_NO_MERGE and making the size > standard-looking would be nice. I have a size of 704 bytes per object, and > this probably causes some weird wastage for each slab. Yes, unless the "historical reasons" do not make it infeasible to do that. And I wonder if net/core/skbuff.c intends to always prevent merging, or only with hardening configs like SLAB_BUCKETS. -- Cheers, Harry / Hyeonggon [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] net: skb: isolate skb data area allocations into a separate bucket 2026-06-05 5:45 ` Harry Yoo @ 2026-06-05 7:25 ` Eric Dumazet 0 siblings, 0 replies; 12+ messages in thread From: Eric Dumazet @ 2026-06-05 7:25 UTC (permalink / raw) To: Harry Yoo Cc: Pedro Falcato, Vlastimil Babka, Andrew Morton, David S. Miller, Jakub Kicinski, Paolo Abeni, linux-hardening, linux-mm, netdev, linux-kernel, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Simon Horman, Jason Xing, Kuniyuki Iwashima On Thu, Jun 4, 2026 at 10:45 PM Harry Yoo <harry@kernel.org> wrote: > > > > On 6/5/26 4:12 AM, Pedro Falcato wrote: > > On Thu, Jun 04, 2026 at 02:30:34PM +0900, Harry Yoo wrote: > >> > >> > >> On 6/3/26 3:31 AM, Pedro Falcato wrote: > >>> SKB data area allocations (as done from alloc_skb()) use kmalloc(). > >>> These allocations can be variably sized and their contents can be more > >>> or less controlled from userspace, which makes them useful for attackers > >>> that want to overwrite a use-after-free'd object from the same kmalloc slab > >>> (which often just requires the sizes to roughly match into the same kmalloc > >>> bucket). [0] is an easy example of an exploit that uses netlink skb > >>> allocation to target another similarly-sized accidentally freed object. > >>> > >>> While other mitigations like CONFIG_RANDOM_KMALLOC_CACHES exist, these are > >>> probabilistic. Use the existing kmem buckets API to further isolate these > >>> allocations in a guaranteed fashion, when CONFIG_SLAB_BUCKETS=y. > >>> > >>> Link: https://github.com/google/security-research/blob/master/pocs/linux/kernelctf/CVE-2023-4207_lts_cos_mitigation_2/docs/exploit.md [0] > >>> Signed-off-by: Pedro Falcato <pfalcato@suse.de> > >>> --- > >>> net/core/skbuff.c | 5 ++++- > >>> 1 file changed, 4 insertions(+), 1 deletion(-) > >>> > >>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c > >>> index 44a7f8401468..1f6c6b531ece 100644 > >>> --- a/net/core/skbuff.c > >>> +++ b/net/core/skbuff.c > >>> @@ -594,6 +594,8 @@ static void *kmalloc_pfmemalloc(size_t obj_size, gfp_t flags, int node) > >>> return kmalloc_node_track_caller(obj_size, flags, node); > >>> } > >>> > >>> +static kmem_buckets *skb_data_buckets __ro_after_init; > >>> + > >>> /* > >>> * kmalloc_reserve is a wrapper around kmalloc_node_track_caller that tells > >>> * the caller if emergency pfmemalloc reserves are being used. If it is and > >>> @@ -632,7 +634,7 @@ static void *kmalloc_reserve(unsigned int *size, gfp_t flags, int node, > >>> * Try a regular allocation, when that fails and we're not entitled > >>> * to the reserves, fail. > >>> */ > >>> - obj = kmalloc_node_track_caller(obj_size, > >>> + obj = kmem_buckets_alloc_node_track_caller(skb_data_buckets, obj_size, > >>> flags | __GFP_NOMEMALLOC | __GFP_NOWARN, > >>> node); > >>> if (likely(obj)) > >> > >> What about kmalloc_pfmemalloc()? > > > > Good point, that looks free as well. > > > > Sidenote: isolating kmem_cache_alloc for possibly-aliasing caches could also > > be useful. skb allocation has net_hotdata.skb_small_head_cache. It doesn't merge > > with anything for $raisins (odd size, plus I don't think usercopy caches are > > getting merged?) but it feels too... accidental? > > Right, we never merge caches with useroffset/usersize. > > Hmm... > > /* SKB_SMALL_HEAD_CACHE_SIZE is the size used for the skbuff_small_head > * kmem_cache. The non-power-of-2 padding is kept for historical reasons and > * to avoid potential collisions with generic kmalloc bucket sizes. > */ > #define SKB_SMALL_HEAD_CACHE_SIZE \ > (is_power_of_2(SKB_SMALL_HEAD_SIZE) ? \ > (SKB_SMALL_HEAD_SIZE + L1_CACHE_BYTES) : \ > SKB_SMALL_HEAD_SIZE) > > > What are "historical reasons" other than avoiding collisions with > kmalloc caches? git log/blame might help :) commit 0f42e3f4fe2a58394e37241d02d9ca6ab7b7d516 Author: Jiayuan Chen <jiayuan.chen@linux.dev> Date: Fri Apr 3 09:45:12 2026 +0800 net: skb: fix cross-cache free of KFENCE-allocated skb head Note that MAX_SKB_FRAGS can be tuned. config MAX_SKB_FRAGS int "Maximum number of fragments per skb_shared_info" range 17 45 default 17 help Having more fragments per skb_shared_info can help GRO efficiency. This helps BIG TCP workloads, but might expose bugs in some legacy drivers. This also increases memory overhead of small packets, and in drivers using build_skb(). If unsure, say 17. > > > Maybe passing something like SLAB_NO_MERGE and making the size > > standard-looking would be nice. I have a size of 704 bytes per object, and > > this probably causes some weird wastage for each slab. > > Yes, unless the "historical reasons" do not make it infeasible to do that. > > And I wonder if net/core/skbuff.c intends to always prevent merging, or > only with hardening configs like SLAB_BUCKETS. We do not care anymore of merging or not. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] net: skb: isolate skb data area allocations into a separate bucket 2026-06-02 18:31 ` [PATCH 2/2] net: skb: isolate skb data area allocations into a separate bucket Pedro Falcato [not found] ` <6d70757a-a849-4828-89e7-f3d51bf8c9f8@kernel.org> @ 2026-06-05 1:52 ` Jakub Kicinski 2026-06-05 18:09 ` Kees Cook 2 siblings, 0 replies; 12+ messages in thread From: Jakub Kicinski @ 2026-06-05 1:52 UTC (permalink / raw) To: Pedro Falcato Cc: Vlastimil Babka, Harry Yoo, Andrew Morton, David S. Miller, Eric Dumazet, Paolo Abeni, linux-hardening, linux-mm, netdev, linux-kernel, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Simon Horman, Jason Xing, Kuniyuki Iwashima On Tue, 2 Jun 2026 19:31:22 +0100 Pedro Falcato wrote: > SKB data area allocations (as done from alloc_skb()) use kmalloc(). > These allocations can be variably sized and their contents can be more > or less controlled from userspace, which makes them useful for attackers > that want to overwrite a use-after-free'd object from the same kmalloc slab > (which often just requires the sizes to roughly match into the same kmalloc > bucket). [0] is an easy example of an exploit that uses netlink skb > allocation to target another similarly-sized accidentally freed object. > > While other mitigations like CONFIG_RANDOM_KMALLOC_CACHES exist, these are > probabilistic. Use the existing kmem buckets API to further isolate these > allocations in a guaranteed fashion, when CONFIG_SLAB_BUCKETS=y. No idea on the merits but from networking point of view: Acked-by: Jakub Kicinski <kuba@kernel.org> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] net: skb: isolate skb data area allocations into a separate bucket 2026-06-02 18:31 ` [PATCH 2/2] net: skb: isolate skb data area allocations into a separate bucket Pedro Falcato [not found] ` <6d70757a-a849-4828-89e7-f3d51bf8c9f8@kernel.org> 2026-06-05 1:52 ` Jakub Kicinski @ 2026-06-05 18:09 ` Kees Cook 2 siblings, 0 replies; 12+ messages in thread From: Kees Cook @ 2026-06-05 18:09 UTC (permalink / raw) To: Pedro Falcato Cc: Vlastimil Babka, Harry Yoo, Andrew Morton, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-hardening, linux-mm, netdev, linux-kernel, Hao Li, Christoph Lameter, David Rientjes, Roman Gushchin, Simon Horman, Jason Xing, Kuniyuki Iwashima On Tue, Jun 02, 2026 at 07:31:22PM +0100, Pedro Falcato wrote: > SKB data area allocations (as done from alloc_skb()) use kmalloc(). > These allocations can be variably sized and their contents can be more > or less controlled from userspace, which makes them useful for attackers > that want to overwrite a use-after-free'd object from the same kmalloc slab > (which often just requires the sizes to roughly match into the same kmalloc > bucket). [0] is an easy example of an exploit that uses netlink skb > allocation to target another similarly-sized accidentally freed object. > > While other mitigations like CONFIG_RANDOM_KMALLOC_CACHES exist, these are > probabilistic. Use the existing kmem buckets API to further isolate these > allocations in a guaranteed fashion, when CONFIG_SLAB_BUCKETS=y. > > Link: https://github.com/google/security-research/blob/master/pocs/linux/kernelctf/CVE-2023-4207_lts_cos_mitigation_2/docs/exploit.md [0] > Signed-off-by: Pedro Falcato <pfalcato@suse.de> Great! This is exactly what the bucket API was made for. :) Reviewed-by: Kees Cook <kees@kernel.org> -- Kees Cook ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2026-06-05 18:09 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-02 18:31 [PATCH 0/2] net: isolate SKB data area allocations Pedro Falcato
2026-06-02 18:31 ` [PATCH 1/2] mm/slab: add a node-track-caller variant for kmem buckets allocation Pedro Falcato
2026-06-04 5:19 ` Harry Yoo
2026-06-04 19:12 ` Pedro Falcato
2026-06-05 11:59 ` Vlastimil Babka (SUSE)
2026-06-05 18:08 ` Kees Cook
2026-06-02 18:31 ` [PATCH 2/2] net: skb: isolate skb data area allocations into a separate bucket Pedro Falcato
[not found] ` <6d70757a-a849-4828-89e7-f3d51bf8c9f8@kernel.org>
2026-06-04 19:12 ` Pedro Falcato
2026-06-05 5:45 ` Harry Yoo
2026-06-05 7:25 ` Eric Dumazet
2026-06-05 1:52 ` Jakub Kicinski
2026-06-05 18:09 ` Kees Cook
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox