From: Harry Yoo <harry.yoo@oracle.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>,
Alexander Potapenko <glider@google.com>,
Roman Gushchin <roman.gushchin@linux.dev>,
Andrew Morton <akpm@linux-foundation.org>,
Vincenzo Frascino <vincenzo.frascino@arm.com>,
Andrey Ryabinin <ryabinin.a.a@gmail.com>,
Feng Tang <feng.79.tang@gmail.com>,
Christoph Lameter <cl@gentwo.org>,
Dmitry Vyukov <dvyukov@google.com>,
Andrey Konovalov <andreyknvl@gmail.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
kasan-dev@googlegroups.com, stable@vger.kernel.org
Subject: Re: [PATCH] mm/slab: ensure all metadata in slab object are word-aligned
Date: Fri, 24 Oct 2025 09:40:55 +0900 [thread overview]
Message-ID: <aPrLF0OUK651M4dk@hyeyoo> (raw)
In-Reply-To: <20251023131600.1103431-1-harry.yoo@oracle.com>
On Thu, Oct 23, 2025 at 10:16:00PM +0900, Harry Yoo wrote:
> When the SLAB_STORE_USER debug flag is used, any metadata placed after
> the original kmalloc request size (orig_size) is not properly aligned
> on 64-bit architectures because its type is unsigned int. When both KASAN
> and SLAB_STORE_USER are enabled, kasan_alloc_meta is misaligned.
>
> Because not all architectures support unaligned memory accesses,
> ensure that all metadata (track, orig_size, kasan_{alloc,free}_meta)
> in a slab object are word-aligned. struct track, kasan_{alloc,free}_meta
> are aligned by adding __aligned(sizeof(unsigned long)).
>
> For orig_size, use ALIGN(sizeof(unsigned int), sizeof(unsigned long)) to
> make clear that its size remains unsigned int but it must be aligned to
> a word boundary. On 64-bit architectures, this reserves 8 bytes for
> orig_size, which is acceptable since kmalloc's original request size
> tracking is intended for debugging rather than production use.
>
> Cc: <stable@vger.kernel.org>
> Fixes: 6edf2576a6cc ("mm/slub: enable debugging memory wasting of kmalloc")
> Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> ---
Adding more details on how I discovered this and why I care:
I was developing a feature that uses unused bytes in s->size as the
slabobj_ext metadata. Unlike other metadata where slab disables KASAN
when accessing it, this should be unpoisoned to avoid adding complexity
and overhead when accessing it.
So I wrote a code snippet to unpoison the metdata on slab allocation,
and then encountered a warning from KASAN:
[ 4.951021] WARNING: CPU: 0 PID: 1 at mm/kasan/shadow.c:174 kasan_unpoison+0x6d/0x80
[ 4.952021] Modules linked in:
[ 4.953021] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.17.0-rc3-slab-memopt-debug+ #22 PREEMPT(voluntary)
[ 4.954021] Tainted: [W]=WARN
[ 4.954495] Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 4.955021] RIP: 0010:kasan_unpoison+0x6d/0x80
[ 4.955724] Code: 84 02 4c 89 e0 83 e0 07 74 0b 4c 01 e3 48 c1 eb 03 42 88 04 2b 5b 41 5c 41 5d 5d 31 c0 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc <0f> 0b 31 c0 31 d20
[ 4.956021] RSP: 0018:ffff888007c57a90 EFLAGS: 00010202
[ 4.957021] RAX: 0000000000000000 RBX: 0000000000000074 RCX: 0000000000000080
[ 4.958021] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff888007d7ae74
[ 4.959021] RBP: ffff888007c57a98 R08: 0000000000000000 R09: 0000000000000000
[ 4.960021] R10: ffffed1000faf400 R11: 0000000000000000 R12: ffffea00001f5e80
[ 4.961021] R13: ffff888007466dc0 R14: ffff888007d7ae00 R15: ffff888007d7ae74
[ 4.962023] FS: 0000000000000000(0000) GS:ffff8880e1a15000(0000) knlGS:0000000000000000
[ 4.963021] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4.964021] CR2: ffff888007201000 CR3: 00000000056a0000 CR4: 00000000000006f0
[ 4.965023] Call Trace:
[ 4.965413] <TASK>
[ 4.966021] ? __kasan_unpoison_range+0x26/0x50
[ 4.966759] alloc_slab_obj_exts_early.constprop.0+0x136/0x240
[ 4.967021] allocate_slab+0x107/0x4b0
[ 4.968021] ___slab_alloc+0x8f6/0xec0
[ 4.969021] ? kstrdup_const+0x2c/0x40
[ 4.969615] ? __xa_alloc+0x227/0x320
[ 4.970021] __slab_alloc.isra.0+0x35/0x90
[ 4.970663] __kmalloc_node_track_caller_noprof+0x4e2/0x7a0
[ 4.971021] ? kstrdup_const+0x2c/0x40
[ 4.972021] kstrdup+0x48/0xf0
[ 4.972505] ? kstrdup+0x48/0xf0
[ 4.973021] kstrdup_const+0x2c/0x40
[ 4.973589] alloc_vfsmnt+0xd5/0x680
[ 4.974021] vfs_create_mount.part.0+0x42/0x3e0
[ 4.975021] vfs_kern_mount.part.0+0x10c/0x150
[ 4.975722] vfs_kern_mount+0x13/0x40
[ 4.976021] devtmpfs_init+0xa8/0x430
[ 4.977021] ? __percpu_counter_init_many+0x199/0x360
[ 4.977812] ? __pfx_devtmpfs_init+0x10/0x10
[ 4.978021] ? page_offline_thaw+0x5/0x20
[ 4.979021] ? __kasan_check_write+0x14/0x30
[ 4.979694] driver_init+0x1a/0x60
[ 4.980021] kernel_init_freeable+0x7de/0xeb0
[ 4.981021] ? __pfx_kernel_init+0x10/0x10
[ 4.981667] kernel_init+0x1f/0x220
[ 4.982021] ? __pfx_kernel_init+0x10/0x10
[ 4.983021] ret_from_fork+0x2b8/0x3b0
[ 4.983618] ? __pfx_kernel_init+0x10/0x10
[ 4.984021] ret_from_fork_asm+0x1a/0x30
[ 4.984639] RIP: 2e66:0x0
[ 4.985021] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[ 4.986021] RSP: 0084:0000000000000000 EFLAGS: 841f0f2e660000 ORIG_RAX: 2e66000000000084
[ 4.987021] RAX: 0000000000000000 RBX: 2e66000000000084 RCX: 0000000000841f0f
[ 4.988021] RDX: 000000841f0f2e66 RSI: 00841f0f2e660000 RDI: 1f0f2e6600000000
[ 4.989021] RBP: 1f0f2e6600000000 R08: 1f0f2e6600000000 R09: 00841f0f2e660000
[ 4.990021] R10: 000000841f0f2e66 R11: 0000000000841f0f R12: 00841f0f2e660000
[ 4.991021] R13: 000000841f0f2e66 R14: 0000000000841f0f R15: 2e66000000000084
[ 4.992022] </TASK>
[ 4.992372] ---[ end trace 0000000000000000 ]---
This warning is from kasan_unpoison():
if (WARN_ON((unsigned long)addr & KASAN_GRANULE_MASK))
return;
on x86_64, the address passed to kasan_{poison,unpoison}() should be at
least aligned with 8 bytes.
After manual investigation it turns out when the SLAB_STORE_USER flag is
specified, any metadata after the original kmalloc request size is
misaligned.
Questions:
- Could it cause any issues other than the one described above?
- Does KASAN even support architectures that have issues with unaligned
accesses?
- How come we haven't seen any issues regarding this so far? :/
> mm/kasan/kasan.h | 4 ++--
> mm/slub.c | 16 +++++++++++-----
> 2 files changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
> index 129178be5e64..d4ea7ecc20c3 100644
> --- a/mm/kasan/kasan.h
> +++ b/mm/kasan/kasan.h
> @@ -265,7 +265,7 @@ struct kasan_alloc_meta {
> struct kasan_track alloc_track;
> /* Free track is stored in kasan_free_meta. */
> depot_stack_handle_t aux_stack[2];
> -};
> +} __aligned(sizeof(unsigned long));
>
> struct qlist_node {
> struct qlist_node *next;
> @@ -289,7 +289,7 @@ struct qlist_node {
> struct kasan_free_meta {
> struct qlist_node quarantine_link;
> struct kasan_track free_track;
> -};
> +} __aligned(sizeof(unsigned long));
>
> #endif /* CONFIG_KASAN_GENERIC */
>
> diff --git a/mm/slub.c b/mm/slub.c
> index a585d0ac45d4..b921f91723c2 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -344,7 +344,7 @@ struct track {
> int cpu; /* Was running on cpu */
> int pid; /* Pid context */
> unsigned long when; /* When did the operation occur */
> -};
> +} __aligned(sizeof(unsigned long));
>
> enum track_item { TRACK_ALLOC, TRACK_FREE };
>
> @@ -1196,7 +1196,7 @@ static void print_trailer(struct kmem_cache *s, struct slab *slab, u8 *p)
> off += 2 * sizeof(struct track);
>
> if (slub_debug_orig_size(s))
> - off += sizeof(unsigned int);
> + off += ALIGN(sizeof(unsigned int), sizeof(unsigned long));
>
> off += kasan_metadata_size(s, false);
>
> @@ -1392,7 +1392,8 @@ static int check_pad_bytes(struct kmem_cache *s, struct slab *slab, u8 *p)
> off += 2 * sizeof(struct track);
>
> if (s->flags & SLAB_KMALLOC)
> - off += sizeof(unsigned int);
> + off += ALIGN(sizeof(unsigned int),
> + sizeof(unsigned long));
> }
>
> off += kasan_metadata_size(s, false);
> @@ -7820,9 +7821,14 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
> */
> size += 2 * sizeof(struct track);
>
> - /* Save the original kmalloc request size */
> + /*
> + * Save the original kmalloc request size.
> + * Although the request size is an unsigned int,
> + * make sure that is aligned to word boundary.
> + */
> if (flags & SLAB_KMALLOC)
> - size += sizeof(unsigned int);
> + size += ALIGN(sizeof(unsigned int),
> + sizeof(unsigned long));
> }
> #endif
>
> --
> 2.43.0
>
--
Cheers,
Harry / Hyeonggon
next prev parent reply other threads:[~2025-10-24 0:41 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-23 13:16 [PATCH] mm/slab: ensure all metadata in slab object are word-aligned Harry Yoo
2025-10-24 0:40 ` Harry Yoo [this message]
2025-10-24 1:19 ` Andrey Konovalov
2025-10-24 1:35 ` Andrey Konovalov
2025-10-24 1:56 ` Andrey Konovalov
2025-10-24 7:55 ` Harry Yoo
2025-10-24 8:35 ` Harry Yoo
2025-10-24 14:17 ` Andrey Konovalov
2025-10-24 1:19 ` Andrey Konovalov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aPrLF0OUK651M4dk@hyeyoo \
--to=harry.yoo@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=andreyknvl@gmail.com \
--cc=cl@gentwo.org \
--cc=dvyukov@google.com \
--cc=feng.79.tang@gmail.com \
--cc=glider@google.com \
--cc=kasan-dev@googlegroups.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=ryabinin.a.a@gmail.com \
--cc=stable@vger.kernel.org \
--cc=vbabka@suse.cz \
--cc=vincenzo.frascino@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.