public inbox for ceph-devel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fs/ceph: Fix kernel oops due invalid pointer for kfree() in parse_longname()
@ 2025-12-20 14:01 Daniel Vogelbacher
  2025-12-22 20:08 ` Viacheslav Dubeyko
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Daniel Vogelbacher @ 2025-12-20 14:01 UTC (permalink / raw)
  To: ceph-devel; +Cc: xiubli, idryomov

This fixes a kernel oops when reading ceph snapshot directories (.snap),
for example by simply run `ls /mnt/my_ceph/.snap`.

The bug was introduced in commit:

bb80f7618832 - parse_longname(): strrchr() expects NUL-terminated string

str is guarded by __free(kfree), but advanced later for skipping
the initial '_' in snapshot names.
This patch removes the need for advancing the pointer so kfree()
could do proper memory cleanup.

Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220807
Fixes: bb80f7618832 - parse_longname(): strrchr() expects NUL-terminated string

Cc: stable@vger.kernel.org
Suggested-by: Helge Deller <deller@gmx.de>
Signed-off-by: Daniel Vogelbacher <daniel@chaospixel.com>
---
 fs/ceph/crypto.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
index 0ea4db650f85..3e051972e49d 100644
--- a/fs/ceph/crypto.c
+++ b/fs/ceph/crypto.c
@@ -166,12 +166,12 @@ static struct inode *parse_longname(const struct inode *parent,
 	struct ceph_vino vino = { .snap = CEPH_NOSNAP };
 	char *name_end, *inode_number;
 	int ret = -EIO;
-	/* NUL-terminate */
-	char *str __free(kfree) = kmemdup_nul(name, *name_len, GFP_KERNEL);
+	if (*name_len <= 1)
+		return ERR_PTR(-EIO);
+	/* Skip initial '_' and NUL-terminate */
+	char *str __free(kfree) = kmemdup_nul(name + 1, *name_len - 1, GFP_KERNEL);
 	if (!str)
 		return ERR_PTR(-ENOMEM);
-	/* Skip initial '_' */
-	str++;
 	name_end = strrchr(str, '_');
 	if (!name_end) {
 		doutc(cl, "failed to parse long snapshot name: %s\n", str);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re:  [PATCH] fs/ceph: Fix kernel oops due invalid pointer for kfree() in parse_longname()
  2025-12-20 14:01 [PATCH] fs/ceph: Fix kernel oops due invalid pointer for kfree() in parse_longname() Daniel Vogelbacher
@ 2025-12-22 20:08 ` Viacheslav Dubeyko
  2025-12-22 21:26   ` Daniel Vogelbacher
  2026-02-01  8:34 ` [PATCH v2] " Daniel Vogelbacher
  2026-02-03 19:40 ` [PATCH v3] " Daniel Vogelbacher
  2 siblings, 1 reply; 14+ messages in thread
From: Viacheslav Dubeyko @ 2025-12-22 20:08 UTC (permalink / raw)
  To: daniel@chaospixel.com, ceph-devel@vger.kernel.org
  Cc: Xiubo Li, idryomov@gmail.com

On Sat, 2025-12-20 at 15:01 +0100, Daniel Vogelbacher wrote:
> This fixes a kernel oops when reading ceph snapshot directories (.snap),
> for example by simply run `ls /mnt/my_ceph/.snap`.
> 

Frankly speaking, it's completely not clear how this kernel oops can happen.
Could you please explain in more details how it can happen and what is the
nature of the issue? How the issue can be reproduced?

> The bug was introduced in commit:
> 
> bb80f7618832 - parse_longname(): strrchr() expects NUL-terminated string
> 
> str is guarded by __free(kfree), but advanced later for skipping
> the initial '_' in snapshot names.
> This patch removes the need for advancing the pointer so kfree()
> could do proper memory cleanup.
> 

I cannot follow of this explanation. What is the wrong? Why should we fix
something here?

> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220807 
> 

Why the issue had not been reported to CephFS community through email or by
means of https://tracker.ceph.com? 

Have you run xfstests for your patch?

>  
> Fixes: bb80f7618832 - parse_longname(): strrchr() expects NUL-terminated string
> 
> Cc: stable@vger.kernel.org
> Suggested-by: Helge Deller <deller@gmx.de>
> Signed-off-by: Daniel Vogelbacher <daniel@chaospixel.com>
> ---
>  fs/ceph/crypto.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
> index 0ea4db650f85..3e051972e49d 100644
> --- a/fs/ceph/crypto.c
> +++ b/fs/ceph/crypto.c
> @@ -166,12 +166,12 @@ static struct inode *parse_longname(const struct inode *parent,
>  	struct ceph_vino vino = { .snap = CEPH_NOSNAP };
>  	char *name_end, *inode_number;
>  	int ret = -EIO;
> -	/* NUL-terminate */
> -	char *str __free(kfree) = kmemdup_nul(name, *name_len, GFP_KERNEL);
> +	if (*name_len <= 1)

I believe that even if we have *name_len <= 1, then current logic can manage it.
Why do we need this fix? The commit message sounds really unclear for my taste.
Could you prove that we really need this fix?

Thanks,
Slava.

> +		return ERR_PTR(-EIO);
> +	/* Skip initial '_' and NUL-terminate */
> +	char *str __free(kfree) = kmemdup_nul(name + 1, *name_len - 1, GFP_KERNEL);
>  	if (!str)
>  		return ERR_PTR(-ENOMEM);
> -	/* Skip initial '_' */
> -	str++;
>  	name_end = strrchr(str, '_');
>  	if (!name_end) {
>  		doutc(cl, "failed to parse long snapshot name: %s\n", str);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fs/ceph: Fix kernel oops due invalid pointer for kfree() in parse_longname()
  2025-12-22 20:08 ` Viacheslav Dubeyko
@ 2025-12-22 21:26   ` Daniel Vogelbacher
  2025-12-23 22:49     ` Viacheslav Dubeyko
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel Vogelbacher @ 2025-12-22 21:26 UTC (permalink / raw)
  To: Viacheslav Dubeyko, ceph-devel@vger.kernel.org
  Cc: Xiubo Li, idryomov@gmail.com

On 12/22/25 21:08, Viacheslav Dubeyko wrote:
> On Sat, 2025-12-20 at 15:01 +0100, Daniel Vogelbacher wrote:
>> This fixes a kernel oops when reading ceph snapshot directories (.snap),
>> for example by simply run `ls /mnt/my_ceph/.snap`.
>>
> 
> Frankly speaking, it's completely not clear how this kernel oops can happen.
> Could you please explain in more details how it can happen and what is the
> nature of the issue? How the issue can be reproduced?

All I need to reproduce the issue is to run `ls .snap/` on any mounted 
cephfs mountpoint that contains scheduled snapshots. I've one prod VM 
(KVM) where I hit the issue after a Debian Trixie upgrade. To isolate 
it, I've created a fresh Trixie VM, dropped the distribution kernel and 
built a vanilla kernel to isolate the buggy commit by using git-bisect - 
and to ensure the bug was not introduced by any Debian patches. If that 
helps, it's a Squid 19.2.3 cluster.

So basically the steps are:

  * Setup a Ceph cluster with 19.2.3
  * Create a pool and cephfs
  * Create schedule snapshots for the fs
  * Mount the fs and populate it with a few files on any kernel version 
that contains bb80f7618832, that is >=6.12.41
  * Wait until there are scheduled snapshots created
  * run `ls /mnt/my/cephfs/.snap`

This should result in a kernel oops like:

[   53.703013] Oops: general protection fault, probably for 
non-canonical address 0xd0c22857c0000000: 0000 [#1] SMP PTI
[   53.703201] CPU: 11 UID: 0 PID: 360 Comm: kworker/11:2 Not tainted 
6.18.0-rc7 #41 PREEMPT(voluntary)
[   53.703281] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
1.16.2-debian-1.16.2-1 04/01/2014
[   53.703317] Workqueue: ceph-msgr ceph_con_workfn [libceph]
[   53.703424] RIP: 0010:rb_insert_color 
(/usr/src/linux/lib/rbtree.c:185 (discriminator 1) 
/usr/src/linux/lib/rbtree.c:436 (discriminator 1))
[   53.704503] Code: 76 17 48 83 e1 fc 48 3b 51 10 0f 84 b7 00 00 00 48 
89 41 08 c3 cc cc cc cc 48 89 06 c3 cc cc cc cc 48 8b 4a 10 48 85 c9 74 
05 <f6> 01 01 74 1b 48 8b 48 10 48 39 f9 74 68 48 89 c7 48 89 4a 08 48
All code
========
    0:	76 17                	jbe    0x19
    2:	48 83 e1 fc          	and    $0xfffffffffffffffc,%rcx
    6:	48 3b 51 10          	cmp    0x10(%rcx),%rdx
    a:	0f 84 b7 00 00 00    	je     0xc7
   10:	48 89 41 08          	mov    %rax,0x8(%rcx)
   14:	c3                   	ret
   15:	cc                   	int3
   16:	cc                   	int3
   17:	cc                   	int3
   18:	cc                   	int3
   19:	48 89 06             	mov    %rax,(%rsi)
   1c:	c3                   	ret
   1d:	cc                   	int3
   1e:	cc                   	int3
   1f:	cc                   	int3
   20:	cc                   	int3
   21:	48 8b 4a 10          	mov    0x10(%rdx),%rcx
   25:	48 85 c9             	test   %rcx,%rcx
   28:	74 05                	je     0x2f
   2a:*	f6 01 01             	testb  $0x1,(%rcx)		<-- trapping instruction
   2d:	74 1b                	je     0x4a
   2f:	48 8b 48 10          	mov    0x10(%rax),%rcx
   33:	48 39 f9             	cmp    %rdi,%rcx
   36:	74 68                	je     0xa0
   38:	48 89 c7             	mov    %rax,%rdi
   3b:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
   3f:	48                   	rex.W

Code starting with the faulting instruction
===========================================
    0:	f6 01 01             	testb  $0x1,(%rcx)
    3:	74 1b                	je     0x20
    5:	48 8b 48 10          	mov    0x10(%rax),%rcx
    9:	48 39 f9             	cmp    %rdi,%rcx
    c:	74 68                	je     0x76
    e:	48 89 c7             	mov    %rax,%rdi
   11:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
   15:	48                   	rex.W
[   53.704559] RSP: 0018:ffff9ab7c07579e0 EFLAGS: 00010286
[   53.704591] RAX: ffff8bd0c2285b40 RBX: ffff8bd0c2285240 RCX: 
d0c22857c0000000
[   53.704616] RDX: ffff8bd0c2285910 RSI: ffff8bd0c3e695c0 RDI: 
ffff8bd0c22855c0
[   53.704645] RBP: 0000000000002139 R08: 0000000000000000 R09: 
0000000000000000
[   53.704668] R10: 0000000000000000 R11: ffff8bd0c16244e0 R12: 
ffff8bd0c3e695b8
[   53.704691] R13: ffff8bd0c3b62000 R14: ffff8bd0c22857c0 R15: 
ffff8bd0c3e695c0
[   53.704714] FS:  0000000000000000(0000) GS:ffff8bd1815ca000(0000) 
knlGS:0000000000000000
[   53.704741] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   53.704762] CR2: 000055667ef28e10 CR3: 0000000106cc2005 CR4: 
0000000000772ef0
[   53.704790] PKRU: 55555554
[   53.704803] Call Trace:
[   53.704844]  <TASK>
[   53.704862] ceph_get_snapid_map 
(/usr/src/linux/./include/linux/spinlock.h:391 
/usr/src/linux/fs/ceph/snap.c:1255) ceph
[   53.704957] ceph_fill_inode (/usr/src/linux/fs/ceph/inode.c:1062 
(discriminator 2)) ceph
[   53.705019]  ? __pfx_ceph_set_ino_cb 
(/usr/src/linux/fs/ceph/inode.c:46) ceph
[   53.705074]  ? __pfx_ceph_ino_compare 
(/usr/src/linux/fs/ceph/super.h:595) ceph
[   53.705132] ceph_readdir_prepopulate 
(/usr/src/linux/fs/ceph/inode.c:2113) ceph
[   53.705191] mds_dispatch (/usr/src/linux/fs/ceph/mds_client.c:3993 
/usr/src/linux/fs/ceph/mds_client.c:6299) ceph
[   53.705253]  ? sock_recvmsg (/usr/src/linux/net/socket.c:1078 
(discriminator 1) /usr/src/linux/net/socket.c:1100 (discriminator 1))
[   53.705279] ceph_con_process_message 
(/usr/src/linux/net/ceph/messenger.c:1427) libceph
[   53.705347] process_message 
(/usr/src/linux/net/ceph/messenger_v2.c:2879) libceph
[   53.705406] ceph_con_v2_try_read 
(/usr/src/linux/net/ceph/messenger_v2.c:3043 
/usr/src/linux/net/ceph/messenger_v2.c:3099 
/usr/src/linux/net/ceph/messenger_v2.c:3148) libceph
[   53.705467]  ? psi_group_change (/usr/src/linux/kernel/sched/psi.c:876)
[   53.705488]  ? sched_balance_newidle 
(/usr/src/linux/kernel/sched/fair.c:12902 (discriminator 2))
[   53.705512]  ? psi_task_switch (/usr/src/linux/kernel/sched/psi.c:984 
(discriminator 2))
[   53.705532]  ? _raw_spin_unlock 
(/usr/src/linux/./arch/x86/include/asm/paravirt.h:562 
/usr/src/linux/./arch/x86/include/asm/qspinlock.h:57 
/usr/src/linux/./include/linux/spinlock.h:204 
/usr/src/linux/./include/linux/spinlock_api_smp.h:142 
/usr/src/linux/kernel/locking/spinlock.c:186)
[   53.705550]  ? finish_task_switch.isra.0 
(/usr/src/linux/./arch/x86/include/asm/paravirt.h:671 
/usr/src/linux/kernel/sched/sched.h:1559 
/usr/src/linux/kernel/sched/core.c:5073 
/usr/src/linux/kernel/sched/core.c:5191)
[   53.705575] ceph_con_workfn 
(/usr/src/linux/net/ceph/messenger.c:1578) libceph
[   53.705627]  process_one_work 
(/usr/src/linux/./arch/x86/include/asm/jump_label.h:36 
/usr/src/linux/./include/trace/events/workqueue.h:110 
/usr/src/linux/kernel/workqueue.c:3268)
[   53.705657]  worker_thread (/usr/src/linux/kernel/workqueue.c:3340 
(discriminator 2) /usr/src/linux/kernel/workqueue.c:3427 (discriminator 2))
[   53.705679]  ? __pfx_worker_thread 
(/usr/src/linux/kernel/workqueue.c:3373)
[   53.705700]  kthread (/usr/src/linux/kernel/kthread.c:463)
[   53.705717]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
[   53.705734]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
[   53.705752]  ret_from_fork (/usr/src/linux/arch/x86/kernel/process.c:164)
[   53.705776]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
[   53.705793]  ret_from_fork_asm 
(/usr/src/linux/arch/x86/entry/entry_64.S:255)
[   53.705826]  </TASK>
[   53.705842] Modules linked in: ceph netfs libceph cfg80211 rfkill 
8021q garp stp mrp llc binfmt_misc intel_rapl_msr intel_rapl_common 
intel_uncore_frequency_common kvm_intel virtio_gpu joydev kvm 
drm_client_lib virtio_dma_buf evdev drm_shmem_helper sg drm_kms_helper 
virtio_balloon button irqbypass ghash_clmulni_intel aesni_intel rapl 
pcspkr drm configfs efi_pstore nfnetlink vsock_loopback 
vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vmw_vmci 
vsock qemu_fw_cfg virtio_rng autofs4 ext4 crc16 mbcache jbd2 hid_generic 
usbhid hid sr_mod cdrom dm_mod ahci libahci libata xhci_pci iTCO_wdt 
intel_pmc_bxt xhci_hcd iTCO_vendor_support scsi_mod psmouse virtio_net 
i2c_i801 watchdog serio_raw i2c_smbus lpc_ich scsi_common usbcore 
net_failover failover virtio_blk usb_common
[   53.708740] ---[ end trace 0000000000000000 ]---
[   53.709462] RIP: 0010:rb_insert_color 
(/usr/src/linux/lib/rbtree.c:185 (discriminator 1) 
/usr/src/linux/lib/rbtree.c:436 (discriminator 1))
[   53.710118] Code: 76 17 48 83 e1 fc 48 3b 51 10 0f 84 b7 00 00 00 48 
89 41 08 c3 cc cc cc cc 48 89 06 c3 cc cc cc cc 48 8b 4a 10 48 85 c9 74 
05 <f6> 01 01 74 1b 48 8b 48 10 48 39 f9 74 68 48 89 c7 48 89 4a 08 48
All code
========
    0:	76 17                	jbe    0x19
    2:	48 83 e1 fc          	and    $0xfffffffffffffffc,%rcx
    6:	48 3b 51 10          	cmp    0x10(%rcx),%rdx
    a:	0f 84 b7 00 00 00    	je     0xc7
   10:	48 89 41 08          	mov    %rax,0x8(%rcx)
   14:	c3                   	ret
   15:	cc                   	int3
   16:	cc                   	int3
   17:	cc                   	int3
   18:	cc                   	int3
   19:	48 89 06             	mov    %rax,(%rsi)
   1c:	c3                   	ret
   1d:	cc                   	int3
   1e:	cc                   	int3
   1f:	cc                   	int3
   20:	cc                   	int3
   21:	48 8b 4a 10          	mov    0x10(%rdx),%rcx
   25:	48 85 c9             	test   %rcx,%rcx
   28:	74 05                	je     0x2f
   2a:*	f6 01 01             	testb  $0x1,(%rcx)		<-- trapping instruction
   2d:	74 1b                	je     0x4a
   2f:	48 8b 48 10          	mov    0x10(%rax),%rcx
   33:	48 39 f9             	cmp    %rdi,%rcx
   36:	74 68                	je     0xa0
   38:	48 89 c7             	mov    %rax,%rdi
   3b:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
   3f:	48                   	rex.W

Code starting with the faulting instruction
===========================================
    0:	f6 01 01             	testb  $0x1,(%rcx)
    3:	74 1b                	je     0x20
    5:	48 8b 48 10          	mov    0x10(%rax),%rcx
    9:	48 39 f9             	cmp    %rdi,%rcx
    c:	74 68                	je     0x76
    e:	48 89 c7             	mov    %rax,%rdi
   11:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
   15:	48                   	rex.W
[   53.711453] RSP: 0018:ffff9ab7c07579e0 EFLAGS: 00010286
[   53.712112] RAX: ffff8bd0c2285b40 RBX: ffff8bd0c2285240 RCX: 
d0c22857c0000000
[   53.712798] RDX: ffff8bd0c2285910 RSI: ffff8bd0c3e695c0 RDI: 
ffff8bd0c22855c0
[   53.713423] RBP: 0000000000002139 R08: 0000000000000000 R09: 
0000000000000000
[   53.714061] R10: 0000000000000000 R11: ffff8bd0c16244e0 R12: 
ffff8bd0c3e695b8
[   53.714696] R13: ffff8bd0c3b62000 R14: ffff8bd0c22857c0 R15: 
ffff8bd0c3e695c0
[   53.715321] FS:  0000000000000000(0000) GS:ffff8bd1815ca000(0000) 
knlGS:0000000000000000
[   53.715956] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   53.716651] CR2: 000055667ef28e10 CR3: 0000000106cc2005 CR4: 
0000000000772ef0
[   53.717295] PKRU: 55555554
[   53.717918] note: kworker/11:2[360] exited with preempt_count 1


>> The bug was introduced in commit:
>>
>> bb80f7618832 - parse_longname(): strrchr() expects NUL-terminated string
>>
>> str is guarded by __free(kfree), but advanced later for skipping
>> the initial '_' in snapshot names.
>> This patch removes the need for advancing the pointer so kfree()
>> could do proper memory cleanup.
>>
> 
> I cannot follow of this explanation. What is the wrong? Why should we fix
> something here?

In bb80f7618832, the pointer in variable "str" is guarded by 
__free(kfree), which means the pointer returned by kmemdup_nul() is 
automatically freed. kfree() should receive the same pointer as returned 
by kmemdump_nul(), but this is not the case, as the pointer is advanced 
by one. kmemdup_nul() may return for example 0x1234000, but kfree() is 
called with 0x1234001. I don't know the exact behavior of kfree(), but I 
assume calling kfree() with random pointers leads to UB?

>> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220807
>>
> 
> Why the issue had not been reported to CephFS community through email or by
> means of https://tracker.ceph.com?
It's a kernel bug and not related to any ceph packages, so I've reported 
it to the kernel issue tracking system.

> Have you run xfstests for your patch?
No, not aware of it. How is xfs related to cephfs?


>> Fixes: bb80f7618832 - parse_longname(): strrchr() expects NUL-terminated string
>>
>> Cc: stable@vger.kernel.org
>> Suggested-by: Helge Deller <deller@gmx.de>
>> Signed-off-by: Daniel Vogelbacher <daniel@chaospixel.com>
>> ---
>>   fs/ceph/crypto.c | 8 ++++----
>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
>> index 0ea4db650f85..3e051972e49d 100644
>> --- a/fs/ceph/crypto.c
>> +++ b/fs/ceph/crypto.c
>> @@ -166,12 +166,12 @@ static struct inode *parse_longname(const struct inode *parent,
>>   	struct ceph_vino vino = { .snap = CEPH_NOSNAP };
>>   	char *name_end, *inode_number;
>>   	int ret = -EIO;
>> -	/* NUL-terminate */
>> -	char *str __free(kfree) = kmemdup_nul(name, *name_len, GFP_KERNEL);
>> +	if (*name_len <= 1)
> 
> I believe that even if we have *name_len <= 1, then current logic can manage it.
> Why do we need this fix? The commit message sounds really unclear for my taste.
> Could you prove that we really need this fix?

I've added this protection because otherwise I do pointer arithmetic 
without checking bounds. I couldn't give you a better excuse :) I could
simply remove it on your request.


-- 
Best regards / Mit freundlichen Grüßen
Daniel Vogelbacher

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] fs/ceph: Fix kernel oops due invalid pointer for kfree() in parse_longname()
  2025-12-22 21:26   ` Daniel Vogelbacher
@ 2025-12-23 22:49     ` Viacheslav Dubeyko
  2026-01-20 13:42       ` Daniel Vogelbacher
  0 siblings, 1 reply; 14+ messages in thread
From: Viacheslav Dubeyko @ 2025-12-23 22:49 UTC (permalink / raw)
  To: daniel@chaospixel.com, ceph-devel@vger.kernel.org
  Cc: Xiubo Li, idryomov@gmail.com

On Mon, 2025-12-22 at 22:26 +0100, Daniel Vogelbacher wrote:
> On 12/22/25 21:08, Viacheslav Dubeyko wrote:
> > On Sat, 2025-12-20 at 15:01 +0100, Daniel Vogelbacher wrote:
> > > This fixes a kernel oops when reading ceph snapshot directories (.snap),
> > > for example by simply run `ls /mnt/my_ceph/.snap`.
> > > 
> > 
> > Frankly speaking, it's completely not clear how this kernel oops can happen.
> > Could you please explain in more details how it can happen and what is the
> > nature of the issue? How the issue can be reproduced?
> 
> All I need to reproduce the issue is to run `ls .snap/` on any mounted 
> cephfs mountpoint that contains scheduled snapshots. I've one prod VM 
> (KVM) where I hit the issue after a Debian Trixie upgrade. To isolate 
> it, I've created a fresh Trixie VM, dropped the distribution kernel and 
> built a vanilla kernel to isolate the buggy commit by using git-bisect - 
> and to ensure the bug was not introduced by any Debian patches. If that 
> helps, it's a Squid 19.2.3 cluster.
> 
> So basically the steps are:
> 
>   * Setup a Ceph cluster with 19.2.3
>   * Create a pool and cephfs
>   * Create schedule snapshots for the fs
>   * Mount the fs and populate it with a few files on any kernel version 
> that contains bb80f7618832, that is >=6.12.41
>   * Wait until there are scheduled snapshots created
>   * run `ls /mnt/my/cephfs/.snap`

It will be good to see the particular command that everyone can run to reproduce
the issue. You don't need to share the command for setup Ceph cluster, creating
pool and CephFS instance. But the rest steps are really important because mount
options and details of command that you run can change everything.

> 
> This should result in a kernel oops like:

The commit message could include oops details.

> 
> [   53.703013] Oops: general protection fault, probably for 
> non-canonical address 0xd0c22857c0000000: 0000 [#1] SMP PTI
> [   53.703201] CPU: 11 UID: 0 PID: 360 Comm: kworker/11:2 Not tainted 
> 6.18.0-rc7 #41 PREEMPT(voluntary)
> [   53.703281] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> 1.16.2-debian-1.16.2-1 04/01/2014
> [   53.703317] Workqueue: ceph-msgr ceph_con_workfn [libceph]
> [   53.703424] RIP: 0010:rb_insert_color 
> (/usr/src/linux/lib/rbtree.c:185 (discriminator 1) 
> /usr/src/linux/lib/rbtree.c:436 (discriminator 1))
> [   53.704503] Code: 76 17 48 83 e1 fc 48 3b 51 10 0f 84 b7 00 00 00 48 
> 89 41 08 c3 cc cc cc cc 48 89 06 c3 cc cc cc cc 48 8b 4a 10 48 85 c9 74 
> 05 <f6> 01 01 74 1b 48 8b 48 10 48 39 f9 74 68 48 89 c7 48 89 4a 08 48
> All code
> ========
>     0:	76 17                	jbe    0x19
>     2:	48 83 e1 fc          	and    $0xfffffffffffffffc,%rcx
>     6:	48 3b 51 10          	cmp    0x10(%rcx),%rdx
>     a:	0f 84 b7 00 00 00    	je     0xc7
>    10:	48 89 41 08          	mov    %rax,0x8(%rcx)
>    14:	c3                   	ret
>    15:	cc                   	int3
>    16:	cc                   	int3
>    17:	cc                   	int3
>    18:	cc                   	int3
>    19:	48 89 06             	mov    %rax,(%rsi)
>    1c:	c3                   	ret
>    1d:	cc                   	int3
>    1e:	cc                   	int3
>    1f:	cc                   	int3
>    20:	cc                   	int3
>    21:	48 8b 4a 10          	mov    0x10(%rdx),%rcx
>    25:	48 85 c9             	test   %rcx,%rcx
>    28:	74 05                	je     0x2f
>    2a:*	f6 01 01             	testb  $0x1,(%rcx)		<-- trapping instruction
>    2d:	74 1b                	je     0x4a
>    2f:	48 8b 48 10          	mov    0x10(%rax),%rcx
>    33:	48 39 f9             	cmp    %rdi,%rcx
>    36:	74 68                	je     0xa0
>    38:	48 89 c7             	mov    %rax,%rdi
>    3b:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
>    3f:	48                   	rex.W
> 
> Code starting with the faulting instruction
> ===========================================
>     0:	f6 01 01             	testb  $0x1,(%rcx)
>     3:	74 1b                	je     0x20
>     5:	48 8b 48 10          	mov    0x10(%rax),%rcx
>     9:	48 39 f9             	cmp    %rdi,%rcx
>     c:	74 68                	je     0x76
>     e:	48 89 c7             	mov    %rax,%rdi
>    11:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
>    15:	48                   	rex.W
> [   53.704559] RSP: 0018:ffff9ab7c07579e0 EFLAGS: 00010286
> [   53.704591] RAX: ffff8bd0c2285b40 RBX: ffff8bd0c2285240 RCX: 
> d0c22857c0000000
> [   53.704616] RDX: ffff8bd0c2285910 RSI: ffff8bd0c3e695c0 RDI: 
> ffff8bd0c22855c0
> [   53.704645] RBP: 0000000000002139 R08: 0000000000000000 R09: 
> 0000000000000000
> [   53.704668] R10: 0000000000000000 R11: ffff8bd0c16244e0 R12: 
> ffff8bd0c3e695b8
> [   53.704691] R13: ffff8bd0c3b62000 R14: ffff8bd0c22857c0 R15: 
> ffff8bd0c3e695c0
> [   53.704714] FS:  0000000000000000(0000) GS:ffff8bd1815ca000(0000) 
> knlGS:0000000000000000
> [   53.704741] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   53.704762] CR2: 000055667ef28e10 CR3: 0000000106cc2005 CR4: 
> 0000000000772ef0
> [   53.704790] PKRU: 55555554
> [   53.704803] Call Trace:
> [   53.704844]  <TASK>
> [   53.704862] ceph_get_snapid_map 
> (/usr/src/linux/./include/linux/spinlock.h:391 
> /usr/src/linux/fs/ceph/snap.c:1255) ceph
> [   53.704957] ceph_fill_inode (/usr/src/linux/fs/ceph/inode.c:1062 
> (discriminator 2)) ceph
> [   53.705019]  ? __pfx_ceph_set_ino_cb 
> (/usr/src/linux/fs/ceph/inode.c:46) ceph
> [   53.705074]  ? __pfx_ceph_ino_compare 
> (/usr/src/linux/fs/ceph/super.h:595) ceph
> [   53.705132] ceph_readdir_prepopulate 
> (/usr/src/linux/fs/ceph/inode.c:2113) ceph
> [   53.705191] mds_dispatch (/usr/src/linux/fs/ceph/mds_client.c:3993 
> /usr/src/linux/fs/ceph/mds_client.c:6299) ceph
> [   53.705253]  ? sock_recvmsg (/usr/src/linux/net/socket.c:1078 
> (discriminator 1) /usr/src/linux/net/socket.c:1100 (discriminator 1))
> [   53.705279] ceph_con_process_message 
> (/usr/src/linux/net/ceph/messenger.c:1427) libceph
> [   53.705347] process_message 
> (/usr/src/linux/net/ceph/messenger_v2.c:2879) libceph
> [   53.705406] ceph_con_v2_try_read 
> (/usr/src/linux/net/ceph/messenger_v2.c:3043 
> /usr/src/linux/net/ceph/messenger_v2.c:3099 
> /usr/src/linux/net/ceph/messenger_v2.c:3148) libceph
> [   53.705467]  ? psi_group_change (/usr/src/linux/kernel/sched/psi.c:876)
> [   53.705488]  ? sched_balance_newidle 
> (/usr/src/linux/kernel/sched/fair.c:12902 (discriminator 2))
> [   53.705512]  ? psi_task_switch (/usr/src/linux/kernel/sched/psi.c:984 
> (discriminator 2))
> [   53.705532]  ? _raw_spin_unlock 
> (/usr/src/linux/./arch/x86/include/asm/paravirt.h:562 
> /usr/src/linux/./arch/x86/include/asm/qspinlock.h:57 
> /usr/src/linux/./include/linux/spinlock.h:204 
> /usr/src/linux/./include/linux/spinlock_api_smp.h:142 
> /usr/src/linux/kernel/locking/spinlock.c:186)
> [   53.705550]  ? finish_task_switch.isra.0 
> (/usr/src/linux/./arch/x86/include/asm/paravirt.h:671 
> /usr/src/linux/kernel/sched/sched.h:1559 
> /usr/src/linux/kernel/sched/core.c:5073 
> /usr/src/linux/kernel/sched/core.c:5191)
> [   53.705575] ceph_con_workfn 
> (/usr/src/linux/net/ceph/messenger.c:1578) libceph
> [   53.705627]  process_one_work 
> (/usr/src/linux/./arch/x86/include/asm/jump_label.h:36 
> /usr/src/linux/./include/trace/events/workqueue.h:110 
> /usr/src/linux/kernel/workqueue.c:3268)
> [   53.705657]  worker_thread (/usr/src/linux/kernel/workqueue.c:3340 
> (discriminator 2) /usr/src/linux/kernel/workqueue.c:3427 (discriminator 2))
> [   53.705679]  ? __pfx_worker_thread 
> (/usr/src/linux/kernel/workqueue.c:3373)
> [   53.705700]  kthread (/usr/src/linux/kernel/kthread.c:463)
> [   53.705717]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
> [   53.705734]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
> [   53.705752]  ret_from_fork (/usr/src/linux/arch/x86/kernel/process.c:164)
> [   53.705776]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
> [   53.705793]  ret_from_fork_asm 
> (/usr/src/linux/arch/x86/entry/entry_64.S:255)
> [   53.705826]  </TASK>
> [   53.705842] Modules linked in: ceph netfs libceph cfg80211 rfkill 
> 8021q garp stp mrp llc binfmt_misc intel_rapl_msr intel_rapl_common 
> intel_uncore_frequency_common kvm_intel virtio_gpu joydev kvm 
> drm_client_lib virtio_dma_buf evdev drm_shmem_helper sg drm_kms_helper 
> virtio_balloon button irqbypass ghash_clmulni_intel aesni_intel rapl 
> pcspkr drm configfs efi_pstore nfnetlink vsock_loopback 
> vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vmw_vmci 
> vsock qemu_fw_cfg virtio_rng autofs4 ext4 crc16 mbcache jbd2 hid_generic 
> usbhid hid sr_mod cdrom dm_mod ahci libahci libata xhci_pci iTCO_wdt 
> intel_pmc_bxt xhci_hcd iTCO_vendor_support scsi_mod psmouse virtio_net 
> i2c_i801 watchdog serio_raw i2c_smbus lpc_ich scsi_common usbcore 
> net_failover failover virtio_blk usb_common
> [   53.708740] ---[ end trace 0000000000000000 ]---
> [   53.709462] RIP: 0010:rb_insert_color 
> (/usr/src/linux/lib/rbtree.c:185 (discriminator 1) 
> /usr/src/linux/lib/rbtree.c:436 (discriminator 1))
> [   53.710118] Code: 76 17 48 83 e1 fc 48 3b 51 10 0f 84 b7 00 00 00 48 
> 89 41 08 c3 cc cc cc cc 48 89 06 c3 cc cc cc cc 48 8b 4a 10 48 85 c9 74 
> 05 <f6> 01 01 74 1b 48 8b 48 10 48 39 f9 74 68 48 89 c7 48 89 4a 08 48
> All code
> ========
>     0:	76 17                	jbe    0x19
>     2:	48 83 e1 fc          	and    $0xfffffffffffffffc,%rcx
>     6:	48 3b 51 10          	cmp    0x10(%rcx),%rdx
>     a:	0f 84 b7 00 00 00    	je     0xc7
>    10:	48 89 41 08          	mov    %rax,0x8(%rcx)
>    14:	c3                   	ret
>    15:	cc                   	int3
>    16:	cc                   	int3
>    17:	cc                   	int3
>    18:	cc                   	int3
>    19:	48 89 06             	mov    %rax,(%rsi)
>    1c:	c3                   	ret
>    1d:	cc                   	int3
>    1e:	cc                   	int3
>    1f:	cc                   	int3
>    20:	cc                   	int3
>    21:	48 8b 4a 10          	mov    0x10(%rdx),%rcx
>    25:	48 85 c9             	test   %rcx,%rcx
>    28:	74 05                	je     0x2f
>    2a:*	f6 01 01             	testb  $0x1,(%rcx)		<-- trapping instruction
>    2d:	74 1b                	je     0x4a
>    2f:	48 8b 48 10          	mov    0x10(%rax),%rcx
>    33:	48 39 f9             	cmp    %rdi,%rcx
>    36:	74 68                	je     0xa0
>    38:	48 89 c7             	mov    %rax,%rdi
>    3b:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
>    3f:	48                   	rex.W
> 
> Code starting with the faulting instruction
> ===========================================
>     0:	f6 01 01             	testb  $0x1,(%rcx)
>     3:	74 1b                	je     0x20
>     5:	48 8b 48 10          	mov    0x10(%rax),%rcx
>     9:	48 39 f9             	cmp    %rdi,%rcx
>     c:	74 68                	je     0x76
>     e:	48 89 c7             	mov    %rax,%rdi
>    11:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
>    15:	48                   	rex.W
> [   53.711453] RSP: 0018:ffff9ab7c07579e0 EFLAGS: 00010286
> [   53.712112] RAX: ffff8bd0c2285b40 RBX: ffff8bd0c2285240 RCX: 
> d0c22857c0000000
> [   53.712798] RDX: ffff8bd0c2285910 RSI: ffff8bd0c3e695c0 RDI: 
> ffff8bd0c22855c0
> [   53.713423] RBP: 0000000000002139 R08: 0000000000000000 R09: 
> 0000000000000000
> [   53.714061] R10: 0000000000000000 R11: ffff8bd0c16244e0 R12: 
> ffff8bd0c3e695b8
> [   53.714696] R13: ffff8bd0c3b62000 R14: ffff8bd0c22857c0 R15: 
> ffff8bd0c3e695c0
> [   53.715321] FS:  0000000000000000(0000) GS:ffff8bd1815ca000(0000) 
> knlGS:0000000000000000
> [   53.715956] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   53.716651] CR2: 000055667ef28e10 CR3: 0000000106cc2005 CR4: 
> 0000000000772ef0
> [   53.717295] PKRU: 55555554
> [   53.717918] note: kworker/11:2[360] exited with preempt_count 1
> 
> 
> > > The bug was introduced in commit:
> > > 
> > > bb80f7618832 - parse_longname(): strrchr() expects NUL-terminated string
> > > 
> > > str is guarded by __free(kfree), but advanced later for skipping
> > > the initial '_' in snapshot names.
> > > This patch removes the need for advancing the pointer so kfree()
> > > could do proper memory cleanup.
> > > 
> > 
> > I cannot follow of this explanation. What is the wrong? Why should we fix
> > something here?
> 
> In bb80f7618832, the pointer in variable "str" is guarded by 
> __free(kfree), which means the pointer returned by kmemdup_nul() is 
> automatically freed. kfree() should receive the same pointer as returned 
> by kmemdump_nul(), but this is not the case, as the pointer is advanced 
> by one. kmemdup_nul() may return for example 0x1234000, but kfree() is 
> called with 0x1234001. I don't know the exact behavior of kfree(), but I 
> assume calling kfree() with random pointers leads to UB?

Please, see my comments below.

> 
> > > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220807  
> > > 
> > 
> > Why the issue had not been reported to CephFS community through email or by
> > means of https://tracker.ceph.com?  
> It's a kernel bug and not related to any ceph packages, so I've reported 
> it to the kernel issue tracking system.
> 
> > Have you run xfstests for your patch?
> No, not aware of it. How is xfs related to cephfs?

The xfstests is the regression testing suite that is used for testing all of
Linux file systems (and CephFS too). But if you are not file system guy, then
it's OK that you didn't run the xfstests.

> 
> 
> > > Fixes: bb80f7618832 - parse_longname(): strrchr() expects NUL-terminated string
> > > 
> > > Cc: stable@vger.kernel.org
> > > Suggested-by: Helge Deller <deller@gmx.de>
> > > Signed-off-by: Daniel Vogelbacher <daniel@chaospixel.com>
> > > ---
> > >   fs/ceph/crypto.c | 8 ++++----
> > >   1 file changed, 4 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
> > > index 0ea4db650f85..3e051972e49d 100644
> > > --- a/fs/ceph/crypto.c
> > > +++ b/fs/ceph/crypto.c
> > > @@ -166,12 +166,12 @@ static struct inode *parse_longname(const struct inode *parent,
> > >   	struct ceph_vino vino = { .snap = CEPH_NOSNAP };
> > >   	char *name_end, *inode_number;
> > >   	int ret = -EIO;
> > > -	/* NUL-terminate */
> > > -	char *str __free(kfree) = kmemdup_nul(name, *name_len, GFP_KERNEL);
> > > +	if (*name_len <= 1)
> > 
> > I believe that even if we have *name_len <= 1, then current logic can manage it.
> > Why do we need this fix? The commit message sounds really unclear for my taste.
> > Could you prove that we really need this fix?
> 
> I've added this protection because otherwise I do pointer arithmetic 
> without checking bounds. I couldn't give you a better excuse :) I could
> simply remove it on your request.
> 

OK. Let's analyze the code again.

	char *str __free(kfree) = kmemdup_nul(name, *name_len, GFP_KERNEL);
	if (!str)
		return ERR_PTR(-ENOMEM);
	/* Skip initial '_' */
	str++;
	name_end = strrchr(str, '_');
	if (!name_end) {
		doutc(cl, "failed to parse long snapshot name: %s\n", str);
		return ERR_PTR(-EIO);
	}
	*name_len = (name_end - str);
	if (*name_len <= 0) {
		pr_err_client(cl, "failed to parse long snapshot name\n");
		return ERR_PTR(-EIO);
	}

First of all, we try to create a NULL-terminated string from unterminated data.
If we provide name_len == 0, then we should allocate 1 byte/symbol string that
contains only termination symbol. Potentially, we could not allocate memory at
all if we are under memory pressure (this situation is managed by !str check).
However, it doesn't make sense to try to allocate memory at that case. So, the
length check at the beginning makes sense:

	if (*name_len <= 0)
		return ERR_PTR(-EIO);

Next, we expect to have '_' at the beginning. Let's imagine that we don't have
any '_' in the provided string, then it make sense to try to allocate memory. I
suggest to call this next:

        name_end = strnchr(name, *name_len, '_');
        if (!name_end) {
		doutc(cl, "failed to parse long snapshot name: %s\n", str);
		return ERR_PTR(-EIO);
	} else if (name != name_end) {
                /* we expect '_' at the beginning */
		doutc(cl, "failed to parse long snapshot name: %s\n", str);
		return ERR_PTR(-EIO);
        }

If we have found the first instance of '_' at the beginning of name, then it
makes sense to continue logic.

        if (*name_len <= 1)
		return ERR_PTR(-EIO);

And here we can continue the existing logic:

	char *str __free(kfree) = kmemdup_nul(name, *name_len, GFP_KERNEL);
	if (!str)
		return ERR_PTR(-ENOMEM);
	/* Skip initial '_' */
	str++;
	name_end = strrchr(str, '_');
	if (!name_end) {
		doutc(cl, "failed to parse long snapshot name: %s\n", str);
		return ERR_PTR(-EIO);
	}
	*name_len = (name_end - str);
	if (*name_len <= 0) {
		pr_err_client(cl, "failed to parse long snapshot name\n");
		return ERR_PTR(-EIO);
	}

Does this logic make sense to you?

However, I have started to think... Could we completely remove the kmemdup_nul()
and to operate with the initial name only? I think it's possible, we simply need
to use much smarter technique of string analysis. What do you think? It will be
good to exclude the memory allocation here.

Thanks,
Slava.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fs/ceph: Fix kernel oops due invalid pointer for kfree() in parse_longname()
  2025-12-23 22:49     ` Viacheslav Dubeyko
@ 2026-01-20 13:42       ` Daniel Vogelbacher
  2026-01-21 20:44         ` Viacheslav Dubeyko
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel Vogelbacher @ 2026-01-20 13:42 UTC (permalink / raw)
  To: Viacheslav Dubeyko
  Cc: ceph-devel@vger.kernel.org, Xiubo Li, idryomov@gmail.com

On Tue, Dec 23, 2025 at 10:49:36PM +0000, Viacheslav Dubeyko wrote:
> On Mon, 2025-12-22 at 22:26 +0100, Daniel Vogelbacher wrote:
> > On 12/22/25 21:08, Viacheslav Dubeyko wrote:
> > > On Sat, 2025-12-20 at 15:01 +0100, Daniel Vogelbacher wrote:
> > > > This fixes a kernel oops when reading ceph snapshot directories (.snap),
> > > > for example by simply run `ls /mnt/my_ceph/.snap`.
> > > > 
> > > 
> > > Frankly speaking, it's completely not clear how this kernel oops can happen.
> > > Could you please explain in more details how it can happen and what is the
> > > nature of the issue? How the issue can be reproduced?
> > 
> > All I need to reproduce the issue is to run `ls .snap/` on any mounted 
> > cephfs mountpoint that contains scheduled snapshots. I've one prod VM 
> > (KVM) where I hit the issue after a Debian Trixie upgrade. To isolate 
> > it, I've created a fresh Trixie VM, dropped the distribution kernel and 
> > built a vanilla kernel to isolate the buggy commit by using git-bisect - 
> > and to ensure the bug was not introduced by any Debian patches. If that 
> > helps, it's a Squid 19.2.3 cluster.
> > 
> > So basically the steps are:
> > 
> >   * Setup a Ceph cluster with 19.2.3
> >   * Create a pool and cephfs
> >   * Create schedule snapshots for the fs
> >   * Mount the fs and populate it with a few files on any kernel version 
> > that contains bb80f7618832, that is >=6.12.41
> >   * Wait until there are scheduled snapshots created
> >   * run `ls /mnt/my/cephfs/.snap`
> 
> It will be good to see the particular command that everyone can run to reproduce
> the issue. You don't need to share the command for setup Ceph cluster, creating
> pool and CephFS instance. But the rest steps are really important because mount
> options and details of command that you run can change everything.

These are the steps to reproduce on a new VM:

# echo "samba-fileserver@.files=/volumes/datapool/stuff/3461082b-ecc9-4e82-8549-3fd2590d3fb6      /mnt/test/stuff   ceph     acl,noatime,_netdev    0       0" >> /etc/fstab

Reboot the system
# systemctl reboot

Check if it's really mounted
# mount | grep stuff

List snapshots (expected 63 snapshots)
# ls /mnt/test/stuff/.snap

Now ls hangs forever and the kernel log shows the oops.

> 
> > 
> > This should result in a kernel oops like:
> 
> The commit message could include oops details.
> 
> > 
> > [   53.703013] Oops: general protection fault, probably for 
> > non-canonical address 0xd0c22857c0000000: 0000 [#1] SMP PTI
> > [   53.703201] CPU: 11 UID: 0 PID: 360 Comm: kworker/11:2 Not tainted 
> > 6.18.0-rc7 #41 PREEMPT(voluntary)
> > [   53.703281] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> > 1.16.2-debian-1.16.2-1 04/01/2014
> > [   53.703317] Workqueue: ceph-msgr ceph_con_workfn [libceph]
> > [   53.703424] RIP: 0010:rb_insert_color 
> > (/usr/src/linux/lib/rbtree.c:185 (discriminator 1) 
> > /usr/src/linux/lib/rbtree.c:436 (discriminator 1))
> > [   53.704503] Code: 76 17 48 83 e1 fc 48 3b 51 10 0f 84 b7 00 00 00 48 
> > 89 41 08 c3 cc cc cc cc 48 89 06 c3 cc cc cc cc 48 8b 4a 10 48 85 c9 74 
> > 05 <f6> 01 01 74 1b 48 8b 48 10 48 39 f9 74 68 48 89 c7 48 89 4a 08 48
> > All code
> > ========
> >     0:	76 17                	jbe    0x19
> >     2:	48 83 e1 fc          	and    $0xfffffffffffffffc,%rcx
> >     6:	48 3b 51 10          	cmp    0x10(%rcx),%rdx
> >     a:	0f 84 b7 00 00 00    	je     0xc7
> >    10:	48 89 41 08          	mov    %rax,0x8(%rcx)
> >    14:	c3                   	ret
> >    15:	cc                   	int3
> >    16:	cc                   	int3
> >    17:	cc                   	int3
> >    18:	cc                   	int3
> >    19:	48 89 06             	mov    %rax,(%rsi)
> >    1c:	c3                   	ret
> >    1d:	cc                   	int3
> >    1e:	cc                   	int3
> >    1f:	cc                   	int3
> >    20:	cc                   	int3
> >    21:	48 8b 4a 10          	mov    0x10(%rdx),%rcx
> >    25:	48 85 c9             	test   %rcx,%rcx
> >    28:	74 05                	je     0x2f
> >    2a:*	f6 01 01             	testb  $0x1,(%rcx)		<-- trapping instruction
> >    2d:	74 1b                	je     0x4a
> >    2f:	48 8b 48 10          	mov    0x10(%rax),%rcx
> >    33:	48 39 f9             	cmp    %rdi,%rcx
> >    36:	74 68                	je     0xa0
> >    38:	48 89 c7             	mov    %rax,%rdi
> >    3b:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
> >    3f:	48                   	rex.W
> > 
> > Code starting with the faulting instruction
> > ===========================================
> >     0:	f6 01 01             	testb  $0x1,(%rcx)
> >     3:	74 1b                	je     0x20
> >     5:	48 8b 48 10          	mov    0x10(%rax),%rcx
> >     9:	48 39 f9             	cmp    %rdi,%rcx
> >     c:	74 68                	je     0x76
> >     e:	48 89 c7             	mov    %rax,%rdi
> >    11:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
> >    15:	48                   	rex.W
> > [   53.704559] RSP: 0018:ffff9ab7c07579e0 EFLAGS: 00010286
> > [   53.704591] RAX: ffff8bd0c2285b40 RBX: ffff8bd0c2285240 RCX: 
> > d0c22857c0000000
> > [   53.704616] RDX: ffff8bd0c2285910 RSI: ffff8bd0c3e695c0 RDI: 
> > ffff8bd0c22855c0
> > [   53.704645] RBP: 0000000000002139 R08: 0000000000000000 R09: 
> > 0000000000000000
> > [   53.704668] R10: 0000000000000000 R11: ffff8bd0c16244e0 R12: 
> > ffff8bd0c3e695b8
> > [   53.704691] R13: ffff8bd0c3b62000 R14: ffff8bd0c22857c0 R15: 
> > ffff8bd0c3e695c0
> > [   53.704714] FS:  0000000000000000(0000) GS:ffff8bd1815ca000(0000) 
> > knlGS:0000000000000000
> > [   53.704741] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   53.704762] CR2: 000055667ef28e10 CR3: 0000000106cc2005 CR4: 
> > 0000000000772ef0
> > [   53.704790] PKRU: 55555554
> > [   53.704803] Call Trace:
> > [   53.704844]  <TASK>
> > [   53.704862] ceph_get_snapid_map 
> > (/usr/src/linux/./include/linux/spinlock.h:391 
> > /usr/src/linux/fs/ceph/snap.c:1255) ceph
> > [   53.704957] ceph_fill_inode (/usr/src/linux/fs/ceph/inode.c:1062 
> > (discriminator 2)) ceph
> > [   53.705019]  ? __pfx_ceph_set_ino_cb 
> > (/usr/src/linux/fs/ceph/inode.c:46) ceph
> > [   53.705074]  ? __pfx_ceph_ino_compare 
> > (/usr/src/linux/fs/ceph/super.h:595) ceph
> > [   53.705132] ceph_readdir_prepopulate 
> > (/usr/src/linux/fs/ceph/inode.c:2113) ceph
> > [   53.705191] mds_dispatch (/usr/src/linux/fs/ceph/mds_client.c:3993 
> > /usr/src/linux/fs/ceph/mds_client.c:6299) ceph
> > [   53.705253]  ? sock_recvmsg (/usr/src/linux/net/socket.c:1078 
> > (discriminator 1) /usr/src/linux/net/socket.c:1100 (discriminator 1))
> > [   53.705279] ceph_con_process_message 
> > (/usr/src/linux/net/ceph/messenger.c:1427) libceph
> > [   53.705347] process_message 
> > (/usr/src/linux/net/ceph/messenger_v2.c:2879) libceph
> > [   53.705406] ceph_con_v2_try_read 
> > (/usr/src/linux/net/ceph/messenger_v2.c:3043 
> > /usr/src/linux/net/ceph/messenger_v2.c:3099 
> > /usr/src/linux/net/ceph/messenger_v2.c:3148) libceph
> > [   53.705467]  ? psi_group_change (/usr/src/linux/kernel/sched/psi.c:876)
> > [   53.705488]  ? sched_balance_newidle 
> > (/usr/src/linux/kernel/sched/fair.c:12902 (discriminator 2))
> > [   53.705512]  ? psi_task_switch (/usr/src/linux/kernel/sched/psi.c:984 
> > (discriminator 2))
> > [   53.705532]  ? _raw_spin_unlock 
> > (/usr/src/linux/./arch/x86/include/asm/paravirt.h:562 
> > /usr/src/linux/./arch/x86/include/asm/qspinlock.h:57 
> > /usr/src/linux/./include/linux/spinlock.h:204 
> > /usr/src/linux/./include/linux/spinlock_api_smp.h:142 
> > /usr/src/linux/kernel/locking/spinlock.c:186)
> > [   53.705550]  ? finish_task_switch.isra.0 
> > (/usr/src/linux/./arch/x86/include/asm/paravirt.h:671 
> > /usr/src/linux/kernel/sched/sched.h:1559 
> > /usr/src/linux/kernel/sched/core.c:5073 
> > /usr/src/linux/kernel/sched/core.c:5191)
> > [   53.705575] ceph_con_workfn 
> > (/usr/src/linux/net/ceph/messenger.c:1578) libceph
> > [   53.705627]  process_one_work 
> > (/usr/src/linux/./arch/x86/include/asm/jump_label.h:36 
> > /usr/src/linux/./include/trace/events/workqueue.h:110 
> > /usr/src/linux/kernel/workqueue.c:3268)
> > [   53.705657]  worker_thread (/usr/src/linux/kernel/workqueue.c:3340 
> > (discriminator 2) /usr/src/linux/kernel/workqueue.c:3427 (discriminator 2))
> > [   53.705679]  ? __pfx_worker_thread 
> > (/usr/src/linux/kernel/workqueue.c:3373)
> > [   53.705700]  kthread (/usr/src/linux/kernel/kthread.c:463)
> > [   53.705717]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
> > [   53.705734]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
> > [   53.705752]  ret_from_fork (/usr/src/linux/arch/x86/kernel/process.c:164)
> > [   53.705776]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
> > [   53.705793]  ret_from_fork_asm 
> > (/usr/src/linux/arch/x86/entry/entry_64.S:255)
> > [   53.705826]  </TASK>
> > [   53.705842] Modules linked in: ceph netfs libceph cfg80211 rfkill 
> > 8021q garp stp mrp llc binfmt_misc intel_rapl_msr intel_rapl_common 
> > intel_uncore_frequency_common kvm_intel virtio_gpu joydev kvm 
> > drm_client_lib virtio_dma_buf evdev drm_shmem_helper sg drm_kms_helper 
> > virtio_balloon button irqbypass ghash_clmulni_intel aesni_intel rapl 
> > pcspkr drm configfs efi_pstore nfnetlink vsock_loopback 
> > vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vmw_vmci 
> > vsock qemu_fw_cfg virtio_rng autofs4 ext4 crc16 mbcache jbd2 hid_generic 
> > usbhid hid sr_mod cdrom dm_mod ahci libahci libata xhci_pci iTCO_wdt 
> > intel_pmc_bxt xhci_hcd iTCO_vendor_support scsi_mod psmouse virtio_net 
> > i2c_i801 watchdog serio_raw i2c_smbus lpc_ich scsi_common usbcore 
> > net_failover failover virtio_blk usb_common
> > [   53.708740] ---[ end trace 0000000000000000 ]---
> > [   53.709462] RIP: 0010:rb_insert_color 
> > (/usr/src/linux/lib/rbtree.c:185 (discriminator 1) 
> > /usr/src/linux/lib/rbtree.c:436 (discriminator 1))
> > [   53.710118] Code: 76 17 48 83 e1 fc 48 3b 51 10 0f 84 b7 00 00 00 48 
> > 89 41 08 c3 cc cc cc cc 48 89 06 c3 cc cc cc cc 48 8b 4a 10 48 85 c9 74 
> > 05 <f6> 01 01 74 1b 48 8b 48 10 48 39 f9 74 68 48 89 c7 48 89 4a 08 48
> > All code
> > ========
> >     0:	76 17                	jbe    0x19
> >     2:	48 83 e1 fc          	and    $0xfffffffffffffffc,%rcx
> >     6:	48 3b 51 10          	cmp    0x10(%rcx),%rdx
> >     a:	0f 84 b7 00 00 00    	je     0xc7
> >    10:	48 89 41 08          	mov    %rax,0x8(%rcx)
> >    14:	c3                   	ret
> >    15:	cc                   	int3
> >    16:	cc                   	int3
> >    17:	cc                   	int3
> >    18:	cc                   	int3
> >    19:	48 89 06             	mov    %rax,(%rsi)
> >    1c:	c3                   	ret
> >    1d:	cc                   	int3
> >    1e:	cc                   	int3
> >    1f:	cc                   	int3
> >    20:	cc                   	int3
> >    21:	48 8b 4a 10          	mov    0x10(%rdx),%rcx
> >    25:	48 85 c9             	test   %rcx,%rcx
> >    28:	74 05                	je     0x2f
> >    2a:*	f6 01 01             	testb  $0x1,(%rcx)		<-- trapping instruction
> >    2d:	74 1b                	je     0x4a
> >    2f:	48 8b 48 10          	mov    0x10(%rax),%rcx
> >    33:	48 39 f9             	cmp    %rdi,%rcx
> >    36:	74 68                	je     0xa0
> >    38:	48 89 c7             	mov    %rax,%rdi
> >    3b:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
> >    3f:	48                   	rex.W
> > 
> > Code starting with the faulting instruction
> > ===========================================
> >     0:	f6 01 01             	testb  $0x1,(%rcx)
> >     3:	74 1b                	je     0x20
> >     5:	48 8b 48 10          	mov    0x10(%rax),%rcx
> >     9:	48 39 f9             	cmp    %rdi,%rcx
> >     c:	74 68                	je     0x76
> >     e:	48 89 c7             	mov    %rax,%rdi
> >    11:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
> >    15:	48                   	rex.W
> > [   53.711453] RSP: 0018:ffff9ab7c07579e0 EFLAGS: 00010286
> > [   53.712112] RAX: ffff8bd0c2285b40 RBX: ffff8bd0c2285240 RCX: 
> > d0c22857c0000000
> > [   53.712798] RDX: ffff8bd0c2285910 RSI: ffff8bd0c3e695c0 RDI: 
> > ffff8bd0c22855c0
> > [   53.713423] RBP: 0000000000002139 R08: 0000000000000000 R09: 
> > 0000000000000000
> > [   53.714061] R10: 0000000000000000 R11: ffff8bd0c16244e0 R12: 
> > ffff8bd0c3e695b8
> > [   53.714696] R13: ffff8bd0c3b62000 R14: ffff8bd0c22857c0 R15: 
> > ffff8bd0c3e695c0
> > [   53.715321] FS:  0000000000000000(0000) GS:ffff8bd1815ca000(0000) 
> > knlGS:0000000000000000
> > [   53.715956] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   53.716651] CR2: 000055667ef28e10 CR3: 0000000106cc2005 CR4: 
> > 0000000000772ef0
> > [   53.717295] PKRU: 55555554
> > [   53.717918] note: kworker/11:2[360] exited with preempt_count 1
> > 
> > 
> > > > The bug was introduced in commit:
> > > > 
> > > > bb80f7618832 - parse_longname(): strrchr() expects NUL-terminated string
> > > > 
> > > > str is guarded by __free(kfree), but advanced later for skipping
> > > > the initial '_' in snapshot names.
> > > > This patch removes the need for advancing the pointer so kfree()
> > > > could do proper memory cleanup.
> > > > 
> > > 
> > > I cannot follow of this explanation. What is the wrong? Why should we fix
> > > something here?
> > 
> > In bb80f7618832, the pointer in variable "str" is guarded by 
> > __free(kfree), which means the pointer returned by kmemdup_nul() is 
> > automatically freed. kfree() should receive the same pointer as returned 
> > by kmemdump_nul(), but this is not the case, as the pointer is advanced 
> > by one. kmemdup_nul() may return for example 0x1234000, but kfree() is 
> > called with 0x1234001. I don't know the exact behavior of kfree(), but I 
> > assume calling kfree() with random pointers leads to UB?
> 
> Please, see my comments below.
> 
> > 
> > > > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220807  
> > > > 
> > > 
> > > Why the issue had not been reported to CephFS community through email or by
> > > means of https://tracker.ceph.com?  
> > It's a kernel bug and not related to any ceph packages, so I've reported 
> > it to the kernel issue tracking system.
> > 
> > > Have you run xfstests for your patch?
> > No, not aware of it. How is xfs related to cephfs?
> 
> The xfstests is the regression testing suite that is used for testing all of
> Linux file systems (and CephFS too). But if you are not file system guy, then
> it's OK that you didn't run the xfstests.
> 
> > 
> > 
> > > > Fixes: bb80f7618832 - parse_longname(): strrchr() expects NUL-terminated string
> > > > 
> > > > Cc: stable@vger.kernel.org
> > > > Suggested-by: Helge Deller <deller@gmx.de>
> > > > Signed-off-by: Daniel Vogelbacher <daniel@chaospixel.com>
> > > > ---
> > > >   fs/ceph/crypto.c | 8 ++++----
> > > >   1 file changed, 4 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
> > > > index 0ea4db650f85..3e051972e49d 100644
> > > > --- a/fs/ceph/crypto.c
> > > > +++ b/fs/ceph/crypto.c
> > > > @@ -166,12 +166,12 @@ static struct inode *parse_longname(const struct inode *parent,
> > > >   	struct ceph_vino vino = { .snap = CEPH_NOSNAP };
> > > >   	char *name_end, *inode_number;
> > > >   	int ret = -EIO;
> > > > -	/* NUL-terminate */
> > > > -	char *str __free(kfree) = kmemdup_nul(name, *name_len, GFP_KERNEL);
> > > > +	if (*name_len <= 1)
> > > 
> > > I believe that even if we have *name_len <= 1, then current logic can manage it.
> > > Why do we need this fix? The commit message sounds really unclear for my taste.
> > > Could you prove that we really need this fix?
> > 
> > I've added this protection because otherwise I do pointer arithmetic 
> > without checking bounds. I couldn't give you a better excuse :) I could
> > simply remove it on your request.
> > 
> 
> OK. Let's analyze the code again.
> 
> 	char *str __free(kfree) = kmemdup_nul(name, *name_len, GFP_KERNEL);
> 	if (!str)
> 		return ERR_PTR(-ENOMEM);
> 	/* Skip initial '_' */
> 	str++;
> 	name_end = strrchr(str, '_');
> 	if (!name_end) {
> 		doutc(cl, "failed to parse long snapshot name: %s\n", str);
> 		return ERR_PTR(-EIO);
> 	}
> 	*name_len = (name_end - str);
> 	if (*name_len <= 0) {
> 		pr_err_client(cl, "failed to parse long snapshot name\n");
> 		return ERR_PTR(-EIO);
> 	}
> 
> First of all, we try to create a NULL-terminated string from unterminated data.
> If we provide name_len == 0, then we should allocate 1 byte/symbol string that
> contains only termination symbol. Potentially, we could not allocate memory at
> all if we are under memory pressure (this situation is managed by !str check).
> However, it doesn't make sense to try to allocate memory at that case. So, the
> length check at the beginning makes sense:
> 
> 	if (*name_len <= 0)
> 		return ERR_PTR(-EIO);
> 
> Next, we expect to have '_' at the beginning. Let's imagine that we don't have
> any '_' in the provided string, then it make sense to try to allocate memory. I
> suggest to call this next:
> 
>         name_end = strnchr(name, *name_len, '_');
>         if (!name_end) {
> 		doutc(cl, "failed to parse long snapshot name: %s\n", str);
> 		return ERR_PTR(-EIO);
> 	} else if (name != name_end) {
>                 /* we expect '_' at the beginning */
> 		doutc(cl, "failed to parse long snapshot name: %s\n", str);
> 		return ERR_PTR(-EIO);
>         }

We don't have the `str` variable here yet. I suggest I can simplify this all by:

@ -166,7 +166,8 @@ static struct inode *parse_longname(const struct inode *parent,
        struct ceph_vino vino = { .snap = CEPH_NOSNAP };
        char *name_end, *inode_number;
        int ret = -EIO;
-       if (*name_len <= 1)
+       /* Snapshot name must start with an underscore */
+       if (*name_len <= 0 || name[0] != '_')
                return ERR_PTR(-EIO);
        /* Skip initial '_' and NUL-terminate */
        char *str __free(kfree) = kmemdup_nul(name + 1, *name_len - 1, GFP_KERNEL);



> If we have found the first instance of '_' at the beginning of name, then it
> makes sense to continue logic.
> 
>         if (*name_len <= 1)
> 		return ERR_PTR(-EIO);

See my comment above.

> And here we can continue the existing logic:
> 
> 	char *str __free(kfree) = kmemdup_nul(name, *name_len, GFP_KERNEL);
> 	if (!str)
> 		return ERR_PTR(-ENOMEM);
> 	/* Skip initial '_' */
> 	str++;
> 	name_end = strrchr(str, '_');
> 	if (!name_end) {
> 		doutc(cl, "failed to parse long snapshot name: %s\n", str);
> 		return ERR_PTR(-EIO);
> 	}
> 	*name_len = (name_end - str);
> 	if (*name_len <= 0) {
> 		pr_err_client(cl, "failed to parse long snapshot name\n");
> 		return ERR_PTR(-EIO);
> 	}
> 
> Does this logic make sense to you?

My simplified logic comes with the cost of potentially allocating memory for a
snapshot name that has no second underscore. But from my understanding, this
naming scheme is by convention for Ceph snapshot names, so this should not
happen in practice.

> 
> However, I have started to think... Could we completely remove the kmemdup_nul()
> and to operate with the initial name only? I think it's possible, we simply need
> to use much smarter technique of string analysis. What do you think? It will be
> good to exclude the memory allocation here.

I'm not a kernel nor ceph developer and it seems that most functions used here
don't have a variant for non null terminated strings. I assume it would be much extra
work just to remove the allocation entirely.

--
Best regards / Mit freundlichen Grüßen
Daniel Vogelbacher

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [PATCH] fs/ceph: Fix kernel oops due invalid pointer for kfree() in parse_longname()
  2026-01-20 13:42       ` Daniel Vogelbacher
@ 2026-01-21 20:44         ` Viacheslav Dubeyko
  2026-01-21 21:38           ` Daniel Vogelbacher
  0 siblings, 1 reply; 14+ messages in thread
From: Viacheslav Dubeyko @ 2026-01-21 20:44 UTC (permalink / raw)
  To: daniel@chaospixel.com
  Cc: ceph-devel@vger.kernel.org, Xiubo Li, idryomov@gmail.com

On Tue, 2026-01-20 at 14:42 +0100, Daniel Vogelbacher wrote:
> On Tue, Dec 23, 2025 at 10:49:36PM +0000, Viacheslav Dubeyko wrote:
> > On Mon, 2025-12-22 at 22:26 +0100, Daniel Vogelbacher wrote:
> > > On 12/22/25 21:08, Viacheslav Dubeyko wrote:
> > > > On Sat, 2025-12-20 at 15:01 +0100, Daniel Vogelbacher wrote:
> > > > > This fixes a kernel oops when reading ceph snapshot directories (.snap),
> > > > > for example by simply run `ls /mnt/my_ceph/.snap`.
> > > > > 
> > > > 
> > > > Frankly speaking, it's completely not clear how this kernel oops can happen.
> > > > Could you please explain in more details how it can happen and what is the
> > > > nature of the issue? How the issue can be reproduced?
> > > 
> > > All I need to reproduce the issue is to run `ls .snap/` on any mounted 
> > > cephfs mountpoint that contains scheduled snapshots. I've one prod VM 
> > > (KVM) where I hit the issue after a Debian Trixie upgrade. To isolate 
> > > it, I've created a fresh Trixie VM, dropped the distribution kernel and 
> > > built a vanilla kernel to isolate the buggy commit by using git-bisect - 
> > > and to ensure the bug was not introduced by any Debian patches. If that 
> > > helps, it's a Squid 19.2.3 cluster.
> > > 
> > > So basically the steps are:
> > > 
> > >   * Setup a Ceph cluster with 19.2.3
> > >   * Create a pool and cephfs
> > >   * Create schedule snapshots for the fs
> > >   * Mount the fs and populate it with a few files on any kernel version 
> > > that contains bb80f7618832, that is >=6.12.41
> > >   * Wait until there are scheduled snapshots created
> > >   * run `ls /mnt/my/cephfs/.snap`
> > 
> > It will be good to see the particular command that everyone can run to reproduce
> > the issue. You don't need to share the command for setup Ceph cluster, creating
> > pool and CephFS instance. But the rest steps are really important because mount
> > options and details of command that you run can change everything.
> 
> These are the steps to reproduce on a new VM:
> 
> # echo "samba-fileserver@.files=/volumes/datapool/stuff/3461082b-ecc9-4e82-8549-3fd2590d3fb6      /mnt/test/stuff   ceph     acl,noatime,_netdev    0       0" >> /etc/fstab
> 
> Reboot the system
> # systemctl reboot
> 
> Check if it's really mounted
> # mount | grep stuff
> 
> List snapshots (expected 63 snapshots)
> # ls /mnt/test/stuff/.snap
> 

If I will do something like this on my side, then I will have no snapshots at
all. How have you created the snapshots? How many snapshots (1, 2, ..., 63)
should be created to reproduce the issue? This explanation completely missed.

> Now ls hangs forever and the kernel log shows the oops.
> 
> > 
> > > 
> > > This should result in a kernel oops like:
> > 
> > The commit message could include oops details.
> > 
> > > 
> > > [   53.703013] Oops: general protection fault, probably for 
> > > non-canonical address 0xd0c22857c0000000: 0000 [#1] SMP PTI
> > > [   53.703201] CPU: 11 UID: 0 PID: 360 Comm: kworker/11:2 Not tainted 
> > > 6.18.0-rc7 #41 PREEMPT(voluntary)
> > > [   53.703281] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> > > 1.16.2-debian-1.16.2-1 04/01/2014
> > > [   53.703317] Workqueue: ceph-msgr ceph_con_workfn [libceph]
> > > [   53.703424] RIP: 0010:rb_insert_color 
> > > (/usr/src/linux/lib/rbtree.c:185 (discriminator 1) 
> > > /usr/src/linux/lib/rbtree.c:436 (discriminator 1))
> > > [   53.704503] Code: 76 17 48 83 e1 fc 48 3b 51 10 0f 84 b7 00 00 00 48 
> > > 89 41 08 c3 cc cc cc cc 48 89 06 c3 cc cc cc cc 48 8b 4a 10 48 85 c9 74 
> > > 05 <f6> 01 01 74 1b 48 8b 48 10 48 39 f9 74 68 48 89 c7 48 89 4a 08 48
> > > All code
> > > ========
> > >     0:	76 17                	jbe    0x19
> > >     2:	48 83 e1 fc          	and    $0xfffffffffffffffc,%rcx
> > >     6:	48 3b 51 10          	cmp    0x10(%rcx),%rdx
> > >     a:	0f 84 b7 00 00 00    	je     0xc7
> > >    10:	48 89 41 08          	mov    %rax,0x8(%rcx)
> > >    14:	c3                   	ret
> > >    15:	cc                   	int3
> > >    16:	cc                   	int3
> > >    17:	cc                   	int3
> > >    18:	cc                   	int3
> > >    19:	48 89 06             	mov    %rax,(%rsi)
> > >    1c:	c3                   	ret
> > >    1d:	cc                   	int3
> > >    1e:	cc                   	int3
> > >    1f:	cc                   	int3
> > >    20:	cc                   	int3
> > >    21:	48 8b 4a 10          	mov    0x10(%rdx),%rcx
> > >    25:	48 85 c9             	test   %rcx,%rcx
> > >    28:	74 05                	je     0x2f
> > >    2a:*	f6 01 01             	testb  $0x1,(%rcx)		<-- trapping instruction
> > >    2d:	74 1b                	je     0x4a
> > >    2f:	48 8b 48 10          	mov    0x10(%rax),%rcx
> > >    33:	48 39 f9             	cmp    %rdi,%rcx
> > >    36:	74 68                	je     0xa0
> > >    38:	48 89 c7             	mov    %rax,%rdi
> > >    3b:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
> > >    3f:	48                   	rex.W
> > > 
> > > Code starting with the faulting instruction
> > > ===========================================
> > >     0:	f6 01 01             	testb  $0x1,(%rcx)
> > >     3:	74 1b                	je     0x20
> > >     5:	48 8b 48 10          	mov    0x10(%rax),%rcx
> > >     9:	48 39 f9             	cmp    %rdi,%rcx
> > >     c:	74 68                	je     0x76
> > >     e:	48 89 c7             	mov    %rax,%rdi
> > >    11:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
> > >    15:	48                   	rex.W
> > > [   53.704559] RSP: 0018:ffff9ab7c07579e0 EFLAGS: 00010286
> > > [   53.704591] RAX: ffff8bd0c2285b40 RBX: ffff8bd0c2285240 RCX: 
> > > d0c22857c0000000
> > > [   53.704616] RDX: ffff8bd0c2285910 RSI: ffff8bd0c3e695c0 RDI: 
> > > ffff8bd0c22855c0
> > > [   53.704645] RBP: 0000000000002139 R08: 0000000000000000 R09: 
> > > 0000000000000000
> > > [   53.704668] R10: 0000000000000000 R11: ffff8bd0c16244e0 R12: 
> > > ffff8bd0c3e695b8
> > > [   53.704691] R13: ffff8bd0c3b62000 R14: ffff8bd0c22857c0 R15: 
> > > ffff8bd0c3e695c0
> > > [   53.704714] FS:  0000000000000000(0000) GS:ffff8bd1815ca000(0000) 
> > > knlGS:0000000000000000
> > > [   53.704741] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [   53.704762] CR2: 000055667ef28e10 CR3: 0000000106cc2005 CR4: 
> > > 0000000000772ef0
> > > [   53.704790] PKRU: 55555554
> > > [   53.704803] Call Trace:
> > > [   53.704844]  <TASK>
> > > [   53.704862] ceph_get_snapid_map 
> > > (/usr/src/linux/./include/linux/spinlock.h:391 
> > > /usr/src/linux/fs/ceph/snap.c:1255) ceph
> > > [   53.704957] ceph_fill_inode (/usr/src/linux/fs/ceph/inode.c:1062 
> > > (discriminator 2)) ceph
> > > [   53.705019]  ? __pfx_ceph_set_ino_cb 
> > > (/usr/src/linux/fs/ceph/inode.c:46) ceph
> > > [   53.705074]  ? __pfx_ceph_ino_compare 
> > > (/usr/src/linux/fs/ceph/super.h:595) ceph
> > > [   53.705132] ceph_readdir_prepopulate 
> > > (/usr/src/linux/fs/ceph/inode.c:2113) ceph
> > > [   53.705191] mds_dispatch (/usr/src/linux/fs/ceph/mds_client.c:3993 
> > > /usr/src/linux/fs/ceph/mds_client.c:6299) ceph
> > > [   53.705253]  ? sock_recvmsg (/usr/src/linux/net/socket.c:1078 
> > > (discriminator 1) /usr/src/linux/net/socket.c:1100 (discriminator 1))
> > > [   53.705279] ceph_con_process_message 
> > > (/usr/src/linux/net/ceph/messenger.c:1427) libceph
> > > [   53.705347] process_message 
> > > (/usr/src/linux/net/ceph/messenger_v2.c:2879) libceph
> > > [   53.705406] ceph_con_v2_try_read 
> > > (/usr/src/linux/net/ceph/messenger_v2.c:3043 
> > > /usr/src/linux/net/ceph/messenger_v2.c:3099 
> > > /usr/src/linux/net/ceph/messenger_v2.c:3148) libceph
> > > [   53.705467]  ? psi_group_change (/usr/src/linux/kernel/sched/psi.c:876)
> > > [   53.705488]  ? sched_balance_newidle 
> > > (/usr/src/linux/kernel/sched/fair.c:12902 (discriminator 2))
> > > [   53.705512]  ? psi_task_switch (/usr/src/linux/kernel/sched/psi.c:984 
> > > (discriminator 2))
> > > [   53.705532]  ? _raw_spin_unlock 
> > > (/usr/src/linux/./arch/x86/include/asm/paravirt.h:562 
> > > /usr/src/linux/./arch/x86/include/asm/qspinlock.h:57 
> > > /usr/src/linux/./include/linux/spinlock.h:204 
> > > /usr/src/linux/./include/linux/spinlock_api_smp.h:142 
> > > /usr/src/linux/kernel/locking/spinlock.c:186)
> > > [   53.705550]  ? finish_task_switch.isra.0 
> > > (/usr/src/linux/./arch/x86/include/asm/paravirt.h:671 
> > > /usr/src/linux/kernel/sched/sched.h:1559 
> > > /usr/src/linux/kernel/sched/core.c:5073 
> > > /usr/src/linux/kernel/sched/core.c:5191)
> > > [   53.705575] ceph_con_workfn 
> > > (/usr/src/linux/net/ceph/messenger.c:1578) libceph
> > > [   53.705627]  process_one_work 
> > > (/usr/src/linux/./arch/x86/include/asm/jump_label.h:36 
> > > /usr/src/linux/./include/trace/events/workqueue.h:110 
> > > /usr/src/linux/kernel/workqueue.c:3268)
> > > [   53.705657]  worker_thread (/usr/src/linux/kernel/workqueue.c:3340 
> > > (discriminator 2) /usr/src/linux/kernel/workqueue.c:3427 (discriminator 2))
> > > [   53.705679]  ? __pfx_worker_thread 
> > > (/usr/src/linux/kernel/workqueue.c:3373)
> > > [   53.705700]  kthread (/usr/src/linux/kernel/kthread.c:463)
> > > [   53.705717]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
> > > [   53.705734]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
> > > [   53.705752]  ret_from_fork (/usr/src/linux/arch/x86/kernel/process.c:164)
> > > [   53.705776]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
> > > [   53.705793]  ret_from_fork_asm 
> > > (/usr/src/linux/arch/x86/entry/entry_64.S:255)
> > > [   53.705826]  </TASK>
> > > [   53.705842] Modules linked in: ceph netfs libceph cfg80211 rfkill 
> > > 8021q garp stp mrp llc binfmt_misc intel_rapl_msr intel_rapl_common 
> > > intel_uncore_frequency_common kvm_intel virtio_gpu joydev kvm 
> > > drm_client_lib virtio_dma_buf evdev drm_shmem_helper sg drm_kms_helper 
> > > virtio_balloon button irqbypass ghash_clmulni_intel aesni_intel rapl 
> > > pcspkr drm configfs efi_pstore nfnetlink vsock_loopback 
> > > vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vmw_vmci 
> > > vsock qemu_fw_cfg virtio_rng autofs4 ext4 crc16 mbcache jbd2 hid_generic 
> > > usbhid hid sr_mod cdrom dm_mod ahci libahci libata xhci_pci iTCO_wdt 
> > > intel_pmc_bxt xhci_hcd iTCO_vendor_support scsi_mod psmouse virtio_net 
> > > i2c_i801 watchdog serio_raw i2c_smbus lpc_ich scsi_common usbcore 
> > > net_failover failover virtio_blk usb_common
> > > [   53.708740] ---[ end trace 0000000000000000 ]---
> > > [   53.709462] RIP: 0010:rb_insert_color 
> > > (/usr/src/linux/lib/rbtree.c:185 (discriminator 1) 
> > > /usr/src/linux/lib/rbtree.c:436 (discriminator 1))
> > > [   53.710118] Code: 76 17 48 83 e1 fc 48 3b 51 10 0f 84 b7 00 00 00 48 
> > > 89 41 08 c3 cc cc cc cc 48 89 06 c3 cc cc cc cc 48 8b 4a 10 48 85 c9 74 
> > > 05 <f6> 01 01 74 1b 48 8b 48 10 48 39 f9 74 68 48 89 c7 48 89 4a 08 48
> > > All code
> > > ========
> > >     0:	76 17                	jbe    0x19
> > >     2:	48 83 e1 fc          	and    $0xfffffffffffffffc,%rcx
> > >     6:	48 3b 51 10          	cmp    0x10(%rcx),%rdx
> > >     a:	0f 84 b7 00 00 00    	je     0xc7
> > >    10:	48 89 41 08          	mov    %rax,0x8(%rcx)
> > >    14:	c3                   	ret
> > >    15:	cc                   	int3
> > >    16:	cc                   	int3
> > >    17:	cc                   	int3
> > >    18:	cc                   	int3
> > >    19:	48 89 06             	mov    %rax,(%rsi)
> > >    1c:	c3                   	ret
> > >    1d:	cc                   	int3
> > >    1e:	cc                   	int3
> > >    1f:	cc                   	int3
> > >    20:	cc                   	int3
> > >    21:	48 8b 4a 10          	mov    0x10(%rdx),%rcx
> > >    25:	48 85 c9             	test   %rcx,%rcx
> > >    28:	74 05                	je     0x2f
> > >    2a:*	f6 01 01             	testb  $0x1,(%rcx)		<-- trapping instruction
> > >    2d:	74 1b                	je     0x4a
> > >    2f:	48 8b 48 10          	mov    0x10(%rax),%rcx
> > >    33:	48 39 f9             	cmp    %rdi,%rcx
> > >    36:	74 68                	je     0xa0
> > >    38:	48 89 c7             	mov    %rax,%rdi
> > >    3b:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
> > >    3f:	48                   	rex.W
> > > 
> > > Code starting with the faulting instruction
> > > ===========================================
> > >     0:	f6 01 01             	testb  $0x1,(%rcx)
> > >     3:	74 1b                	je     0x20
> > >     5:	48 8b 48 10          	mov    0x10(%rax),%rcx
> > >     9:	48 39 f9             	cmp    %rdi,%rcx
> > >     c:	74 68                	je     0x76
> > >     e:	48 89 c7             	mov    %rax,%rdi
> > >    11:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
> > >    15:	48                   	rex.W
> > > [   53.711453] RSP: 0018:ffff9ab7c07579e0 EFLAGS: 00010286
> > > [   53.712112] RAX: ffff8bd0c2285b40 RBX: ffff8bd0c2285240 RCX: 
> > > d0c22857c0000000
> > > [   53.712798] RDX: ffff8bd0c2285910 RSI: ffff8bd0c3e695c0 RDI: 
> > > ffff8bd0c22855c0
> > > [   53.713423] RBP: 0000000000002139 R08: 0000000000000000 R09: 
> > > 0000000000000000
> > > [   53.714061] R10: 0000000000000000 R11: ffff8bd0c16244e0 R12: 
> > > ffff8bd0c3e695b8
> > > [   53.714696] R13: ffff8bd0c3b62000 R14: ffff8bd0c22857c0 R15: 
> > > ffff8bd0c3e695c0
> > > [   53.715321] FS:  0000000000000000(0000) GS:ffff8bd1815ca000(0000) 
> > > knlGS:0000000000000000
> > > [   53.715956] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [   53.716651] CR2: 000055667ef28e10 CR3: 0000000106cc2005 CR4: 
> > > 0000000000772ef0
> > > [   53.717295] PKRU: 55555554
> > > [   53.717918] note: kworker/11:2[360] exited with preempt_count 1
> > > 
> > > 
> > > > > The bug was introduced in commit:
> > > > > 
> > > > > bb80f7618832 - parse_longname(): strrchr() expects NUL-terminated string
> > > > > 
> > > > > str is guarded by __free(kfree), but advanced later for skipping
> > > > > the initial '_' in snapshot names.
> > > > > This patch removes the need for advancing the pointer so kfree()
> > > > > could do proper memory cleanup.
> > > > > 
> > > > 
> > > > I cannot follow of this explanation. What is the wrong? Why should we fix
> > > > something here?
> > > 
> > > In bb80f7618832, the pointer in variable "str" is guarded by 
> > > __free(kfree), which means the pointer returned by kmemdup_nul() is 
> > > automatically freed. kfree() should receive the same pointer as returned 
> > > by kmemdump_nul(), but this is not the case, as the pointer is advanced 
> > > by one. kmemdup_nul() may return for example 0x1234000, but kfree() is 
> > > called with 0x1234001. I don't know the exact behavior of kfree(), but I 
> > > assume calling kfree() with random pointers leads to UB?
> > 
> > Please, see my comments below.
> > 
> > > 
> > > > > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220807    
> > > > > 
> > > > 
> > > > Why the issue had not been reported to CephFS community through email or by
> > > > means of https://tracker.ceph.com?    
> > > It's a kernel bug and not related to any ceph packages, so I've reported 
> > > it to the kernel issue tracking system.
> > > 
> > > > Have you run xfstests for your patch?
> > > No, not aware of it. How is xfs related to cephfs?
> > 
> > The xfstests is the regression testing suite that is used for testing all of
> > Linux file systems (and CephFS too). But if you are not file system guy, then
> > it's OK that you didn't run the xfstests.
> > 
> > > 
> > > 
> > > > > Fixes: bb80f7618832 - parse_longname(): strrchr() expects NUL-terminated string
> > > > > 
> > > > > Cc: stable@vger.kernel.org
> > > > > Suggested-by: Helge Deller <deller@gmx.de>
> > > > > Signed-off-by: Daniel Vogelbacher <daniel@chaospixel.com>
> > > > > ---
> > > > >   fs/ceph/crypto.c | 8 ++++----
> > > > >   1 file changed, 4 insertions(+), 4 deletions(-)
> > > > > 
> > > > > diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
> > > > > index 0ea4db650f85..3e051972e49d 100644
> > > > > --- a/fs/ceph/crypto.c
> > > > > +++ b/fs/ceph/crypto.c
> > > > > @@ -166,12 +166,12 @@ static struct inode *parse_longname(const struct inode *parent,
> > > > >   	struct ceph_vino vino = { .snap = CEPH_NOSNAP };
> > > > >   	char *name_end, *inode_number;
> > > > >   	int ret = -EIO;
> > > > > -	/* NUL-terminate */
> > > > > -	char *str __free(kfree) = kmemdup_nul(name, *name_len, GFP_KERNEL);
> > > > > +	if (*name_len <= 1)
> > > > 
> > > > I believe that even if we have *name_len <= 1, then current logic can manage it.
> > > > Why do we need this fix? The commit message sounds really unclear for my taste.
> > > > Could you prove that we really need this fix?
> > > 
> > > I've added this protection because otherwise I do pointer arithmetic 
> > > without checking bounds. I couldn't give you a better excuse :) I could
> > > simply remove it on your request.
> > > 
> > 
> > OK. Let's analyze the code again.
> > 
> > 	char *str __free(kfree) = kmemdup_nul(name, *name_len, GFP_KERNEL);
> > 	if (!str)
> > 		return ERR_PTR(-ENOMEM);
> > 	/* Skip initial '_' */
> > 	str++;
> > 	name_end = strrchr(str, '_');
> > 	if (!name_end) {
> > 		doutc(cl, "failed to parse long snapshot name: %s\n", str);
> > 		return ERR_PTR(-EIO);
> > 	}
> > 	*name_len = (name_end - str);
> > 	if (*name_len <= 0) {
> > 		pr_err_client(cl, "failed to parse long snapshot name\n");
> > 		return ERR_PTR(-EIO);
> > 	}
> > 
> > First of all, we try to create a NULL-terminated string from unterminated data.
> > If we provide name_len == 0, then we should allocate 1 byte/symbol string that
> > contains only termination symbol. Potentially, we could not allocate memory at
> > all if we are under memory pressure (this situation is managed by !str check).
> > However, it doesn't make sense to try to allocate memory at that case. So, the
> > length check at the beginning makes sense:
> > 
> > 	if (*name_len <= 0)
> > 		return ERR_PTR(-EIO);
> > 
> > Next, we expect to have '_' at the beginning. Let's imagine that we don't have
> > any '_' in the provided string, then it make sense to try to allocate memory. I
> > suggest to call this next:
> > 
> >         name_end = strnchr(name, *name_len, '_');
> >         if (!name_end) {
> > 		doutc(cl, "failed to parse long snapshot name: %s\n", str);
> > 		return ERR_PTR(-EIO);
> > 	} else if (name != name_end) {
> >                 /* we expect '_' at the beginning */
> > 		doutc(cl, "failed to parse long snapshot name: %s\n", str);
> > 		return ERR_PTR(-EIO);
> >         }
> 
> We don't have the `str` variable here yet. I suggest I can simplify this all by:
> 
> @ -166,7 +166,8 @@ static struct inode *parse_longname(const struct inode *parent,
>         struct ceph_vino vino = { .snap = CEPH_NOSNAP };
>         char *name_end, *inode_number;
>         int ret = -EIO;
> -       if (*name_len <= 1)
> +       /* Snapshot name must start with an underscore */
> +       if (*name_len <= 0 || name[0] != '_')
>                 return ERR_PTR(-EIO);
>         /* Skip initial '_' and NUL-terminate */
>         char *str __free(kfree) = kmemdup_nul(name + 1, *name_len - 1, GFP_KERNEL);
> 
> 
> 
> > If we have found the first instance of '_' at the beginning of name, then it
> > makes sense to continue logic.
> > 
> >         if (*name_len <= 1)
> > 		return ERR_PTR(-EIO);
> 
> See my comment above.
> 
> > And here we can continue the existing logic:
> > 
> > 	char *str __free(kfree) = kmemdup_nul(name, *name_len, GFP_KERNEL);
> > 	if (!str)
> > 		return ERR_PTR(-ENOMEM);
> > 	/* Skip initial '_' */
> > 	str++;
> > 	name_end = strrchr(str, '_');
> > 	if (!name_end) {
> > 		doutc(cl, "failed to parse long snapshot name: %s\n", str);
> > 		return ERR_PTR(-EIO);
> > 	}
> > 	*name_len = (name_end - str);
> > 	if (*name_len <= 0) {
> > 		pr_err_client(cl, "failed to parse long snapshot name\n");
> > 		return ERR_PTR(-EIO);
> > 	}
> > 
> > Does this logic make sense to you?
> 
> My simplified logic comes with the cost of potentially allocating memory for a
> snapshot name that has no second underscore. But from my understanding, this
> naming scheme is by convention for Ceph snapshot names, so this should not
> happen in practice.

OK. I need to see the second version of the patch. I am completely lost myself
in the discussion details. Could you please send the new version of the patch?
Then, it will be clear if it's good enough already or we need to continue polish
the code.

> 
> > 
> > However, I have started to think... Could we completely remove the kmemdup_nul()
> > and to operate with the initial name only? I think it's possible, we simply need
> > to use much smarter technique of string analysis. What do you think? It will be
> > good to exclude the memory allocation here.
> 
> I'm not a kernel nor ceph developer and it seems that most functions used here
> don't have a variant for non null terminated strings. I assume it would be much extra
> work just to remove the allocation entirely.
> 
> 

If you try to suggest any fix for CephFS kernel client, then you cannot excuse
yourself by this "I'm not a kernel nor ceph developer". :) You should work on
the patch until it will be good enough. :) The whole Ceph community could
benefit from your fix. ;)

Thanks,
Slava.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fs/ceph: Fix kernel oops due invalid pointer for kfree() in parse_longname()
  2026-01-21 20:44         ` Viacheslav Dubeyko
@ 2026-01-21 21:38           ` Daniel Vogelbacher
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel Vogelbacher @ 2026-01-21 21:38 UTC (permalink / raw)
  To: Viacheslav Dubeyko
  Cc: ceph-devel@vger.kernel.org, Xiubo Li, idryomov@gmail.com

On 1/21/26 21:44, Viacheslav Dubeyko wrote:
> On Tue, 2026-01-20 at 14:42 +0100, Daniel Vogelbacher wrote:
>> On Tue, Dec 23, 2025 at 10:49:36PM +0000, Viacheslav Dubeyko wrote:
>>> On Mon, 2025-12-22 at 22:26 +0100, Daniel Vogelbacher wrote:
>>>> On 12/22/25 21:08, Viacheslav Dubeyko wrote:
>>>>> On Sat, 2025-12-20 at 15:01 +0100, Daniel Vogelbacher wrote:
>>>>>> This fixes a kernel oops when reading ceph snapshot directories (.snap),
>>>>>> for example by simply run `ls /mnt/my_ceph/.snap`.
>>>>>>
>>>>>
>>>>> Frankly speaking, it's completely not clear how this kernel oops can happen.
>>>>> Could you please explain in more details how it can happen and what is the
>>>>> nature of the issue? How the issue can be reproduced?
>>>>
>>>> All I need to reproduce the issue is to run `ls .snap/` on any mounted
>>>> cephfs mountpoint that contains scheduled snapshots. I've one prod VM
>>>> (KVM) where I hit the issue after a Debian Trixie upgrade. To isolate
>>>> it, I've created a fresh Trixie VM, dropped the distribution kernel and
>>>> built a vanilla kernel to isolate the buggy commit by using git-bisect -
>>>> and to ensure the bug was not introduced by any Debian patches. If that
>>>> helps, it's a Squid 19.2.3 cluster.
>>>>
>>>> So basically the steps are:
>>>>
>>>>    * Setup a Ceph cluster with 19.2.3
>>>>    * Create a pool and cephfs
>>>>    * Create schedule snapshots for the fs
>>>>    * Mount the fs and populate it with a few files on any kernel version
>>>> that contains bb80f7618832, that is >=6.12.41
>>>>    * Wait until there are scheduled snapshots created
>>>>    * run `ls /mnt/my/cephfs/.snap`
>>>
>>> It will be good to see the particular command that everyone can run to reproduce
>>> the issue. You don't need to share the command for setup Ceph cluster, creating
>>> pool and CephFS instance. But the rest steps are really important because mount
>>> options and details of command that you run can change everything.
>>
>> These are the steps to reproduce on a new VM:
>>
>> # echo "samba-fileserver@.files=/volumes/datapool/stuff/3461082b-ecc9-4e82-8549-3fd2590d3fb6      /mnt/test/stuff   ceph     acl,noatime,_netdev    0       0" >> /etc/fstab
>>
>> Reboot the system
>> # systemctl reboot
>>
>> Check if it's really mounted
>> # mount | grep stuff
>>
>> List snapshots (expected 63 snapshots)
>> # ls /mnt/test/stuff/.snap
>>
> 
> If I will do something like this on my side, then I will have no snapshots at
> all. How have you created the snapshots? How many snapshots (1, 2, ..., 63)
> should be created to reproduce the issue? This explanation completely missed.

Do you have created a cephfs snapshot schedule like I described in my 
previous mail? I assume creating manually snapshots with ceph cmd should 
be fine, too.

>> Now ls hangs forever and the kernel log shows the oops.
>>
>>>
>>>>
>>>> This should result in a kernel oops like:
>>>
>>> The commit message could include oops details.
>>>
>>>>
>>>> [   53.703013] Oops: general protection fault, probably for
>>>> non-canonical address 0xd0c22857c0000000: 0000 [#1] SMP PTI
>>>> [   53.703201] CPU: 11 UID: 0 PID: 360 Comm: kworker/11:2 Not tainted
>>>> 6.18.0-rc7 #41 PREEMPT(voluntary)
>>>> [   53.703281] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
>>>> 1.16.2-debian-1.16.2-1 04/01/2014
>>>> [   53.703317] Workqueue: ceph-msgr ceph_con_workfn [libceph]
>>>> [   53.703424] RIP: 0010:rb_insert_color
>>>> (/usr/src/linux/lib/rbtree.c:185 (discriminator 1)
>>>> /usr/src/linux/lib/rbtree.c:436 (discriminator 1))
>>>> [   53.704503] Code: 76 17 48 83 e1 fc 48 3b 51 10 0f 84 b7 00 00 00 48
>>>> 89 41 08 c3 cc cc cc cc 48 89 06 c3 cc cc cc cc 48 8b 4a 10 48 85 c9 74
>>>> 05 <f6> 01 01 74 1b 48 8b 48 10 48 39 f9 74 68 48 89 c7 48 89 4a 08 48
>>>> All code
>>>> ========
>>>>      0:	76 17                	jbe    0x19
>>>>      2:	48 83 e1 fc          	and    $0xfffffffffffffffc,%rcx
>>>>      6:	48 3b 51 10          	cmp    0x10(%rcx),%rdx
>>>>      a:	0f 84 b7 00 00 00    	je     0xc7
>>>>     10:	48 89 41 08          	mov    %rax,0x8(%rcx)
>>>>     14:	c3                   	ret
>>>>     15:	cc                   	int3
>>>>     16:	cc                   	int3
>>>>     17:	cc                   	int3
>>>>     18:	cc                   	int3
>>>>     19:	48 89 06             	mov    %rax,(%rsi)
>>>>     1c:	c3                   	ret
>>>>     1d:	cc                   	int3
>>>>     1e:	cc                   	int3
>>>>     1f:	cc                   	int3
>>>>     20:	cc                   	int3
>>>>     21:	48 8b 4a 10          	mov    0x10(%rdx),%rcx
>>>>     25:	48 85 c9             	test   %rcx,%rcx
>>>>     28:	74 05                	je     0x2f
>>>>     2a:*	f6 01 01             	testb  $0x1,(%rcx)		<-- trapping instruction
>>>>     2d:	74 1b                	je     0x4a
>>>>     2f:	48 8b 48 10          	mov    0x10(%rax),%rcx
>>>>     33:	48 39 f9             	cmp    %rdi,%rcx
>>>>     36:	74 68                	je     0xa0
>>>>     38:	48 89 c7             	mov    %rax,%rdi
>>>>     3b:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
>>>>     3f:	48                   	rex.W
>>>>
>>>> Code starting with the faulting instruction
>>>> ===========================================
>>>>      0:	f6 01 01             	testb  $0x1,(%rcx)
>>>>      3:	74 1b                	je     0x20
>>>>      5:	48 8b 48 10          	mov    0x10(%rax),%rcx
>>>>      9:	48 39 f9             	cmp    %rdi,%rcx
>>>>      c:	74 68                	je     0x76
>>>>      e:	48 89 c7             	mov    %rax,%rdi
>>>>     11:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
>>>>     15:	48                   	rex.W
>>>> [   53.704559] RSP: 0018:ffff9ab7c07579e0 EFLAGS: 00010286
>>>> [   53.704591] RAX: ffff8bd0c2285b40 RBX: ffff8bd0c2285240 RCX:
>>>> d0c22857c0000000
>>>> [   53.704616] RDX: ffff8bd0c2285910 RSI: ffff8bd0c3e695c0 RDI:
>>>> ffff8bd0c22855c0
>>>> [   53.704645] RBP: 0000000000002139 R08: 0000000000000000 R09:
>>>> 0000000000000000
>>>> [   53.704668] R10: 0000000000000000 R11: ffff8bd0c16244e0 R12:
>>>> ffff8bd0c3e695b8
>>>> [   53.704691] R13: ffff8bd0c3b62000 R14: ffff8bd0c22857c0 R15:
>>>> ffff8bd0c3e695c0
>>>> [   53.704714] FS:  0000000000000000(0000) GS:ffff8bd1815ca000(0000)
>>>> knlGS:0000000000000000
>>>> [   53.704741] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [   53.704762] CR2: 000055667ef28e10 CR3: 0000000106cc2005 CR4:
>>>> 0000000000772ef0
>>>> [   53.704790] PKRU: 55555554
>>>> [   53.704803] Call Trace:
>>>> [   53.704844]  <TASK>
>>>> [   53.704862] ceph_get_snapid_map
>>>> (/usr/src/linux/./include/linux/spinlock.h:391
>>>> /usr/src/linux/fs/ceph/snap.c:1255) ceph
>>>> [   53.704957] ceph_fill_inode (/usr/src/linux/fs/ceph/inode.c:1062
>>>> (discriminator 2)) ceph
>>>> [   53.705019]  ? __pfx_ceph_set_ino_cb
>>>> (/usr/src/linux/fs/ceph/inode.c:46) ceph
>>>> [   53.705074]  ? __pfx_ceph_ino_compare
>>>> (/usr/src/linux/fs/ceph/super.h:595) ceph
>>>> [   53.705132] ceph_readdir_prepopulate
>>>> (/usr/src/linux/fs/ceph/inode.c:2113) ceph
>>>> [   53.705191] mds_dispatch (/usr/src/linux/fs/ceph/mds_client.c:3993
>>>> /usr/src/linux/fs/ceph/mds_client.c:6299) ceph
>>>> [   53.705253]  ? sock_recvmsg (/usr/src/linux/net/socket.c:1078
>>>> (discriminator 1) /usr/src/linux/net/socket.c:1100 (discriminator 1))
>>>> [   53.705279] ceph_con_process_message
>>>> (/usr/src/linux/net/ceph/messenger.c:1427) libceph
>>>> [   53.705347] process_message
>>>> (/usr/src/linux/net/ceph/messenger_v2.c:2879) libceph
>>>> [   53.705406] ceph_con_v2_try_read
>>>> (/usr/src/linux/net/ceph/messenger_v2.c:3043
>>>> /usr/src/linux/net/ceph/messenger_v2.c:3099
>>>> /usr/src/linux/net/ceph/messenger_v2.c:3148) libceph
>>>> [   53.705467]  ? psi_group_change (/usr/src/linux/kernel/sched/psi.c:876)
>>>> [   53.705488]  ? sched_balance_newidle
>>>> (/usr/src/linux/kernel/sched/fair.c:12902 (discriminator 2))
>>>> [   53.705512]  ? psi_task_switch (/usr/src/linux/kernel/sched/psi.c:984
>>>> (discriminator 2))
>>>> [   53.705532]  ? _raw_spin_unlock
>>>> (/usr/src/linux/./arch/x86/include/asm/paravirt.h:562
>>>> /usr/src/linux/./arch/x86/include/asm/qspinlock.h:57
>>>> /usr/src/linux/./include/linux/spinlock.h:204
>>>> /usr/src/linux/./include/linux/spinlock_api_smp.h:142
>>>> /usr/src/linux/kernel/locking/spinlock.c:186)
>>>> [   53.705550]  ? finish_task_switch.isra.0
>>>> (/usr/src/linux/./arch/x86/include/asm/paravirt.h:671
>>>> /usr/src/linux/kernel/sched/sched.h:1559
>>>> /usr/src/linux/kernel/sched/core.c:5073
>>>> /usr/src/linux/kernel/sched/core.c:5191)
>>>> [   53.705575] ceph_con_workfn
>>>> (/usr/src/linux/net/ceph/messenger.c:1578) libceph
>>>> [   53.705627]  process_one_work
>>>> (/usr/src/linux/./arch/x86/include/asm/jump_label.h:36
>>>> /usr/src/linux/./include/trace/events/workqueue.h:110
>>>> /usr/src/linux/kernel/workqueue.c:3268)
>>>> [   53.705657]  worker_thread (/usr/src/linux/kernel/workqueue.c:3340
>>>> (discriminator 2) /usr/src/linux/kernel/workqueue.c:3427 (discriminator 2))
>>>> [   53.705679]  ? __pfx_worker_thread
>>>> (/usr/src/linux/kernel/workqueue.c:3373)
>>>> [   53.705700]  kthread (/usr/src/linux/kernel/kthread.c:463)
>>>> [   53.705717]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
>>>> [   53.705734]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
>>>> [   53.705752]  ret_from_fork (/usr/src/linux/arch/x86/kernel/process.c:164)
>>>> [   53.705776]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
>>>> [   53.705793]  ret_from_fork_asm
>>>> (/usr/src/linux/arch/x86/entry/entry_64.S:255)
>>>> [   53.705826]  </TASK>
>>>> [   53.705842] Modules linked in: ceph netfs libceph cfg80211 rfkill
>>>> 8021q garp stp mrp llc binfmt_misc intel_rapl_msr intel_rapl_common
>>>> intel_uncore_frequency_common kvm_intel virtio_gpu joydev kvm
>>>> drm_client_lib virtio_dma_buf evdev drm_shmem_helper sg drm_kms_helper
>>>> virtio_balloon button irqbypass ghash_clmulni_intel aesni_intel rapl
>>>> pcspkr drm configfs efi_pstore nfnetlink vsock_loopback
>>>> vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vmw_vmci
>>>> vsock qemu_fw_cfg virtio_rng autofs4 ext4 crc16 mbcache jbd2 hid_generic
>>>> usbhid hid sr_mod cdrom dm_mod ahci libahci libata xhci_pci iTCO_wdt
>>>> intel_pmc_bxt xhci_hcd iTCO_vendor_support scsi_mod psmouse virtio_net
>>>> i2c_i801 watchdog serio_raw i2c_smbus lpc_ich scsi_common usbcore
>>>> net_failover failover virtio_blk usb_common
>>>> [   53.708740] ---[ end trace 0000000000000000 ]---
>>>> [   53.709462] RIP: 0010:rb_insert_color
>>>> (/usr/src/linux/lib/rbtree.c:185 (discriminator 1)
>>>> /usr/src/linux/lib/rbtree.c:436 (discriminator 1))
>>>> [   53.710118] Code: 76 17 48 83 e1 fc 48 3b 51 10 0f 84 b7 00 00 00 48
>>>> 89 41 08 c3 cc cc cc cc 48 89 06 c3 cc cc cc cc 48 8b 4a 10 48 85 c9 74
>>>> 05 <f6> 01 01 74 1b 48 8b 48 10 48 39 f9 74 68 48 89 c7 48 89 4a 08 48
>>>> All code
>>>> ========
>>>>      0:	76 17                	jbe    0x19
>>>>      2:	48 83 e1 fc          	and    $0xfffffffffffffffc,%rcx
>>>>      6:	48 3b 51 10          	cmp    0x10(%rcx),%rdx
>>>>      a:	0f 84 b7 00 00 00    	je     0xc7
>>>>     10:	48 89 41 08          	mov    %rax,0x8(%rcx)
>>>>     14:	c3                   	ret
>>>>     15:	cc                   	int3
>>>>     16:	cc                   	int3
>>>>     17:	cc                   	int3
>>>>     18:	cc                   	int3
>>>>     19:	48 89 06             	mov    %rax,(%rsi)
>>>>     1c:	c3                   	ret
>>>>     1d:	cc                   	int3
>>>>     1e:	cc                   	int3
>>>>     1f:	cc                   	int3
>>>>     20:	cc                   	int3
>>>>     21:	48 8b 4a 10          	mov    0x10(%rdx),%rcx
>>>>     25:	48 85 c9             	test   %rcx,%rcx
>>>>     28:	74 05                	je     0x2f
>>>>     2a:*	f6 01 01             	testb  $0x1,(%rcx)		<-- trapping instruction
>>>>     2d:	74 1b                	je     0x4a
>>>>     2f:	48 8b 48 10          	mov    0x10(%rax),%rcx
>>>>     33:	48 39 f9             	cmp    %rdi,%rcx
>>>>     36:	74 68                	je     0xa0
>>>>     38:	48 89 c7             	mov    %rax,%rdi
>>>>     3b:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
>>>>     3f:	48                   	rex.W
>>>>
>>>> Code starting with the faulting instruction
>>>> ===========================================
>>>>      0:	f6 01 01             	testb  $0x1,(%rcx)
>>>>      3:	74 1b                	je     0x20
>>>>      5:	48 8b 48 10          	mov    0x10(%rax),%rcx
>>>>      9:	48 39 f9             	cmp    %rdi,%rcx
>>>>      c:	74 68                	je     0x76
>>>>      e:	48 89 c7             	mov    %rax,%rdi
>>>>     11:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
>>>>     15:	48                   	rex.W
>>>> [   53.711453] RSP: 0018:ffff9ab7c07579e0 EFLAGS: 00010286
>>>> [   53.712112] RAX: ffff8bd0c2285b40 RBX: ffff8bd0c2285240 RCX:
>>>> d0c22857c0000000
>>>> [   53.712798] RDX: ffff8bd0c2285910 RSI: ffff8bd0c3e695c0 RDI:
>>>> ffff8bd0c22855c0
>>>> [   53.713423] RBP: 0000000000002139 R08: 0000000000000000 R09:
>>>> 0000000000000000
>>>> [   53.714061] R10: 0000000000000000 R11: ffff8bd0c16244e0 R12:
>>>> ffff8bd0c3e695b8
>>>> [   53.714696] R13: ffff8bd0c3b62000 R14: ffff8bd0c22857c0 R15:
>>>> ffff8bd0c3e695c0
>>>> [   53.715321] FS:  0000000000000000(0000) GS:ffff8bd1815ca000(0000)
>>>> knlGS:0000000000000000
>>>> [   53.715956] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [   53.716651] CR2: 000055667ef28e10 CR3: 0000000106cc2005 CR4:
>>>> 0000000000772ef0
>>>> [   53.717295] PKRU: 55555554
>>>> [   53.717918] note: kworker/11:2[360] exited with preempt_count 1
>>>>
>>>>
>>>>>> The bug was introduced in commit:
>>>>>>
>>>>>> bb80f7618832 - parse_longname(): strrchr() expects NUL-terminated string
>>>>>>
>>>>>> str is guarded by __free(kfree), but advanced later for skipping
>>>>>> the initial '_' in snapshot names.
>>>>>> This patch removes the need for advancing the pointer so kfree()
>>>>>> could do proper memory cleanup.
>>>>>>
>>>>>
>>>>> I cannot follow of this explanation. What is the wrong? Why should we fix
>>>>> something here?
>>>>
>>>> In bb80f7618832, the pointer in variable "str" is guarded by
>>>> __free(kfree), which means the pointer returned by kmemdup_nul() is
>>>> automatically freed. kfree() should receive the same pointer as returned
>>>> by kmemdump_nul(), but this is not the case, as the pointer is advanced
>>>> by one. kmemdup_nul() may return for example 0x1234000, but kfree() is
>>>> called with 0x1234001. I don't know the exact behavior of kfree(), but I
>>>> assume calling kfree() with random pointers leads to UB?
>>>
>>> Please, see my comments below.
>>>
>>>>
>>>>>> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220807
>>>>>>
>>>>>
>>>>> Why the issue had not been reported to CephFS community through email or by
>>>>> means of https://tracker.ceph.com?
>>>> It's a kernel bug and not related to any ceph packages, so I've reported
>>>> it to the kernel issue tracking system.
>>>>
>>>>> Have you run xfstests for your patch?
>>>> No, not aware of it. How is xfs related to cephfs?
>>>
>>> The xfstests is the regression testing suite that is used for testing all of
>>> Linux file systems (and CephFS too). But if you are not file system guy, then
>>> it's OK that you didn't run the xfstests.
>>>
>>>>
>>>>
>>>>>> Fixes: bb80f7618832 - parse_longname(): strrchr() expects NUL-terminated string
>>>>>>
>>>>>> Cc: stable@vger.kernel.org
>>>>>> Suggested-by: Helge Deller <deller@gmx.de>
>>>>>> Signed-off-by: Daniel Vogelbacher <daniel@chaospixel.com>
>>>>>> ---
>>>>>>    fs/ceph/crypto.c | 8 ++++----
>>>>>>    1 file changed, 4 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
>>>>>> index 0ea4db650f85..3e051972e49d 100644
>>>>>> --- a/fs/ceph/crypto.c
>>>>>> +++ b/fs/ceph/crypto.c
>>>>>> @@ -166,12 +166,12 @@ static struct inode *parse_longname(const struct inode *parent,
>>>>>>    	struct ceph_vino vino = { .snap = CEPH_NOSNAP };
>>>>>>    	char *name_end, *inode_number;
>>>>>>    	int ret = -EIO;
>>>>>> -	/* NUL-terminate */
>>>>>> -	char *str __free(kfree) = kmemdup_nul(name, *name_len, GFP_KERNEL);
>>>>>> +	if (*name_len <= 1)
>>>>>
>>>>> I believe that even if we have *name_len <= 1, then current logic can manage it.
>>>>> Why do we need this fix? The commit message sounds really unclear for my taste.
>>>>> Could you prove that we really need this fix?
>>>>
>>>> I've added this protection because otherwise I do pointer arithmetic
>>>> without checking bounds. I couldn't give you a better excuse :) I could
>>>> simply remove it on your request.
>>>>
>>>
>>> OK. Let's analyze the code again.
>>>
>>> 	char *str __free(kfree) = kmemdup_nul(name, *name_len, GFP_KERNEL);
>>> 	if (!str)
>>> 		return ERR_PTR(-ENOMEM);
>>> 	/* Skip initial '_' */
>>> 	str++;
>>> 	name_end = strrchr(str, '_');
>>> 	if (!name_end) {
>>> 		doutc(cl, "failed to parse long snapshot name: %s\n", str);
>>> 		return ERR_PTR(-EIO);
>>> 	}
>>> 	*name_len = (name_end - str);
>>> 	if (*name_len <= 0) {
>>> 		pr_err_client(cl, "failed to parse long snapshot name\n");
>>> 		return ERR_PTR(-EIO);
>>> 	}
>>>
>>> First of all, we try to create a NULL-terminated string from unterminated data.
>>> If we provide name_len == 0, then we should allocate 1 byte/symbol string that
>>> contains only termination symbol. Potentially, we could not allocate memory at
>>> all if we are under memory pressure (this situation is managed by !str check).
>>> However, it doesn't make sense to try to allocate memory at that case. So, the
>>> length check at the beginning makes sense:
>>>
>>> 	if (*name_len <= 0)
>>> 		return ERR_PTR(-EIO);
>>>
>>> Next, we expect to have '_' at the beginning. Let's imagine that we don't have
>>> any '_' in the provided string, then it make sense to try to allocate memory. I
>>> suggest to call this next:
>>>
>>>          name_end = strnchr(name, *name_len, '_');
>>>          if (!name_end) {
>>> 		doutc(cl, "failed to parse long snapshot name: %s\n", str);
>>> 		return ERR_PTR(-EIO);
>>> 	} else if (name != name_end) {
>>>                  /* we expect '_' at the beginning */
>>> 		doutc(cl, "failed to parse long snapshot name: %s\n", str);
>>> 		return ERR_PTR(-EIO);
>>>          }
>>
>> We don't have the `str` variable here yet. I suggest I can simplify this all by:
>>
>> @ -166,7 +166,8 @@ static struct inode *parse_longname(const struct inode *parent,
>>          struct ceph_vino vino = { .snap = CEPH_NOSNAP };
>>          char *name_end, *inode_number;
>>          int ret = -EIO;
>> -       if (*name_len <= 1)
>> +       /* Snapshot name must start with an underscore */
>> +       if (*name_len <= 0 || name[0] != '_')
>>                  return ERR_PTR(-EIO);
>>          /* Skip initial '_' and NUL-terminate */
>>          char *str __free(kfree) = kmemdup_nul(name + 1, *name_len - 1, GFP_KERNEL);
>>
>>
>>
>>> If we have found the first instance of '_' at the beginning of name, then it
>>> makes sense to continue logic.
>>>
>>>          if (*name_len <= 1)
>>> 		return ERR_PTR(-EIO);
>>
>> See my comment above.
>>
>>> And here we can continue the existing logic:
>>>
>>> 	char *str __free(kfree) = kmemdup_nul(name, *name_len, GFP_KERNEL);
>>> 	if (!str)
>>> 		return ERR_PTR(-ENOMEM);
>>> 	/* Skip initial '_' */
>>> 	str++;
>>> 	name_end = strrchr(str, '_');
>>> 	if (!name_end) {
>>> 		doutc(cl, "failed to parse long snapshot name: %s\n", str);
>>> 		return ERR_PTR(-EIO);
>>> 	}
>>> 	*name_len = (name_end - str);
>>> 	if (*name_len <= 0) {
>>> 		pr_err_client(cl, "failed to parse long snapshot name\n");
>>> 		return ERR_PTR(-EIO);
>>> 	}
>>>
>>> Does this logic make sense to you?
>>
>> My simplified logic comes with the cost of potentially allocating memory for a
>> snapshot name that has no second underscore. But from my understanding, this
>> naming scheme is by convention for Ceph snapshot names, so this should not
>> happen in practice.
> 
> OK. I need to see the second version of the patch. I am completely lost myself
> in the discussion details. Could you please send the new version of the patch?
> Then, it will be clear if it's good enough already or we need to continue polish
> the code.

I will prepare the v2 patch in the next few days.

> 
>>
>>>
>>> However, I have started to think... Could we completely remove the kmemdup_nul()
>>> and to operate with the initial name only? I think it's possible, we simply need
>>> to use much smarter technique of string analysis. What do you think? It will be
>>> good to exclude the memory allocation here.
>>
>> I'm not a kernel nor ceph developer and it seems that most functions used here
>> don't have a variant for non null terminated strings. I assume it would be much extra
>> work just to remove the allocation entirely.
>>
>>
> 
> If you try to suggest any fix for CephFS kernel client, then you cannot excuse
> yourself by this "I'm not a kernel nor ceph developer". :) You should work on
> the patch until it will be good enough. :) The whole Ceph community could
> benefit from your fix. ;)
> 
> Thanks,
> Slava.


-- 
Best regards / Mit freundlichen Grüßen
Daniel Vogelbacher

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2] fs/ceph: Fix kernel oops due invalid pointer for kfree() in parse_longname()
  2025-12-20 14:01 [PATCH] fs/ceph: Fix kernel oops due invalid pointer for kfree() in parse_longname() Daniel Vogelbacher
  2025-12-22 20:08 ` Viacheslav Dubeyko
@ 2026-02-01  8:34 ` Daniel Vogelbacher
  2026-02-02 19:13   ` Viacheslav Dubeyko
  2026-02-03 19:40 ` [PATCH v3] " Daniel Vogelbacher
  2 siblings, 1 reply; 14+ messages in thread
From: Daniel Vogelbacher @ 2026-02-01  8:34 UTC (permalink / raw)
  To: ceph-devel; +Cc: Slava.Dubeyko, xiubli, idryomov

This fixes a kernel oops when reading ceph snapshot directories (.snap),
for example by simply run `ls /mnt/my_ceph/.snap`.

The bug was introduced in commit:

bb80f7618832 - parse_longname(): strrchr() expects NUL-terminated string

The variable str is guarded by __free(kfree), but advanced by one for
skipping the initial '_' in snapshot names. Thus, kfree() is called
with an invalid pointer.
This patch removes the need for advancing the pointer so kfree()
is called with correct memory pointer.

The full trace is:

[   53.703013] Oops: general protection fault, probably for non-canonical address 0xd0c22857c0000000: 0000 [#1] SMP PTI
[   53.703201] CPU: 11 UID: 0 PID: 360 Comm: kworker/11:2 Not tainted 6.18.0-rc7 #41 PREEMPT(voluntary)
[   53.703281] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[   53.703317] Workqueue: ceph-msgr ceph_con_workfn [libceph]
[   53.703424] RIP: 0010:rb_insert_color (/usr/src/linux/lib/rbtree.c:185 (discriminator 1) /usr/src/linux/lib/rbtree.c:436 (discriminator 1))
[   53.704503] Code: 76 17 48 83 e1 fc 48 3b 51 10 0f 84 b7 00 00 00 48 89 41 08 c3 cc cc cc cc 48 89 06 c3 cc cc cc cc 48 8b 4a 10 48 85 c9 74 05 <f6> 01 01 74 1b 48 8b 48 10 48 39 f9 74 68 48 89 c7 48 89 4a 08 48
All code
========
   0:	76 17                	jbe    0x19
   2:	48 83 e1 fc          	and    $0xfffffffffffffffc,%rcx
   6:	48 3b 51 10          	cmp    0x10(%rcx),%rdx
   a:	0f 84 b7 00 00 00    	je     0xc7
  10:	48 89 41 08          	mov    %rax,0x8(%rcx)
  14:	c3                   	ret
  15:	cc                   	int3
  16:	cc                   	int3
  17:	cc                   	int3
  18:	cc                   	int3
  19:	48 89 06             	mov    %rax,(%rsi)
  1c:	c3                   	ret
  1d:	cc                   	int3
  1e:	cc                   	int3
  1f:	cc                   	int3
  20:	cc                   	int3
  21:	48 8b 4a 10          	mov    0x10(%rdx),%rcx
  25:	48 85 c9             	test   %rcx,%rcx
  28:	74 05                	je     0x2f
  2a:*	f6 01 01             	testb  $0x1,(%rcx)		<-- trapping instruction
  2d:	74 1b                	je     0x4a
  2f:	48 8b 48 10          	mov    0x10(%rax),%rcx
  33:	48 39 f9             	cmp    %rdi,%rcx
  36:	74 68                	je     0xa0
  38:	48 89 c7             	mov    %rax,%rdi
  3b:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
  3f:	48                   	rex.W

Code starting with the faulting instruction
===========================================
   0:	f6 01 01             	testb  $0x1,(%rcx)
   3:	74 1b                	je     0x20
   5:	48 8b 48 10          	mov    0x10(%rax),%rcx
   9:	48 39 f9             	cmp    %rdi,%rcx
   c:	74 68                	je     0x76
   e:	48 89 c7             	mov    %rax,%rdi
  11:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
  15:	48                   	rex.W
[   53.704559] RSP: 0018:ffff9ab7c07579e0 EFLAGS: 00010286
[   53.704591] RAX: ffff8bd0c2285b40 RBX: ffff8bd0c2285240 RCX: d0c22857c0000000
[   53.704616] RDX: ffff8bd0c2285910 RSI: ffff8bd0c3e695c0 RDI: ffff8bd0c22855c0
[   53.704645] RBP: 0000000000002139 R08: 0000000000000000 R09: 0000000000000000
[   53.704668] R10: 0000000000000000 R11: ffff8bd0c16244e0 R12: ffff8bd0c3e695b8
[   53.704691] R13: ffff8bd0c3b62000 R14: ffff8bd0c22857c0 R15: ffff8bd0c3e695c0
[   53.704714] FS:  0000000000000000(0000) GS:ffff8bd1815ca000(0000) knlGS:0000000000000000
[   53.704741] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   53.704762] CR2: 000055667ef28e10 CR3: 0000000106cc2005 CR4: 0000000000772ef0
[   53.704790] PKRU: 55555554
[   53.704803] Call Trace:
[   53.704844]  <TASK>
[   53.704862] ceph_get_snapid_map (/usr/src/linux/./include/linux/spinlock.h:391 /usr/src/linux/fs/ceph/snap.c:1255) ceph
[   53.704957] ceph_fill_inode (/usr/src/linux/fs/ceph/inode.c:1062 (discriminator 2)) ceph
[   53.705019]  ? __pfx_ceph_set_ino_cb (/usr/src/linux/fs/ceph/inode.c:46) ceph
[   53.705074]  ? __pfx_ceph_ino_compare (/usr/src/linux/fs/ceph/super.h:595) ceph
[   53.705132] ceph_readdir_prepopulate (/usr/src/linux/fs/ceph/inode.c:2113) ceph
[   53.705191] mds_dispatch (/usr/src/linux/fs/ceph/mds_client.c:3993 /usr/src/linux/fs/ceph/mds_client.c:6299) ceph
[   53.705253]  ? sock_recvmsg (/usr/src/linux/net/socket.c:1078 (discriminator 1) /usr/src/linux/net/socket.c:1100 (discriminator 1))
[   53.705279] ceph_con_process_message (/usr/src/linux/net/ceph/messenger.c:1427) libceph
[   53.705347] process_message (/usr/src/linux/net/ceph/messenger_v2.c:2879) libceph
[   53.705406] ceph_con_v2_try_read (/usr/src/linux/net/ceph/messenger_v2.c:3043 /usr/src/linux/net/ceph/messenger_v2.c:3099 /usr/src/linux/net/ceph/messenger_v2.c:3148) libceph
[   53.705467]  ? psi_group_change (/usr/src/linux/kernel/sched/psi.c:876)
[   53.705488]  ? sched_balance_newidle (/usr/src/linux/kernel/sched/fair.c:12902 (discriminator 2))
[   53.705512]  ? psi_task_switch (/usr/src/linux/kernel/sched/psi.c:984 (discriminator 2))
[   53.705532]  ? _raw_spin_unlock (/usr/src/linux/./arch/x86/include/asm/paravirt.h:562 /usr/src/linux/./arch/x86/include/asm/qspinlock.h:57 /usr/src/linux/./include/linux/spinlock.h:204 /usr/src/linux/./include/linux/spinlock_api_smp.h:142 /usr/src/linux/kernel/locking/spinlock.c:186)
[   53.705550]  ? finish_task_switch.isra.0 (/usr/src/linux/./arch/x86/include/asm/paravirt.h:671 /usr/src/linux/kernel/sched/sched.h:1559 /usr/src/linux/kernel/sched/core.c:5073 /usr/src/linux/kernel/sched/core.c:5191)
[   53.705575] ceph_con_workfn (/usr/src/linux/net/ceph/messenger.c:1578) libceph
[   53.705627]  process_one_work (/usr/src/linux/./arch/x86/include/asm/jump_label.h:36 /usr/src/linux/./include/trace/events/workqueue.h:110 /usr/src/linux/kernel/workqueue.c:3268)
[   53.705657]  worker_thread (/usr/src/linux/kernel/workqueue.c:3340 (discriminator 2) /usr/src/linux/kernel/workqueue.c:3427 (discriminator 2))
[   53.705679]  ? __pfx_worker_thread (/usr/src/linux/kernel/workqueue.c:3373)
[   53.705700]  kthread (/usr/src/linux/kernel/kthread.c:463)
[   53.705717]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
[   53.705734]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
[   53.705752]  ret_from_fork (/usr/src/linux/arch/x86/kernel/process.c:164)
[   53.705776]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
[   53.705793]  ret_from_fork_asm (/usr/src/linux/arch/x86/entry/entry_64.S:255)
[   53.705826]  </TASK>
[   53.705842] Modules linked in: ceph netfs libceph cfg80211 rfkill 8021q garp stp mrp llc binfmt_misc intel_rapl_msr intel_rapl_common intel_uncore_frequency_common kvm_intel virtio_gpu joydev kvm drm_client_lib virtio_dma_buf evdev drm_shmem_helper sg drm_kms_helper virtio_balloon button irqbypass ghash_clmulni_intel aesni_intel rapl pcspkr drm configfs efi_pstore nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vmw_vmci vsock qemu_fw_cfg virtio_rng autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid sr_mod cdrom dm_mod ahci libahci libata xhci_pci iTCO_wdt intel_pmc_bxt xhci_hcd iTCO_vendor_support scsi_mod psmouse virtio_net i2c_i801 watchdog serio_raw i2c_smbus lpc_ich scsi_common usbcore net_failover failover virtio_blk usb_common
[   53.708740] ---[ end trace 0000000000000000 ]---
[   53.709462] RIP: 0010:rb_insert_color (/usr/src/linux/lib/rbtree.c:185 (discriminator 1) /usr/src/linux/lib/rbtree.c:436 (discriminator 1))
[   53.710118] Code: 76 17 48 83 e1 fc 48 3b 51 10 0f 84 b7 00 00 00 48 89 41 08 c3 cc cc cc cc 48 89 06 c3 cc cc cc cc 48 8b 4a 10 48 85 c9 74 05 <f6> 01 01 74 1b 48 8b 48 10 48 39 f9 74 68 48 89 c7 48 89 4a 08 48
All code
========
   0:	76 17                	jbe    0x19
   2:	48 83 e1 fc          	and    $0xfffffffffffffffc,%rcx
   6:	48 3b 51 10          	cmp    0x10(%rcx),%rdx
   a:	0f 84 b7 00 00 00    	je     0xc7
  10:	48 89 41 08          	mov    %rax,0x8(%rcx)
  14:	c3                   	ret
  15:	cc                   	int3
  16:	cc                   	int3
  17:	cc                   	int3
  18:	cc                   	int3
  19:	48 89 06             	mov    %rax,(%rsi)
  1c:	c3                   	ret
  1d:	cc                   	int3
  1e:	cc                   	int3
  1f:	cc                   	int3
  20:	cc                   	int3
  21:	48 8b 4a 10          	mov    0x10(%rdx),%rcx
  25:	48 85 c9             	test   %rcx,%rcx
  28:	74 05                	je     0x2f
  2a:*	f6 01 01             	testb  $0x1,(%rcx)		<-- trapping instruction
  2d:	74 1b                	je     0x4a
  2f:	48 8b 48 10          	mov    0x10(%rax),%rcx
  33:	48 39 f9             	cmp    %rdi,%rcx
  36:	74 68                	je     0xa0
  38:	48 89 c7             	mov    %rax,%rdi
  3b:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
  3f:	48                   	rex.W

Code starting with the faulting instruction
===========================================
   0:	f6 01 01             	testb  $0x1,(%rcx)
   3:	74 1b                	je     0x20
   5:	48 8b 48 10          	mov    0x10(%rax),%rcx
   9:	48 39 f9             	cmp    %rdi,%rcx
   c:	74 68                	je     0x76
   e:	48 89 c7             	mov    %rax,%rdi
  11:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
  15:	48                   	rex.W
[   53.711453] RSP: 0018:ffff9ab7c07579e0 EFLAGS: 00010286
[   53.712112] RAX: ffff8bd0c2285b40 RBX: ffff8bd0c2285240 RCX: d0c22857c0000000
[   53.712798] RDX: ffff8bd0c2285910 RSI: ffff8bd0c3e695c0 RDI: ffff8bd0c22855c0
[   53.713423] RBP: 0000000000002139 R08: 0000000000000000 R09: 0000000000000000
[   53.714061] R10: 0000000000000000 R11: ffff8bd0c16244e0 R12: ffff8bd0c3e695b8
[   53.714696] R13: ffff8bd0c3b62000 R14: ffff8bd0c22857c0 R15: ffff8bd0c3e695c0
[   53.715321] FS:  0000000000000000(0000) GS:ffff8bd1815ca000(0000) knlGS:0000000000000000
[   53.715956] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   53.716651] CR2: 000055667ef28e10 CR3: 0000000106cc2005 CR4: 0000000000772ef0
[   53.717295] PKRU: 55555554
[   53.717918] note: kworker/11:2[360] exited with preempt_count 1


Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220807
Fixes: bb80f7618832 - parse_longname(): strrchr() expects NUL-terminated string

Cc: stable@vger.kernel.org
Suggested-by: Helge Deller <deller@gmx.de>
Signed-off-by: Daniel Vogelbacher <daniel@chaospixel.com>
---
 fs/ceph/crypto.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
index 0ea4db650f85..9a115282f67d 100644
--- a/fs/ceph/crypto.c
+++ b/fs/ceph/crypto.c
@@ -166,12 +166,13 @@ static struct inode *parse_longname(const struct inode *parent,
 	struct ceph_vino vino = { .snap = CEPH_NOSNAP };
 	char *name_end, *inode_number;
 	int ret = -EIO;
-	/* NUL-terminate */
-	char *str __free(kfree) = kmemdup_nul(name, *name_len, GFP_KERNEL);
+	/* Snapshot name must start with an underscore */
+	if (*name_len <= 0 || name[0] != '_')
+		return ERR_PTR(-EIO);
+	/* Skip initial '_' and NUL-terminate */
+	char *str __free(kfree) = kmemdup_nul(name + 1, *name_len - 1, GFP_KERNEL);
 	if (!str)
 		return ERR_PTR(-ENOMEM);
-	/* Skip initial '_' */
-	str++;
 	name_end = strrchr(str, '_');
 	if (!name_end) {
 		doutc(cl, "failed to parse long snapshot name: %s\n", str);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re:  [PATCH v2] fs/ceph: Fix kernel oops due invalid pointer for kfree() in parse_longname()
  2026-02-01  8:34 ` [PATCH v2] " Daniel Vogelbacher
@ 2026-02-02 19:13   ` Viacheslav Dubeyko
  2026-02-03 19:23     ` Viacheslav Dubeyko
  0 siblings, 1 reply; 14+ messages in thread
From: Viacheslav Dubeyko @ 2026-02-02 19:13 UTC (permalink / raw)
  To: daniel@chaospixel.com, ceph-devel@vger.kernel.org
  Cc: Xiubo Li, idryomov@gmail.com

On Sun, 2026-02-01 at 09:34 +0100, Daniel Vogelbacher wrote:
> This fixes a kernel oops when reading ceph snapshot directories (.snap),
> for example by simply run `ls /mnt/my_ceph/.snap`.
> 
> The bug was introduced in commit:
> 
> bb80f7618832 - parse_longname(): strrchr() expects NUL-terminated string
> 
> The variable str is guarded by __free(kfree), but advanced by one for
> skipping the initial '_' in snapshot names. Thus, kfree() is called
> with an invalid pointer.
> This patch removes the need for advancing the pointer so kfree()
> is called with correct memory pointer.
> 
> The full trace is:
> 
> [   53.703013] Oops: general protection fault, probably for non-canonical address 0xd0c22857c0000000: 0000 [#1] SMP PTI
> [   53.703201] CPU: 11 UID: 0 PID: 360 Comm: kworker/11:2 Not tainted 6.18.0-rc7 #41 PREEMPT(voluntary)
> [   53.703281] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [   53.703317] Workqueue: ceph-msgr ceph_con_workfn [libceph]
> [   53.703424] RIP: 0010:rb_insert_color (/usr/src/linux/lib/rbtree.c:185 (discriminator 1) /usr/src/linux/lib/rbtree.c:436 (discriminator 1))
> [   53.704503] Code: 76 17 48 83 e1 fc 48 3b 51 10 0f 84 b7 00 00 00 48 89 41 08 c3 cc cc cc cc 48 89 06 c3 cc cc cc cc 48 8b 4a 10 48 85 c9 74 05 <f6> 01 01 74 1b 48 8b 48 10 48 39 f9 74 68 48 89 c7 48 89 4a 08 48
> All code
> ========
>    0:	76 17                	jbe    0x19
>    2:	48 83 e1 fc          	and    $0xfffffffffffffffc,%rcx
>    6:	48 3b 51 10          	cmp    0x10(%rcx),%rdx
>    a:	0f 84 b7 00 00 00    	je     0xc7
>   10:	48 89 41 08          	mov    %rax,0x8(%rcx)
>   14:	c3                   	ret
>   15:	cc                   	int3
>   16:	cc                   	int3
>   17:	cc                   	int3
>   18:	cc                   	int3
>   19:	48 89 06             	mov    %rax,(%rsi)
>   1c:	c3                   	ret
>   1d:	cc                   	int3
>   1e:	cc                   	int3
>   1f:	cc                   	int3
>   20:	cc                   	int3
>   21:	48 8b 4a 10          	mov    0x10(%rdx),%rcx
>   25:	48 85 c9             	test   %rcx,%rcx
>   28:	74 05                	je     0x2f
>   2a:*	f6 01 01             	testb  $0x1,(%rcx)		<-- trapping instruction
>   2d:	74 1b                	je     0x4a
>   2f:	48 8b 48 10          	mov    0x10(%rax),%rcx
>   33:	48 39 f9             	cmp    %rdi,%rcx
>   36:	74 68                	je     0xa0
>   38:	48 89 c7             	mov    %rax,%rdi
>   3b:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
>   3f:	48                   	rex.W
> 
> Code starting with the faulting instruction
> ===========================================
>    0:	f6 01 01             	testb  $0x1,(%rcx)
>    3:	74 1b                	je     0x20
>    5:	48 8b 48 10          	mov    0x10(%rax),%rcx
>    9:	48 39 f9             	cmp    %rdi,%rcx
>    c:	74 68                	je     0x76
>    e:	48 89 c7             	mov    %rax,%rdi
>   11:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
>   15:	48                   	rex.W
> [   53.704559] RSP: 0018:ffff9ab7c07579e0 EFLAGS: 00010286
> [   53.704591] RAX: ffff8bd0c2285b40 RBX: ffff8bd0c2285240 RCX: d0c22857c0000000
> [   53.704616] RDX: ffff8bd0c2285910 RSI: ffff8bd0c3e695c0 RDI: ffff8bd0c22855c0
> [   53.704645] RBP: 0000000000002139 R08: 0000000000000000 R09: 0000000000000000
> [   53.704668] R10: 0000000000000000 R11: ffff8bd0c16244e0 R12: ffff8bd0c3e695b8
> [   53.704691] R13: ffff8bd0c3b62000 R14: ffff8bd0c22857c0 R15: ffff8bd0c3e695c0
> [   53.704714] FS:  0000000000000000(0000) GS:ffff8bd1815ca000(0000) knlGS:0000000000000000
> [   53.704741] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   53.704762] CR2: 000055667ef28e10 CR3: 0000000106cc2005 CR4: 0000000000772ef0
> [   53.704790] PKRU: 55555554
> [   53.704803] Call Trace:
> [   53.704844]  <TASK>
> [   53.704862] ceph_get_snapid_map (/usr/src/linux/./include/linux/spinlock.h:391 /usr/src/linux/fs/ceph/snap.c:1255) ceph
> [   53.704957] ceph_fill_inode (/usr/src/linux/fs/ceph/inode.c:1062 (discriminator 2)) ceph
> [   53.705019]  ? __pfx_ceph_set_ino_cb (/usr/src/linux/fs/ceph/inode.c:46) ceph
> [   53.705074]  ? __pfx_ceph_ino_compare (/usr/src/linux/fs/ceph/super.h:595) ceph
> [   53.705132] ceph_readdir_prepopulate (/usr/src/linux/fs/ceph/inode.c:2113) ceph
> [   53.705191] mds_dispatch (/usr/src/linux/fs/ceph/mds_client.c:3993 /usr/src/linux/fs/ceph/mds_client.c:6299) ceph
> [   53.705253]  ? sock_recvmsg (/usr/src/linux/net/socket.c:1078 (discriminator 1) /usr/src/linux/net/socket.c:1100 (discriminator 1))
> [   53.705279] ceph_con_process_message (/usr/src/linux/net/ceph/messenger.c:1427) libceph
> [   53.705347] process_message (/usr/src/linux/net/ceph/messenger_v2.c:2879) libceph
> [   53.705406] ceph_con_v2_try_read (/usr/src/linux/net/ceph/messenger_v2.c:3043 /usr/src/linux/net/ceph/messenger_v2.c:3099 /usr/src/linux/net/ceph/messenger_v2.c:3148) libceph
> [   53.705467]  ? psi_group_change (/usr/src/linux/kernel/sched/psi.c:876)
> [   53.705488]  ? sched_balance_newidle (/usr/src/linux/kernel/sched/fair.c:12902 (discriminator 2))
> [   53.705512]  ? psi_task_switch (/usr/src/linux/kernel/sched/psi.c:984 (discriminator 2))
> [   53.705532]  ? _raw_spin_unlock (/usr/src/linux/./arch/x86/include/asm/paravirt.h:562 /usr/src/linux/./arch/x86/include/asm/qspinlock.h:57 /usr/src/linux/./include/linux/spinlock.h:204 /usr/src/linux/./include/linux/spinlock_api_smp.h:142 /usr/src/linux/kernel/locking/spinlock.c:186)
> [   53.705550]  ? finish_task_switch.isra.0 (/usr/src/linux/./arch/x86/include/asm/paravirt.h:671 /usr/src/linux/kernel/sched/sched.h:1559 /usr/src/linux/kernel/sched/core.c:5073 /usr/src/linux/kernel/sched/core.c:5191)
> [   53.705575] ceph_con_workfn (/usr/src/linux/net/ceph/messenger.c:1578) libceph
> [   53.705627]  process_one_work (/usr/src/linux/./arch/x86/include/asm/jump_label.h:36 /usr/src/linux/./include/trace/events/workqueue.h:110 /usr/src/linux/kernel/workqueue.c:3268)
> [   53.705657]  worker_thread (/usr/src/linux/kernel/workqueue.c:3340 (discriminator 2) /usr/src/linux/kernel/workqueue.c:3427 (discriminator 2))
> [   53.705679]  ? __pfx_worker_thread (/usr/src/linux/kernel/workqueue.c:3373)
> [   53.705700]  kthread (/usr/src/linux/kernel/kthread.c:463)
> [   53.705717]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
> [   53.705734]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
> [   53.705752]  ret_from_fork (/usr/src/linux/arch/x86/kernel/process.c:164)
> [   53.705776]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
> [   53.705793]  ret_from_fork_asm (/usr/src/linux/arch/x86/entry/entry_64.S:255)
> [   53.705826]  </TASK>
> [   53.705842] Modules linked in: ceph netfs libceph cfg80211 rfkill 8021q garp stp mrp llc binfmt_misc intel_rapl_msr intel_rapl_common intel_uncore_frequency_common kvm_intel virtio_gpu joydev kvm drm_client_lib virtio_dma_buf evdev drm_shmem_helper sg drm_kms_helper virtio_balloon button irqbypass ghash_clmulni_intel aesni_intel rapl pcspkr drm configfs efi_pstore nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vmw_vmci vsock qemu_fw_cfg virtio_rng autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid sr_mod cdrom dm_mod ahci libahci libata xhci_pci iTCO_wdt intel_pmc_bxt xhci_hcd iTCO_vendor_support scsi_mod psmouse virtio_net i2c_i801 watchdog serio_raw i2c_smbus lpc_ich scsi_common usbcore net_failover failover virtio_blk usb_common
> [   53.708740] ---[ end trace 0000000000000000 ]---
> [   53.709462] RIP: 0010:rb_insert_color (/usr/src/linux/lib/rbtree.c:185 (discriminator 1) /usr/src/linux/lib/rbtree.c:436 (discriminator 1))
> [   53.710118] Code: 76 17 48 83 e1 fc 48 3b 51 10 0f 84 b7 00 00 00 48 89 41 08 c3 cc cc cc cc 48 89 06 c3 cc cc cc cc 48 8b 4a 10 48 85 c9 74 05 <f6> 01 01 74 1b 48 8b 48 10 48 39 f9 74 68 48 89 c7 48 89 4a 08 48
> All code
> ========
>    0:	76 17                	jbe    0x19
>    2:	48 83 e1 fc          	and    $0xfffffffffffffffc,%rcx
>    6:	48 3b 51 10          	cmp    0x10(%rcx),%rdx
>    a:	0f 84 b7 00 00 00    	je     0xc7
>   10:	48 89 41 08          	mov    %rax,0x8(%rcx)
>   14:	c3                   	ret
>   15:	cc                   	int3
>   16:	cc                   	int3
>   17:	cc                   	int3
>   18:	cc                   	int3
>   19:	48 89 06             	mov    %rax,(%rsi)
>   1c:	c3                   	ret
>   1d:	cc                   	int3
>   1e:	cc                   	int3
>   1f:	cc                   	int3
>   20:	cc                   	int3
>   21:	48 8b 4a 10          	mov    0x10(%rdx),%rcx
>   25:	48 85 c9             	test   %rcx,%rcx
>   28:	74 05                	je     0x2f
>   2a:*	f6 01 01             	testb  $0x1,(%rcx)		<-- trapping instruction
>   2d:	74 1b                	je     0x4a
>   2f:	48 8b 48 10          	mov    0x10(%rax),%rcx
>   33:	48 39 f9             	cmp    %rdi,%rcx
>   36:	74 68                	je     0xa0
>   38:	48 89 c7             	mov    %rax,%rdi
>   3b:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
>   3f:	48                   	rex.W
> 
> Code starting with the faulting instruction
> ===========================================
>    0:	f6 01 01             	testb  $0x1,(%rcx)
>    3:	74 1b                	je     0x20
>    5:	48 8b 48 10          	mov    0x10(%rax),%rcx
>    9:	48 39 f9             	cmp    %rdi,%rcx
>    c:	74 68                	je     0x76
>    e:	48 89 c7             	mov    %rax,%rdi
>   11:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
>   15:	48                   	rex.W
> [   53.711453] RSP: 0018:ffff9ab7c07579e0 EFLAGS: 00010286
> [   53.712112] RAX: ffff8bd0c2285b40 RBX: ffff8bd0c2285240 RCX: d0c22857c0000000
> [   53.712798] RDX: ffff8bd0c2285910 RSI: ffff8bd0c3e695c0 RDI: ffff8bd0c22855c0
> [   53.713423] RBP: 0000000000002139 R08: 0000000000000000 R09: 0000000000000000
> [   53.714061] R10: 0000000000000000 R11: ffff8bd0c16244e0 R12: ffff8bd0c3e695b8
> [   53.714696] R13: ffff8bd0c3b62000 R14: ffff8bd0c22857c0 R15: ffff8bd0c3e695c0
> [   53.715321] FS:  0000000000000000(0000) GS:ffff8bd1815ca000(0000) knlGS:0000000000000000
> [   53.715956] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   53.716651] CR2: 000055667ef28e10 CR3: 0000000106cc2005 CR4: 0000000000772ef0
> [   53.717295] PKRU: 55555554
> [   53.717918] note: kworker/11:2[360] exited with preempt_count 1
> 
> 
> Closes: https://urldefense.proofpoint.com/v2/url?u=https-3A__bugzilla.kernel.org_show-5Fbug.cgi-3Fid-3D220807&d=DwIDAg&c=BSDicqBQBDjDI9RkVyTcHQ&r=q5bIm4AXMzc8NJu1_RGmnQ2fMWKq4Y4RAkElvUgSs00&m=vkr-T8GMZJtWfXZ4eiu8iUkwi7wK8aQiSNM-v2wjjfz0JEDMQl_jdykJSnDqxAQf&s=waEZSWfhBw5ypSHZwlXNHZTV4OMbbKRZveYMV8z-ICQ&e= 
> Fixes: bb80f7618832 - parse_longname(): strrchr() expects NUL-terminated string
> 
> Cc: stable@vger.kernel.org
> Suggested-by: Helge Deller <deller@gmx.de>
> Signed-off-by: Daniel Vogelbacher <daniel@chaospixel.com>
> ---
>  fs/ceph/crypto.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
> index 0ea4db650f85..9a115282f67d 100644
> --- a/fs/ceph/crypto.c
> +++ b/fs/ceph/crypto.c
> @@ -166,12 +166,13 @@ static struct inode *parse_longname(const struct inode *parent,
>  	struct ceph_vino vino = { .snap = CEPH_NOSNAP };
>  	char *name_end, *inode_number;
>  	int ret = -EIO;
> -	/* NUL-terminate */
> -	char *str __free(kfree) = kmemdup_nul(name, *name_len, GFP_KERNEL);
> +	/* Snapshot name must start with an underscore */
> +	if (*name_len <= 0 || name[0] != '_')
> +		return ERR_PTR(-EIO);
> +	/* Skip initial '_' and NUL-terminate */
> +	char *str __free(kfree) = kmemdup_nul(name + 1, *name_len - 1, GFP_KERNEL);
>  	if (!str)
>  		return ERR_PTR(-ENOMEM);
> -	/* Skip initial '_' */
> -	str++;
>  	name_end = strrchr(str, '_');
>  	if (!name_end) {
>  		doutc(cl, "failed to parse long snapshot name: %s\n", str);

Looks good.

Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>

Let me run the xfstests for your patch. I'll be back with the result ASAP.

Thanks,
Slava.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE:  [PATCH v2] fs/ceph: Fix kernel oops due invalid pointer for kfree() in parse_longname()
  2026-02-02 19:13   ` Viacheslav Dubeyko
@ 2026-02-03 19:23     ` Viacheslav Dubeyko
  2026-02-03 19:41       ` Daniel Vogelbacher
  0 siblings, 1 reply; 14+ messages in thread
From: Viacheslav Dubeyko @ 2026-02-03 19:23 UTC (permalink / raw)
  To: daniel@chaospixel.com, ceph-devel@vger.kernel.org
  Cc: Xiubo Li, idryomov@gmail.com

On Mon, 2026-02-02 at 19:13 +0000, Viacheslav Dubeyko wrote:
> On Sun, 2026-02-01 at 09:34 +0100, Daniel Vogelbacher wrote:
> > This fixes a kernel oops when reading ceph snapshot directories (.snap),
> > for example by simply run `ls /mnt/my_ceph/.snap`.
> > 
> > The bug was introduced in commit:
> > 
> > bb80f7618832 - parse_longname(): strrchr() expects NUL-terminated string
> > 
> > The variable str is guarded by __free(kfree), but advanced by one for
> > skipping the initial '_' in snapshot names. Thus, kfree() is called
> > with an invalid pointer.
> > This patch removes the need for advancing the pointer so kfree()
> > is called with correct memory pointer.
> > 
> > The full trace is:
> > 
> > [   53.703013] Oops: general protection fault, probably for non-canonical address 0xd0c22857c0000000: 0000 [#1] SMP PTI
> > [   53.703201] CPU: 11 UID: 0 PID: 360 Comm: kworker/11:2 Not tainted 6.18.0-rc7 #41 PREEMPT(voluntary)
> > [   53.703281] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > [   53.703317] Workqueue: ceph-msgr ceph_con_workfn [libceph]
> > [   53.703424] RIP: 0010:rb_insert_color (/usr/src/linux/lib/rbtree.c:185 (discriminator 1) /usr/src/linux/lib/rbtree.c:436 (discriminator 1))
> > [   53.704503] Code: 76 17 48 83 e1 fc 48 3b 51 10 0f 84 b7 00 00 00 48 89 41 08 c3 cc cc cc cc 48 89 06 c3 cc cc cc cc 48 8b 4a 10 48 85 c9 74 05 <f6> 01 01 74 1b 48 8b 48 10 48 39 f9 74 68 48 89 c7 48 89 4a 08 48
> > All code
> > ========
> >    0:	76 17                	jbe    0x19
> >    2:	48 83 e1 fc          	and    $0xfffffffffffffffc,%rcx
> >    6:	48 3b 51 10          	cmp    0x10(%rcx),%rdx
> >    a:	0f 84 b7 00 00 00    	je     0xc7
> >   10:	48 89 41 08          	mov    %rax,0x8(%rcx)
> >   14:	c3                   	ret
> >   15:	cc                   	int3
> >   16:	cc                   	int3
> >   17:	cc                   	int3
> >   18:	cc                   	int3
> >   19:	48 89 06             	mov    %rax,(%rsi)
> >   1c:	c3                   	ret
> >   1d:	cc                   	int3
> >   1e:	cc                   	int3
> >   1f:	cc                   	int3
> >   20:	cc                   	int3
> >   21:	48 8b 4a 10          	mov    0x10(%rdx),%rcx
> >   25:	48 85 c9             	test   %rcx,%rcx
> >   28:	74 05                	je     0x2f
> >   2a:*	f6 01 01             	testb  $0x1,(%rcx)		<-- trapping instruction
> >   2d:	74 1b                	je     0x4a
> >   2f:	48 8b 48 10          	mov    0x10(%rax),%rcx
> >   33:	48 39 f9             	cmp    %rdi,%rcx
> >   36:	74 68                	je     0xa0
> >   38:	48 89 c7             	mov    %rax,%rdi
> >   3b:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
> >   3f:	48                   	rex.W
> > 
> > Code starting with the faulting instruction
> > ===========================================
> >    0:	f6 01 01             	testb  $0x1,(%rcx)
> >    3:	74 1b                	je     0x20
> >    5:	48 8b 48 10          	mov    0x10(%rax),%rcx
> >    9:	48 39 f9             	cmp    %rdi,%rcx
> >    c:	74 68                	je     0x76
> >    e:	48 89 c7             	mov    %rax,%rdi
> >   11:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
> >   15:	48                   	rex.W
> > [   53.704559] RSP: 0018:ffff9ab7c07579e0 EFLAGS: 00010286
> > [   53.704591] RAX: ffff8bd0c2285b40 RBX: ffff8bd0c2285240 RCX: d0c22857c0000000
> > [   53.704616] RDX: ffff8bd0c2285910 RSI: ffff8bd0c3e695c0 RDI: ffff8bd0c22855c0
> > [   53.704645] RBP: 0000000000002139 R08: 0000000000000000 R09: 0000000000000000
> > [   53.704668] R10: 0000000000000000 R11: ffff8bd0c16244e0 R12: ffff8bd0c3e695b8
> > [   53.704691] R13: ffff8bd0c3b62000 R14: ffff8bd0c22857c0 R15: ffff8bd0c3e695c0
> > [   53.704714] FS:  0000000000000000(0000) GS:ffff8bd1815ca000(0000) knlGS:0000000000000000
> > [   53.704741] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   53.704762] CR2: 000055667ef28e10 CR3: 0000000106cc2005 CR4: 0000000000772ef0
> > [   53.704790] PKRU: 55555554
> > [   53.704803] Call Trace:
> > [   53.704844]  <TASK>
> > [   53.704862] ceph_get_snapid_map (/usr/src/linux/./include/linux/spinlock.h:391 /usr/src/linux/fs/ceph/snap.c:1255) ceph
> > [   53.704957] ceph_fill_inode (/usr/src/linux/fs/ceph/inode.c:1062 (discriminator 2)) ceph
> > [   53.705019]  ? __pfx_ceph_set_ino_cb (/usr/src/linux/fs/ceph/inode.c:46) ceph
> > [   53.705074]  ? __pfx_ceph_ino_compare (/usr/src/linux/fs/ceph/super.h:595) ceph
> > [   53.705132] ceph_readdir_prepopulate (/usr/src/linux/fs/ceph/inode.c:2113) ceph
> > [   53.705191] mds_dispatch (/usr/src/linux/fs/ceph/mds_client.c:3993 /usr/src/linux/fs/ceph/mds_client.c:6299) ceph
> > [   53.705253]  ? sock_recvmsg (/usr/src/linux/net/socket.c:1078 (discriminator 1) /usr/src/linux/net/socket.c:1100 (discriminator 1))
> > [   53.705279] ceph_con_process_message (/usr/src/linux/net/ceph/messenger.c:1427) libceph
> > [   53.705347] process_message (/usr/src/linux/net/ceph/messenger_v2.c:2879) libceph
> > [   53.705406] ceph_con_v2_try_read (/usr/src/linux/net/ceph/messenger_v2.c:3043 /usr/src/linux/net/ceph/messenger_v2.c:3099 /usr/src/linux/net/ceph/messenger_v2.c:3148) libceph
> > [   53.705467]  ? psi_group_change (/usr/src/linux/kernel/sched/psi.c:876)
> > [   53.705488]  ? sched_balance_newidle (/usr/src/linux/kernel/sched/fair.c:12902 (discriminator 2))
> > [   53.705512]  ? psi_task_switch (/usr/src/linux/kernel/sched/psi.c:984 (discriminator 2))
> > [   53.705532]  ? _raw_spin_unlock (/usr/src/linux/./arch/x86/include/asm/paravirt.h:562 /usr/src/linux/./arch/x86/include/asm/qspinlock.h:57 /usr/src/linux/./include/linux/spinlock.h:204 /usr/src/linux/./include/linux/spinlock_api_smp.h:142 /usr/src/linux/kernel/locking/spinlock.c:186)
> > [   53.705550]  ? finish_task_switch.isra.0 (/usr/src/linux/./arch/x86/include/asm/paravirt.h:671 /usr/src/linux/kernel/sched/sched.h:1559 /usr/src/linux/kernel/sched/core.c:5073 /usr/src/linux/kernel/sched/core.c:5191)
> > [   53.705575] ceph_con_workfn (/usr/src/linux/net/ceph/messenger.c:1578) libceph
> > [   53.705627]  process_one_work (/usr/src/linux/./arch/x86/include/asm/jump_label.h:36 /usr/src/linux/./include/trace/events/workqueue.h:110 /usr/src/linux/kernel/workqueue.c:3268)
> > [   53.705657]  worker_thread (/usr/src/linux/kernel/workqueue.c:3340 (discriminator 2) /usr/src/linux/kernel/workqueue.c:3427 (discriminator 2))
> > [   53.705679]  ? __pfx_worker_thread (/usr/src/linux/kernel/workqueue.c:3373)
> > [   53.705700]  kthread (/usr/src/linux/kernel/kthread.c:463)
> > [   53.705717]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
> > [   53.705734]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
> > [   53.705752]  ret_from_fork (/usr/src/linux/arch/x86/kernel/process.c:164)
> > [   53.705776]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
> > [   53.705793]  ret_from_fork_asm (/usr/src/linux/arch/x86/entry/entry_64.S:255)
> > [   53.705826]  </TASK>
> > [   53.705842] Modules linked in: ceph netfs libceph cfg80211 rfkill 8021q garp stp mrp llc binfmt_misc intel_rapl_msr intel_rapl_common intel_uncore_frequency_common kvm_intel virtio_gpu joydev kvm drm_client_lib virtio_dma_buf evdev drm_shmem_helper sg drm_kms_helper virtio_balloon button irqbypass ghash_clmulni_intel aesni_intel rapl pcspkr drm configfs efi_pstore nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vmw_vmci vsock qemu_fw_cfg virtio_rng autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid sr_mod cdrom dm_mod ahci libahci libata xhci_pci iTCO_wdt intel_pmc_bxt xhci_hcd iTCO_vendor_support scsi_mod psmouse virtio_net i2c_i801 watchdog serio_raw i2c_smbus lpc_ich scsi_common usbcore net_failover failover virtio_blk usb_common
> > [   53.708740] ---[ end trace 0000000000000000 ]---
> > [   53.709462] RIP: 0010:rb_insert_color (/usr/src/linux/lib/rbtree.c:185 (discriminator 1) /usr/src/linux/lib/rbtree.c:436 (discriminator 1))
> > [   53.710118] Code: 76 17 48 83 e1 fc 48 3b 51 10 0f 84 b7 00 00 00 48 89 41 08 c3 cc cc cc cc 48 89 06 c3 cc cc cc cc 48 8b 4a 10 48 85 c9 74 05 <f6> 01 01 74 1b 48 8b 48 10 48 39 f9 74 68 48 89 c7 48 89 4a 08 48
> > All code
> > ========
> >    0:	76 17                	jbe    0x19
> >    2:	48 83 e1 fc          	and    $0xfffffffffffffffc,%rcx
> >    6:	48 3b 51 10          	cmp    0x10(%rcx),%rdx
> >    a:	0f 84 b7 00 00 00    	je     0xc7
> >   10:	48 89 41 08          	mov    %rax,0x8(%rcx)
> >   14:	c3                   	ret
> >   15:	cc                   	int3
> >   16:	cc                   	int3
> >   17:	cc                   	int3
> >   18:	cc                   	int3
> >   19:	48 89 06             	mov    %rax,(%rsi)
> >   1c:	c3                   	ret
> >   1d:	cc                   	int3
> >   1e:	cc                   	int3
> >   1f:	cc                   	int3
> >   20:	cc                   	int3
> >   21:	48 8b 4a 10          	mov    0x10(%rdx),%rcx
> >   25:	48 85 c9             	test   %rcx,%rcx
> >   28:	74 05                	je     0x2f
> >   2a:*	f6 01 01             	testb  $0x1,(%rcx)		<-- trapping instruction
> >   2d:	74 1b                	je     0x4a
> >   2f:	48 8b 48 10          	mov    0x10(%rax),%rcx
> >   33:	48 39 f9             	cmp    %rdi,%rcx
> >   36:	74 68                	je     0xa0
> >   38:	48 89 c7             	mov    %rax,%rdi
> >   3b:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
> >   3f:	48                   	rex.W
> > 
> > Code starting with the faulting instruction
> > ===========================================
> >    0:	f6 01 01             	testb  $0x1,(%rcx)
> >    3:	74 1b                	je     0x20
> >    5:	48 8b 48 10          	mov    0x10(%rax),%rcx
> >    9:	48 39 f9             	cmp    %rdi,%rcx
> >    c:	74 68                	je     0x76
> >    e:	48 89 c7             	mov    %rax,%rdi
> >   11:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
> >   15:	48                   	rex.W
> > [   53.711453] RSP: 0018:ffff9ab7c07579e0 EFLAGS: 00010286
> > [   53.712112] RAX: ffff8bd0c2285b40 RBX: ffff8bd0c2285240 RCX: d0c22857c0000000
> > [   53.712798] RDX: ffff8bd0c2285910 RSI: ffff8bd0c3e695c0 RDI: ffff8bd0c22855c0
> > [   53.713423] RBP: 0000000000002139 R08: 0000000000000000 R09: 0000000000000000
> > [   53.714061] R10: 0000000000000000 R11: ffff8bd0c16244e0 R12: ffff8bd0c3e695b8
> > [   53.714696] R13: ffff8bd0c3b62000 R14: ffff8bd0c22857c0 R15: ffff8bd0c3e695c0
> > [   53.715321] FS:  0000000000000000(0000) GS:ffff8bd1815ca000(0000) knlGS:0000000000000000
> > [   53.715956] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   53.716651] CR2: 000055667ef28e10 CR3: 0000000106cc2005 CR4: 0000000000772ef0
> > [   53.717295] PKRU: 55555554
> > [   53.717918] note: kworker/11:2[360] exited with preempt_count 1
> > 
> > 
> > Closes: https://urldefense.proofpoint.com/v2/url?u=https-3A__bugzilla.kernel.org_show-5Fbug.cgi-3Fid-3D220807&d=DwIDAg&c=BSDicqBQBDjDI9RkVyTcHQ&r=q5bIm4AXMzc8NJu1_RGmnQ2fMWKq4Y4RAkElvUgSs00&m=vkr-T8GMZJtWfXZ4eiu8iUkwi7wK8aQiSNM-v2wjjfz0JEDMQl_jdykJSnDqxAQf&s=waEZSWfhBw5ypSHZwlXNHZTV4OMbbKRZveYMV8z-ICQ&e= 
> > Fixes: bb80f7618832 - parse_longname(): strrchr() expects NUL-terminated string
> > 
> > Cc: stable@vger.kernel.org
> > Suggested-by: Helge Deller <deller@gmx.de>
> > Signed-off-by: Daniel Vogelbacher <daniel@chaospixel.com>
> > ---
> >  fs/ceph/crypto.c | 9 +++++----
> >  1 file changed, 5 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
> > index 0ea4db650f85..9a115282f67d 100644
> > --- a/fs/ceph/crypto.c
> > +++ b/fs/ceph/crypto.c
> > @@ -166,12 +166,13 @@ static struct inode *parse_longname(const struct inode *parent,
> >  	struct ceph_vino vino = { .snap = CEPH_NOSNAP };
> >  	char *name_end, *inode_number;
> >  	int ret = -EIO;
> > -	/* NUL-terminate */
> > -	char *str __free(kfree) = kmemdup_nul(name, *name_len, GFP_KERNEL);
> > +	/* Snapshot name must start with an underscore */
> > +	if (*name_len <= 0 || name[0] != '_')
> > +		return ERR_PTR(-EIO);
> > +	/* Skip initial '_' and NUL-terminate */
> > +	char *str __free(kfree) = kmemdup_nul(name + 1, *name_len - 1, GFP_KERNEL);
> >  	if (!str)
> >  		return ERR_PTR(-ENOMEM);
> > -	/* Skip initial '_' */
> > -	str++;
> >  	name_end = strrchr(str, '_');
> >  	if (!name_end) {
> >  		doutc(cl, "failed to parse long snapshot name: %s\n", str);
> 
> Looks good.
> 
> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
> 
> Let me run the xfstests for your patch. I'll be back with the result ASAP.
> 
> 

The xfstests run has been successful. I don't see any new issue.

If I remember correctly, you have shared the issue reproduction path during of
our discussion. By why haven't you add this information into the commit message?
Could you please add these details into the commit message? :)

Thanks,
Slava.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3] fs/ceph: Fix kernel oops due invalid pointer for kfree() in parse_longname()
  2025-12-20 14:01 [PATCH] fs/ceph: Fix kernel oops due invalid pointer for kfree() in parse_longname() Daniel Vogelbacher
  2025-12-22 20:08 ` Viacheslav Dubeyko
  2026-02-01  8:34 ` [PATCH v2] " Daniel Vogelbacher
@ 2026-02-03 19:40 ` Daniel Vogelbacher
  2026-02-03 20:16   ` Viacheslav Dubeyko
  2 siblings, 1 reply; 14+ messages in thread
From: Daniel Vogelbacher @ 2026-02-03 19:40 UTC (permalink / raw)
  To: ceph-devel; +Cc: Slava.Dubeyko, xiubli, idryomov

This fixes a kernel oops when reading ceph snapshot directories (.snap),
for example by simply run `ls /mnt/my_ceph/.snap`.

The bug was introduced in commit:

bb80f7618832 - parse_longname(): strrchr() expects NUL-terminated string

The variable str is guarded by __free(kfree), but advanced by one for
skipping the initial '_' in snapshot names. Thus, kfree() is called
with an invalid pointer.
This patch removes the need for advancing the pointer so kfree()
is called with correct memory pointer.

Steps to reproduce:

1. Create snapshots on a cephfs volume (I've 63 snaps in my testcase)

2. Add cephfs mount to fstab
$ echo "samba-fileserver@.files=/volumes/datapool/stuff/3461082b-ecc9-4e82-8549-3fd2590d3fb6      /mnt/test/stuff   ceph     acl,noatime,_netdev    0       0" >> /etc/fstab

3. Reboot the system
$ systemctl reboot

4. Check if it's really mounted
$ mount | grep stuff

5. List snapshots (expected 63 snapshots on my system)
$ ls /mnt/test/stuff/.snap

Now ls hangs forever and the kernel log shows the oops.

The full trace is:

[   53.703013] Oops: general protection fault, probably for non-canonical address 0xd0c22857c0000000: 0000 [#1] SMP PTI
[   53.703201] CPU: 11 UID: 0 PID: 360 Comm: kworker/11:2 Not tainted 6.18.0-rc7 #41 PREEMPT(voluntary)
[   53.703281] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[   53.703317] Workqueue: ceph-msgr ceph_con_workfn [libceph]
[   53.703424] RIP: 0010:rb_insert_color (/usr/src/linux/lib/rbtree.c:185 (discriminator 1) /usr/src/linux/lib/rbtree.c:436 (discriminator 1))
[   53.704503] Code: 76 17 48 83 e1 fc 48 3b 51 10 0f 84 b7 00 00 00 48 89 41 08 c3 cc cc cc cc 48 89 06 c3 cc cc cc cc 48 8b 4a 10 48 85 c9 74 05 <f6> 01 01 74 1b 48 8b 48 10 48 39 f9 74 68 48 89 c7 48 89 4a 08 48
All code
========
   0:	76 17                	jbe    0x19
   2:	48 83 e1 fc          	and    $0xfffffffffffffffc,%rcx
   6:	48 3b 51 10          	cmp    0x10(%rcx),%rdx
   a:	0f 84 b7 00 00 00    	je     0xc7
  10:	48 89 41 08          	mov    %rax,0x8(%rcx)
  14:	c3                   	ret
  15:	cc                   	int3
  16:	cc                   	int3
  17:	cc                   	int3
  18:	cc                   	int3
  19:	48 89 06             	mov    %rax,(%rsi)
  1c:	c3                   	ret
  1d:	cc                   	int3
  1e:	cc                   	int3
  1f:	cc                   	int3
  20:	cc                   	int3
  21:	48 8b 4a 10          	mov    0x10(%rdx),%rcx
  25:	48 85 c9             	test   %rcx,%rcx
  28:	74 05                	je     0x2f
  2a:*	f6 01 01             	testb  $0x1,(%rcx)		<-- trapping instruction
  2d:	74 1b                	je     0x4a
  2f:	48 8b 48 10          	mov    0x10(%rax),%rcx
  33:	48 39 f9             	cmp    %rdi,%rcx
  36:	74 68                	je     0xa0
  38:	48 89 c7             	mov    %rax,%rdi
  3b:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
  3f:	48                   	rex.W

Code starting with the faulting instruction
===========================================
   0:	f6 01 01             	testb  $0x1,(%rcx)
   3:	74 1b                	je     0x20
   5:	48 8b 48 10          	mov    0x10(%rax),%rcx
   9:	48 39 f9             	cmp    %rdi,%rcx
   c:	74 68                	je     0x76
   e:	48 89 c7             	mov    %rax,%rdi
  11:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
  15:	48                   	rex.W
[   53.704559] RSP: 0018:ffff9ab7c07579e0 EFLAGS: 00010286
[   53.704591] RAX: ffff8bd0c2285b40 RBX: ffff8bd0c2285240 RCX: d0c22857c0000000
[   53.704616] RDX: ffff8bd0c2285910 RSI: ffff8bd0c3e695c0 RDI: ffff8bd0c22855c0
[   53.704645] RBP: 0000000000002139 R08: 0000000000000000 R09: 0000000000000000
[   53.704668] R10: 0000000000000000 R11: ffff8bd0c16244e0 R12: ffff8bd0c3e695b8
[   53.704691] R13: ffff8bd0c3b62000 R14: ffff8bd0c22857c0 R15: ffff8bd0c3e695c0
[   53.704714] FS:  0000000000000000(0000) GS:ffff8bd1815ca000(0000) knlGS:0000000000000000
[   53.704741] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   53.704762] CR2: 000055667ef28e10 CR3: 0000000106cc2005 CR4: 0000000000772ef0
[   53.704790] PKRU: 55555554
[   53.704803] Call Trace:
[   53.704844]  <TASK>
[   53.704862] ceph_get_snapid_map (/usr/src/linux/./include/linux/spinlock.h:391 /usr/src/linux/fs/ceph/snap.c:1255) ceph
[   53.704957] ceph_fill_inode (/usr/src/linux/fs/ceph/inode.c:1062 (discriminator 2)) ceph
[   53.705019]  ? __pfx_ceph_set_ino_cb (/usr/src/linux/fs/ceph/inode.c:46) ceph
[   53.705074]  ? __pfx_ceph_ino_compare (/usr/src/linux/fs/ceph/super.h:595) ceph
[   53.705132] ceph_readdir_prepopulate (/usr/src/linux/fs/ceph/inode.c:2113) ceph
[   53.705191] mds_dispatch (/usr/src/linux/fs/ceph/mds_client.c:3993 /usr/src/linux/fs/ceph/mds_client.c:6299) ceph
[   53.705253]  ? sock_recvmsg (/usr/src/linux/net/socket.c:1078 (discriminator 1) /usr/src/linux/net/socket.c:1100 (discriminator 1))
[   53.705279] ceph_con_process_message (/usr/src/linux/net/ceph/messenger.c:1427) libceph
[   53.705347] process_message (/usr/src/linux/net/ceph/messenger_v2.c:2879) libceph
[   53.705406] ceph_con_v2_try_read (/usr/src/linux/net/ceph/messenger_v2.c:3043 /usr/src/linux/net/ceph/messenger_v2.c:3099 /usr/src/linux/net/ceph/messenger_v2.c:3148) libceph
[   53.705467]  ? psi_group_change (/usr/src/linux/kernel/sched/psi.c:876)
[   53.705488]  ? sched_balance_newidle (/usr/src/linux/kernel/sched/fair.c:12902 (discriminator 2))
[   53.705512]  ? psi_task_switch (/usr/src/linux/kernel/sched/psi.c:984 (discriminator 2))
[   53.705532]  ? _raw_spin_unlock (/usr/src/linux/./arch/x86/include/asm/paravirt.h:562 /usr/src/linux/./arch/x86/include/asm/qspinlock.h:57 /usr/src/linux/./include/linux/spinlock.h:204 /usr/src/linux/./include/linux/spinlock_api_smp.h:142 /usr/src/linux/kernel/locking/spinlock.c:186)
[   53.705550]  ? finish_task_switch.isra.0 (/usr/src/linux/./arch/x86/include/asm/paravirt.h:671 /usr/src/linux/kernel/sched/sched.h:1559 /usr/src/linux/kernel/sched/core.c:5073 /usr/src/linux/kernel/sched/core.c:5191)
[   53.705575] ceph_con_workfn (/usr/src/linux/net/ceph/messenger.c:1578) libceph
[   53.705627]  process_one_work (/usr/src/linux/./arch/x86/include/asm/jump_label.h:36 /usr/src/linux/./include/trace/events/workqueue.h:110 /usr/src/linux/kernel/workqueue.c:3268)
[   53.705657]  worker_thread (/usr/src/linux/kernel/workqueue.c:3340 (discriminator 2) /usr/src/linux/kernel/workqueue.c:3427 (discriminator 2))
[   53.705679]  ? __pfx_worker_thread (/usr/src/linux/kernel/workqueue.c:3373)
[   53.705700]  kthread (/usr/src/linux/kernel/kthread.c:463)
[   53.705717]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
[   53.705734]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
[   53.705752]  ret_from_fork (/usr/src/linux/arch/x86/kernel/process.c:164)
[   53.705776]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
[   53.705793]  ret_from_fork_asm (/usr/src/linux/arch/x86/entry/entry_64.S:255)
[   53.705826]  </TASK>
[   53.705842] Modules linked in: ceph netfs libceph cfg80211 rfkill 8021q garp stp mrp llc binfmt_misc intel_rapl_msr intel_rapl_common intel_uncore_frequency_common kvm_intel virtio_gpu joydev kvm drm_client_lib virtio_dma_buf evdev drm_shmem_helper sg drm_kms_helper virtio_balloon button irqbypass ghash_clmulni_intel aesni_intel rapl pcspkr drm configfs efi_pstore nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vmw_vmci vsock qemu_fw_cfg virtio_rng autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid sr_mod cdrom dm_mod ahci libahci libata xhci_pci iTCO_wdt intel_pmc_bxt xhci_hcd iTCO_vendor_support scsi_mod psmouse virtio_net i2c_i801 watchdog serio_raw i2c_smbus lpc_ich scsi_common usbcore net_failover failover virtio_blk usb_common
[   53.708740] ---[ end trace 0000000000000000 ]---
[   53.709462] RIP: 0010:rb_insert_color (/usr/src/linux/lib/rbtree.c:185 (discriminator 1) /usr/src/linux/lib/rbtree.c:436 (discriminator 1))
[   53.710118] Code: 76 17 48 83 e1 fc 48 3b 51 10 0f 84 b7 00 00 00 48 89 41 08 c3 cc cc cc cc 48 89 06 c3 cc cc cc cc 48 8b 4a 10 48 85 c9 74 05 <f6> 01 01 74 1b 48 8b 48 10 48 39 f9 74 68 48 89 c7 48 89 4a 08 48
All code
========
   0:	76 17                	jbe    0x19
   2:	48 83 e1 fc          	and    $0xfffffffffffffffc,%rcx
   6:	48 3b 51 10          	cmp    0x10(%rcx),%rdx
   a:	0f 84 b7 00 00 00    	je     0xc7
  10:	48 89 41 08          	mov    %rax,0x8(%rcx)
  14:	c3                   	ret
  15:	cc                   	int3
  16:	cc                   	int3
  17:	cc                   	int3
  18:	cc                   	int3
  19:	48 89 06             	mov    %rax,(%rsi)
  1c:	c3                   	ret
  1d:	cc                   	int3
  1e:	cc                   	int3
  1f:	cc                   	int3
  20:	cc                   	int3
  21:	48 8b 4a 10          	mov    0x10(%rdx),%rcx
  25:	48 85 c9             	test   %rcx,%rcx
  28:	74 05                	je     0x2f
  2a:*	f6 01 01             	testb  $0x1,(%rcx)		<-- trapping instruction
  2d:	74 1b                	je     0x4a
  2f:	48 8b 48 10          	mov    0x10(%rax),%rcx
  33:	48 39 f9             	cmp    %rdi,%rcx
  36:	74 68                	je     0xa0
  38:	48 89 c7             	mov    %rax,%rdi
  3b:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
  3f:	48                   	rex.W

Code starting with the faulting instruction
===========================================
   0:	f6 01 01             	testb  $0x1,(%rcx)
   3:	74 1b                	je     0x20
   5:	48 8b 48 10          	mov    0x10(%rax),%rcx
   9:	48 39 f9             	cmp    %rdi,%rcx
   c:	74 68                	je     0x76
   e:	48 89 c7             	mov    %rax,%rdi
  11:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
  15:	48                   	rex.W
[   53.711453] RSP: 0018:ffff9ab7c07579e0 EFLAGS: 00010286
[   53.712112] RAX: ffff8bd0c2285b40 RBX: ffff8bd0c2285240 RCX: d0c22857c0000000
[   53.712798] RDX: ffff8bd0c2285910 RSI: ffff8bd0c3e695c0 RDI: ffff8bd0c22855c0
[   53.713423] RBP: 0000000000002139 R08: 0000000000000000 R09: 0000000000000000
[   53.714061] R10: 0000000000000000 R11: ffff8bd0c16244e0 R12: ffff8bd0c3e695b8
[   53.714696] R13: ffff8bd0c3b62000 R14: ffff8bd0c22857c0 R15: ffff8bd0c3e695c0
[   53.715321] FS:  0000000000000000(0000) GS:ffff8bd1815ca000(0000) knlGS:0000000000000000
[   53.715956] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   53.716651] CR2: 000055667ef28e10 CR3: 0000000106cc2005 CR4: 0000000000772ef0
[   53.717295] PKRU: 55555554
[   53.717918] note: kworker/11:2[360] exited with preempt_count 1


Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220807
Fixes: bb80f7618832 - parse_longname(): strrchr() expects NUL-terminated string

Cc: stable@vger.kernel.org
Suggested-by: Helge Deller <deller@gmx.de>
Signed-off-by: Daniel Vogelbacher <daniel@chaospixel.com>
---
 fs/ceph/crypto.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
index 0ea4db650f85..9a115282f67d 100644
--- a/fs/ceph/crypto.c
+++ b/fs/ceph/crypto.c
@@ -166,12 +166,13 @@ static struct inode *parse_longname(const struct inode *parent,
 	struct ceph_vino vino = { .snap = CEPH_NOSNAP };
 	char *name_end, *inode_number;
 	int ret = -EIO;
-	/* NUL-terminate */
-	char *str __free(kfree) = kmemdup_nul(name, *name_len, GFP_KERNEL);
+	/* Snapshot name must start with an underscore */
+	if (*name_len <= 0 || name[0] != '_')
+		return ERR_PTR(-EIO);
+	/* Skip initial '_' and NUL-terminate */
+	char *str __free(kfree) = kmemdup_nul(name + 1, *name_len - 1, GFP_KERNEL);
 	if (!str)
 		return ERR_PTR(-ENOMEM);
-	/* Skip initial '_' */
-	str++;
 	name_end = strrchr(str, '_');
 	if (!name_end) {
 		doutc(cl, "failed to parse long snapshot name: %s\n", str);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] fs/ceph: Fix kernel oops due invalid pointer for kfree() in parse_longname()
  2026-02-03 19:23     ` Viacheslav Dubeyko
@ 2026-02-03 19:41       ` Daniel Vogelbacher
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel Vogelbacher @ 2026-02-03 19:41 UTC (permalink / raw)
  To: Viacheslav Dubeyko, ceph-devel@vger.kernel.org
  Cc: Xiubo Li, idryomov@gmail.com

On 2/3/26 20:23, Viacheslav Dubeyko wrote:
> On Mon, 2026-02-02 at 19:13 +0000, Viacheslav Dubeyko wrote:
>> On Sun, 2026-02-01 at 09:34 +0100, Daniel Vogelbacher wrote:
>>> This fixes a kernel oops when reading ceph snapshot directories (.snap),
>>> for example by simply run `ls /mnt/my_ceph/.snap`.
>>>
>>> The bug was introduced in commit:
>>>
>>> bb80f7618832 - parse_longname(): strrchr() expects NUL-terminated string
>>>
>>> The variable str is guarded by __free(kfree), but advanced by one for
>>> skipping the initial '_' in snapshot names. Thus, kfree() is called
>>> with an invalid pointer.
>>> This patch removes the need for advancing the pointer so kfree()
>>> is called with correct memory pointer.
>>>
>>> The full trace is:
>>>
>>> [   53.703013] Oops: general protection fault, probably for non-canonical address 0xd0c22857c0000000: 0000 [#1] SMP PTI
>>> [   53.703201] CPU: 11 UID: 0 PID: 360 Comm: kworker/11:2 Not tainted 6.18.0-rc7 #41 PREEMPT(voluntary)
>>> [   53.703281] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
>>> [   53.703317] Workqueue: ceph-msgr ceph_con_workfn [libceph]
>>> [   53.703424] RIP: 0010:rb_insert_color (/usr/src/linux/lib/rbtree.c:185 (discriminator 1) /usr/src/linux/lib/rbtree.c:436 (discriminator 1))
>>> [   53.704503] Code: 76 17 48 83 e1 fc 48 3b 51 10 0f 84 b7 00 00 00 48 89 41 08 c3 cc cc cc cc 48 89 06 c3 cc cc cc cc 48 8b 4a 10 48 85 c9 74 05 <f6> 01 01 74 1b 48 8b 48 10 48 39 f9 74 68 48 89 c7 48 89 4a 08 48
>>> All code
>>> ========
>>>     0:	76 17                	jbe    0x19
>>>     2:	48 83 e1 fc          	and    $0xfffffffffffffffc,%rcx
>>>     6:	48 3b 51 10          	cmp    0x10(%rcx),%rdx
>>>     a:	0f 84 b7 00 00 00    	je     0xc7
>>>    10:	48 89 41 08          	mov    %rax,0x8(%rcx)
>>>    14:	c3                   	ret
>>>    15:	cc                   	int3
>>>    16:	cc                   	int3
>>>    17:	cc                   	int3
>>>    18:	cc                   	int3
>>>    19:	48 89 06             	mov    %rax,(%rsi)
>>>    1c:	c3                   	ret
>>>    1d:	cc                   	int3
>>>    1e:	cc                   	int3
>>>    1f:	cc                   	int3
>>>    20:	cc                   	int3
>>>    21:	48 8b 4a 10          	mov    0x10(%rdx),%rcx
>>>    25:	48 85 c9             	test   %rcx,%rcx
>>>    28:	74 05                	je     0x2f
>>>    2a:*	f6 01 01             	testb  $0x1,(%rcx)		<-- trapping instruction
>>>    2d:	74 1b                	je     0x4a
>>>    2f:	48 8b 48 10          	mov    0x10(%rax),%rcx
>>>    33:	48 39 f9             	cmp    %rdi,%rcx
>>>    36:	74 68                	je     0xa0
>>>    38:	48 89 c7             	mov    %rax,%rdi
>>>    3b:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
>>>    3f:	48                   	rex.W
>>>
>>> Code starting with the faulting instruction
>>> ===========================================
>>>     0:	f6 01 01             	testb  $0x1,(%rcx)
>>>     3:	74 1b                	je     0x20
>>>     5:	48 8b 48 10          	mov    0x10(%rax),%rcx
>>>     9:	48 39 f9             	cmp    %rdi,%rcx
>>>     c:	74 68                	je     0x76
>>>     e:	48 89 c7             	mov    %rax,%rdi
>>>    11:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
>>>    15:	48                   	rex.W
>>> [   53.704559] RSP: 0018:ffff9ab7c07579e0 EFLAGS: 00010286
>>> [   53.704591] RAX: ffff8bd0c2285b40 RBX: ffff8bd0c2285240 RCX: d0c22857c0000000
>>> [   53.704616] RDX: ffff8bd0c2285910 RSI: ffff8bd0c3e695c0 RDI: ffff8bd0c22855c0
>>> [   53.704645] RBP: 0000000000002139 R08: 0000000000000000 R09: 0000000000000000
>>> [   53.704668] R10: 0000000000000000 R11: ffff8bd0c16244e0 R12: ffff8bd0c3e695b8
>>> [   53.704691] R13: ffff8bd0c3b62000 R14: ffff8bd0c22857c0 R15: ffff8bd0c3e695c0
>>> [   53.704714] FS:  0000000000000000(0000) GS:ffff8bd1815ca000(0000) knlGS:0000000000000000
>>> [   53.704741] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [   53.704762] CR2: 000055667ef28e10 CR3: 0000000106cc2005 CR4: 0000000000772ef0
>>> [   53.704790] PKRU: 55555554
>>> [   53.704803] Call Trace:
>>> [   53.704844]  <TASK>
>>> [   53.704862] ceph_get_snapid_map (/usr/src/linux/./include/linux/spinlock.h:391 /usr/src/linux/fs/ceph/snap.c:1255) ceph
>>> [   53.704957] ceph_fill_inode (/usr/src/linux/fs/ceph/inode.c:1062 (discriminator 2)) ceph
>>> [   53.705019]  ? __pfx_ceph_set_ino_cb (/usr/src/linux/fs/ceph/inode.c:46) ceph
>>> [   53.705074]  ? __pfx_ceph_ino_compare (/usr/src/linux/fs/ceph/super.h:595) ceph
>>> [   53.705132] ceph_readdir_prepopulate (/usr/src/linux/fs/ceph/inode.c:2113) ceph
>>> [   53.705191] mds_dispatch (/usr/src/linux/fs/ceph/mds_client.c:3993 /usr/src/linux/fs/ceph/mds_client.c:6299) ceph
>>> [   53.705253]  ? sock_recvmsg (/usr/src/linux/net/socket.c:1078 (discriminator 1) /usr/src/linux/net/socket.c:1100 (discriminator 1))
>>> [   53.705279] ceph_con_process_message (/usr/src/linux/net/ceph/messenger.c:1427) libceph
>>> [   53.705347] process_message (/usr/src/linux/net/ceph/messenger_v2.c:2879) libceph
>>> [   53.705406] ceph_con_v2_try_read (/usr/src/linux/net/ceph/messenger_v2.c:3043 /usr/src/linux/net/ceph/messenger_v2.c:3099 /usr/src/linux/net/ceph/messenger_v2.c:3148) libceph
>>> [   53.705467]  ? psi_group_change (/usr/src/linux/kernel/sched/psi.c:876)
>>> [   53.705488]  ? sched_balance_newidle (/usr/src/linux/kernel/sched/fair.c:12902 (discriminator 2))
>>> [   53.705512]  ? psi_task_switch (/usr/src/linux/kernel/sched/psi.c:984 (discriminator 2))
>>> [   53.705532]  ? _raw_spin_unlock (/usr/src/linux/./arch/x86/include/asm/paravirt.h:562 /usr/src/linux/./arch/x86/include/asm/qspinlock.h:57 /usr/src/linux/./include/linux/spinlock.h:204 /usr/src/linux/./include/linux/spinlock_api_smp.h:142 /usr/src/linux/kernel/locking/spinlock.c:186)
>>> [   53.705550]  ? finish_task_switch.isra.0 (/usr/src/linux/./arch/x86/include/asm/paravirt.h:671 /usr/src/linux/kernel/sched/sched.h:1559 /usr/src/linux/kernel/sched/core.c:5073 /usr/src/linux/kernel/sched/core.c:5191)
>>> [   53.705575] ceph_con_workfn (/usr/src/linux/net/ceph/messenger.c:1578) libceph
>>> [   53.705627]  process_one_work (/usr/src/linux/./arch/x86/include/asm/jump_label.h:36 /usr/src/linux/./include/trace/events/workqueue.h:110 /usr/src/linux/kernel/workqueue.c:3268)
>>> [   53.705657]  worker_thread (/usr/src/linux/kernel/workqueue.c:3340 (discriminator 2) /usr/src/linux/kernel/workqueue.c:3427 (discriminator 2))
>>> [   53.705679]  ? __pfx_worker_thread (/usr/src/linux/kernel/workqueue.c:3373)
>>> [   53.705700]  kthread (/usr/src/linux/kernel/kthread.c:463)
>>> [   53.705717]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
>>> [   53.705734]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
>>> [   53.705752]  ret_from_fork (/usr/src/linux/arch/x86/kernel/process.c:164)
>>> [   53.705776]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
>>> [   53.705793]  ret_from_fork_asm (/usr/src/linux/arch/x86/entry/entry_64.S:255)
>>> [   53.705826]  </TASK>
>>> [   53.705842] Modules linked in: ceph netfs libceph cfg80211 rfkill 8021q garp stp mrp llc binfmt_misc intel_rapl_msr intel_rapl_common intel_uncore_frequency_common kvm_intel virtio_gpu joydev kvm drm_client_lib virtio_dma_buf evdev drm_shmem_helper sg drm_kms_helper virtio_balloon button irqbypass ghash_clmulni_intel aesni_intel rapl pcspkr drm configfs efi_pstore nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vmw_vmci vsock qemu_fw_cfg virtio_rng autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid sr_mod cdrom dm_mod ahci libahci libata xhci_pci iTCO_wdt intel_pmc_bxt xhci_hcd iTCO_vendor_support scsi_mod psmouse virtio_net i2c_i801 watchdog serio_raw i2c_smbus lpc_ich scsi_common usbcore net_failover failover virtio_blk usb_common
>>> [   53.708740] ---[ end trace 0000000000000000 ]---
>>> [   53.709462] RIP: 0010:rb_insert_color (/usr/src/linux/lib/rbtree.c:185 (discriminator 1) /usr/src/linux/lib/rbtree.c:436 (discriminator 1))
>>> [   53.710118] Code: 76 17 48 83 e1 fc 48 3b 51 10 0f 84 b7 00 00 00 48 89 41 08 c3 cc cc cc cc 48 89 06 c3 cc cc cc cc 48 8b 4a 10 48 85 c9 74 05 <f6> 01 01 74 1b 48 8b 48 10 48 39 f9 74 68 48 89 c7 48 89 4a 08 48
>>> All code
>>> ========
>>>     0:	76 17                	jbe    0x19
>>>     2:	48 83 e1 fc          	and    $0xfffffffffffffffc,%rcx
>>>     6:	48 3b 51 10          	cmp    0x10(%rcx),%rdx
>>>     a:	0f 84 b7 00 00 00    	je     0xc7
>>>    10:	48 89 41 08          	mov    %rax,0x8(%rcx)
>>>    14:	c3                   	ret
>>>    15:	cc                   	int3
>>>    16:	cc                   	int3
>>>    17:	cc                   	int3
>>>    18:	cc                   	int3
>>>    19:	48 89 06             	mov    %rax,(%rsi)
>>>    1c:	c3                   	ret
>>>    1d:	cc                   	int3
>>>    1e:	cc                   	int3
>>>    1f:	cc                   	int3
>>>    20:	cc                   	int3
>>>    21:	48 8b 4a 10          	mov    0x10(%rdx),%rcx
>>>    25:	48 85 c9             	test   %rcx,%rcx
>>>    28:	74 05                	je     0x2f
>>>    2a:*	f6 01 01             	testb  $0x1,(%rcx)		<-- trapping instruction
>>>    2d:	74 1b                	je     0x4a
>>>    2f:	48 8b 48 10          	mov    0x10(%rax),%rcx
>>>    33:	48 39 f9             	cmp    %rdi,%rcx
>>>    36:	74 68                	je     0xa0
>>>    38:	48 89 c7             	mov    %rax,%rdi
>>>    3b:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
>>>    3f:	48                   	rex.W
>>>
>>> Code starting with the faulting instruction
>>> ===========================================
>>>     0:	f6 01 01             	testb  $0x1,(%rcx)
>>>     3:	74 1b                	je     0x20
>>>     5:	48 8b 48 10          	mov    0x10(%rax),%rcx
>>>     9:	48 39 f9             	cmp    %rdi,%rcx
>>>     c:	74 68                	je     0x76
>>>     e:	48 89 c7             	mov    %rax,%rdi
>>>    11:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
>>>    15:	48                   	rex.W
>>> [   53.711453] RSP: 0018:ffff9ab7c07579e0 EFLAGS: 00010286
>>> [   53.712112] RAX: ffff8bd0c2285b40 RBX: ffff8bd0c2285240 RCX: d0c22857c0000000
>>> [   53.712798] RDX: ffff8bd0c2285910 RSI: ffff8bd0c3e695c0 RDI: ffff8bd0c22855c0
>>> [   53.713423] RBP: 0000000000002139 R08: 0000000000000000 R09: 0000000000000000
>>> [   53.714061] R10: 0000000000000000 R11: ffff8bd0c16244e0 R12: ffff8bd0c3e695b8
>>> [   53.714696] R13: ffff8bd0c3b62000 R14: ffff8bd0c22857c0 R15: ffff8bd0c3e695c0
>>> [   53.715321] FS:  0000000000000000(0000) GS:ffff8bd1815ca000(0000) knlGS:0000000000000000
>>> [   53.715956] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [   53.716651] CR2: 000055667ef28e10 CR3: 0000000106cc2005 CR4: 0000000000772ef0
>>> [   53.717295] PKRU: 55555554
>>> [   53.717918] note: kworker/11:2[360] exited with preempt_count 1
>>>
>>>
>>> Closes: https://urldefense.proofpoint.com/v2/url?u=https-3A__bugzilla.kernel.org_show-5Fbug.cgi-3Fid-3D220807&d=DwIDAg&c=BSDicqBQBDjDI9RkVyTcHQ&r=q5bIm4AXMzc8NJu1_RGmnQ2fMWKq4Y4RAkElvUgSs00&m=vkr-T8GMZJtWfXZ4eiu8iUkwi7wK8aQiSNM-v2wjjfz0JEDMQl_jdykJSnDqxAQf&s=waEZSWfhBw5ypSHZwlXNHZTV4OMbbKRZveYMV8z-ICQ&e=
>>> Fixes: bb80f7618832 - parse_longname(): strrchr() expects NUL-terminated string
>>>
>>> Cc: stable@vger.kernel.org
>>> Suggested-by: Helge Deller <deller@gmx.de>
>>> Signed-off-by: Daniel Vogelbacher <daniel@chaospixel.com>
>>> ---
>>>   fs/ceph/crypto.c | 9 +++++----
>>>   1 file changed, 5 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
>>> index 0ea4db650f85..9a115282f67d 100644
>>> --- a/fs/ceph/crypto.c
>>> +++ b/fs/ceph/crypto.c
>>> @@ -166,12 +166,13 @@ static struct inode *parse_longname(const struct inode *parent,
>>>   	struct ceph_vino vino = { .snap = CEPH_NOSNAP };
>>>   	char *name_end, *inode_number;
>>>   	int ret = -EIO;
>>> -	/* NUL-terminate */
>>> -	char *str __free(kfree) = kmemdup_nul(name, *name_len, GFP_KERNEL);
>>> +	/* Snapshot name must start with an underscore */
>>> +	if (*name_len <= 0 || name[0] != '_')
>>> +		return ERR_PTR(-EIO);
>>> +	/* Skip initial '_' and NUL-terminate */
>>> +	char *str __free(kfree) = kmemdup_nul(name + 1, *name_len - 1, GFP_KERNEL);
>>>   	if (!str)
>>>   		return ERR_PTR(-ENOMEM);
>>> -	/* Skip initial '_' */
>>> -	str++;
>>>   	name_end = strrchr(str, '_');
>>>   	if (!name_end) {
>>>   		doutc(cl, "failed to parse long snapshot name: %s\n", str);
>>
>> Looks good.
>>
>> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
>>
>> Let me run the xfstests for your patch. I'll be back with the result ASAP.
>>
>>
> 
> The xfstests run has been successful. I don't see any new issue.
> 
> If I remember correctly, you have shared the issue reproduction path during of
> our discussion. By why haven't you add this information into the commit message?
> Could you please add these details into the commit message? :)

Sure, please see patch v3.

> Thanks,
> Slava.


-- 
Best regards / Mit freundlichen Grüßen
Daniel Vogelbacher

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re:  [PATCH v3] fs/ceph: Fix kernel oops due invalid pointer for kfree() in parse_longname()
  2026-02-03 19:40 ` [PATCH v3] " Daniel Vogelbacher
@ 2026-02-03 20:16   ` Viacheslav Dubeyko
  2026-02-03 20:22     ` Ilya Dryomov
  0 siblings, 1 reply; 14+ messages in thread
From: Viacheslav Dubeyko @ 2026-02-03 20:16 UTC (permalink / raw)
  To: daniel@chaospixel.com, ceph-devel@vger.kernel.org
  Cc: Xiubo Li, idryomov@gmail.com

On Tue, 2026-02-03 at 20:40 +0100, Daniel Vogelbacher wrote:
> This fixes a kernel oops when reading ceph snapshot directories (.snap),
> for example by simply run `ls /mnt/my_ceph/.snap`.
> 
> The bug was introduced in commit:
> 
> bb80f7618832 - parse_longname(): strrchr() expects NUL-terminated string
> 
> The variable str is guarded by __free(kfree), but advanced by one for
> skipping the initial '_' in snapshot names. Thus, kfree() is called
> with an invalid pointer.
> This patch removes the need for advancing the pointer so kfree()
> is called with correct memory pointer.
> 
> Steps to reproduce:
> 
> 1. Create snapshots on a cephfs volume (I've 63 snaps in my testcase)
> 
> 2. Add cephfs mount to fstab
> $ echo "samba-fileserver@.files=/volumes/datapool/stuff/3461082b-ecc9-4e82-8549-3fd2590d3fb6      /mnt/test/stuff   ceph     acl,noatime,_netdev    0       0" >> /etc/fstab
> 
> 3. Reboot the system
> $ systemctl reboot
> 
> 4. Check if it's really mounted
> $ mount | grep stuff
> 
> 5. List snapshots (expected 63 snapshots on my system)
> $ ls /mnt/test/stuff/.snap
> 
> Now ls hangs forever and the kernel log shows the oops.
> 
> The full trace is:
> 
> [   53.703013] Oops: general protection fault, probably for non-canonical address 0xd0c22857c0000000: 0000 [#1] SMP PTI
> [   53.703201] CPU: 11 UID: 0 PID: 360 Comm: kworker/11:2 Not tainted 6.18.0-rc7 #41 PREEMPT(voluntary)
> [   53.703281] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [   53.703317] Workqueue: ceph-msgr ceph_con_workfn [libceph]
> [   53.703424] RIP: 0010:rb_insert_color (/usr/src/linux/lib/rbtree.c:185 (discriminator 1) /usr/src/linux/lib/rbtree.c:436 (discriminator 1))
> [   53.704503] Code: 76 17 48 83 e1 fc 48 3b 51 10 0f 84 b7 00 00 00 48 89 41 08 c3 cc cc cc cc 48 89 06 c3 cc cc cc cc 48 8b 4a 10 48 85 c9 74 05 <f6> 01 01 74 1b 48 8b 48 10 48 39 f9 74 68 48 89 c7 48 89 4a 08 48
> All code
> ========
>    0:	76 17                	jbe    0x19
>    2:	48 83 e1 fc          	and    $0xfffffffffffffffc,%rcx
>    6:	48 3b 51 10          	cmp    0x10(%rcx),%rdx
>    a:	0f 84 b7 00 00 00    	je     0xc7
>   10:	48 89 41 08          	mov    %rax,0x8(%rcx)
>   14:	c3                   	ret
>   15:	cc                   	int3
>   16:	cc                   	int3
>   17:	cc                   	int3
>   18:	cc                   	int3
>   19:	48 89 06             	mov    %rax,(%rsi)
>   1c:	c3                   	ret
>   1d:	cc                   	int3
>   1e:	cc                   	int3
>   1f:	cc                   	int3
>   20:	cc                   	int3
>   21:	48 8b 4a 10          	mov    0x10(%rdx),%rcx
>   25:	48 85 c9             	test   %rcx,%rcx
>   28:	74 05                	je     0x2f
>   2a:*	f6 01 01             	testb  $0x1,(%rcx)		<-- trapping instruction
>   2d:	74 1b                	je     0x4a
>   2f:	48 8b 48 10          	mov    0x10(%rax),%rcx
>   33:	48 39 f9             	cmp    %rdi,%rcx
>   36:	74 68                	je     0xa0
>   38:	48 89 c7             	mov    %rax,%rdi
>   3b:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
>   3f:	48                   	rex.W
> 
> Code starting with the faulting instruction
> ===========================================
>    0:	f6 01 01             	testb  $0x1,(%rcx)
>    3:	74 1b                	je     0x20
>    5:	48 8b 48 10          	mov    0x10(%rax),%rcx
>    9:	48 39 f9             	cmp    %rdi,%rcx
>    c:	74 68                	je     0x76
>    e:	48 89 c7             	mov    %rax,%rdi
>   11:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
>   15:	48                   	rex.W
> [   53.704559] RSP: 0018:ffff9ab7c07579e0 EFLAGS: 00010286
> [   53.704591] RAX: ffff8bd0c2285b40 RBX: ffff8bd0c2285240 RCX: d0c22857c0000000
> [   53.704616] RDX: ffff8bd0c2285910 RSI: ffff8bd0c3e695c0 RDI: ffff8bd0c22855c0
> [   53.704645] RBP: 0000000000002139 R08: 0000000000000000 R09: 0000000000000000
> [   53.704668] R10: 0000000000000000 R11: ffff8bd0c16244e0 R12: ffff8bd0c3e695b8
> [   53.704691] R13: ffff8bd0c3b62000 R14: ffff8bd0c22857c0 R15: ffff8bd0c3e695c0
> [   53.704714] FS:  0000000000000000(0000) GS:ffff8bd1815ca000(0000) knlGS:0000000000000000
> [   53.704741] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   53.704762] CR2: 000055667ef28e10 CR3: 0000000106cc2005 CR4: 0000000000772ef0
> [   53.704790] PKRU: 55555554
> [   53.704803] Call Trace:
> [   53.704844]  <TASK>
> [   53.704862] ceph_get_snapid_map (/usr/src/linux/./include/linux/spinlock.h:391 /usr/src/linux/fs/ceph/snap.c:1255) ceph
> [   53.704957] ceph_fill_inode (/usr/src/linux/fs/ceph/inode.c:1062 (discriminator 2)) ceph
> [   53.705019]  ? __pfx_ceph_set_ino_cb (/usr/src/linux/fs/ceph/inode.c:46) ceph
> [   53.705074]  ? __pfx_ceph_ino_compare (/usr/src/linux/fs/ceph/super.h:595) ceph
> [   53.705132] ceph_readdir_prepopulate (/usr/src/linux/fs/ceph/inode.c:2113) ceph
> [   53.705191] mds_dispatch (/usr/src/linux/fs/ceph/mds_client.c:3993 /usr/src/linux/fs/ceph/mds_client.c:6299) ceph
> [   53.705253]  ? sock_recvmsg (/usr/src/linux/net/socket.c:1078 (discriminator 1) /usr/src/linux/net/socket.c:1100 (discriminator 1))
> [   53.705279] ceph_con_process_message (/usr/src/linux/net/ceph/messenger.c:1427) libceph
> [   53.705347] process_message (/usr/src/linux/net/ceph/messenger_v2.c:2879) libceph
> [   53.705406] ceph_con_v2_try_read (/usr/src/linux/net/ceph/messenger_v2.c:3043 /usr/src/linux/net/ceph/messenger_v2.c:3099 /usr/src/linux/net/ceph/messenger_v2.c:3148) libceph
> [   53.705467]  ? psi_group_change (/usr/src/linux/kernel/sched/psi.c:876)
> [   53.705488]  ? sched_balance_newidle (/usr/src/linux/kernel/sched/fair.c:12902 (discriminator 2))
> [   53.705512]  ? psi_task_switch (/usr/src/linux/kernel/sched/psi.c:984 (discriminator 2))
> [   53.705532]  ? _raw_spin_unlock (/usr/src/linux/./arch/x86/include/asm/paravirt.h:562 /usr/src/linux/./arch/x86/include/asm/qspinlock.h:57 /usr/src/linux/./include/linux/spinlock.h:204 /usr/src/linux/./include/linux/spinlock_api_smp.h:142 /usr/src/linux/kernel/locking/spinlock.c:186)
> [   53.705550]  ? finish_task_switch.isra.0 (/usr/src/linux/./arch/x86/include/asm/paravirt.h:671 /usr/src/linux/kernel/sched/sched.h:1559 /usr/src/linux/kernel/sched/core.c:5073 /usr/src/linux/kernel/sched/core.c:5191)
> [   53.705575] ceph_con_workfn (/usr/src/linux/net/ceph/messenger.c:1578) libceph
> [   53.705627]  process_one_work (/usr/src/linux/./arch/x86/include/asm/jump_label.h:36 /usr/src/linux/./include/trace/events/workqueue.h:110 /usr/src/linux/kernel/workqueue.c:3268)
> [   53.705657]  worker_thread (/usr/src/linux/kernel/workqueue.c:3340 (discriminator 2) /usr/src/linux/kernel/workqueue.c:3427 (discriminator 2))
> [   53.705679]  ? __pfx_worker_thread (/usr/src/linux/kernel/workqueue.c:3373)
> [   53.705700]  kthread (/usr/src/linux/kernel/kthread.c:463)
> [   53.705717]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
> [   53.705734]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
> [   53.705752]  ret_from_fork (/usr/src/linux/arch/x86/kernel/process.c:164)
> [   53.705776]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
> [   53.705793]  ret_from_fork_asm (/usr/src/linux/arch/x86/entry/entry_64.S:255)
> [   53.705826]  </TASK>
> [   53.705842] Modules linked in: ceph netfs libceph cfg80211 rfkill 8021q garp stp mrp llc binfmt_misc intel_rapl_msr intel_rapl_common intel_uncore_frequency_common kvm_intel virtio_gpu joydev kvm drm_client_lib virtio_dma_buf evdev drm_shmem_helper sg drm_kms_helper virtio_balloon button irqbypass ghash_clmulni_intel aesni_intel rapl pcspkr drm configfs efi_pstore nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vmw_vmci vsock qemu_fw_cfg virtio_rng autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid sr_mod cdrom dm_mod ahci libahci libata xhci_pci iTCO_wdt intel_pmc_bxt xhci_hcd iTCO_vendor_support scsi_mod psmouse virtio_net i2c_i801 watchdog serio_raw i2c_smbus lpc_ich scsi_common usbcore net_failover failover virtio_blk usb_common
> [   53.708740] ---[ end trace 0000000000000000 ]---
> [   53.709462] RIP: 0010:rb_insert_color (/usr/src/linux/lib/rbtree.c:185 (discriminator 1) /usr/src/linux/lib/rbtree.c:436 (discriminator 1))
> [   53.710118] Code: 76 17 48 83 e1 fc 48 3b 51 10 0f 84 b7 00 00 00 48 89 41 08 c3 cc cc cc cc 48 89 06 c3 cc cc cc cc 48 8b 4a 10 48 85 c9 74 05 <f6> 01 01 74 1b 48 8b 48 10 48 39 f9 74 68 48 89 c7 48 89 4a 08 48
> All code
> ========
>    0:	76 17                	jbe    0x19
>    2:	48 83 e1 fc          	and    $0xfffffffffffffffc,%rcx
>    6:	48 3b 51 10          	cmp    0x10(%rcx),%rdx
>    a:	0f 84 b7 00 00 00    	je     0xc7
>   10:	48 89 41 08          	mov    %rax,0x8(%rcx)
>   14:	c3                   	ret
>   15:	cc                   	int3
>   16:	cc                   	int3
>   17:	cc                   	int3
>   18:	cc                   	int3
>   19:	48 89 06             	mov    %rax,(%rsi)
>   1c:	c3                   	ret
>   1d:	cc                   	int3
>   1e:	cc                   	int3
>   1f:	cc                   	int3
>   20:	cc                   	int3
>   21:	48 8b 4a 10          	mov    0x10(%rdx),%rcx
>   25:	48 85 c9             	test   %rcx,%rcx
>   28:	74 05                	je     0x2f
>   2a:*	f6 01 01             	testb  $0x1,(%rcx)		<-- trapping instruction
>   2d:	74 1b                	je     0x4a
>   2f:	48 8b 48 10          	mov    0x10(%rax),%rcx
>   33:	48 39 f9             	cmp    %rdi,%rcx
>   36:	74 68                	je     0xa0
>   38:	48 89 c7             	mov    %rax,%rdi
>   3b:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
>   3f:	48                   	rex.W
> 
> Code starting with the faulting instruction
> ===========================================
>    0:	f6 01 01             	testb  $0x1,(%rcx)
>    3:	74 1b                	je     0x20
>    5:	48 8b 48 10          	mov    0x10(%rax),%rcx
>    9:	48 39 f9             	cmp    %rdi,%rcx
>    c:	74 68                	je     0x76
>    e:	48 89 c7             	mov    %rax,%rdi
>   11:	48 89 4a 08          	mov    %rcx,0x8(%rdx)
>   15:	48                   	rex.W
> [   53.711453] RSP: 0018:ffff9ab7c07579e0 EFLAGS: 00010286
> [   53.712112] RAX: ffff8bd0c2285b40 RBX: ffff8bd0c2285240 RCX: d0c22857c0000000
> [   53.712798] RDX: ffff8bd0c2285910 RSI: ffff8bd0c3e695c0 RDI: ffff8bd0c22855c0
> [   53.713423] RBP: 0000000000002139 R08: 0000000000000000 R09: 0000000000000000
> [   53.714061] R10: 0000000000000000 R11: ffff8bd0c16244e0 R12: ffff8bd0c3e695b8
> [   53.714696] R13: ffff8bd0c3b62000 R14: ffff8bd0c22857c0 R15: ffff8bd0c3e695c0
> [   53.715321] FS:  0000000000000000(0000) GS:ffff8bd1815ca000(0000) knlGS:0000000000000000
> [   53.715956] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   53.716651] CR2: 000055667ef28e10 CR3: 0000000106cc2005 CR4: 0000000000772ef0
> [   53.717295] PKRU: 55555554
> [   53.717918] note: kworker/11:2[360] exited with preempt_count 1
> 
> 
> Closes: https://urldefense.proofpoint.com/v2/url?u=https-3A__bugzilla.kernel.org_show-5Fbug.cgi-3Fid-3D220807&d=DwIDAg&c=BSDicqBQBDjDI9RkVyTcHQ&r=q5bIm4AXMzc8NJu1_RGmnQ2fMWKq4Y4RAkElvUgSs00&m=9uYAi-tacLykatKmRl3LQ-OSQx74kkNg-1jsH1vnLekoVAt2IW2alYrd5HYGhsZK&s=mH4zSOBAE-0mk_Os9bf16JcPkk0k2BmY82O8_DfbZw0&e= 
> Fixes: bb80f7618832 - parse_longname(): strrchr() expects NUL-terminated string
> 
> Cc: stable@vger.kernel.org
> Suggested-by: Helge Deller <deller@gmx.de>
> Signed-off-by: Daniel Vogelbacher <daniel@chaospixel.com>
> ---
>  fs/ceph/crypto.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
> index 0ea4db650f85..9a115282f67d 100644
> --- a/fs/ceph/crypto.c
> +++ b/fs/ceph/crypto.c
> @@ -166,12 +166,13 @@ static struct inode *parse_longname(const struct inode *parent,
>  	struct ceph_vino vino = { .snap = CEPH_NOSNAP };
>  	char *name_end, *inode_number;
>  	int ret = -EIO;
> -	/* NUL-terminate */
> -	char *str __free(kfree) = kmemdup_nul(name, *name_len, GFP_KERNEL);
> +	/* Snapshot name must start with an underscore */
> +	if (*name_len <= 0 || name[0] != '_')
> +		return ERR_PTR(-EIO);
> +	/* Skip initial '_' and NUL-terminate */
> +	char *str __free(kfree) = kmemdup_nul(name + 1, *name_len - 1, GFP_KERNEL);
>  	if (!str)
>  		return ERR_PTR(-ENOMEM);
> -	/* Skip initial '_' */
> -	str++;
>  	name_end = strrchr(str, '_');
>  	if (!name_end) {
>  		doutc(cl, "failed to parse long snapshot name: %s\n", str);

Looks good.

Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>

Thanks,
Slava.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3] fs/ceph: Fix kernel oops due invalid pointer for kfree() in parse_longname()
  2026-02-03 20:16   ` Viacheslav Dubeyko
@ 2026-02-03 20:22     ` Ilya Dryomov
  0 siblings, 0 replies; 14+ messages in thread
From: Ilya Dryomov @ 2026-02-03 20:22 UTC (permalink / raw)
  To: Viacheslav Dubeyko
  Cc: daniel@chaospixel.com, ceph-devel@vger.kernel.org, Xiubo Li

On Tue, Feb 3, 2026 at 9:16 PM Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> wrote:
>
> On Tue, 2026-02-03 at 20:40 +0100, Daniel Vogelbacher wrote:
> > This fixes a kernel oops when reading ceph snapshot directories (.snap),
> > for example by simply run `ls /mnt/my_ceph/.snap`.
> >
> > The bug was introduced in commit:
> >
> > bb80f7618832 - parse_longname(): strrchr() expects NUL-terminated string
> >
> > The variable str is guarded by __free(kfree), but advanced by one for
> > skipping the initial '_' in snapshot names. Thus, kfree() is called
> > with an invalid pointer.
> > This patch removes the need for advancing the pointer so kfree()
> > is called with correct memory pointer.
> >
> > Steps to reproduce:
> >
> > 1. Create snapshots on a cephfs volume (I've 63 snaps in my testcase)
> >
> > 2. Add cephfs mount to fstab
> > $ echo "samba-fileserver@.files=/volumes/datapool/stuff/3461082b-ecc9-4e82-8549-3fd2590d3fb6      /mnt/test/stuff   ceph     acl,noatime,_netdev    0       0" >> /etc/fstab
> >
> > 3. Reboot the system
> > $ systemctl reboot
> >
> > 4. Check if it's really mounted
> > $ mount | grep stuff
> >
> > 5. List snapshots (expected 63 snapshots on my system)
> > $ ls /mnt/test/stuff/.snap
> >
> > Now ls hangs forever and the kernel log shows the oops.
> >
> > The full trace is:
> >
> > [   53.703013] Oops: general protection fault, probably for non-canonical address 0xd0c22857c0000000: 0000 [#1] SMP PTI
> > [   53.703201] CPU: 11 UID: 0 PID: 360 Comm: kworker/11:2 Not tainted 6.18.0-rc7 #41 PREEMPT(voluntary)
> > [   53.703281] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > [   53.703317] Workqueue: ceph-msgr ceph_con_workfn [libceph]
> > [   53.703424] RIP: 0010:rb_insert_color (/usr/src/linux/lib/rbtree.c:185 (discriminator 1) /usr/src/linux/lib/rbtree.c:436 (discriminator 1))
> > [   53.704503] Code: 76 17 48 83 e1 fc 48 3b 51 10 0f 84 b7 00 00 00 48 89 41 08 c3 cc cc cc cc 48 89 06 c3 cc cc cc cc 48 8b 4a 10 48 85 c9 74 05 <f6> 01 01 74 1b 48 8b 48 10 48 39 f9 74 68 48 89 c7 48 89 4a 08 48
> > All code
> > ========
> >    0: 76 17                   jbe    0x19
> >    2: 48 83 e1 fc             and    $0xfffffffffffffffc,%rcx
> >    6: 48 3b 51 10             cmp    0x10(%rcx),%rdx
> >    a: 0f 84 b7 00 00 00       je     0xc7
> >   10: 48 89 41 08             mov    %rax,0x8(%rcx)
> >   14: c3                      ret
> >   15: cc                      int3
> >   16: cc                      int3
> >   17: cc                      int3
> >   18: cc                      int3
> >   19: 48 89 06                mov    %rax,(%rsi)
> >   1c: c3                      ret
> >   1d: cc                      int3
> >   1e: cc                      int3
> >   1f: cc                      int3
> >   20: cc                      int3
> >   21: 48 8b 4a 10             mov    0x10(%rdx),%rcx
> >   25: 48 85 c9                test   %rcx,%rcx
> >   28: 74 05                   je     0x2f
> >   2a:*        f6 01 01                testb  $0x1,(%rcx)              <-- trapping instruction
> >   2d: 74 1b                   je     0x4a
> >   2f: 48 8b 48 10             mov    0x10(%rax),%rcx
> >   33: 48 39 f9                cmp    %rdi,%rcx
> >   36: 74 68                   je     0xa0
> >   38: 48 89 c7                mov    %rax,%rdi
> >   3b: 48 89 4a 08             mov    %rcx,0x8(%rdx)
> >   3f: 48                      rex.W
> >
> > Code starting with the faulting instruction
> > ===========================================
> >    0: f6 01 01                testb  $0x1,(%rcx)
> >    3: 74 1b                   je     0x20
> >    5: 48 8b 48 10             mov    0x10(%rax),%rcx
> >    9: 48 39 f9                cmp    %rdi,%rcx
> >    c: 74 68                   je     0x76
> >    e: 48 89 c7                mov    %rax,%rdi
> >   11: 48 89 4a 08             mov    %rcx,0x8(%rdx)
> >   15: 48                      rex.W
> > [   53.704559] RSP: 0018:ffff9ab7c07579e0 EFLAGS: 00010286
> > [   53.704591] RAX: ffff8bd0c2285b40 RBX: ffff8bd0c2285240 RCX: d0c22857c0000000
> > [   53.704616] RDX: ffff8bd0c2285910 RSI: ffff8bd0c3e695c0 RDI: ffff8bd0c22855c0
> > [   53.704645] RBP: 0000000000002139 R08: 0000000000000000 R09: 0000000000000000
> > [   53.704668] R10: 0000000000000000 R11: ffff8bd0c16244e0 R12: ffff8bd0c3e695b8
> > [   53.704691] R13: ffff8bd0c3b62000 R14: ffff8bd0c22857c0 R15: ffff8bd0c3e695c0
> > [   53.704714] FS:  0000000000000000(0000) GS:ffff8bd1815ca000(0000) knlGS:0000000000000000
> > [   53.704741] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   53.704762] CR2: 000055667ef28e10 CR3: 0000000106cc2005 CR4: 0000000000772ef0
> > [   53.704790] PKRU: 55555554
> > [   53.704803] Call Trace:
> > [   53.704844]  <TASK>
> > [   53.704862] ceph_get_snapid_map (/usr/src/linux/./include/linux/spinlock.h:391 /usr/src/linux/fs/ceph/snap.c:1255) ceph
> > [   53.704957] ceph_fill_inode (/usr/src/linux/fs/ceph/inode.c:1062 (discriminator 2)) ceph
> > [   53.705019]  ? __pfx_ceph_set_ino_cb (/usr/src/linux/fs/ceph/inode.c:46) ceph
> > [   53.705074]  ? __pfx_ceph_ino_compare (/usr/src/linux/fs/ceph/super.h:595) ceph
> > [   53.705132] ceph_readdir_prepopulate (/usr/src/linux/fs/ceph/inode.c:2113) ceph
> > [   53.705191] mds_dispatch (/usr/src/linux/fs/ceph/mds_client.c:3993 /usr/src/linux/fs/ceph/mds_client.c:6299) ceph
> > [   53.705253]  ? sock_recvmsg (/usr/src/linux/net/socket.c:1078 (discriminator 1) /usr/src/linux/net/socket.c:1100 (discriminator 1))
> > [   53.705279] ceph_con_process_message (/usr/src/linux/net/ceph/messenger.c:1427) libceph
> > [   53.705347] process_message (/usr/src/linux/net/ceph/messenger_v2.c:2879) libceph
> > [   53.705406] ceph_con_v2_try_read (/usr/src/linux/net/ceph/messenger_v2.c:3043 /usr/src/linux/net/ceph/messenger_v2.c:3099 /usr/src/linux/net/ceph/messenger_v2.c:3148) libceph
> > [   53.705467]  ? psi_group_change (/usr/src/linux/kernel/sched/psi.c:876)
> > [   53.705488]  ? sched_balance_newidle (/usr/src/linux/kernel/sched/fair.c:12902 (discriminator 2))
> > [   53.705512]  ? psi_task_switch (/usr/src/linux/kernel/sched/psi.c:984 (discriminator 2))
> > [   53.705532]  ? _raw_spin_unlock (/usr/src/linux/./arch/x86/include/asm/paravirt.h:562 /usr/src/linux/./arch/x86/include/asm/qspinlock.h:57 /usr/src/linux/./include/linux/spinlock.h:204 /usr/src/linux/./include/linux/spinlock_api_smp.h:142 /usr/src/linux/kernel/locking/spinlock.c:186)
> > [   53.705550]  ? finish_task_switch.isra.0 (/usr/src/linux/./arch/x86/include/asm/paravirt.h:671 /usr/src/linux/kernel/sched/sched.h:1559 /usr/src/linux/kernel/sched/core.c:5073 /usr/src/linux/kernel/sched/core.c:5191)
> > [   53.705575] ceph_con_workfn (/usr/src/linux/net/ceph/messenger.c:1578) libceph
> > [   53.705627]  process_one_work (/usr/src/linux/./arch/x86/include/asm/jump_label.h:36 /usr/src/linux/./include/trace/events/workqueue.h:110 /usr/src/linux/kernel/workqueue.c:3268)
> > [   53.705657]  worker_thread (/usr/src/linux/kernel/workqueue.c:3340 (discriminator 2) /usr/src/linux/kernel/workqueue.c:3427 (discriminator 2))
> > [   53.705679]  ? __pfx_worker_thread (/usr/src/linux/kernel/workqueue.c:3373)
> > [   53.705700]  kthread (/usr/src/linux/kernel/kthread.c:463)
> > [   53.705717]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
> > [   53.705734]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
> > [   53.705752]  ret_from_fork (/usr/src/linux/arch/x86/kernel/process.c:164)
> > [   53.705776]  ? __pfx_kthread (/usr/src/linux/kernel/kthread.c:412)
> > [   53.705793]  ret_from_fork_asm (/usr/src/linux/arch/x86/entry/entry_64.S:255)
> > [   53.705826]  </TASK>
> > [   53.705842] Modules linked in: ceph netfs libceph cfg80211 rfkill 8021q garp stp mrp llc binfmt_misc intel_rapl_msr intel_rapl_common intel_uncore_frequency_common kvm_intel virtio_gpu joydev kvm drm_client_lib virtio_dma_buf evdev drm_shmem_helper sg drm_kms_helper virtio_balloon button irqbypass ghash_clmulni_intel aesni_intel rapl pcspkr drm configfs efi_pstore nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vmw_vmci vsock qemu_fw_cfg virtio_rng autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid sr_mod cdrom dm_mod ahci libahci libata xhci_pci iTCO_wdt intel_pmc_bxt xhci_hcd iTCO_vendor_support scsi_mod psmouse virtio_net i2c_i801 watchdog serio_raw i2c_smbus lpc_ich scsi_common usbcore net_failover failover virtio_blk usb_common
> > [   53.708740] ---[ end trace 0000000000000000 ]---
> > [   53.709462] RIP: 0010:rb_insert_color (/usr/src/linux/lib/rbtree.c:185 (discriminator 1) /usr/src/linux/lib/rbtree.c:436 (discriminator 1))
> > [   53.710118] Code: 76 17 48 83 e1 fc 48 3b 51 10 0f 84 b7 00 00 00 48 89 41 08 c3 cc cc cc cc 48 89 06 c3 cc cc cc cc 48 8b 4a 10 48 85 c9 74 05 <f6> 01 01 74 1b 48 8b 48 10 48 39 f9 74 68 48 89 c7 48 89 4a 08 48
> > All code
> > ========
> >    0: 76 17                   jbe    0x19
> >    2: 48 83 e1 fc             and    $0xfffffffffffffffc,%rcx
> >    6: 48 3b 51 10             cmp    0x10(%rcx),%rdx
> >    a: 0f 84 b7 00 00 00       je     0xc7
> >   10: 48 89 41 08             mov    %rax,0x8(%rcx)
> >   14: c3                      ret
> >   15: cc                      int3
> >   16: cc                      int3
> >   17: cc                      int3
> >   18: cc                      int3
> >   19: 48 89 06                mov    %rax,(%rsi)
> >   1c: c3                      ret
> >   1d: cc                      int3
> >   1e: cc                      int3
> >   1f: cc                      int3
> >   20: cc                      int3
> >   21: 48 8b 4a 10             mov    0x10(%rdx),%rcx
> >   25: 48 85 c9                test   %rcx,%rcx
> >   28: 74 05                   je     0x2f
> >   2a:*        f6 01 01                testb  $0x1,(%rcx)              <-- trapping instruction
> >   2d: 74 1b                   je     0x4a
> >   2f: 48 8b 48 10             mov    0x10(%rax),%rcx
> >   33: 48 39 f9                cmp    %rdi,%rcx
> >   36: 74 68                   je     0xa0
> >   38: 48 89 c7                mov    %rax,%rdi
> >   3b: 48 89 4a 08             mov    %rcx,0x8(%rdx)
> >   3f: 48                      rex.W
> >
> > Code starting with the faulting instruction
> > ===========================================
> >    0: f6 01 01                testb  $0x1,(%rcx)
> >    3: 74 1b                   je     0x20
> >    5: 48 8b 48 10             mov    0x10(%rax),%rcx
> >    9: 48 39 f9                cmp    %rdi,%rcx
> >    c: 74 68                   je     0x76
> >    e: 48 89 c7                mov    %rax,%rdi
> >   11: 48 89 4a 08             mov    %rcx,0x8(%rdx)
> >   15: 48                      rex.W
> > [   53.711453] RSP: 0018:ffff9ab7c07579e0 EFLAGS: 00010286
> > [   53.712112] RAX: ffff8bd0c2285b40 RBX: ffff8bd0c2285240 RCX: d0c22857c0000000
> > [   53.712798] RDX: ffff8bd0c2285910 RSI: ffff8bd0c3e695c0 RDI: ffff8bd0c22855c0
> > [   53.713423] RBP: 0000000000002139 R08: 0000000000000000 R09: 0000000000000000
> > [   53.714061] R10: 0000000000000000 R11: ffff8bd0c16244e0 R12: ffff8bd0c3e695b8
> > [   53.714696] R13: ffff8bd0c3b62000 R14: ffff8bd0c22857c0 R15: ffff8bd0c3e695c0
> > [   53.715321] FS:  0000000000000000(0000) GS:ffff8bd1815ca000(0000) knlGS:0000000000000000
> > [   53.715956] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   53.716651] CR2: 000055667ef28e10 CR3: 0000000106cc2005 CR4: 0000000000772ef0
> > [   53.717295] PKRU: 55555554
> > [   53.717918] note: kworker/11:2[360] exited with preempt_count 1
> >
> >
> > Closes: https://urldefense.proofpoint.com/v2/url?u=https-3A__bugzilla.kernel.org_show-5Fbug.cgi-3Fid-3D220807&d=DwIDAg&c=BSDicqBQBDjDI9RkVyTcHQ&r=q5bIm4AXMzc8NJu1_RGmnQ2fMWKq4Y4RAkElvUgSs00&m=9uYAi-tacLykatKmRl3LQ-OSQx74kkNg-1jsH1vnLekoVAt2IW2alYrd5HYGhsZK&s=mH4zSOBAE-0mk_Os9bf16JcPkk0k2BmY82O8_DfbZw0&e=
> > Fixes: bb80f7618832 - parse_longname(): strrchr() expects NUL-terminated string
> >
> > Cc: stable@vger.kernel.org
> > Suggested-by: Helge Deller <deller@gmx.de>
> > Signed-off-by: Daniel Vogelbacher <daniel@chaospixel.com>
> > ---
> >  fs/ceph/crypto.c | 9 +++++----
> >  1 file changed, 5 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/ceph/crypto.c b/fs/ceph/crypto.c
> > index 0ea4db650f85..9a115282f67d 100644
> > --- a/fs/ceph/crypto.c
> > +++ b/fs/ceph/crypto.c
> > @@ -166,12 +166,13 @@ static struct inode *parse_longname(const struct inode *parent,
> >       struct ceph_vino vino = { .snap = CEPH_NOSNAP };
> >       char *name_end, *inode_number;
> >       int ret = -EIO;
> > -     /* NUL-terminate */
> > -     char *str __free(kfree) = kmemdup_nul(name, *name_len, GFP_KERNEL);
> > +     /* Snapshot name must start with an underscore */
> > +     if (*name_len <= 0 || name[0] != '_')
> > +             return ERR_PTR(-EIO);
> > +     /* Skip initial '_' and NUL-terminate */
> > +     char *str __free(kfree) = kmemdup_nul(name + 1, *name_len - 1, GFP_KERNEL);
> >       if (!str)
> >               return ERR_PTR(-ENOMEM);
> > -     /* Skip initial '_' */
> > -     str++;
> >       name_end = strrchr(str, '_');
> >       if (!name_end) {
> >               doutc(cl, "failed to parse long snapshot name: %s\n", str);
>
> Looks good.
>
> Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>

Applied.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-02-03 20:22 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-20 14:01 [PATCH] fs/ceph: Fix kernel oops due invalid pointer for kfree() in parse_longname() Daniel Vogelbacher
2025-12-22 20:08 ` Viacheslav Dubeyko
2025-12-22 21:26   ` Daniel Vogelbacher
2025-12-23 22:49     ` Viacheslav Dubeyko
2026-01-20 13:42       ` Daniel Vogelbacher
2026-01-21 20:44         ` Viacheslav Dubeyko
2026-01-21 21:38           ` Daniel Vogelbacher
2026-02-01  8:34 ` [PATCH v2] " Daniel Vogelbacher
2026-02-02 19:13   ` Viacheslav Dubeyko
2026-02-03 19:23     ` Viacheslav Dubeyko
2026-02-03 19:41       ` Daniel Vogelbacher
2026-02-03 19:40 ` [PATCH v3] " Daniel Vogelbacher
2026-02-03 20:16   ` Viacheslav Dubeyko
2026-02-03 20:22     ` Ilya Dryomov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox