* cifs_ses_add_channel() can race with itself
@ 2026-03-25 13:20 David Howells
2026-03-25 20:44 ` Henrique Carvalho
0 siblings, 1 reply; 2+ messages in thread
From: David Howells @ 2026-03-25 13:20 UTC (permalink / raw)
To: Steve French; +Cc: dhowells, Paulo Alcantara, linux-cifs
Hi Steve,
Whilst running xfstests against cifs, I managed to encounter what I think must
be due to a race between cifs_ses_add_channel() and itself - presumably by
concurrent mount type things that share a session.
SO what I saw was this:
BUG: kernel NULL pointer dereference, address: 0000000000000358
...
RIP: 0010:cifs_alloc_hash+0x5/0xd0
...
RSP: 0018:ffff88814eb4bdb0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff888148804800 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000358 RDI: ffffffff82c66915
RBP: ffff88810ad52000 R08: 0000000000000000 R09: ffff88840fa1bea0
R10: 0000000000000006 R11: 00000000000002eb R12: ffff8881488048a0
R13: 000000000015681b R14: ffff888148786c00 R15: ffff888148804860
FS: 0000000000000000(0000) GS:ffff88848be03000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000358 CR3: 0000000002e3a002 CR4: 00000000001706f0
Call Trace:
<TASK>
cifs_ses_add_channel+0x39b/0x530
cifs_try_adding_channels+0x201/0x2e0
mchan_mount_work_fn+0x1d/0x30
process_one_work+0x189/0x2b0
process_scheduled_works+0x3a/0x50
worker_thread+0x13b/0x1d0
Now, the RIP location corresponds to the deref of *sdesc in cifs_alloc_hash(),
so sdesc (which is 0x358 in RSI) must be based on a NULL pointer. Looking at
cifs_ses_add_channel(), this corresponds to an inlined call to
smb3_crypto_shash_allocate(). 0x358 matches server->secmech.aes_cmac if
server is NULL:
(gdb) p &((struct TCP_Server_Info *)0)->secmech.aes_cmac
$2 = (struct shash_desc **) 0x358
Looking further at cifs_ses_add_channel(), that means chan->server must be
NULL - but that would seem unlikely, given:
chan_server = cifs_get_tcp_session(ctx, ses->server);
a few lines above:
rc = smb3_crypto_shash_allocate(chan->server);
chan_server got checked for error, but not NULLness, though it doesn't look
like it could be NULL.
However, I note that there's no locking that spans the two lines. The first
happens under ses->chan_lock and the second under ses->session_mutex, but
there's nothing to stop another cifs_ses_add_channel() jumping in in the
meantime and zapping chan->server as there's a gap, be it ever so small, where
no lock is held.
Looking further up the stack, this is launched from mount into a workqueue, so
it's not necessarily inside of any of the mount locking either. It would seem
that cifs_try_adding_channels() probably should use some locking, but does not
seem to use anything authoritative.
Now this is a lot of surmisation. It's the only time I've seen this in a lot
of running xfstests and things on cifs, so it's not especially reproducible,
unfortunately.
Let me know if you have a fix for this!
Thanks,
David
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: cifs_ses_add_channel() can race with itself
2026-03-25 13:20 cifs_ses_add_channel() can race with itself David Howells
@ 2026-03-25 20:44 ` Henrique Carvalho
0 siblings, 0 replies; 2+ messages in thread
From: Henrique Carvalho @ 2026-03-25 20:44 UTC (permalink / raw)
To: David Howells; +Cc: Steve French, Paulo Alcantara, linux-cifs
Hi David,
On Wed, Mar 25, 2026 at 01:20:14PM +0000, David Howells wrote:
> Hi Steve,
>
> Whilst running xfstests against cifs, I managed to encounter what I think must
> be due to a race between cifs_ses_add_channel() and itself - presumably by
> concurrent mount type things that share a session.
>
> SO what I saw was this:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000358
> ...
> RIP: 0010:cifs_alloc_hash+0x5/0xd0
> ...
> RSP: 0018:ffff88814eb4bdb0 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffff888148804800 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000358 RDI: ffffffff82c66915
> RBP: ffff88810ad52000 R08: 0000000000000000 R09: ffff88840fa1bea0
> R10: 0000000000000006 R11: 00000000000002eb R12: ffff8881488048a0
> R13: 000000000015681b R14: ffff888148786c00 R15: ffff888148804860
> FS: 0000000000000000(0000) GS:ffff88848be03000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000358 CR3: 0000000002e3a002 CR4: 00000000001706f0
> Call Trace:
> <TASK>
> cifs_ses_add_channel+0x39b/0x530
> cifs_try_adding_channels+0x201/0x2e0
> mchan_mount_work_fn+0x1d/0x30
> process_one_work+0x189/0x2b0
> process_scheduled_works+0x3a/0x50
> worker_thread+0x13b/0x1d0
>
> Now, the RIP location corresponds to the deref of *sdesc in cifs_alloc_hash(),
> so sdesc (which is 0x358 in RSI) must be based on a NULL pointer. Looking at
> cifs_ses_add_channel(), this corresponds to an inlined call to
> smb3_crypto_shash_allocate(). 0x358 matches server->secmech.aes_cmac if
> server is NULL:
>
> (gdb) p &((struct TCP_Server_Info *)0)->secmech.aes_cmac
> $2 = (struct shash_desc **) 0x358
>
> Looking further at cifs_ses_add_channel(), that means chan->server must be
> NULL - but that would seem unlikely, given:
>
> chan_server = cifs_get_tcp_session(ctx, ses->server);
>
> a few lines above:
>
> rc = smb3_crypto_shash_allocate(chan->server);
>
> chan_server got checked for error, but not NULLness, though it doesn't look
> like it could be NULL.
>
> However, I note that there's no locking that spans the two lines. The first
> happens under ses->chan_lock and the second under ses->session_mutex, but
> there's nothing to stop another cifs_ses_add_channel() jumping in in the
> meantime and zapping chan->server as there's a gap, be it ever so small, where
> no lock is held.
>
Do you happen to be doing remount?
A possibility here is chan->server gets NULL'd after leaving the
spin_lock and before it reaches smb3_crypto_shash_allocate().
This could happen (a) in the put session path, which I'm assuming cannot
happen because we hold a ref to the session, or (b) in the channel
*removal* path rather than inside another cifs_ses_add_channel().
In particular, inside cifs_chan_skip_or_disable(), which gets called if
ses->chan_count > ses->chan_max.
This condition can happen if ses->chan_max gets overwritten by a lower
max. One place that does lower ses->chan_max is smb3_reconfigure(),
which updates ses->chan_max and then calls channel scaling/shrinking.
--
Henrique
SUSE Labs
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-03-25 20:44 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-25 13:20 cifs_ses_add_channel() can race with itself David Howells
2026-03-25 20:44 ` Henrique Carvalho
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox