cifs_ses_add_channel() can race with itself

public inbox for linux-cifs@vger.kernel.org
 help / color / mirror / Atom feed

* cifs_ses_add_channel() can race with itself
@ 2026-03-25 13:20 David Howells
  2026-03-25 20:44 ` Henrique Carvalho
  0 siblings, 1 reply; 2+ messages in thread
From: David Howells @ 2026-03-25 13:20 UTC (permalink / raw)
  To: Steve French; +Cc: dhowells, Paulo Alcantara, linux-cifs

Hi Steve,

Whilst running xfstests against cifs, I managed to encounter what I think must
be due to a race between cifs_ses_add_channel() and itself - presumably by
concurrent mount type things that share a session.

SO what I saw was this:

    BUG: kernel NULL pointer dereference, address: 0000000000000358
    ...
    RIP: 0010:cifs_alloc_hash+0x5/0xd0
    ...
    RSP: 0018:ffff88814eb4bdb0 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: ffff888148804800 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000358 RDI: ffffffff82c66915
    RBP: ffff88810ad52000 R08: 0000000000000000 R09: ffff88840fa1bea0
    R10: 0000000000000006 R11: 00000000000002eb R12: ffff8881488048a0
    R13: 000000000015681b R14: ffff888148786c00 R15: ffff888148804860
    FS:  0000000000000000(0000) GS:ffff88848be03000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000358 CR3: 0000000002e3a002 CR4: 00000000001706f0
    Call Trace:
     <TASK>
     cifs_ses_add_channel+0x39b/0x530
     cifs_try_adding_channels+0x201/0x2e0
     mchan_mount_work_fn+0x1d/0x30
     process_one_work+0x189/0x2b0
     process_scheduled_works+0x3a/0x50
     worker_thread+0x13b/0x1d0

Now, the RIP location corresponds to the deref of *sdesc in cifs_alloc_hash(),
so sdesc (which is 0x358 in RSI) must be based on a NULL pointer.  Looking at
cifs_ses_add_channel(), this corresponds to an inlined call to
smb3_crypto_shash_allocate().  0x358 matches server->secmech.aes_cmac if
server is NULL:

	(gdb) p &((struct TCP_Server_Info *)0)->secmech.aes_cmac
	$2 = (struct shash_desc **) 0x358

Looking further at cifs_ses_add_channel(), that means chan->server must be
NULL - but that would seem unlikely, given:

	chan_server = cifs_get_tcp_session(ctx, ses->server);

a few lines above:

	rc = smb3_crypto_shash_allocate(chan->server);

chan_server got checked for error, but not NULLness, though it doesn't look
like it could be NULL.

However, I note that there's no locking that spans the two lines.  The first
happens under ses->chan_lock and the second under ses->session_mutex, but
there's nothing to stop another cifs_ses_add_channel() jumping in in the
meantime and zapping chan->server as there's a gap, be it ever so small, where
no lock is held.

Looking further up the stack, this is launched from mount into a workqueue, so
it's not necessarily inside of any of the mount locking either.  It would seem
that cifs_try_adding_channels() probably should use some locking, but does not
seem to use anything authoritative.

Now this is a lot of surmisation.  It's the only time I've seen this in a lot
of running xfstests and things on cifs, so it's not especially reproducible,
unfortunately.

Let me know if you have a fix for this!

Thanks,
David

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: cifs_ses_add_channel() can race with itself
  2026-03-25 13:20 cifs_ses_add_channel() can race with itself David Howells
@ 2026-03-25 20:44 ` Henrique Carvalho
  0 siblings, 0 replies; 2+ messages in thread
From: Henrique Carvalho @ 2026-03-25 20:44 UTC (permalink / raw)
  To: David Howells; +Cc: Steve French, Paulo Alcantara, linux-cifs

Hi David,

On Wed, Mar 25, 2026 at 01:20:14PM +0000, David Howells wrote:
> Hi Steve,
> 
> Whilst running xfstests against cifs, I managed to encounter what I think must
> be due to a race between cifs_ses_add_channel() and itself - presumably by
> concurrent mount type things that share a session.
> 
> SO what I saw was this:
> 
>     BUG: kernel NULL pointer dereference, address: 0000000000000358
>     ...
>     RIP: 0010:cifs_alloc_hash+0x5/0xd0
>     ...
>     RSP: 0018:ffff88814eb4bdb0 EFLAGS: 00010246
>     RAX: 0000000000000000 RBX: ffff888148804800 RCX: 0000000000000000
>     RDX: 0000000000000000 RSI: 0000000000000358 RDI: ffffffff82c66915
>     RBP: ffff88810ad52000 R08: 0000000000000000 R09: ffff88840fa1bea0
>     R10: 0000000000000006 R11: 00000000000002eb R12: ffff8881488048a0
>     R13: 000000000015681b R14: ffff888148786c00 R15: ffff888148804860
>     FS:  0000000000000000(0000) GS:ffff88848be03000(0000) knlGS:0000000000000000
>     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>     CR2: 0000000000000358 CR3: 0000000002e3a002 CR4: 00000000001706f0
>     Call Trace:
>      <TASK>
>      cifs_ses_add_channel+0x39b/0x530
>      cifs_try_adding_channels+0x201/0x2e0
>      mchan_mount_work_fn+0x1d/0x30
>      process_one_work+0x189/0x2b0
>      process_scheduled_works+0x3a/0x50
>      worker_thread+0x13b/0x1d0
> 
> Now, the RIP location corresponds to the deref of *sdesc in cifs_alloc_hash(),
> so sdesc (which is 0x358 in RSI) must be based on a NULL pointer.  Looking at
> cifs_ses_add_channel(), this corresponds to an inlined call to
> smb3_crypto_shash_allocate().  0x358 matches server->secmech.aes_cmac if
> server is NULL:
> 
> 	(gdb) p &((struct TCP_Server_Info *)0)->secmech.aes_cmac
> 	$2 = (struct shash_desc **) 0x358
> 
> Looking further at cifs_ses_add_channel(), that means chan->server must be
> NULL - but that would seem unlikely, given:
> 
> 	chan_server = cifs_get_tcp_session(ctx, ses->server);
> 
> a few lines above:
> 
> 	rc = smb3_crypto_shash_allocate(chan->server);
> 
> chan_server got checked for error, but not NULLness, though it doesn't look
> like it could be NULL.
> 
> However, I note that there's no locking that spans the two lines.  The first
> happens under ses->chan_lock and the second under ses->session_mutex, but
> there's nothing to stop another cifs_ses_add_channel() jumping in in the
> meantime and zapping chan->server as there's a gap, be it ever so small, where
> no lock is held.
> 

Do you happen to be doing remount?

A possibility here is chan->server gets NULL'd after leaving the
spin_lock and before it reaches smb3_crypto_shash_allocate().

This could happen (a) in the put session path, which I'm assuming cannot
happen because we hold a ref to the session, or (b) in the channel
*removal* path rather than inside another cifs_ses_add_channel().

In particular, inside cifs_chan_skip_or_disable(), which gets called if
ses->chan_count > ses->chan_max.

This condition can happen if ses->chan_max gets overwritten by a lower
max. One place that does lower ses->chan_max is smb3_reconfigure(),
which updates ses->chan_max and then calls channel scaling/shrinking.

-- 
Henrique
SUSE Labs


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-03-25 20:44 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-25 13:20 cifs_ses_add_channel() can race with itself David Howells
2026-03-25 20:44 ` Henrique Carvalho

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox