Linux CIFS filesystem development
 help / color / mirror / Atom feed
* Deadlock in Ubuntu 5.4 kernel
@ 2022-12-29  3:26 Shyam Prasad N
  2022-12-29 12:16 ` Paulo Alcantara
  0 siblings, 1 reply; 3+ messages in thread
From: Shyam Prasad N @ 2022-12-29  3:26 UTC (permalink / raw)
  To: Paulo Alcantara, CIFS, Enzo Matsumiya

Hi Paulo/Enzo,

A customer reported this deadlock in a Kubernetes setup running on Ubuntu-18.04.
This must be a 5.4 kernel, running this code:
https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-azure/+git/bionic

Based on the stack, it appears to be a hang in DFS reconnect codepath,
trying to access the DFS cache lock in dfs_cache_update_vol.

Can you tell if this is a known issue that has been fixed since?
And if Ubuntu should backport any fix to 5.4?
I could not find the function in the mainline codebase.

dmesg:
Dec 10 09:07:48 aks-corew26-13626357-vmss00000E kernel:
[5653610.066765] INFO: task cifsd:981715 blocked for more than 604
seconds.
Dec 10 09:07:48 aks-corew26-13626357-vmss00000E kernel:
[5653610.073365]       Not tainted 5.4.0-1091-azure #96~18.04.1-Ubuntu
Dec 10 09:07:48 aks-corew26-13626357-vmss00000E kernel:
[5653610.080279] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
Dec 10 09:07:48 aks-corew26-13626357-vmss00000E
node-problem-detector-startup.sh[562959]: I1210 09:07:48.142691
562992 log_monitor.go:160] New status generated:
&{Source:kernel-monitor Events:[{Severity:warn Timestamp:2022-12-10
09:07:47.636272174 +0000 UTC m=+63284.134685121 Reason:TaskHung
Message:INFO: task cifsd:981715 blocked for more than 604 seconds.}]
Conditions:[{Type:KernelDeadlock Status:False Transition:2022-12-09
15:33:03.569676476 +0000 UTC m=+0.068089323 Reason:KernelHasNoDeadlock
Message:kernel has no deadlock} {Type:ReadonlyFilesystem Status:False
Transition:2022-12-09 15:33:03.569676576 +0000 UTC m=+0.068089423
Reason:FilesystemIsNotReadOnly Message:Filesystem is not read-only}]}
Dec 10 09:07:48 aks-corew26-13626357-vmss00000E kernel:
[5653610.086811] cifsd           D    0 981715      2 0x80004002
Dec 10 09:07:48 aks-corew26-13626357-vmss00000E kernel:
[5653610.086814] Call Trace:
Dec 10 09:07:48 aks-corew26-13626357-vmss00000E kernel:
[5653610.086826]  __schedule+0x277/0x710
Dec 10 09:07:48 aks-corew26-13626357-vmss00000E kernel:
[5653610.086829]  ? __next_timer_interrupt+0xe0/0xe0
Dec 10 09:07:48 aks-corew26-13626357-vmss00000E kernel:
[5653610.086836]  schedule+0x33/0xa0
Dec 10 09:07:48 aks-corew26-13626357-vmss00000E kernel:
[5653610.086838]  schedule_preempt_disabled+0xe/0x10
Dec 10 09:07:48 aks-corew26-13626357-vmss00000E kernel:
[5653610.086840]  __mutex_lock.isra.10+0x24c/0x4a0
Dec 10 09:07:48 aks-corew26-13626357-vmss00000E kernel:
[5653610.086870]  ? do_dfs_cache_find+0x1be/0xea0 [cifs]
Dec 10 09:07:48 aks-corew26-13626357-vmss00000E kernel:
[5653610.086873]  __mutex_lock_slowpath+0x13/0x20
Dec 10 09:07:48 aks-corew26-13626357-vmss00000E kernel:
[5653610.086874]  ? __mutex_lock_slowpath+0x13/0x20
Dec 10 09:07:48 aks-corew26-13626357-vmss00000E kernel:
[5653610.086875]  mutex_lock+0x2f/0x40
Dec 10 09:07:48 aks-corew26-13626357-vmss00000E kernel:
[5653610.086891]  dfs_cache_update_vol+0x4a/0x290 [cifs]
Dec 10 09:07:48 aks-corew26-13626357-vmss00000E kernel:
[5653610.086904]  cifs_reconnect+0x597/0xd50 [cifs]
Dec 10 09:07:48 aks-corew26-13626357-vmss00000E kernel:
[5653610.086916]  cifs_handle_standard+0x198/0x1c0 [cifs]
Dec 10 09:07:48 aks-corew26-13626357-vmss00000E kernel:
[5653610.086928]  cifs_demultiplex_thread+0x9ed/0xc70 [cifs]
Dec 10 09:07:48 aks-corew26-13626357-vmss00000E kernel:
[5653610.086931]  kthread+0x121/0x140
Dec 10 09:07:48 aks-corew26-13626357-vmss00000E kernel:
[5653610.086942]  ? cifs_handle_standard+0x1c0/0x1c0 [cifs]
Dec 10 09:07:48 aks-corew26-13626357-vmss00000E kernel:
[5653610.086944]  ? kthread_park+0x90/0x90
Dec 10 09:07:48 aks-corew26-13626357-vmss00000E kernel:
[5653610.086946]  ret_from_fork+0x35/0x40
Dec 10 09:07:48 aks-corew26-13626357-vmss00000E kernel:
[5653610.087014] INFO: task kworker/0:2:2562230 blocked for more than
604 seconds.
Dec 10 09:07:48 aks-corew26-13626357-vmss00000E kernel:
[5653610.092927]       Not tainted 5.4.0-1091-azure #96~18.04.1-Ubuntu


-- 
Regards,
Shyam

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Deadlock in Ubuntu 5.4 kernel
  2022-12-29  3:26 Deadlock in Ubuntu 5.4 kernel Shyam Prasad N
@ 2022-12-29 12:16 ` Paulo Alcantara
  2022-12-30  3:33   ` Shyam Prasad N
  0 siblings, 1 reply; 3+ messages in thread
From: Paulo Alcantara @ 2022-12-29 12:16 UTC (permalink / raw)
  To: Shyam Prasad N, CIFS, Enzo Matsumiya

Shyam Prasad N <nspmangalore@gmail.com> writes:

> A customer reported this deadlock in a Kubernetes setup running on Ubuntu-18.04.
> This must be a 5.4 kernel, running this code:
> https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-azure/+git/bionic
>
> Based on the stack, it appears to be a hang in DFS reconnect codepath,
> trying to access the DFS cache lock in dfs_cache_update_vol.
>
> Can you tell if this is a known issue that has been fixed since?

Looks like this has been fixed by

        06d57378bcc9 ("cifs: Fix potential deadlock when updating vol in cifs_reconnect()")

> And if Ubuntu should backport any fix to 5.4?

I would say so.  It would probably also require others dfs related
patches to be backported in addition to the above.

> I could not find the function in the mainline codebase.

Yes, it has changed alot.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Deadlock in Ubuntu 5.4 kernel
  2022-12-29 12:16 ` Paulo Alcantara
@ 2022-12-30  3:33   ` Shyam Prasad N
  0 siblings, 0 replies; 3+ messages in thread
From: Shyam Prasad N @ 2022-12-30  3:33 UTC (permalink / raw)
  To: Paulo Alcantara; +Cc: CIFS, Enzo Matsumiya

On Thu, Dec 29, 2022 at 5:47 PM Paulo Alcantara <pc@cjr.nz> wrote:
>
> Shyam Prasad N <nspmangalore@gmail.com> writes:
>
> > A customer reported this deadlock in a Kubernetes setup running on Ubuntu-18.04.
> > This must be a 5.4 kernel, running this code:
> > https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-azure/+git/bionic
> >
> > Based on the stack, it appears to be a hang in DFS reconnect codepath,
> > trying to access the DFS cache lock in dfs_cache_update_vol.
> >
> > Can you tell if this is a known issue that has been fixed since?
>
> Looks like this has been fixed by
>
>         06d57378bcc9 ("cifs: Fix potential deadlock when updating vol in cifs_reconnect()")
>

Thanks for this.

> > And if Ubuntu should backport any fix to 5.4?
>
> I would say so.  It would probably also require others dfs related
> patches to be backported in addition to the above.

If you could point me to a list of other patches that could be
backported to a 5.4 kernel, that would be great.

>
> > I could not find the function in the mainline codebase.
>
> Yes, it has changed alot.

-- 
Regards,
Shyam

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-12-30  3:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-29  3:26 Deadlock in Ubuntu 5.4 kernel Shyam Prasad N
2022-12-29 12:16 ` Paulo Alcantara
2022-12-30  3:33   ` Shyam Prasad N

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox