[BUG REPORT] cifs: Deadlock due to network reconnection during file writing

Linux CIFS filesystem development
 help / color / mirror / Atom feed

* [BUG REPORT] cifs: Deadlock due to network reconnection during file writing
@ 2025-01-26  7:36 Wang Zhaolong
  2025-02-07  1:08 ` Wang Zhaolong
  2025-02-07  1:30 ` Steve French
  0 siblings, 2 replies; 8+ messages in thread
From: Wang Zhaolong @ 2025-01-26  7:36 UTC (permalink / raw)
  To: stable, sfrench; +Cc: linux-cifs, yangerkun, yi zhang

In the code of the LTS branch that is being maintained (from linux-5.4 to
linux-6.6), a deadlock occurs in the network reconnection scenario When
multiple processes or threads write to the same file concurrently.

Take the code of linux-5.10 as an example. The simplified deadlock process
is as follows:

```
Process 1                           Process 2
lock_page() - [1]
   wait_on_page_writeback() - [2] Waiting for writeback, blocked by [4]

                                     lock_page()     - [3] Blocked by [1]
                                       end_page_writeback() - [4] Won't execute
```

Based on my research, I'm going to use two detailed scenarios to illustrate
the issue.

Scenarios 1:

```
P1 (dd)                 P2 (cifsd)          P3 (cifsiod)

cifs_writepages
   wdata_prepare_pages
     lock_page - [1]
      wait_on_page_writeback - [2] Waiting for writeback, blocked by [4]
       wait_on_page_bit
                       cifs_demultiplex_thread
                       cifs_read_from_socket
                       cifs_readv_from_socket
                       - If another process triggers reconnect at this point
                         cifs_reconnect
                         - mid->mid_state updated to MID_RETRY_NEEDED
                         smb2_writev_callback mid_entry->callback()
                          - mid_state leads to wdata->result = -EAGAIN
                          wdata->result = -EAGAIN
                          queue_work(cifsiod_wq, &wdata->work);
                                          cifs_writev_complete - worker function
                                            - wdata->result == -EAGAIN Condition satisfied
                                            cifs_writev_requeue
                                              lock_page - [3] Blocked by [1]
                                            end_page_writeback
                                            - [4] Won't execute
     unlock_page
```

Mainline refactoring commit d08089f649a0 ("cifs: Change the I/O paths to use
an iterator rather than a page list") unlock folio while waiting for the
writeback to complete. This patch is introduced in v6.3-rc1. Therefore, scenario 1
only affects LTS versions from linux-5.4 to linux-6.1.

Call stack trace:

```
     cat /proc/34/stack
     [<0>] __lock_page+0x147/0x3a0
     [<0>] cifs_writev_requeue.cold+0x185/0x28e
     [<0>] process_one_work+0x1df/0x3b0
     [<0>] worker_thread+0x4a/0x3c0
     [<0>] kthread+0x125/0x160
     [<0>] ret_from_fork+0x22/0x30

     # cat /proc/465/stack
     [<0>] wait_on_page_bit+0x106/0x2e0
     [<0>] wait_on_page_writeback+0x25/0xd0
     [<0>] cifs_writepages+0x5ee/0xf60
     [<0>] do_writepages+0x43/0xe0
     [<0>] __filemap_fdatawrite_range+0xcd/0x110
     [<0>] file_write_and_wait_range+0x40/0x90
     [<0>] cifs_strict_fsync+0x35/0x470
     [<0>] do_fsync+0x38/0x70
     [<0>] __x64_sys_fsync+0x10/0x20
     [<0>] do_syscall_64+0x33/0x40
     [<0>] entry_SYSCALL_64_after_hwframe+0x67/0xd1

     [  369.826215] INFO: task kworker/1:1:34 blocked for more than 122 seconds.
     [  369.828964]       Not tainted 5.10.0+ #164
     [  369.830623] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
     [  369.835104] task:kworker/1:1     state:D stack:13472 pid:   34 ppid:     2 flags:0x00004000
     [  369.838448] Workqueue: cifsiod cifs_writev_complete
     [  369.840242] Call Trace:
     [  369.841219]  __schedule+0x401/0x8e0
     [  369.842568]  schedule+0x49/0x130
     [  369.843785]  io_schedule+0x12/0x40
     [  369.845079]  __lock_page+0x147/0x3a0
     [  369.846444]  ? add_to_page_cache_lru+0x180/0x180
     [  369.847963]  cifs_writev_requeue.cold+0x185/0x28e
     [  369.849193]  process_one_work+0x1df/0x3b0
     [  369.850248]  worker_thread+0x4a/0x3c0
     [  369.851216]  ? process_one_work+0x3b0/0x3b0
     [  369.852308]  kthread+0x125/0x160
     [  369.853167]  ? kthread_park+0x90/0x90
     [  369.854142]  ret_from_fork+0x22/0x30
     [  369.855054] INFO: task kworker/u8:3:96 blocked for more than 122 seconds.
     [  369.856781]       Not tainted 5.10.0+ #164
     [  369.857851] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
     [  369.859419] task:kworker/u8:3    state:D stack:12744 pid:   96 ppid:     2 flags:0x00004000
     [  369.861041] Workqueue: writeback wb_workfn (flush-cifs-2)
     [  369.862095] Call Trace:
     [  369.862583]  __schedule+0x401/0x8e0
     [  369.863280]  schedule+0x49/0x130
     [  369.863912]  io_schedule+0x12/0x40
     [  369.864604]  __lock_page+0x147/0x3a0
     [  369.865322]  ? add_to_page_cache_lru+0x180/0x180
     [  369.866246]  cifs_writepages+0x620/0xf60
     [  369.867005]  do_writepages+0x43/0xe0
     [  369.867737]  ? __blk_mq_try_issue_directly+0x121/0x1c0
     [  369.868750]  __writeback_single_inode+0x3d/0x320
     [  369.869589]  writeback_sb_inodes+0x20d/0x480
     [  369.870367]  __writeback_inodes_wb+0x4c/0xe0
     [  369.871148]  wb_writeback+0x201/0x2f0
     [  369.871797]  wb_workfn+0x38a/0x4e0
     [  369.872427]  ? check_preempt_curr+0x47/0x70
     [  369.873191]  ? ttwu_do_wakeup.isra.0+0x17/0x170
     [  369.873999]  process_one_work+0x1df/0x3b0
     [  369.874741]  worker_thread+0x4a/0x3c0
     [  369.875421]  ? process_one_work+0x3b0/0x3b0
     [  369.876180]  kthread+0x125/0x160
     [  369.876761]  ? kthread_park+0x90/0x90
     [  369.877431]  ret_from_fork+0x22/0x30
     [  369.878106] INFO: task a.out:465 blocked for more than 122 seconds.
     [  369.879225]       Not tainted 5.10.0+ #164
     [  369.879945] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
     [  369.881316] task:a.out           state:D stack:12752 pid:  465 ppid:   386 flags:0x00000002
     [  369.882791] Call Trace:
     [  369.883263]  __schedule+0x401/0x8e0
     [  369.883884]  schedule+0x49/0x130
     [  369.884447]  io_schedule+0x12/0x40
     [  369.885054]  wait_on_page_bit+0x106/0x2e0
     [  369.885795]  ? add_to_page_cache_lru+0x180/0x180
     [  369.886631]  wait_on_page_writeback+0x25/0xd0
     [  369.887427]  cifs_writepages+0x5ee/0xf60
     [  369.888151]  do_writepages+0x43/0xe0
     [  369.888789]  ? __generic_file_write_iter+0xfd/0x1d0
     [  369.889663]  __filemap_fdatawrite_range+0xcd/0x110
     [  369.890523]  file_write_and_wait_range+0x40/0x90
     [  369.891360]  cifs_strict_fsync+0x35/0x470
     [  369.892094]  do_fsync+0x38/0x70
     [  369.892657]  __x64_sys_fsync+0x10/0x20
     [  369.893336]  do_syscall_64+0x33/0x40
     [  369.893978]  entry_SYSCALL_64_after_hwframe+0x67/0xd1
     [  369.894883] RIP: 0033:0x7f660e208950
     [  369.895538] RSP: 002b:00007fff52b27b78 EFLAGS: 00000202 ORIG_RAX: 000000000000004a
     [  369.896882] RAX: ffffffffffffffda RBX: 00007fff52b28cb8 RCX: 00007f660e208950
     [  369.898139] RDX: 0000000000001000 RSI: 00007fff52b27b80 RDI: 0000000000000003
     [  369.899395] RBP: 00007fff52b28ba0 R08: 0000000000000410 R09: 0000000000000001
     [  369.900661] R10: 00007f660e11c400 R11: 0000000000000202 R12: 0000000000000000
     [  369.901925] R13: 00007fff52b28cc8 R14: 00007f660e328000 R15: 000055b5aeb6fdd8
     [  369.903202] INFO: task sync:468 blocked for more than 122 seconds.
     [  369.904311]       Not tainted 5.10.0+ #164
     [  369.905034] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
     [  369.906457] task:sync            state:D stack:13632 pid:  468 ppid:   386 flags:0x00004002
     [  369.907930] Call Trace:
     [  369.908369]  __schedule+0x401/0x8e0
     [  369.908984]  schedule+0x49/0x130
     [  369.909582]  io_schedule+0x12/0x40
     [  369.910208]  wait_on_page_bit+0x106/0x2e0
     [  369.910918]  ? add_to_page_cache_lru+0x180/0x180
     [  369.911758]  wait_on_page_writeback+0x25/0xd0
     [  369.912560]  __filemap_fdatawait_range+0x83/0x110
     [  369.913408]  ? __add_pages+0x6f/0x1b0
     [  369.914089]  filemap_fdatawait_keep_errors+0x1a/0x50
     [  369.914957]  sync_inodes_sb+0x208/0x2a0
     [  369.915666]  ? __x64_sys_tee+0xd0/0xd0
     [  369.916344]  iterate_supers+0x90/0xe0
     [  369.916983]  ksys_sync+0x40/0xb0
     [  369.917590]  __do_sys_sync+0xa/0x20
     [  369.918240]  do_syscall_64+0x33/0x40
     [  369.918884]  entry_SYSCALL_64_after_hwframe+0x67/0xd1
     [  369.919800] RIP: 0033:0x7f746d820987
     [  369.920451] RSP: 002b:00007ffce853fd78 EFLAGS: 00000206 ORIG_RAX: 00000000000000a2
     [  369.921798] RAX: ffffffffffffffda RBX: 00007ffce853fed8 RCX: 00007f746d820987
     [  369.923063] RDX: 00007f746d8f4801 RSI: 00007ffce8541f71 RDI: 00007f746d8b05ad
     [  369.924339] RBP: 0000000000000001 R08: 000000000000ffff R09: 0000000000000000
     [  369.925605] R10: 00007f746d7308a0 R11: 0000000000000206 R12: 000055b8487470fb
     [  369.926866] R13: 0000000000000000 R14: 0000000000000000 R15: 000055b848749ce0
     [  369.928138] Kernel panic - not syncing: hung_task: blocked tasks
     [  369.929191] CPU: 3 PID: 35 Comm: khungtaskd Not tainted 5.10.0+ #164
     [  369.952450] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014
     [  369.956984] Call Trace:
     [  369.957973]  dump_stack+0x57/0x6e
     [  369.959273]  panic+0x115/0x2f1
     [  369.960476]  watchdog.cold+0xb5/0xb5
     [  369.961884]  ? hungtask_pm_notify+0x40/0x40
     [  369.963310]  kthread+0x125/0x160
     [  369.964354]  ? kthread_park+0x90/0x90
     [  369.965551]  ret_from_fork+0x22/0x30
     [  369.967673] Kernel Offset: 0xd600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
     [  369.971025] ---[ end Kernel panic - not syncing: hung_task: blocked tasks ]---
```

Scenarios 2:

   Scenario 2 occurs in strict cache mode

```
P1 (dd)                 P2 (cifsd)          P3 (cifsiod)

cifs_strict_writev
  cifs_zap_mapping - If something breaks the oplock
   cifs_revalidate_mapping
    cifs_invalidate_mapping
     invalidate_inode_pages2
      invalidate_inode_pages2_range
       lock_page - [1]
       wait_on_page_writeback - [2] Waiting for writeback, blocked by [4]
        wait_on_page_bit
                       cifs_demultiplex_thread
                       cifs_read_from_socket
                       cifs_readv_from_socket
                       - If another process triggers reconnect at this point
                         cifs_reconnect
                         - mid->mid_state updated to MID_RETRY_NEEDED
                         smb2_writev_callback mid_entry->callback()
                          - mid_state leads to wdata->result = -EAGAIN
                          wdata->result = -EAGAIN
                          queue_work(cifsiod_wq, &wdata->work);
                                          cifs_writev_complete - worker function
                                            - wdata->result == -EAGAIN Condition satisfied
                                            cifs_writev_requeue
                                              lock_page - [3] Blocked by [1]
                                            end_page_writeback
                                            - [4] Won't execute
       unlock_page
```

Mainline refactoring commit 3ee1a1fc3981 ("cifs: Cut over to using netfslib")
directly terminates the file write instead of resending data when smb2_writev_callback()
detects a write failure, thus avoiding this problem. This patch is introduced
in v6.10-rc1. Therefore, scenario 2 affects LTS versions from linux-5.4
to linux-6.6.

```
cat /proc/522/stack
[<0>] wait_on_page_bit+0x106/0x150
[<0>] invalidate_inode_pages2_range+0x2cc/0x580
[<0>] cifs_invalidate_mapping+0x2c/0x50 [cifs]
[<0>] cifs_revalidate_mapping+0x4c/0x90 [cifs]
[<0>] cifs_strict_writev+0x17a/0x250 [cifs]
[<0>] __vfs_write+0x14f/0x1b0
[<0>] vfs_write+0xb6/0x1a0
[<0>] ksys_write+0x57/0xd0
[<0>] do_syscall_64+0x63/0x250
[<0>] entry_SYSCALL_64_after_hwframe+0x5c/0xc1
[<0>] 0xffffffffffffffff

cat /proc/33/stack
[<0>] __lock_page+0x10c/0x160
[<0>] cifs_writev_requeue.cold+0x17e/0x239 [cifs]
[<0>] process_one_work+0x1a9/0x3f0
[<0>] worker_thread+0x50/0x3c0
[<0>] kthread+0x117/0x130
[<0>] ret_from_fork+0x35/0x40
[<0>] 0xffffffffffffffff
```


The root cause of the deadlock problem is that the page/folio is locked again in
cifs_writev_requeue(). In order to safely fix it on the LTS branches, I would
like to clarify the following questions:,

1. Whether resending is necessary. If retransmission is not required, simply
terminating the write would avoids this problem. Is this an acceptable solution?

2. Is it necessary to lock the page/folio in cifs_writev_requeue()? Based on
my code screening (possibly missing), there seems to be no process that modifies
a page when it is marked as PG_writeback.Therefore, the page does not need to be
locked during wait_on_page_writeback().

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG REPORT] cifs: Deadlock due to network reconnection during file writing
  2025-01-26  7:36 [BUG REPORT] cifs: Deadlock due to network reconnection during file writing Wang Zhaolong
@ 2025-02-07  1:08 ` Wang Zhaolong
  2025-02-07  1:30 ` Steve French
  1 sibling, 0 replies; 8+ messages in thread
From: Wang Zhaolong @ 2025-02-07  1:08 UTC (permalink / raw)
  To: stable@vger.kernel.org, sfrench@samba.org
  Cc: linux-cifs@vger.kernel.org, yangerkun, zhangyi (F)

Friendly ping.

Best Regards,
Wang Zhaolong

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG REPORT] cifs: Deadlock due to network reconnection during file writing
  2025-01-26  7:36 [BUG REPORT] cifs: Deadlock due to network reconnection during file writing Wang Zhaolong
  2025-02-07  1:08 ` Wang Zhaolong
@ 2025-02-07  1:30 ` Steve French
  2025-02-10 13:05   ` David Howells
  1 sibling, 1 reply; 8+ messages in thread
From: Steve French @ 2025-02-07  1:30 UTC (permalink / raw)
  To: Wang Zhaolong, David Howells
  Cc: stable, linux-cifs, yangerkun, yi zhang, Paulo Alcantara

Adding David Howells in case he has opinions.

On Sun, Jan 26, 2025 at 1:37 AM Wang Zhaolong <wangzhaolong1@huawei.com> wrote:
>
> In the code of the LTS branch that is being maintained (from linux-5.4 to
> linux-6.6), a deadlock occurs in the network reconnection scenario When
> multiple processes or threads write to the same file concurrently.
>
> Take the code of linux-5.10 as an example. The simplified deadlock process
> is as follows:
>
> ```
> Process 1                           Process 2
> lock_page() - [1]
>    wait_on_page_writeback() - [2] Waiting for writeback, blocked by [4]
>
>                                      lock_page()     - [3] Blocked by [1]
>                                        end_page_writeback() - [4] Won't execute
> ```
>
> Based on my research, I'm going to use two detailed scenarios to illustrate
> the issue.
>
> Scenarios 1:
>
> ```
> P1 (dd)                 P2 (cifsd)          P3 (cifsiod)
>
> cifs_writepages
>    wdata_prepare_pages
>      lock_page - [1]
>       wait_on_page_writeback - [2] Waiting for writeback, blocked by [4]
>        wait_on_page_bit
>                        cifs_demultiplex_thread
>                        cifs_read_from_socket
>                        cifs_readv_from_socket
>                        - If another process triggers reconnect at this point
>                          cifs_reconnect
>                          - mid->mid_state updated to MID_RETRY_NEEDED
>                          smb2_writev_callback mid_entry->callback()
>                           - mid_state leads to wdata->result = -EAGAIN
>                           wdata->result = -EAGAIN
>                           queue_work(cifsiod_wq, &wdata->work);
>                                           cifs_writev_complete - worker function
>                                             - wdata->result == -EAGAIN Condition satisfied
>                                             cifs_writev_requeue
>                                               lock_page - [3] Blocked by [1]
>                                             end_page_writeback
>                                             - [4] Won't execute
>      unlock_page
> ```
>
> Mainline refactoring commit d08089f649a0 ("cifs: Change the I/O paths to use
> an iterator rather than a page list") unlock folio while waiting for the
> writeback to complete. This patch is introduced in v6.3-rc1. Therefore, scenario 1
> only affects LTS versions from linux-5.4 to linux-6.1.
>
> Call stack trace:
>
> ```
>      cat /proc/34/stack
>      [<0>] __lock_page+0x147/0x3a0
>      [<0>] cifs_writev_requeue.cold+0x185/0x28e
>      [<0>] process_one_work+0x1df/0x3b0
>      [<0>] worker_thread+0x4a/0x3c0
>      [<0>] kthread+0x125/0x160
>      [<0>] ret_from_fork+0x22/0x30
>
>      # cat /proc/465/stack
>      [<0>] wait_on_page_bit+0x106/0x2e0
>      [<0>] wait_on_page_writeback+0x25/0xd0
>      [<0>] cifs_writepages+0x5ee/0xf60
>      [<0>] do_writepages+0x43/0xe0
>      [<0>] __filemap_fdatawrite_range+0xcd/0x110
>      [<0>] file_write_and_wait_range+0x40/0x90
>      [<0>] cifs_strict_fsync+0x35/0x470
>      [<0>] do_fsync+0x38/0x70
>      [<0>] __x64_sys_fsync+0x10/0x20
>      [<0>] do_syscall_64+0x33/0x40
>      [<0>] entry_SYSCALL_64_after_hwframe+0x67/0xd1
>
>      [  369.826215] INFO: task kworker/1:1:34 blocked for more than 122 seconds.
>      [  369.828964]       Not tainted 5.10.0+ #164
>      [  369.830623] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>      [  369.835104] task:kworker/1:1     state:D stack:13472 pid:   34 ppid:     2 flags:0x00004000
>      [  369.838448] Workqueue: cifsiod cifs_writev_complete
>      [  369.840242] Call Trace:
>      [  369.841219]  __schedule+0x401/0x8e0
>      [  369.842568]  schedule+0x49/0x130
>      [  369.843785]  io_schedule+0x12/0x40
>      [  369.845079]  __lock_page+0x147/0x3a0
>      [  369.846444]  ? add_to_page_cache_lru+0x180/0x180
>      [  369.847963]  cifs_writev_requeue.cold+0x185/0x28e
>      [  369.849193]  process_one_work+0x1df/0x3b0
>      [  369.850248]  worker_thread+0x4a/0x3c0
>      [  369.851216]  ? process_one_work+0x3b0/0x3b0
>      [  369.852308]  kthread+0x125/0x160
>      [  369.853167]  ? kthread_park+0x90/0x90
>      [  369.854142]  ret_from_fork+0x22/0x30
>      [  369.855054] INFO: task kworker/u8:3:96 blocked for more than 122 seconds.
>      [  369.856781]       Not tainted 5.10.0+ #164
>      [  369.857851] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>      [  369.859419] task:kworker/u8:3    state:D stack:12744 pid:   96 ppid:     2 flags:0x00004000
>      [  369.861041] Workqueue: writeback wb_workfn (flush-cifs-2)
>      [  369.862095] Call Trace:
>      [  369.862583]  __schedule+0x401/0x8e0
>      [  369.863280]  schedule+0x49/0x130
>      [  369.863912]  io_schedule+0x12/0x40
>      [  369.864604]  __lock_page+0x147/0x3a0
>      [  369.865322]  ? add_to_page_cache_lru+0x180/0x180
>      [  369.866246]  cifs_writepages+0x620/0xf60
>      [  369.867005]  do_writepages+0x43/0xe0
>      [  369.867737]  ? __blk_mq_try_issue_directly+0x121/0x1c0
>      [  369.868750]  __writeback_single_inode+0x3d/0x320
>      [  369.869589]  writeback_sb_inodes+0x20d/0x480
>      [  369.870367]  __writeback_inodes_wb+0x4c/0xe0
>      [  369.871148]  wb_writeback+0x201/0x2f0
>      [  369.871797]  wb_workfn+0x38a/0x4e0
>      [  369.872427]  ? check_preempt_curr+0x47/0x70
>      [  369.873191]  ? ttwu_do_wakeup.isra.0+0x17/0x170
>      [  369.873999]  process_one_work+0x1df/0x3b0
>      [  369.874741]  worker_thread+0x4a/0x3c0
>      [  369.875421]  ? process_one_work+0x3b0/0x3b0
>      [  369.876180]  kthread+0x125/0x160
>      [  369.876761]  ? kthread_park+0x90/0x90
>      [  369.877431]  ret_from_fork+0x22/0x30
>      [  369.878106] INFO: task a.out:465 blocked for more than 122 seconds.
>      [  369.879225]       Not tainted 5.10.0+ #164
>      [  369.879945] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>      [  369.881316] task:a.out           state:D stack:12752 pid:  465 ppid:   386 flags:0x00000002
>      [  369.882791] Call Trace:
>      [  369.883263]  __schedule+0x401/0x8e0
>      [  369.883884]  schedule+0x49/0x130
>      [  369.884447]  io_schedule+0x12/0x40
>      [  369.885054]  wait_on_page_bit+0x106/0x2e0
>      [  369.885795]  ? add_to_page_cache_lru+0x180/0x180
>      [  369.886631]  wait_on_page_writeback+0x25/0xd0
>      [  369.887427]  cifs_writepages+0x5ee/0xf60
>      [  369.888151]  do_writepages+0x43/0xe0
>      [  369.888789]  ? __generic_file_write_iter+0xfd/0x1d0
>      [  369.889663]  __filemap_fdatawrite_range+0xcd/0x110
>      [  369.890523]  file_write_and_wait_range+0x40/0x90
>      [  369.891360]  cifs_strict_fsync+0x35/0x470
>      [  369.892094]  do_fsync+0x38/0x70
>      [  369.892657]  __x64_sys_fsync+0x10/0x20
>      [  369.893336]  do_syscall_64+0x33/0x40
>      [  369.893978]  entry_SYSCALL_64_after_hwframe+0x67/0xd1
>      [  369.894883] RIP: 0033:0x7f660e208950
>      [  369.895538] RSP: 002b:00007fff52b27b78 EFLAGS: 00000202 ORIG_RAX: 000000000000004a
>      [  369.896882] RAX: ffffffffffffffda RBX: 00007fff52b28cb8 RCX: 00007f660e208950
>      [  369.898139] RDX: 0000000000001000 RSI: 00007fff52b27b80 RDI: 0000000000000003
>      [  369.899395] RBP: 00007fff52b28ba0 R08: 0000000000000410 R09: 0000000000000001
>      [  369.900661] R10: 00007f660e11c400 R11: 0000000000000202 R12: 0000000000000000
>      [  369.901925] R13: 00007fff52b28cc8 R14: 00007f660e328000 R15: 000055b5aeb6fdd8
>      [  369.903202] INFO: task sync:468 blocked for more than 122 seconds.
>      [  369.904311]       Not tainted 5.10.0+ #164
>      [  369.905034] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>      [  369.906457] task:sync            state:D stack:13632 pid:  468 ppid:   386 flags:0x00004002
>      [  369.907930] Call Trace:
>      [  369.908369]  __schedule+0x401/0x8e0
>      [  369.908984]  schedule+0x49/0x130
>      [  369.909582]  io_schedule+0x12/0x40
>      [  369.910208]  wait_on_page_bit+0x106/0x2e0
>      [  369.910918]  ? add_to_page_cache_lru+0x180/0x180
>      [  369.911758]  wait_on_page_writeback+0x25/0xd0
>      [  369.912560]  __filemap_fdatawait_range+0x83/0x110
>      [  369.913408]  ? __add_pages+0x6f/0x1b0
>      [  369.914089]  filemap_fdatawait_keep_errors+0x1a/0x50
>      [  369.914957]  sync_inodes_sb+0x208/0x2a0
>      [  369.915666]  ? __x64_sys_tee+0xd0/0xd0
>      [  369.916344]  iterate_supers+0x90/0xe0
>      [  369.916983]  ksys_sync+0x40/0xb0
>      [  369.917590]  __do_sys_sync+0xa/0x20
>      [  369.918240]  do_syscall_64+0x33/0x40
>      [  369.918884]  entry_SYSCALL_64_after_hwframe+0x67/0xd1
>      [  369.919800] RIP: 0033:0x7f746d820987
>      [  369.920451] RSP: 002b:00007ffce853fd78 EFLAGS: 00000206 ORIG_RAX: 00000000000000a2
>      [  369.921798] RAX: ffffffffffffffda RBX: 00007ffce853fed8 RCX: 00007f746d820987
>      [  369.923063] RDX: 00007f746d8f4801 RSI: 00007ffce8541f71 RDI: 00007f746d8b05ad
>      [  369.924339] RBP: 0000000000000001 R08: 000000000000ffff R09: 0000000000000000
>      [  369.925605] R10: 00007f746d7308a0 R11: 0000000000000206 R12: 000055b8487470fb
>      [  369.926866] R13: 0000000000000000 R14: 0000000000000000 R15: 000055b848749ce0
>      [  369.928138] Kernel panic - not syncing: hung_task: blocked tasks
>      [  369.929191] CPU: 3 PID: 35 Comm: khungtaskd Not tainted 5.10.0+ #164
>      [  369.952450] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014
>      [  369.956984] Call Trace:
>      [  369.957973]  dump_stack+0x57/0x6e
>      [  369.959273]  panic+0x115/0x2f1
>      [  369.960476]  watchdog.cold+0xb5/0xb5
>      [  369.961884]  ? hungtask_pm_notify+0x40/0x40
>      [  369.963310]  kthread+0x125/0x160
>      [  369.964354]  ? kthread_park+0x90/0x90
>      [  369.965551]  ret_from_fork+0x22/0x30
>      [  369.967673] Kernel Offset: 0xd600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>      [  369.971025] ---[ end Kernel panic - not syncing: hung_task: blocked tasks ]---
> ```
>
> Scenarios 2:
>
>    Scenario 2 occurs in strict cache mode
>
> ```
> P1 (dd)                 P2 (cifsd)          P3 (cifsiod)
>
> cifs_strict_writev
>   cifs_zap_mapping - If something breaks the oplock
>    cifs_revalidate_mapping
>     cifs_invalidate_mapping
>      invalidate_inode_pages2
>       invalidate_inode_pages2_range
>        lock_page - [1]
>        wait_on_page_writeback - [2] Waiting for writeback, blocked by [4]
>         wait_on_page_bit
>                        cifs_demultiplex_thread
>                        cifs_read_from_socket
>                        cifs_readv_from_socket
>                        - If another process triggers reconnect at this point
>                          cifs_reconnect
>                          - mid->mid_state updated to MID_RETRY_NEEDED
>                          smb2_writev_callback mid_entry->callback()
>                           - mid_state leads to wdata->result = -EAGAIN
>                           wdata->result = -EAGAIN
>                           queue_work(cifsiod_wq, &wdata->work);
>                                           cifs_writev_complete - worker function
>                                             - wdata->result == -EAGAIN Condition satisfied
>                                             cifs_writev_requeue
>                                               lock_page - [3] Blocked by [1]
>                                             end_page_writeback
>                                             - [4] Won't execute
>        unlock_page
> ```
>
> Mainline refactoring commit 3ee1a1fc3981 ("cifs: Cut over to using netfslib")
> directly terminates the file write instead of resending data when smb2_writev_callback()
> detects a write failure, thus avoiding this problem. This patch is introduced
> in v6.10-rc1. Therefore, scenario 2 affects LTS versions from linux-5.4
> to linux-6.6.
>
> ```
> cat /proc/522/stack
> [<0>] wait_on_page_bit+0x106/0x150
> [<0>] invalidate_inode_pages2_range+0x2cc/0x580
> [<0>] cifs_invalidate_mapping+0x2c/0x50 [cifs]
> [<0>] cifs_revalidate_mapping+0x4c/0x90 [cifs]
> [<0>] cifs_strict_writev+0x17a/0x250 [cifs]
> [<0>] __vfs_write+0x14f/0x1b0
> [<0>] vfs_write+0xb6/0x1a0
> [<0>] ksys_write+0x57/0xd0
> [<0>] do_syscall_64+0x63/0x250
> [<0>] entry_SYSCALL_64_after_hwframe+0x5c/0xc1
> [<0>] 0xffffffffffffffff
>
> cat /proc/33/stack
> [<0>] __lock_page+0x10c/0x160
> [<0>] cifs_writev_requeue.cold+0x17e/0x239 [cifs]
> [<0>] process_one_work+0x1a9/0x3f0
> [<0>] worker_thread+0x50/0x3c0
> [<0>] kthread+0x117/0x130
> [<0>] ret_from_fork+0x35/0x40
> [<0>] 0xffffffffffffffff
> ```
>
>
> The root cause of the deadlock problem is that the page/folio is locked again in
> cifs_writev_requeue(). In order to safely fix it on the LTS branches, I would
> like to clarify the following questions:,
>
> 1. Whether resending is necessary. If retransmission is not required, simply
> terminating the write would avoids this problem. Is this an acceptable solution?
>
> 2. Is it necessary to lock the page/folio in cifs_writev_requeue()? Based on
> my code screening (possibly missing), there seems to be no process that modifies
> a page when it is marked as PG_writeback.Therefore, the page does not need to be
> locked during wait_on_page_writeback().
>


-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG REPORT] cifs: Deadlock due to network reconnection during file writing
  2025-02-07  1:30 ` Steve French
@ 2025-02-10 13:05   ` David Howells
  2025-02-18  1:05     ` Wang Zhaolong
  0 siblings, 1 reply; 8+ messages in thread
From: David Howells @ 2025-02-10 13:05 UTC (permalink / raw)
  To: Steve French
  Cc: dhowells, Wang Zhaolong, stable, linux-cifs, yangerkun, yi zhang,
	Paulo Alcantara

This is before cifs moved over to using netfslib (v6.9) and netfslib took over
all the dealing with the VFS/VM for I/O and the handling of pages/folios.  Do
you know if the same problem occurs after that point?

David

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG REPORT] cifs: Deadlock due to network reconnection during file writing
  2025-02-10 13:05   ` David Howells
@ 2025-02-18  1:05     ` Wang Zhaolong
  2025-03-18 13:50       ` Wang Zhaolong
  0 siblings, 1 reply; 8+ messages in thread
From: Wang Zhaolong @ 2025-02-18  1:05 UTC (permalink / raw)
  To: David Howells, Steve French
  Cc: stable, linux-cifs, yangerkun, yi zhang, Paulo Alcantara

Friendly ping.

Best regards,
Wang Zhaolong

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG REPORT] cifs: Deadlock due to network reconnection during file writing
  2025-02-18  1:05     ` Wang Zhaolong
@ 2025-03-18 13:50       ` Wang Zhaolong
  2025-03-18 14:37         ` Greg KH
  0 siblings, 1 reply; 8+ messages in thread
From: Wang Zhaolong @ 2025-03-18 13:50 UTC (permalink / raw)
  To: David Howells, Steve French
  Cc: stable, linux-cifs, yangerkun, yi zhang, Paulo Alcantara

Friendly ping.

Best regards,
Wang Zhaolong

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG REPORT] cifs: Deadlock due to network reconnection during file writing
  2025-03-18 13:50       ` Wang Zhaolong
@ 2025-03-18 14:37         ` Greg KH
  2025-03-19  1:25           ` Wang Zhaolong
  0 siblings, 1 reply; 8+ messages in thread
From: Greg KH @ 2025-03-18 14:37 UTC (permalink / raw)
  To: Wang Zhaolong
  Cc: David Howells, Steve French, stable, linux-cifs, yangerkun,
	yi zhang, Paulo Alcantara

On Tue, Mar 18, 2025 at 09:50:25PM +0800, Wang Zhaolong wrote:
> Friendly ping.

Empty pings with no context are not good :(

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG REPORT] cifs: Deadlock due to network reconnection during file writing
  2025-03-18 14:37         ` Greg KH
@ 2025-03-19  1:25           ` Wang Zhaolong
  0 siblings, 0 replies; 8+ messages in thread
From: Wang Zhaolong @ 2025-03-19  1:25 UTC (permalink / raw)
  To: Greg KH
  Cc: David Howells, Steve French, stable, linux-cifs, yangerkun,
	yi zhang, Paulo Alcantara

Apologies for the earlier context-less ping 🙏. Here's the
situation:

I have been tracking the latest progress on fixing an issue
involving a deadlock in the CIFS write file process caused by
a network interruption. This problem affects LTS Linux kernel
versions 5.4.y through 6.6.y. The reason it is limited to LTS
versions is that the issue was avoided in the mainline 6.9
version due to the netns-based code restructuring in the CIFS.

In my previous email, I provided the code call flow of the issue,
as well as the invasive method to modify the kernel for reproduction.
If there is anything else I can provide to help move this
forward, please let me know.

Thank you for your time and support!

Best regards,
Wang Zhaolong

> On Tue, Mar 18, 2025 at 09:50:25PM +0800, Wang Zhaolong wrote:
>> Friendly ping.
> 
> Empty pings with no context are not good :(

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-03-19  1:25 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-26  7:36 [BUG REPORT] cifs: Deadlock due to network reconnection during file writing Wang Zhaolong
2025-02-07  1:08 ` Wang Zhaolong
2025-02-07  1:30 ` Steve French
2025-02-10 13:05   ` David Howells
2025-02-18  1:05     ` Wang Zhaolong
2025-03-18 13:50       ` Wang Zhaolong
2025-03-18 14:37         ` Greg KH
2025-03-19  1:25           ` Wang Zhaolong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox