linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [bug report] blktests nvme/tcp nvme/060 hang
@ 2025-08-06  1:57 Yi Zhang
  2025-08-06  6:16 ` Maurizio Lombardi
  2025-08-06 10:52 ` Hannes Reinecke
  0 siblings, 2 replies; 9+ messages in thread
From: Yi Zhang @ 2025-08-06  1:57 UTC (permalink / raw)
  To: linux-block, open list:NVM EXPRESS DRIVER
  Cc: Shinichiro Kawasaki, Maurizio Lombardi

Hello
I hit this issue when I was running blktests nvme/tcp nvme/060 on the
latest linux-block/for-next with rt enabled, please help check it and
let me know if you need any info/testing for it, thanks.

[  195.356104] run blktests nvme/060 at 2025-08-06 01:50:45
[  195.578659] loop0: detected capacity change from 0 to 2097152
[  195.620986] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[  195.681508] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[  195.904991] nvmet: Created nvm controller 1 for subsystem
blktests-subsystem-1 for NQN
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[  195.919199] nvme nvme1: creating 80 I/O queues.
[  196.035295] nvme nvme1: mapped 80/0/0 default/read/poll queues.
[  196.175655] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr
127.0.0.1:4420, hostnqn:
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
[  196.195731] nvme (3558) used greatest stack depth: 19248 bytes left
[  197.786460] nvmet: ctrl 1 fatal error occurred!
[  197.789337] nvme nvme1: starting error recovery
[  197.932751] nvme nvme1: Reconnecting in 1 seconds...
[  198.980348] nvmet: Created nvm controller 1 for subsystem
blktests-subsystem-1 for NQN
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[  198.985848] nvme nvme1: creating 80 I/O queues.
[  199.318924] nvme nvme1: mapped 80/0/0 default/read/poll queues.
[  199.423818] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
[  199.556572] nvme nvme1: Failed reconnect attempt 1/600
[  200.075342] systemd-udevd (3501) used greatest stack depth: 17888 bytes left
[  200.496781] nvme nvme1: Property Set error: 880, offset 0x14
[  200.977968] nvmet: Created nvm controller 1 for subsystem
blktests-subsystem-1 for NQN
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[  200.985964] nvme nvme1: creating 80 I/O queues.
[  201.076602] nvme nvme1: mapped 80/0/0 default/read/poll queues.
[  201.189249] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr
127.0.0.1:4420, hostnqn:
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
[  201.206511] nvme (3741) used greatest stack depth: 17712 bytes left
[  201.867978] nvmet: ctrl 1 fatal error occurred!
[  201.868127] nvme nvme1: starting error recovery
[  201.986566] nvme nvme1: Reconnecting in 1 seconds...
[  203.049519] nvmet: Created nvm controller 1 for subsystem
blktests-subsystem-1 for NQN
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[  203.056878] nvme nvme1: creating 80 I/O queues.
[  203.399239] nvme nvme1: mapped 80/0/0 default/read/poll queues.
[  203.569429] nvme nvme1: Successfully reconnected (attempt 1/600)
[  203.918129] nvmet: ctrl 1 fatal error occurred!
[  203.918261] nvme nvme1: starting error recovery
[  204.026795] nvme nvme1: Reconnecting in 1 seconds...
[  204.423104] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
[  205.336507] nvme nvme1: Property Set error: 880, offset 0x14
[  205.855722] nvmet: Created nvm controller 1 for subsystem
blktests-subsystem-1 for NQN
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[  205.861038] nvme nvme1: creating 80 I/O queues.
[  205.944537] nvme nvme1: mapped 80/0/0 default/read/poll queues.
[  205.965862] nvmet: ctrl 1 fatal error occurred!
[  206.053046] nvme nvme1: starting error recovery
[  206.147337] nvme nvme1: Failed to configure AEN (cfg 900)
[  206.147495] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr
127.0.0.1:4420, hostnqn:
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
[  206.176163] nvme nvme1: Reconnecting in 1 seconds...
[  207.208170] nvmet: Created nvm controller 1 for subsystem
blktests-subsystem-1 for NQN
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[  207.211901] nvme nvme1: creating 80 I/O queues.
[  207.336799] nvme nvme1: mapped 80/0/0 default/read/poll queues.
[  207.349663] nvme nvme1: Successfully reconnected (attempt 1/600)
[  208.017068] nvmet: ctrl 1 fatal error occurred!
[  208.017201] nvme nvme1: starting error recovery
[  208.126889] nvme nvme1: Reconnecting in 1 seconds...
[  209.199440] nvmet: Created nvm controller 1 for subsystem
blktests-subsystem-1 for NQN
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[  209.203685] nvme nvme1: creating 80 I/O queues.
[  209.382909] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
[  209.503420] nvme nvme1: mapped 80/0/0 default/read/poll queues.
[  209.706412] nvme nvme1: Failed reconnect attempt 1/600
[  210.617622] nvme nvme1: Property Set error: 880, offset 0x14
[  211.158138] nvmet: Created nvm controller 1 for subsystem
blktests-subsystem-1 for NQN
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[  211.163794] nvme nvme1: creating 80 I/O queues.
[  211.245927] nvme nvme1: mapped 80/0/0 default/read/poll queues.
[  211.350477] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr
127.0.0.1:4420, hostnqn:
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
[  212.097987] nvmet: ctrl 1 fatal error occurred!
[  212.098122] nvme nvme1: starting error recovery
[  212.217345] nvme nvme1: Reconnecting in 1 seconds...
[  213.280774] nvmet: Created nvm controller 1 for subsystem
blktests-subsystem-1 for NQN
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[  213.284617] nvme nvme1: creating 80 I/O queues.
[  213.625320] nvme nvme1: mapped 80/0/0 default/read/poll queues.
[  213.787167] nvme nvme1: Successfully reconnected (attempt 1/600)
[  214.147984] nvmet: ctrl 1 fatal error occurred!
[  214.148113] nvme nvme1: starting error recovery
[  214.248455] nvme nvme1: Reconnecting in 1 seconds...
[  214.604818] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
[  215.431717] nvme nvme1: Property Set error: 880, offset 0x14
[  216.014626] nvmet: Created nvm controller 1 for subsystem
blktests-subsystem-1 for NQN
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[  216.020186] nvme nvme1: creating 80 I/O queues.
[  216.101613] nvme nvme1: mapped 80/0/0 default/read/poll queues.
[  216.195783] nvmet: ctrl 1 fatal error occurred!
[  216.536582] nvme nvme1: failed to send request -32
[  216.537991] nvme nvme1: failed nvme_keep_alive_end_io error=4
[  280.196270] nvme nvme1: I/O tag 0 (0000) type 4 opcode 0x7f
(Fabrics Cmd) QID 80 timeout
[  280.198058] nvme nvme1: Connect command failed, error wo/DNR bit: 881
[  280.198353] nvme nvme1: failed to connect queue: 80 ret=881
[  285.369827] nvmet: Created nvm controller 1 for subsystem
blktests-subsystem-1 for NQN
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[  285.375603] nvme nvme1: creating 80 I/O queues.
[  285.449676] nvme nvme1: mapped 80/0/0 default/read/poll queues.
[  285.554119] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr
127.0.0.1:4420, hostnqn:
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
[  287.344206] nvmet: ctrl 1 fatal error occurred!
[  287.344349] nvme nvme1: starting error recovery
[  287.466244] nvme nvme1: Reconnecting in 1 seconds...
[  288.479300] nvmet: Created nvm controller 1 for subsystem
blktests-subsystem-1 for NQN
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[  288.483283] nvme nvme1: creating 80 I/O queues.
[  288.785415] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
[  288.856766] nvme nvme1: mapped 80/0/0 default/read/poll queues.
[  289.066215] nvme nvme1: Failed reconnect attempt 1/600
[  290.116265] nvme nvme1: Property Set error: 880, offset 0x14
[  320.802045] loop: module loaded
[  321.017366] run blktests nvme/060 at 2025-08-06 01:52:51
[  321.220243] loop0: detected capacity change from 0 to 2097152
[  321.262034] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[  321.322296] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[  321.511316] nvmet: Created nvm controller 1 for subsystem
blktests-subsystem-1 for NQN
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[  321.524993] nvme nvme1: creating 80 I/O queues.
[  321.636336] nvme nvme1: mapped 80/0/0 default/read/poll queues.
[  321.754764] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr
127.0.0.1:4420, hostnqn:
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
[  323.431491] nvmet: ctrl 1 fatal error occurred!
[  323.434219] nvme nvme1: starting error recovery
[  323.566948] nvme nvme1: Reconnecting in 1 seconds...
[  324.653738] nvmet: Created nvm controller 1 for subsystem
blktests-subsystem-1 for NQN
nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
[  324.662270] nvme nvme1: creating 80 I/O queues.
[  324.993712] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
[  325.052085] nvme nvme1: mapped 80/0/0 default/read/poll queues.
[  325.481710] nvmet: ctrl 1 fatal error occurred!
[  327.286111] nvmet: ctrl 1 keep-alive timer (1 seconds) expired!
[  390.435380] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  390.441500] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-15):
P4784/2:b..l P1049/8:b..l
[  390.449771] rcu: (detected by 13, t=6503 jiffies, g=8845, q=2243 ncpus=80)
[  390.456724] task:kworker/u329:1  state:D stack:19264 pid:1049
tgid:1049  ppid:2      task_flags:0x4208060 flags:0x00000010
[  390.467850] Workqueue: nvme-wq nvme_tcp_reconnect_ctrl_work [nvme_tcp]
[  390.474378] Call trace:
[  390.476813]  __switch_to+0x1d8/0x330 (T)
[  390.480731]  __schedule+0x860/0x1c30
[  390.484297]  schedule_rtlock+0x30/0x70
[  390.488037]  rtlock_slowlock_locked+0x464/0xf60
[  390.492559]  rt_read_lock+0x2bc/0x3e0
[  390.496211]  nvmet_tcp_listen_data_ready+0x3c/0x118 [nvmet_tcp]
[  390.502125]  nvmet_tcp_data_ready+0x88/0x198 [nvmet_tcp]
[  390.507429]  tcp_data_ready+0xdc/0x3e0
[  390.511171]  tcp_data_queue+0xe30/0x1e08
[  390.515084]  tcp_rcv_established+0x710/0x2328
[  390.519432]  tcp_v4_do_rcv+0x554/0x940
[  390.523172]  tcp_v4_rcv+0x1ec4/0x3078
[  390.526825]  ip_protocol_deliver_rcu+0xc0/0x3f0
[  390.531347]  ip_local_deliver_finish+0x2d4/0x5c0
[  390.535954]  ip_local_deliver+0x17c/0x3c0
[  390.539953]  ip_rcv_finish+0x148/0x238
[  390.543693]  ip_rcv+0xd0/0x2e0
[  390.546737]  __netif_receive_skb_one_core+0x100/0x180
[  390.551780]  __netif_receive_skb+0x2c/0x160
[  390.555953]  process_backlog+0x230/0x6f8
[  390.559866]  __napi_poll.constprop.0+0x9c/0x3e8
[  390.564386]  net_rx_action+0x808/0xb50
[  390.568125]  handle_softirqs.constprop.0+0x23c/0xca0
[  390.573082]  __local_bh_enable_ip+0x260/0x4f0
[  390.577429]  __dev_queue_xmit+0x6f4/0x1bd8
[  390.581515]  neigh_hh_output+0x190/0x2c0
[  390.585429]  ip_finish_output2+0x7c8/0x1788
[  390.589602]  __ip_finish_output+0x2c4/0x4f8
[  390.593776]  ip_finish_output+0x3c/0x2a8
[  390.597689]  ip_output+0x154/0x418
[  390.601081]  __ip_queue_xmit+0x580/0x1108
[  390.605081]  ip_queue_xmit+0x4c/0x78
[  390.608647]  __tcp_transmit_skb+0xfac/0x24e8
[  390.612908]  tcp_write_xmit+0xbec/0x3078
[  390.616821]  __tcp_push_pending_frames+0x90/0x2b8
[  390.621515]  tcp_send_fin+0x108/0x9a8
[  390.625167]  tcp_shutdown+0xd8/0xf8
[  390.628646]  inet_shutdown+0x14c/0x2e8
[  390.632385]  kernel_sock_shutdown+0x5c/0x98
[  390.636560]  __nvme_tcp_stop_queue+0x44/0x220 [nvme_tcp]
[  390.641865]  nvme_tcp_stop_queue_nowait+0x130/0x200 [nvme_tcp]
[  390.647691]  nvme_tcp_setup_ctrl+0x3bc/0x728 [nvme_tcp]
[  390.652909]  nvme_tcp_reconnect_ctrl_work+0x78/0x290 [nvme_tcp]
[  390.658822]  process_one_work+0x80c/0x1a78
[  390.662910]  worker_thread+0x6d0/0xaa8
[  390.666650]  kthread+0x304/0x3a0
[  390.669869]  ret_from_fork+0x10/0x20
[  390.673437] task:kworker/u322:77 state:D stack:27184 pid:4784
tgid:4784  ppid:2      task_flags:0x4208060 flags:0x00000210
[  390.684562] Workqueue: nvmet-wq nvmet_tcp_release_queue_work [nvmet_tcp]
[  390.691256] Call trace:
[  390.693692]  __switch_to+0x1d8/0x330 (T)
[  390.697605]  __schedule+0x860/0x1c30
[  390.701171]  schedule_rtlock+0x30/0x70
[  390.704911]  rwbase_write_lock.constprop.0.isra.0+0x2fc/0xb30
[  390.710646]  rt_write_lock+0x9c/0x138
[  390.714299]  nvmet_tcp_release_queue_work+0x168/0xb48 [nvmet_tcp]
[  390.720384]  process_one_work+0x80c/0x1a78
[  390.724470]  worker_thread+0x6d0/0xaa8
[  390.728210]  kthread+0x304/0x3a0
[  390.731428]  ret_from_fork+0x10/0x20


-- 
Best Regards,
  Yi Zhang


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bug report] blktests nvme/tcp nvme/060 hang
  2025-08-06  1:57 [bug report] blktests nvme/tcp nvme/060 hang Yi Zhang
@ 2025-08-06  6:16 ` Maurizio Lombardi
  2025-08-06  6:22   ` Maurizio Lombardi
  2025-08-06 10:52 ` Hannes Reinecke
  1 sibling, 1 reply; 9+ messages in thread
From: Maurizio Lombardi @ 2025-08-06  6:16 UTC (permalink / raw)
  To: Yi Zhang, linux-block, open list:NVM EXPRESS DRIVER
  Cc: Shinichiro Kawasaki, Maurizio Lombardi

On Wed Aug 6, 2025 at 3:57 AM CEST, Yi Zhang wrote:
> Hello
> I hit this issue when I was running blktests nvme/tcp nvme/060 on the
> latest linux-block/for-next with rt enabled, please help check it and
> let me know if you need any info/testing for it, thanks.
>
> [  390.474378] Call trace:
> [  390.476813]  __switch_to+0x1d8/0x330 (T)
> [  390.480731]  __schedule+0x860/0x1c30
> [  390.484297]  schedule_rtlock+0x30/0x70
> [  390.488037]  rtlock_slowlock_locked+0x464/0xf60
> [  390.492559]  rt_read_lock+0x2bc/0x3e0
> [  390.496211]  nvmet_tcp_listen_data_ready+0x3c/0x118 [nvmet_tcp]
> [  390.502125]  nvmet_tcp_data_ready+0x88/0x198 [nvmet_tcp]

I think that the problem is due to the fact that nvmet_tcp_data_ready()
calls the queue->data_ready() callback with the sk_callback_lock
locked.
The data_ready callback points to nvmet_tcp_listen_data_ready()
which tries to lock the same sk_callback_lock, hence the deadlock.

Maybe it can be fixed by deferring the call to queue->data_ready() by
using a workqueue.

Maurizio


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bug report] blktests nvme/tcp nvme/060 hang
  2025-08-06  6:16 ` Maurizio Lombardi
@ 2025-08-06  6:22   ` Maurizio Lombardi
  2025-08-06  6:44     ` Maurizio Lombardi
  0 siblings, 1 reply; 9+ messages in thread
From: Maurizio Lombardi @ 2025-08-06  6:22 UTC (permalink / raw)
  To: Maurizio Lombardi, Yi Zhang, linux-block,
	open list:NVM EXPRESS DRIVER
  Cc: Shinichiro Kawasaki, Maurizio Lombardi

On Wed Aug 6, 2025 at 8:16 AM CEST, Maurizio Lombardi wrote:
> On Wed Aug 6, 2025 at 3:57 AM CEST, Yi Zhang wrote:
>> Hello
>> I hit this issue when I was running blktests nvme/tcp nvme/060 on the
>> latest linux-block/for-next with rt enabled, please help check it and
>> let me know if you need any info/testing for it, thanks.
>>
>> [  390.474378] Call trace:
>> [  390.476813]  __switch_to+0x1d8/0x330 (T)
>> [  390.480731]  __schedule+0x860/0x1c30
>> [  390.484297]  schedule_rtlock+0x30/0x70
>> [  390.488037]  rtlock_slowlock_locked+0x464/0xf60
>> [  390.492559]  rt_read_lock+0x2bc/0x3e0
>> [  390.496211]  nvmet_tcp_listen_data_ready+0x3c/0x118 [nvmet_tcp]
>> [  390.502125]  nvmet_tcp_data_ready+0x88/0x198 [nvmet_tcp]
>
> I think that the problem is due to the fact that nvmet_tcp_data_ready()
> calls the queue->data_ready() callback with the sk_callback_lock
> locked.
> The data_ready callback points to nvmet_tcp_listen_data_ready()
> which tries to lock the same sk_callback_lock, hence the deadlock.
>
> Maybe it can be fixed by deferring the call to queue->data_ready() by
> using a workqueue.
>

Ops sorry they are two read locks, the real problem then is that
something is holding the write lock.

Maurizio

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bug report] blktests nvme/tcp nvme/060 hang
  2025-08-06  6:22   ` Maurizio Lombardi
@ 2025-08-06  6:44     ` Maurizio Lombardi
  2025-08-06  9:30       ` Maurizio Lombardi
  2025-08-06 10:54       ` Hannes Reinecke
  0 siblings, 2 replies; 9+ messages in thread
From: Maurizio Lombardi @ 2025-08-06  6:44 UTC (permalink / raw)
  To: Maurizio Lombardi, Yi Zhang, linux-block,
	open list:NVM EXPRESS DRIVER
  Cc: Shinichiro Kawasaki, Maurizio Lombardi

On Wed Aug 6, 2025 at 8:22 AM CEST, Maurizio Lombardi wrote:
> On Wed Aug 6, 2025 at 8:16 AM CEST, Maurizio Lombardi wrote:
>>
>> I think that the problem is due to the fact that nvmet_tcp_data_ready()
>> calls the queue->data_ready() callback with the sk_callback_lock
>> locked.
>> The data_ready callback points to nvmet_tcp_listen_data_ready()
>> which tries to lock the same sk_callback_lock, hence the deadlock.
>>
>> Maybe it can be fixed by deferring the call to queue->data_ready() by
>> using a workqueue.
>>
>
> Ops sorry they are two read locks, the real problem then is that
> something is holding the write lock.

Ok, I think I get what happens now.

The threads that call nvmet_tcp_data_ready() (takes the read lock 2
times) and
nvmet_tcp_release_queue_work() (tries to take the write lock)
are blocking each other.
So I still think that deferring the call to queue->data_ready() by
using a workqueue should fix it.

Maurizio



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bug report] blktests nvme/tcp nvme/060 hang
  2025-08-06  6:44     ` Maurizio Lombardi
@ 2025-08-06  9:30       ` Maurizio Lombardi
  2025-08-06 10:54       ` Hannes Reinecke
  1 sibling, 0 replies; 9+ messages in thread
From: Maurizio Lombardi @ 2025-08-06  9:30 UTC (permalink / raw)
  To: Maurizio Lombardi, Yi Zhang, linux-block,
	open list:NVM EXPRESS DRIVER
  Cc: Shinichiro Kawasaki, Maurizio Lombardi

On Wed Aug 6, 2025 at 8:44 AM CEST, Maurizio Lombardi wrote:
> On Wed Aug 6, 2025 at 8:22 AM CEST, Maurizio Lombardi wrote:
>>
>> Ops sorry they are two read locks, the real problem then is that
>> something is holding the write lock.
>
> Ok, I think I get what happens now.
>
> The threads that call nvmet_tcp_data_ready() (takes the read lock 2
> times) and
> nvmet_tcp_release_queue_work() (tries to take the write lock)
> are blocking each other.
> So I still think that deferring the call to queue->data_ready() by
> using a workqueue should fix it.
>

I reproduced the issue by creating a reader thread that tries to take
the lock twice and a writer thread that takes the write lock
between the two calls to read_lock()

[   33.398311] [Reader] Thread started.
[   33.398410] [Writer] Thread started, waiting for reader to get lock...
[   33.398577] [Reader] Acquired read_lock successfully.
[   33.399391] [Reader] Sleeping for a while to allow writer to block...
[   33.418697] [Writer] Reader has the lock. Attempting to acquire write_lock... THIS SHOULD BLOCK.
[   41.288105] [Reader] Attempting to acquire a second read_lock... THIS SHOULD BLOCK.
[   93.388349] rcu: INFO: rcu_preempt self-detected stall on CPU
[   93.388758] rcu:     7-....: (5999 ticks this GP) idle=9db4/1/0x4000000000000000 softirq=1846/1846 fqs=2444
[   93.389390] rcu:     (t=6001 jiffies g=1917 q=4319 ncpus=8)
[   93.389745] CPU: 7 UID: 0 PID: 1784 Comm: reader_thread Kdump: loaded Tainted: G           OEL    -------  ---  6.12.0-116.el10.aarch64 #1 PREEMPT(voluntary)
[   93.389749] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE, [L]=SOFTLOCKUP
[   93.389749] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[   93.389750] pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   93.389752] pc : queued_spin_lock_slowpath+0x78/0x460
[   93.389757] lr : queued_read_lock_slowpath+0x21c/0x228
[   93.389759] sp : ffff80008bd6bdd0
[   93.389760] x29: ffff80008bd6bdd0 x28: 0000000000000000 x27: 0000000000000000
[   93.389762] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
[   93.389764] x23: ffffb1c374605008 x22: ffff0000ca9342c0 x21: ffff80008bafb960
[   93.389766] x20: ffff0000c4735e40 x19: ffffb1c37460701c x18: 0000000000000006
[   93.389767] x17: 444c554f48532053 x16: ffffb1c3ee73ab48 x15: 636f6c5f64616572
[   93.389769] x14: 20646e6f63657320 x13: 2e4b434f4c422044 x12: ffffb1c3eff5ec10
[   93.389771] x11: ffffb1c3efc9ec68 x10: ffffb1c3eff5ec68 x9 : ffffb1c3ee73b4c4
[   93.389772] x8 : 0000000000000001 x7 : 00000000000bffe8 x6 : c0000000ffff7fff
[   93.389774] x5 : ffff00112ebe05c8 x4 : 0000000000000000 x3 : 0000000000000000
[   93.389776] x2 : 0000000000000001 x1 : 0000000000000000 x0 : 0000000000000001
[   93.389778] Call trace:
[   93.389779]  queued_spin_lock_slowpath+0x78/0x460 (P)
[   93.389782]  queued_read_lock_slowpath+0x21c/0x228
[   93.389785]  _raw_read_lock+0x60/0x80
[   93.389787]  reader_thread_fn+0x7c/0xc0 [dead]
[   93.389791]  kthread+0x110/0x130
[   93.389794]  ret_from_fork+0x10/0x20


So apparently in case of contention writers have the precedence.

Note that the same problem may also affect nvmet_tcp_write_space()

Maurizio

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bug report] blktests nvme/tcp nvme/060 hang
  2025-08-06  1:57 [bug report] blktests nvme/tcp nvme/060 hang Yi Zhang
  2025-08-06  6:16 ` Maurizio Lombardi
@ 2025-08-06 10:52 ` Hannes Reinecke
  2025-08-07  1:57   ` Yi Zhang
  2025-08-11 14:10   ` Maurizio Lombardi
  1 sibling, 2 replies; 9+ messages in thread
From: Hannes Reinecke @ 2025-08-06 10:52 UTC (permalink / raw)
  To: Yi Zhang, linux-block, open list:NVM EXPRESS DRIVER
  Cc: Shinichiro Kawasaki, Maurizio Lombardi

On 8/6/25 03:57, Yi Zhang wrote:
> Hello
> I hit this issue when I was running blktests nvme/tcp nvme/060 on the
> latest linux-block/for-next with rt enabled, please help check it and
> let me know if you need any info/testing for it, thanks.
> 
> [  195.356104] run blktests nvme/060 at 2025-08-06 01:50:45
> [  195.578659] loop0: detected capacity change from 0 to 2097152
> [  195.620986] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
> [  195.681508] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
> [  195.904991] nvmet: Created nvm controller 1 for subsystem
> blktests-subsystem-1 for NQN
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> [  195.919199] nvme nvme1: creating 80 I/O queues.
> [  196.035295] nvme nvme1: mapped 80/0/0 default/read/poll queues.
> [  196.175655] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr
> 127.0.0.1:4420, hostnqn:
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
> [  196.195731] nvme (3558) used greatest stack depth: 19248 bytes left
> [  197.786460] nvmet: ctrl 1 fatal error occurred!
> [  197.789337] nvme nvme1: starting error recovery
> [  197.932751] nvme nvme1: Reconnecting in 1 seconds...
> [  198.980348] nvmet: Created nvm controller 1 for subsystem
> blktests-subsystem-1 for NQN
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> [  198.985848] nvme nvme1: creating 80 I/O queues.
> [  199.318924] nvme nvme1: mapped 80/0/0 default/read/poll queues.
> [  199.423818] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
> [  199.556572] nvme nvme1: Failed reconnect attempt 1/600
> [  200.075342] systemd-udevd (3501) used greatest stack depth: 17888 bytes left
> [  200.496781] nvme nvme1: Property Set error: 880, offset 0x14
> [  200.977968] nvmet: Created nvm controller 1 for subsystem
> blktests-subsystem-1 for NQN
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> [  200.985964] nvme nvme1: creating 80 I/O queues.
> [  201.076602] nvme nvme1: mapped 80/0/0 default/read/poll queues.
> [  201.189249] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr
> 127.0.0.1:4420, hostnqn:
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
> [  201.206511] nvme (3741) used greatest stack depth: 17712 bytes left
> [  201.867978] nvmet: ctrl 1 fatal error occurred!
> [  201.868127] nvme nvme1: starting error recovery
> [  201.986566] nvme nvme1: Reconnecting in 1 seconds...
> [  203.049519] nvmet: Created nvm controller 1 for subsystem
> blktests-subsystem-1 for NQN
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> [  203.056878] nvme nvme1: creating 80 I/O queues.
> [  203.399239] nvme nvme1: mapped 80/0/0 default/read/poll queues.
> [  203.569429] nvme nvme1: Successfully reconnected (attempt 1/600)
> [  203.918129] nvmet: ctrl 1 fatal error occurred!
> [  203.918261] nvme nvme1: starting error recovery
> [  204.026795] nvme nvme1: Reconnecting in 1 seconds...
> [  204.423104] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
> [  205.336507] nvme nvme1: Property Set error: 880, offset 0x14
> [  205.855722] nvmet: Created nvm controller 1 for subsystem
> blktests-subsystem-1 for NQN
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> [  205.861038] nvme nvme1: creating 80 I/O queues.
> [  205.944537] nvme nvme1: mapped 80/0/0 default/read/poll queues.
> [  205.965862] nvmet: ctrl 1 fatal error occurred!
> [  206.053046] nvme nvme1: starting error recovery
> [  206.147337] nvme nvme1: Failed to configure AEN (cfg 900)
> [  206.147495] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr
> 127.0.0.1:4420, hostnqn:
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
> [  206.176163] nvme nvme1: Reconnecting in 1 seconds...
> [  207.208170] nvmet: Created nvm controller 1 for subsystem
> blktests-subsystem-1 for NQN
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> [  207.211901] nvme nvme1: creating 80 I/O queues.
> [  207.336799] nvme nvme1: mapped 80/0/0 default/read/poll queues.
> [  207.349663] nvme nvme1: Successfully reconnected (attempt 1/600)
> [  208.017068] nvmet: ctrl 1 fatal error occurred!
> [  208.017201] nvme nvme1: starting error recovery
> [  208.126889] nvme nvme1: Reconnecting in 1 seconds...
> [  209.199440] nvmet: Created nvm controller 1 for subsystem
> blktests-subsystem-1 for NQN
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> [  209.203685] nvme nvme1: creating 80 I/O queues.
> [  209.382909] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
> [  209.503420] nvme nvme1: mapped 80/0/0 default/read/poll queues.
> [  209.706412] nvme nvme1: Failed reconnect attempt 1/600
> [  210.617622] nvme nvme1: Property Set error: 880, offset 0x14
> [  211.158138] nvmet: Created nvm controller 1 for subsystem
> blktests-subsystem-1 for NQN
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> [  211.163794] nvme nvme1: creating 80 I/O queues.
> [  211.245927] nvme nvme1: mapped 80/0/0 default/read/poll queues.
> [  211.350477] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr
> 127.0.0.1:4420, hostnqn:
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
> [  212.097987] nvmet: ctrl 1 fatal error occurred!
> [  212.098122] nvme nvme1: starting error recovery
> [  212.217345] nvme nvme1: Reconnecting in 1 seconds...
> [  213.280774] nvmet: Created nvm controller 1 for subsystem
> blktests-subsystem-1 for NQN
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> [  213.284617] nvme nvme1: creating 80 I/O queues.
> [  213.625320] nvme nvme1: mapped 80/0/0 default/read/poll queues.
> [  213.787167] nvme nvme1: Successfully reconnected (attempt 1/600)
> [  214.147984] nvmet: ctrl 1 fatal error occurred!
> [  214.148113] nvme nvme1: starting error recovery
> [  214.248455] nvme nvme1: Reconnecting in 1 seconds...
> [  214.604818] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
> [  215.431717] nvme nvme1: Property Set error: 880, offset 0x14
> [  216.014626] nvmet: Created nvm controller 1 for subsystem
> blktests-subsystem-1 for NQN
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> [  216.020186] nvme nvme1: creating 80 I/O queues.
> [  216.101613] nvme nvme1: mapped 80/0/0 default/read/poll queues.
> [  216.195783] nvmet: ctrl 1 fatal error occurred!
> [  216.536582] nvme nvme1: failed to send request -32
> [  216.537991] nvme nvme1: failed nvme_keep_alive_end_io error=4
> [  280.196270] nvme nvme1: I/O tag 0 (0000) type 4 opcode 0x7f
> (Fabrics Cmd) QID 80 timeout
> [  280.198058] nvme nvme1: Connect command failed, error wo/DNR bit: 881
> [  280.198353] nvme nvme1: failed to connect queue: 80 ret=881
> [  285.369827] nvmet: Created nvm controller 1 for subsystem
> blktests-subsystem-1 for NQN
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> [  285.375603] nvme nvme1: creating 80 I/O queues.
> [  285.449676] nvme nvme1: mapped 80/0/0 default/read/poll queues.
> [  285.554119] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr
> 127.0.0.1:4420, hostnqn:
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
> [  287.344206] nvmet: ctrl 1 fatal error occurred!
> [  287.344349] nvme nvme1: starting error recovery
> [  287.466244] nvme nvme1: Reconnecting in 1 seconds...
> [  288.479300] nvmet: Created nvm controller 1 for subsystem
> blktests-subsystem-1 for NQN
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> [  288.483283] nvme nvme1: creating 80 I/O queues.
> [  288.785415] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
> [  288.856766] nvme nvme1: mapped 80/0/0 default/read/poll queues.
> [  289.066215] nvme nvme1: Failed reconnect attempt 1/600
> [  290.116265] nvme nvme1: Property Set error: 880, offset 0x14
> [  320.802045] loop: module loaded
> [  321.017366] run blktests nvme/060 at 2025-08-06 01:52:51
> [  321.220243] loop0: detected capacity change from 0 to 2097152
> [  321.262034] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
> [  321.322296] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
> [  321.511316] nvmet: Created nvm controller 1 for subsystem
> blktests-subsystem-1 for NQN
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> [  321.524993] nvme nvme1: creating 80 I/O queues.
> [  321.636336] nvme nvme1: mapped 80/0/0 default/read/poll queues.
> [  321.754764] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr
> 127.0.0.1:4420, hostnqn:
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
> [  323.431491] nvmet: ctrl 1 fatal error occurred!
> [  323.434219] nvme nvme1: starting error recovery
> [  323.566948] nvme nvme1: Reconnecting in 1 seconds...
> [  324.653738] nvmet: Created nvm controller 1 for subsystem
> blktests-subsystem-1 for NQN
> nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> [  324.662270] nvme nvme1: creating 80 I/O queues.
> [  324.993712] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
> [  325.052085] nvme nvme1: mapped 80/0/0 default/read/poll queues.
> [  325.481710] nvmet: ctrl 1 fatal error occurred!
> [  327.286111] nvmet: ctrl 1 keep-alive timer (1 seconds) expired!
> [  390.435380] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> [  390.441500] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-15):
> P4784/2:b..l P1049/8:b..l
> [  390.449771] rcu: (detected by 13, t=6503 jiffies, g=8845, q=2243 ncpus=80)
> [  390.456724] task:kworker/u329:1  state:D stack:19264 pid:1049
> tgid:1049  ppid:2      task_flags:0x4208060 flags:0x00000010
> [  390.467850] Workqueue: nvme-wq nvme_tcp_reconnect_ctrl_work [nvme_tcp]
> [  390.474378] Call trace:
> [  390.476813]  __switch_to+0x1d8/0x330 (T)
> [  390.480731]  __schedule+0x860/0x1c30
> [  390.484297]  schedule_rtlock+0x30/0x70
> [  390.488037]  rtlock_slowlock_locked+0x464/0xf60
> [  390.492559]  rt_read_lock+0x2bc/0x3e0
> [  390.496211]  nvmet_tcp_listen_data_ready+0x3c/0x118 [nvmet_tcp]
> [  390.502125]  nvmet_tcp_data_ready+0x88/0x198 [nvmet_tcp]
> [  390.507429]  tcp_data_ready+0xdc/0x3e0
> [  390.511171]  tcp_data_queue+0xe30/0x1e08
> [  390.515084]  tcp_rcv_established+0x710/0x2328
> [  390.519432]  tcp_v4_do_rcv+0x554/0x940
> [  390.523172]  tcp_v4_rcv+0x1ec4/0x3078
> [  390.526825]  ip_protocol_deliver_rcu+0xc0/0x3f0
> [  390.531347]  ip_local_deliver_finish+0x2d4/0x5c0
> [  390.535954]  ip_local_deliver+0x17c/0x3c0
> [  390.539953]  ip_rcv_finish+0x148/0x238
> [  390.543693]  ip_rcv+0xd0/0x2e0
> [  390.546737]  __netif_receive_skb_one_core+0x100/0x180
> [  390.551780]  __netif_receive_skb+0x2c/0x160
> [  390.555953]  process_backlog+0x230/0x6f8
> [  390.559866]  __napi_poll.constprop.0+0x9c/0x3e8
> [  390.564386]  net_rx_action+0x808/0xb50
> [  390.568125]  handle_softirqs.constprop.0+0x23c/0xca0
> [  390.573082]  __local_bh_enable_ip+0x260/0x4f0
> [  390.577429]  __dev_queue_xmit+0x6f4/0x1bd8
> [  390.581515]  neigh_hh_output+0x190/0x2c0
> [  390.585429]  ip_finish_output2+0x7c8/0x1788
> [  390.589602]  __ip_finish_output+0x2c4/0x4f8
> [  390.593776]  ip_finish_output+0x3c/0x2a8
> [  390.597689]  ip_output+0x154/0x418
> [  390.601081]  __ip_queue_xmit+0x580/0x1108
> [  390.605081]  ip_queue_xmit+0x4c/0x78
> [  390.608647]  __tcp_transmit_skb+0xfac/0x24e8
> [  390.612908]  tcp_write_xmit+0xbec/0x3078
> [  390.616821]  __tcp_push_pending_frames+0x90/0x2b8
> [  390.621515]  tcp_send_fin+0x108/0x9a8
> [  390.625167]  tcp_shutdown+0xd8/0xf8
> [  390.628646]  inet_shutdown+0x14c/0x2e8
> [  390.632385]  kernel_sock_shutdown+0x5c/0x98
> [  390.636560]  __nvme_tcp_stop_queue+0x44/0x220 [nvme_tcp]
> [  390.641865]  nvme_tcp_stop_queue_nowait+0x130/0x200 [nvme_tcp]
> [  390.647691]  nvme_tcp_setup_ctrl+0x3bc/0x728 [nvme_tcp]
> [  390.652909]  nvme_tcp_reconnect_ctrl_work+0x78/0x290 [nvme_tcp]
> [  390.658822]  process_one_work+0x80c/0x1a78
> [  390.662910]  worker_thread+0x6d0/0xaa8
> [  390.666650]  kthread+0x304/0x3a0
> [  390.669869]  ret_from_fork+0x10/0x20
> [  390.673437] task:kworker/u322:77 state:D stack:27184 pid:4784
> tgid:4784  ppid:2      task_flags:0x4208060 flags:0x00000210
> [  390.684562] Workqueue: nvmet-wq nvmet_tcp_release_queue_work [nvmet_tcp]
> [  390.691256] Call trace:
> [  390.693692]  __switch_to+0x1d8/0x330 (T)
> [  390.697605]  __schedule+0x860/0x1c30
> [  390.701171]  schedule_rtlock+0x30/0x70
> [  390.704911]  rwbase_write_lock.constprop.0.isra.0+0x2fc/0xb30
> [  390.710646]  rt_write_lock+0x9c/0x138
> [  390.714299]  nvmet_tcp_release_queue_work+0x168/0xb48 [nvmet_tcp]
> [  390.720384]  process_one_work+0x80c/0x1a78
> [  390.724470]  worker_thread+0x6d0/0xaa8
> [  390.728210]  kthread+0x304/0x3a0
> [  390.731428]  ret_from_fork+0x10/0x20
> 
> 
Can you check if this fixes the problem?

diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index 688033b88d38..23bdce8dcdf0 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -1991,15 +1991,13 @@ static void nvmet_tcp_listen_data_ready(struct 
sock *sk)
         struct nvmet_tcp_port *port;

         trace_sk_data_ready(sk);
+       if (sk->sk_state != TCP_LISTEN)
+               return;

         read_lock_bh(&sk->sk_callback_lock);
         port = sk->sk_user_data;
-       if (!port)
-               goto out;
-
-       if (sk->sk_state == TCP_LISTEN)
+       if (port)
                 queue_work(nvmet_wq, &port->accept_work);
-out:
         read_unlock_bh(&sk->sk_callback_lock);
  }

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [bug report] blktests nvme/tcp nvme/060 hang
  2025-08-06  6:44     ` Maurizio Lombardi
  2025-08-06  9:30       ` Maurizio Lombardi
@ 2025-08-06 10:54       ` Hannes Reinecke
  1 sibling, 0 replies; 9+ messages in thread
From: Hannes Reinecke @ 2025-08-06 10:54 UTC (permalink / raw)
  To: Maurizio Lombardi, Yi Zhang, linux-block,
	open list:NVM EXPRESS DRIVER
  Cc: Shinichiro Kawasaki, Maurizio Lombardi

On 8/6/25 08:44, Maurizio Lombardi wrote:
> On Wed Aug 6, 2025 at 8:22 AM CEST, Maurizio Lombardi wrote:
>> On Wed Aug 6, 2025 at 8:16 AM CEST, Maurizio Lombardi wrote:
>>>
>>> I think that the problem is due to the fact that nvmet_tcp_data_ready()
>>> calls the queue->data_ready() callback with the sk_callback_lock
>>> locked.
>>> The data_ready callback points to nvmet_tcp_listen_data_ready()
>>> which tries to lock the same sk_callback_lock, hence the deadlock.
>>>
>>> Maybe it can be fixed by deferring the call to queue->data_ready() by
>>> using a workqueue.
>>>
>>
>> Ops sorry they are two read locks, the real problem then is that
>> something is holding the write lock.
> 
> Ok, I think I get what happens now.
> 
> The threads that call nvmet_tcp_data_ready() (takes the read lock 2
> times) and
> nvmet_tcp_release_queue_work() (tries to take the write lock)
> are blocking each other.
> So I still think that deferring the call to queue->data_ready() by
> using a workqueue should fix it.
> 
It's nvmet_tcp_list_data_ready() which is the problem; thing is, we only
need to take the lock to access 'sk_user_data' (as this might be
while the callback is running). But the 'sk_state' value can be accessed
without a lock, and as we need to look at sk_user_data only if the
socket is in TCP_LISTEN state (which I hope is not the case during
socket shutdown) we can move the check out of the lock and avoid
this issue.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bug report] blktests nvme/tcp nvme/060 hang
  2025-08-06 10:52 ` Hannes Reinecke
@ 2025-08-07  1:57   ` Yi Zhang
  2025-08-11 14:10   ` Maurizio Lombardi
  1 sibling, 0 replies; 9+ messages in thread
From: Yi Zhang @ 2025-08-07  1:57 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: linux-block, open list:NVM EXPRESS DRIVER, Shinichiro Kawasaki,
	Maurizio Lombardi

On Wed, Aug 6, 2025 at 6:52 PM Hannes Reinecke <hare@suse.de> wrote:
>
> On 8/6/25 03:57, Yi Zhang wrote:
> > Hello
> > I hit this issue when I was running blktests nvme/tcp nvme/060 on the
> > latest linux-block/for-next with rt enabled, please help check it and
> > let me know if you need any info/testing for it, thanks.
> >
> > [  195.356104] run blktests nvme/060 at 2025-08-06 01:50:45
> > [  195.578659] loop0: detected capacity change from 0 to 2097152
> > [  195.620986] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
> > [  195.681508] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
> > [  195.904991] nvmet: Created nvm controller 1 for subsystem
> > blktests-subsystem-1 for NQN
> > nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> > [  195.919199] nvme nvme1: creating 80 I/O queues.
> > [  196.035295] nvme nvme1: mapped 80/0/0 default/read/poll queues.
> > [  196.175655] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr
> > 127.0.0.1:4420, hostnqn:
> > nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
> > [  196.195731] nvme (3558) used greatest stack depth: 19248 bytes left
> > [  197.786460] nvmet: ctrl 1 fatal error occurred!
> > [  197.789337] nvme nvme1: starting error recovery
> > [  197.932751] nvme nvme1: Reconnecting in 1 seconds...
> > [  198.980348] nvmet: Created nvm controller 1 for subsystem
> > blktests-subsystem-1 for NQN
> > nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> > [  198.985848] nvme nvme1: creating 80 I/O queues.
> > [  199.318924] nvme nvme1: mapped 80/0/0 default/read/poll queues.
> > [  199.423818] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
> > [  199.556572] nvme nvme1: Failed reconnect attempt 1/600
> > [  200.075342] systemd-udevd (3501) used greatest stack depth: 17888 bytes left
> > [  200.496781] nvme nvme1: Property Set error: 880, offset 0x14
> > [  200.977968] nvmet: Created nvm controller 1 for subsystem
> > blktests-subsystem-1 for NQN
> > nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> > [  200.985964] nvme nvme1: creating 80 I/O queues.
> > [  201.076602] nvme nvme1: mapped 80/0/0 default/read/poll queues.
> > [  201.189249] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr
> > 127.0.0.1:4420, hostnqn:
> > nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
> > [  201.206511] nvme (3741) used greatest stack depth: 17712 bytes left
> > [  201.867978] nvmet: ctrl 1 fatal error occurred!
> > [  201.868127] nvme nvme1: starting error recovery
> > [  201.986566] nvme nvme1: Reconnecting in 1 seconds...
> > [  203.049519] nvmet: Created nvm controller 1 for subsystem
> > blktests-subsystem-1 for NQN
> > nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> > [  203.056878] nvme nvme1: creating 80 I/O queues.
> > [  203.399239] nvme nvme1: mapped 80/0/0 default/read/poll queues.
> > [  203.569429] nvme nvme1: Successfully reconnected (attempt 1/600)
> > [  203.918129] nvmet: ctrl 1 fatal error occurred!
> > [  203.918261] nvme nvme1: starting error recovery
> > [  204.026795] nvme nvme1: Reconnecting in 1 seconds...
> > [  204.423104] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
> > [  205.336507] nvme nvme1: Property Set error: 880, offset 0x14
> > [  205.855722] nvmet: Created nvm controller 1 for subsystem
> > blktests-subsystem-1 for NQN
> > nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> > [  205.861038] nvme nvme1: creating 80 I/O queues.
> > [  205.944537] nvme nvme1: mapped 80/0/0 default/read/poll queues.
> > [  205.965862] nvmet: ctrl 1 fatal error occurred!
> > [  206.053046] nvme nvme1: starting error recovery
> > [  206.147337] nvme nvme1: Failed to configure AEN (cfg 900)
> > [  206.147495] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr
> > 127.0.0.1:4420, hostnqn:
> > nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
> > [  206.176163] nvme nvme1: Reconnecting in 1 seconds...
> > [  207.208170] nvmet: Created nvm controller 1 for subsystem
> > blktests-subsystem-1 for NQN
> > nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> > [  207.211901] nvme nvme1: creating 80 I/O queues.
> > [  207.336799] nvme nvme1: mapped 80/0/0 default/read/poll queues.
> > [  207.349663] nvme nvme1: Successfully reconnected (attempt 1/600)
> > [  208.017068] nvmet: ctrl 1 fatal error occurred!
> > [  208.017201] nvme nvme1: starting error recovery
> > [  208.126889] nvme nvme1: Reconnecting in 1 seconds...
> > [  209.199440] nvmet: Created nvm controller 1 for subsystem
> > blktests-subsystem-1 for NQN
> > nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> > [  209.203685] nvme nvme1: creating 80 I/O queues.
> > [  209.382909] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
> > [  209.503420] nvme nvme1: mapped 80/0/0 default/read/poll queues.
> > [  209.706412] nvme nvme1: Failed reconnect attempt 1/600
> > [  210.617622] nvme nvme1: Property Set error: 880, offset 0x14
> > [  211.158138] nvmet: Created nvm controller 1 for subsystem
> > blktests-subsystem-1 for NQN
> > nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> > [  211.163794] nvme nvme1: creating 80 I/O queues.
> > [  211.245927] nvme nvme1: mapped 80/0/0 default/read/poll queues.
> > [  211.350477] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr
> > 127.0.0.1:4420, hostnqn:
> > nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
> > [  212.097987] nvmet: ctrl 1 fatal error occurred!
> > [  212.098122] nvme nvme1: starting error recovery
> > [  212.217345] nvme nvme1: Reconnecting in 1 seconds...
> > [  213.280774] nvmet: Created nvm controller 1 for subsystem
> > blktests-subsystem-1 for NQN
> > nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> > [  213.284617] nvme nvme1: creating 80 I/O queues.
> > [  213.625320] nvme nvme1: mapped 80/0/0 default/read/poll queues.
> > [  213.787167] nvme nvme1: Successfully reconnected (attempt 1/600)
> > [  214.147984] nvmet: ctrl 1 fatal error occurred!
> > [  214.148113] nvme nvme1: starting error recovery
> > [  214.248455] nvme nvme1: Reconnecting in 1 seconds...
> > [  214.604818] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
> > [  215.431717] nvme nvme1: Property Set error: 880, offset 0x14
> > [  216.014626] nvmet: Created nvm controller 1 for subsystem
> > blktests-subsystem-1 for NQN
> > nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> > [  216.020186] nvme nvme1: creating 80 I/O queues.
> > [  216.101613] nvme nvme1: mapped 80/0/0 default/read/poll queues.
> > [  216.195783] nvmet: ctrl 1 fatal error occurred!
> > [  216.536582] nvme nvme1: failed to send request -32
> > [  216.537991] nvme nvme1: failed nvme_keep_alive_end_io error=4
> > [  280.196270] nvme nvme1: I/O tag 0 (0000) type 4 opcode 0x7f
> > (Fabrics Cmd) QID 80 timeout
> > [  280.198058] nvme nvme1: Connect command failed, error wo/DNR bit: 881
> > [  280.198353] nvme nvme1: failed to connect queue: 80 ret=881
> > [  285.369827] nvmet: Created nvm controller 1 for subsystem
> > blktests-subsystem-1 for NQN
> > nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> > [  285.375603] nvme nvme1: creating 80 I/O queues.
> > [  285.449676] nvme nvme1: mapped 80/0/0 default/read/poll queues.
> > [  285.554119] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr
> > 127.0.0.1:4420, hostnqn:
> > nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
> > [  287.344206] nvmet: ctrl 1 fatal error occurred!
> > [  287.344349] nvme nvme1: starting error recovery
> > [  287.466244] nvme nvme1: Reconnecting in 1 seconds...
> > [  288.479300] nvmet: Created nvm controller 1 for subsystem
> > blktests-subsystem-1 for NQN
> > nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> > [  288.483283] nvme nvme1: creating 80 I/O queues.
> > [  288.785415] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
> > [  288.856766] nvme nvme1: mapped 80/0/0 default/read/poll queues.
> > [  289.066215] nvme nvme1: Failed reconnect attempt 1/600
> > [  290.116265] nvme nvme1: Property Set error: 880, offset 0x14
> > [  320.802045] loop: module loaded
> > [  321.017366] run blktests nvme/060 at 2025-08-06 01:52:51
> > [  321.220243] loop0: detected capacity change from 0 to 2097152
> > [  321.262034] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
> > [  321.322296] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
> > [  321.511316] nvmet: Created nvm controller 1 for subsystem
> > blktests-subsystem-1 for NQN
> > nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> > [  321.524993] nvme nvme1: creating 80 I/O queues.
> > [  321.636336] nvme nvme1: mapped 80/0/0 default/read/poll queues.
> > [  321.754764] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr
> > 127.0.0.1:4420, hostnqn:
> > nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349
> > [  323.431491] nvmet: ctrl 1 fatal error occurred!
> > [  323.434219] nvme nvme1: starting error recovery
> > [  323.566948] nvme nvme1: Reconnecting in 1 seconds...
> > [  324.653738] nvmet: Created nvm controller 1 for subsystem
> > blktests-subsystem-1 for NQN
> > nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349.
> > [  324.662270] nvme nvme1: creating 80 I/O queues.
> > [  324.993712] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
> > [  325.052085] nvme nvme1: mapped 80/0/0 default/read/poll queues.
> > [  325.481710] nvmet: ctrl 1 fatal error occurred!
> > [  327.286111] nvmet: ctrl 1 keep-alive timer (1 seconds) expired!
> > [  390.435380] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > [  390.441500] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-15):
> > P4784/2:b..l P1049/8:b..l
> > [  390.449771] rcu: (detected by 13, t=6503 jiffies, g=8845, q=2243 ncpus=80)
> > [  390.456724] task:kworker/u329:1  state:D stack:19264 pid:1049
> > tgid:1049  ppid:2      task_flags:0x4208060 flags:0x00000010
> > [  390.467850] Workqueue: nvme-wq nvme_tcp_reconnect_ctrl_work [nvme_tcp]
> > [  390.474378] Call trace:
> > [  390.476813]  __switch_to+0x1d8/0x330 (T)
> > [  390.480731]  __schedule+0x860/0x1c30
> > [  390.484297]  schedule_rtlock+0x30/0x70
> > [  390.488037]  rtlock_slowlock_locked+0x464/0xf60
> > [  390.492559]  rt_read_lock+0x2bc/0x3e0
> > [  390.496211]  nvmet_tcp_listen_data_ready+0x3c/0x118 [nvmet_tcp]
> > [  390.502125]  nvmet_tcp_data_ready+0x88/0x198 [nvmet_tcp]
> > [  390.507429]  tcp_data_ready+0xdc/0x3e0
> > [  390.511171]  tcp_data_queue+0xe30/0x1e08
> > [  390.515084]  tcp_rcv_established+0x710/0x2328
> > [  390.519432]  tcp_v4_do_rcv+0x554/0x940
> > [  390.523172]  tcp_v4_rcv+0x1ec4/0x3078
> > [  390.526825]  ip_protocol_deliver_rcu+0xc0/0x3f0
> > [  390.531347]  ip_local_deliver_finish+0x2d4/0x5c0
> > [  390.535954]  ip_local_deliver+0x17c/0x3c0
> > [  390.539953]  ip_rcv_finish+0x148/0x238
> > [  390.543693]  ip_rcv+0xd0/0x2e0
> > [  390.546737]  __netif_receive_skb_one_core+0x100/0x180
> > [  390.551780]  __netif_receive_skb+0x2c/0x160
> > [  390.555953]  process_backlog+0x230/0x6f8
> > [  390.559866]  __napi_poll.constprop.0+0x9c/0x3e8
> > [  390.564386]  net_rx_action+0x808/0xb50
> > [  390.568125]  handle_softirqs.constprop.0+0x23c/0xca0
> > [  390.573082]  __local_bh_enable_ip+0x260/0x4f0
> > [  390.577429]  __dev_queue_xmit+0x6f4/0x1bd8
> > [  390.581515]  neigh_hh_output+0x190/0x2c0
> > [  390.585429]  ip_finish_output2+0x7c8/0x1788
> > [  390.589602]  __ip_finish_output+0x2c4/0x4f8
> > [  390.593776]  ip_finish_output+0x3c/0x2a8
> > [  390.597689]  ip_output+0x154/0x418
> > [  390.601081]  __ip_queue_xmit+0x580/0x1108
> > [  390.605081]  ip_queue_xmit+0x4c/0x78
> > [  390.608647]  __tcp_transmit_skb+0xfac/0x24e8
> > [  390.612908]  tcp_write_xmit+0xbec/0x3078
> > [  390.616821]  __tcp_push_pending_frames+0x90/0x2b8
> > [  390.621515]  tcp_send_fin+0x108/0x9a8
> > [  390.625167]  tcp_shutdown+0xd8/0xf8
> > [  390.628646]  inet_shutdown+0x14c/0x2e8
> > [  390.632385]  kernel_sock_shutdown+0x5c/0x98
> > [  390.636560]  __nvme_tcp_stop_queue+0x44/0x220 [nvme_tcp]
> > [  390.641865]  nvme_tcp_stop_queue_nowait+0x130/0x200 [nvme_tcp]
> > [  390.647691]  nvme_tcp_setup_ctrl+0x3bc/0x728 [nvme_tcp]
> > [  390.652909]  nvme_tcp_reconnect_ctrl_work+0x78/0x290 [nvme_tcp]
> > [  390.658822]  process_one_work+0x80c/0x1a78
> > [  390.662910]  worker_thread+0x6d0/0xaa8
> > [  390.666650]  kthread+0x304/0x3a0
> > [  390.669869]  ret_from_fork+0x10/0x20
> > [  390.673437] task:kworker/u322:77 state:D stack:27184 pid:4784
> > tgid:4784  ppid:2      task_flags:0x4208060 flags:0x00000210
> > [  390.684562] Workqueue: nvmet-wq nvmet_tcp_release_queue_work [nvmet_tcp]
> > [  390.691256] Call trace:
> > [  390.693692]  __switch_to+0x1d8/0x330 (T)
> > [  390.697605]  __schedule+0x860/0x1c30
> > [  390.701171]  schedule_rtlock+0x30/0x70
> > [  390.704911]  rwbase_write_lock.constprop.0.isra.0+0x2fc/0xb30
> > [  390.710646]  rt_write_lock+0x9c/0x138
> > [  390.714299]  nvmet_tcp_release_queue_work+0x168/0xb48 [nvmet_tcp]
> > [  390.720384]  process_one_work+0x80c/0x1a78
> > [  390.724470]  worker_thread+0x6d0/0xaa8
> > [  390.728210]  kthread+0x304/0x3a0
> > [  390.731428]  ret_from_fork+0x10/0x20
> >
> >
> Can you check if this fixes the problem?

The issue was fixed with the patch now, feel free to add:

Tested-by: Yi Zhang <yi.zhang@redhat.com>

>
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index 688033b88d38..23bdce8dcdf0 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -1991,15 +1991,13 @@ static void nvmet_tcp_listen_data_ready(struct
> sock *sk)
>          struct nvmet_tcp_port *port;
>
>          trace_sk_data_ready(sk);
> +       if (sk->sk_state != TCP_LISTEN)
> +               return;
>
>          read_lock_bh(&sk->sk_callback_lock);
>          port = sk->sk_user_data;
> -       if (!port)
> -               goto out;
> -
> -       if (sk->sk_state == TCP_LISTEN)
> +       if (port)
>                  queue_work(nvmet_wq, &port->accept_work);
> -out:
>          read_unlock_bh(&sk->sk_callback_lock);
>   }
>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke                  Kernel Storage Architect
> hare@suse.de                                +49 911 74053 688
> SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
> HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich
>


--
Best Regards,
  Yi Zhang


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [bug report] blktests nvme/tcp nvme/060 hang
  2025-08-06 10:52 ` Hannes Reinecke
  2025-08-07  1:57   ` Yi Zhang
@ 2025-08-11 14:10   ` Maurizio Lombardi
  1 sibling, 0 replies; 9+ messages in thread
From: Maurizio Lombardi @ 2025-08-11 14:10 UTC (permalink / raw)
  To: Hannes Reinecke, Yi Zhang, linux-block,
	open list:NVM EXPRESS DRIVER
  Cc: Shinichiro Kawasaki, Maurizio Lombardi

On Wed Aug 6, 2025 at 12:52 PM CEST, Hannes Reinecke wrote:
> On 8/6/25 03:57, Yi Zhang wrote:
>> tgid:1049  ppid:2      task_flags:0x4208060 flags:0x00000010
>> [  390.467850] Workqueue: nvme-wq nvme_tcp_reconnect_ctrl_work [nvme_tcp]
>> [  390.474378] Call trace:
>> [  390.476813]  __switch_to+0x1d8/0x330 (T)
>> [  390.480731]  __schedule+0x860/0x1c30
>> [  390.484297]  schedule_rtlock+0x30/0x70
>> [  390.488037]  rtlock_slowlock_locked+0x464/0xf60
>> [  390.492559]  rt_read_lock+0x2bc/0x3e0
>> [  390.496211]  nvmet_tcp_listen_data_ready+0x3c/0x118 [nvmet_tcp]
>> [  390.502125]  nvmet_tcp_data_ready+0x88/0x198 [nvmet_tcp]
>> [  390.507429]  tcp_data_ready+0xdc/0x3e0
>> [  390.511171]  tcp_data_queue+0xe30/0x1e08
>> [  390.515084]  tcp_rcv_established+0x710/0x2328
>> [  390.519432]  tcp_v4_do_rcv+0x554/0x940
>> [  390.523172]  tcp_v4_rcv+0x1ec4/0x3078
>> [  390.526825]  ip_protocol_deliver_rcu+0xc0/0x3f0
>> [  390.531347]  ip_local_deliver_finish+0x2d4/0x5c0
>> [  390.535954]  ip_local_deliver+0x17c/0x3c0
>> [  390.539953]  ip_rcv_finish+0x148/0x238
>> [  390.543693]  ip_rcv+0xd0/0x2e0
>> [  390.546737]  __netif_receive_skb_one_core+0x100/0x180
>> [  390.551780]  __netif_receive_skb+0x2c/0x160
>> [  390.555953]  process_backlog+0x230/0x6f8
>> [  390.559866]  __napi_poll.constprop.0+0x9c/0x3e8
>> [  390.564386]  net_rx_action+0x808/0xb50
>> [  390.568125]  handle_softirqs.constprop.0+0x23c/0xca0
>> [  390.573082]  __local_bh_enable_ip+0x260/0x4f0
>> [  390.577429]  __dev_queue_xmit+0x6f4/0x1bd8
>> [  390.581515]  neigh_hh_output+0x190/0x2c0
>> [  390.585429]  ip_finish_output2+0x7c8/0x1788
>> [  390.589602]  __ip_finish_output+0x2c4/0x4f8
>> [  390.593776]  ip_finish_output+0x3c/0x2a8
>> [  390.597689]  ip_output+0x154/0x418
>> [  390.601081]  __ip_queue_xmit+0x580/0x1108
>> [  390.605081]  ip_queue_xmit+0x4c/0x78
>> [  390.608647]  __tcp_transmit_skb+0xfac/0x24e8
>> [  390.612908]  tcp_write_xmit+0xbec/0x3078
>> [  390.616821]  __tcp_push_pending_frames+0x90/0x2b8
>> [  390.621515]  tcp_send_fin+0x108/0x9a8
>> [  390.625167]  tcp_shutdown+0xd8/0xf8
>> [  390.628646]  inet_shutdown+0x14c/0x2e8
>> [  390.632385]  kernel_sock_shutdown+0x5c/0x98
>> [  390.636560]  __nvme_tcp_stop_queue+0x44/0x220 [nvme_tcp]
>> [  390.641865]  nvme_tcp_stop_queue_nowait+0x130/0x200 [nvme_tcp]
>> [  390.647691]  nvme_tcp_setup_ctrl+0x3bc/0x728 [nvme_tcp]
>> [  390.652909]  nvme_tcp_reconnect_ctrl_work+0x78/0x290 [nvme_tcp]
>> [  390.658822]  process_one_work+0x80c/0x1a78
>> [  390.662910]  worker_thread+0x6d0/0xaa8
>> [  390.666650]  kthread+0x304/0x3a0
>> [  390.669869]  ret_from_fork+0x10/0x20
>> [  390.673437] task:kworker/u322:77 state:D stack:27184 pid:4784
>> tgid:4784  ppid:2      task_flags:0x4208060 flags:0x00000210
>> [  390.684562] Workqueue: nvmet-wq nvmet_tcp_release_queue_work [nvmet_tcp]
>> [  390.691256] Call trace:
>> [  390.693692]  __switch_to+0x1d8/0x330 (T)
>> [  390.697605]  __schedule+0x860/0x1c30
>> [  390.701171]  schedule_rtlock+0x30/0x70
>> [  390.704911]  rwbase_write_lock.constprop.0.isra.0+0x2fc/0xb30
>> [  390.710646]  rt_write_lock+0x9c/0x138
>> [  390.714299]  nvmet_tcp_release_queue_work+0x168/0xb48 [nvmet_tcp]
>> [  390.720384]  process_one_work+0x80c/0x1a78
>> [  390.724470]  worker_thread+0x6d0/0xaa8
>> [  390.728210]  kthread+0x304/0x3a0
>> [  390.731428]  ret_from_fork+0x10/0x20
>> 
>> 
> Can you check if this fixes the problem?
>
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index 688033b88d38..23bdce8dcdf0 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -1991,15 +1991,13 @@ static void nvmet_tcp_listen_data_ready(struct 
> sock *sk)
>          struct nvmet_tcp_port *port;
>
>          trace_sk_data_ready(sk);
> +       if (sk->sk_state != TCP_LISTEN)
> +               return;
>
>          read_lock_bh(&sk->sk_callback_lock);
>          port = sk->sk_user_data;
> -       if (!port)
> -               goto out;
> -
> -       if (sk->sk_state == TCP_LISTEN)
> +       if (port)
>                  queue_work(nvmet_wq, &port->accept_work);
> -out:
>          read_unlock_bh(&sk->sk_callback_lock);
>   }


Hannes,

are you going to send a formal patch?
In case you missed it, the patch is confirmed to work.

Thanks,
Maurizio


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-08-11 14:10 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-06  1:57 [bug report] blktests nvme/tcp nvme/060 hang Yi Zhang
2025-08-06  6:16 ` Maurizio Lombardi
2025-08-06  6:22   ` Maurizio Lombardi
2025-08-06  6:44     ` Maurizio Lombardi
2025-08-06  9:30       ` Maurizio Lombardi
2025-08-06 10:54       ` Hannes Reinecke
2025-08-06 10:52 ` Hannes Reinecke
2025-08-07  1:57   ` Yi Zhang
2025-08-11 14:10   ` Maurizio Lombardi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).