* Resets during user commands leads to hung task and controller stuck in connecting
@ 2022-11-11 21:50 Jonathan Derrick
2022-11-13 11:03 ` Sagi Grimberg
0 siblings, 1 reply; 5+ messages in thread
From: Jonathan Derrick @ 2022-11-11 21:50 UTC (permalink / raw)
To: Keith Busch, hch, Sagi Grimberg, linux-nvme
Hi,
I'm (again) seeing a hung task when doing resets and formats simultaneously.
Controller state is left in 'connecting'
Using nvme.git/nvme-6.2 as of 'nvme: implement the DEAC bit for the Write Zeroes command',
but I have also repro'd with Christoph's latest reset/probe-split set
ctrl="nvme0"
nsid=1
pci="/sys/block/${ctrl}n${nsid}/device/"
echo 30 > /proc/sys/kernel/hung_task_timeout_secs
while true; do
nvme format -f /dev/${ctrl}n${nsid} &
echo 1 > $pci/reset_controller &
done
[ 79.195862] nvme nvme0: Ignoring bogus Namespace Identifiers
[ 122.378580] INFO: task sh:7737 blocked for more than 30 seconds.
[ 122.380329] Not tainted 6.1.0-rc2+ #87
[ 122.381594] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 122.383782] task:sh state:D stack:0 pid:7737 ppid:1 flags:0x00000004
[ 122.386078] Call Trace:
[ 122.386909] <TASK>
[ 122.387659] __schedule+0x320/0xb10
[ 122.388772] ? lock_release+0x22b/0x450
[ 122.389920] ? lock_acquired+0x1a2/0x400
[ 122.391094] ? wait_for_completion+0x83/0x160
[ 122.392358] schedule+0x53/0xd0
[ 122.393337] schedule_timeout+0x310/0x3b0
[ 122.394517] ? rcu_read_lock_held_common+0xe/0x50
[ 122.395847] ? rcu_read_lock_sched_held+0x23/0x80
[ 122.397189] ? lock_release+0x22b/0x450
[ 122.398336] ? lock_acquired+0x1a2/0x400
[ 122.399484] ? wait_for_completion+0x83/0x160
[ 122.400763] wait_for_completion+0xb5/0x160
[ 122.401968] __flush_work+0x293/0x4a0
[ 122.403068] ? flush_workqueue_prep_pwqs+0x120/0x120
[ 122.404463] ? rcu_read_lock_sched_held+0x23/0x80
[ 122.405791] ? trace_hardirqs_on+0x2b/0xd0
[ 122.406977] nvme_reset_ctrl_sync+0x2a/0x40 [nvme_core]
[ 122.408473] nvme_sysfs_reset+0x12/0x30 [nvme_core]
[ 122.409856] kernfs_fop_write_iter+0x142/0x1e0
[ 122.411137] vfs_write+0x357/0x4f0
[ 122.412161] ksys_write+0x5f/0xe0
[ 122.413172] do_syscall_64+0x3a/0x90
[ 122.414238] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 122.415646] RIP: 0033:0x7f7980ced648
[ 122.416710] RSP: 002b:00007ffc01ee6778 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 122.418794] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f7980ced648
[ 122.420673] RDX: 0000000000000002 RSI: 0000563b40d93ca0 RDI: 0000000000000001
[ 122.422550] RBP: 0000563b40d93ca0 R08: 000000000000000a R09: 00007f7980d3cda0
[ 122.424385] R10: 000000000000000a R11: 0000000000000246 R12: 00007f7980fc06e0
[ 122.426232] R13: 0000000000000002 R14: 00007f7980fbb880 R15: 0000000000000002
[ 122.428067] </TASK>
[ 122.428821] INFO: lockdep is turned off.
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Resets during user commands leads to hung task and controller stuck in connecting
2022-11-11 21:50 Resets during user commands leads to hung task and controller stuck in connecting Jonathan Derrick
@ 2022-11-13 11:03 ` Sagi Grimberg
2022-11-14 23:09 ` Jonathan Derrick
0 siblings, 1 reply; 5+ messages in thread
From: Sagi Grimberg @ 2022-11-13 11:03 UTC (permalink / raw)
To: Jonathan Derrick, Keith Busch, hch, linux-nvme
On 11/11/22 23:50, Jonathan Derrick wrote:
> Hi,
>
> I'm (again) seeing a hung task when doing resets and formats simultaneously.
> Controller state is left in 'connecting'
>
> Using nvme.git/nvme-6.2 as of 'nvme: implement the DEAC bit for the Write Zeroes command',
> but I have also repro'd with Christoph's latest reset/probe-split set
>
>
> ctrl="nvme0"
> nsid=1
> pci="/sys/block/${ctrl}n${nsid}/device/"
> echo 30 > /proc/sys/kernel/hung_task_timeout_secs
> while true; do
> nvme format -f /dev/${ctrl}n${nsid} &
How long to it take the format to complete?
> echo 1 > $pci/reset_controller &
> done
What happens if you set io_timeout to 20 instead of 30? (given
that you bound hung tasks at 30 seconds...
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Resets during user commands leads to hung task and controller stuck in connecting
2022-11-13 11:03 ` Sagi Grimberg
@ 2022-11-14 23:09 ` Jonathan Derrick
2022-11-15 7:46 ` Sagi Grimberg
0 siblings, 1 reply; 5+ messages in thread
From: Jonathan Derrick @ 2022-11-14 23:09 UTC (permalink / raw)
To: Sagi Grimberg, Keith Busch, hch, linux-nvme
On 11/13/2022 4:03 AM, Sagi Grimberg wrote:
>
>
> On 11/11/22 23:50, Jonathan Derrick wrote:
>> Hi,
>>
>> I'm (again) seeing a hung task when doing resets and formats simultaneously.
>> Controller state is left in 'connecting'
>>
>> Using nvme.git/nvme-6.2 as of 'nvme: implement the DEAC bit for the Write Zeroes command',
>> but I have also repro'd with Christoph's latest reset/probe-split set
>>
>>
>> ctrl="nvme0"
>> nsid=1
>> pci="/sys/block/${ctrl}n${nsid}/device/"
>> echo 30 > /proc/sys/kernel/hung_task_timeout_secs
>> while true; do
>> nvme format -f /dev/${ctrl}n${nsid} &
>
> How long to it take the format to complete?
Well it's pretty immediate but I'm under the impression that the
nvme_dev_disable path leads to CC_EN disabling, interrupting any formats
>
>> echo 1 > $pci/reset_controller &
>> done
>
> What happens if you set io_timeout to 20 instead of 30? (given
> that you bound hung tasks at 30 seconds...
It occurs with the standard 120s task timeout too
Also there's no I/O occurring at the moment; just admin work
I added a blktests for this:
http://lists.infradead.org/pipermail/linux-nvme/2022-November/036475.html
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Resets during user commands leads to hung task and controller stuck in connecting
2022-11-14 23:09 ` Jonathan Derrick
@ 2022-11-15 7:46 ` Sagi Grimberg
2022-11-15 16:34 ` Keith Busch
0 siblings, 1 reply; 5+ messages in thread
From: Sagi Grimberg @ 2022-11-15 7:46 UTC (permalink / raw)
To: Jonathan Derrick, Keith Busch, hch, linux-nvme
>>> I'm (again) seeing a hung task when doing resets and formats simultaneously.
>>> Controller state is left in 'connecting'
>>>
>>> Using nvme.git/nvme-6.2 as of 'nvme: implement the DEAC bit for the Write Zeroes command',
>>> but I have also repro'd with Christoph's latest reset/probe-split set
>>>
>>>
>>> ctrl="nvme0"
>>> nsid=1
>>> pci="/sys/block/${ctrl}n${nsid}/device/"
>>> echo 30 > /proc/sys/kernel/hung_task_timeout_secs
>>> while true; do
>>> nvme format -f /dev/${ctrl}n${nsid} &
>>
>> How long to it take the format to complete?
> Well it's pretty immediate but I'm under the impression that the
> nvme_dev_disable path leads to CC_EN disabling, interrupting any formats
>
>>
>>> echo 1 > $pci/reset_controller &
>>> done
>>
>> What happens if you set io_timeout to 20 instead of 30? (given
>> that you bound hung tasks at 30 seconds...
> It occurs with the standard 120s task timeout too
> Also there's no I/O occurring at the moment; just admin work
>
> I added a blktests for this:
> http://lists.infradead.org/pipermail/linux-nvme/2022-November/036475.html
Keith?
Is this related to bc8fb906b0ff ("nvme: handle effects after freeing the
request") ?
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Resets during user commands leads to hung task and controller stuck in connecting
2022-11-15 7:46 ` Sagi Grimberg
@ 2022-11-15 16:34 ` Keith Busch
0 siblings, 0 replies; 5+ messages in thread
From: Keith Busch @ 2022-11-15 16:34 UTC (permalink / raw)
To: Sagi Grimberg; +Cc: Jonathan Derrick, Keith Busch, hch, linux-nvme
On Tue, Nov 15, 2022 at 09:46:45AM +0200, Sagi Grimberg wrote:
>
> > > > I'm (again) seeing a hung task when doing resets and formats simultaneously.
> > > > Controller state is left in 'connecting'
> > > >
> > > > Using nvme.git/nvme-6.2 as of 'nvme: implement the DEAC bit for the Write Zeroes command',
> > > > but I have also repro'd with Christoph's latest reset/probe-split set
> > > >
> > > >
> > > > ctrl="nvme0"
> > > > nsid=1
> > > > pci="/sys/block/${ctrl}n${nsid}/device/"
> > > > echo 30 > /proc/sys/kernel/hung_task_timeout_secs
> > > > while true; do
> > > > nvme format -f /dev/${ctrl}n${nsid} &
> > >
> > > How long to it take the format to complete?
> > Well it's pretty immediate but I'm under the impression that the
> > nvme_dev_disable path leads to CC_EN disabling, interrupting any formats
> >
> > >
> > > > echo 1 > $pci/reset_controller &
> > > > done
> > >
> > > What happens if you set io_timeout to 20 instead of 30? (given
> > > that you bound hung tasks at 30 seconds...
> > It occurs with the standard 120s task timeout too
> > Also there's no I/O occurring at the moment; just admin work
> >
> > I added a blktests for this:
> > http://lists.infradead.org/pipermail/linux-nvme/2022-November/036475.html
>
> Keith?
>
> Is this related to bc8fb906b0ff ("nvme: handle effects after freeing the
> request") ?
Kind of. Jonathan previously reported an error with the same test, and
reported that the mentioned commit fixed it. This is yet another error
with the same test, but that commit doesn't appear to have been a factor
in this new observation.
This test could theoretically consume all admin tags with format
commands, and if the controller breaks on the format and stops
responding, then we don't have a tag available to tear down IO queues.
I'm not sure that's actually happening here, though. Is the sysfs reset
really the only stuck task reported in the kernel messages?
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-11-15 16:34 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-11 21:50 Resets during user commands leads to hung task and controller stuck in connecting Jonathan Derrick
2022-11-13 11:03 ` Sagi Grimberg
2022-11-14 23:09 ` Jonathan Derrick
2022-11-15 7:46 ` Sagi Grimberg
2022-11-15 16:34 ` Keith Busch
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox