Bug Report: can't unload nvme module in case of disabled device

Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Bug Report: can't unload nvme module in case of disabled device
@ 2017-08-01 12:58 Max Gurtovoy
  2017-08-10  8:59 ` Christoph Hellwig
  2017-08-10 16:45 ` Keith Busch
  0 siblings, 2 replies; 9+ messages in thread
From: Max Gurtovoy @ 2017-08-01 12:58 UTC (permalink / raw)


Hi all,

I would like to report a bug that reproduced by the following steps (I'm 
using 4.13.0-rc3+):

1. modprobe nvme
2. echo 0 >  /sys/block/nvme0n1/device/device/enable
3. nvme list (stuck for more than 1-2 mins)
4. modprobe -r nvme (stuck forever)

log:

[ 1342.388888] nvme nvme0: controller is down; will reset: CSTS=0x3, 
PCI_STATUS=0x10
[ 1476.021392] INFO: task kworker/u98:1:436 blocked for more than 120 
seconds.
[ 1476.029072]       Not tainted 4.13.0-rc3+ #19
[ 1476.033878] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 1476.042505] kworker/u98:1   D    0   436      2 0x00000000
[ 1476.048569] Workqueue: nvme-wq nvme_reset_work [nvme]
[ 1476.054133] Call Trace:
[ 1476.056862]  __schedule+0x1dc/0x780
[ 1476.060706]  schedule+0x36/0x80
[ 1476.064180]  blk_mq_freeze_queue_wait+0x4b/0xb0
[ 1476.069175]  ? remove_wait_queue+0x60/0x60
[ 1476.073693]  nvme_wait_freeze+0x33/0x50 [nvme_core]
[ 1476.079068]  nvme_reset_work+0x6b9/0xc40 [nvme]
[ 1476.084075]  ? __switch_to+0x23e/0x4a0
[ 1476.088209]  process_one_work+0x149/0x360
[ 1476.092625]  worker_thread+0x4d/0x3c0
[ 1476.096692]  kthread+0x109/0x140
[ 1476.100247]  ? rescuer_thread+0x380/0x380
[ 1476.104664]  ? kthread_park+0x60/0x60
[ 1476.108698]  ret_from_fork+0x25/0x30



[ 1598.901883] INFO: task kworker/u98:1:436 blocked for more than 120 
seconds.
[ 1598.909557]       Not tainted 4.13.0-rc3+ #19
[ 1598.914362] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 1598.923004] kworker/u98:1   D    0   436      2 0x00000000
[ 1598.929063] Workqueue: nvme-wq nvme_reset_work [nvme]
[ 1598.934637] Call Trace:
[ 1598.937348]  __schedule+0x1dc/0x780
[ 1598.941208]  schedule+0x36/0x80
[ 1598.944682]  blk_mq_freeze_queue_wait+0x4b/0xb0
[ 1598.949675]  ? remove_wait_queue+0x60/0x60
[ 1598.954189]  nvme_wait_freeze+0x33/0x50 [nvme_core]
[ 1598.959574]  nvme_reset_work+0x6b9/0xc40 [nvme]
[ 1598.964580]  ? __switch_to+0x23e/0x4a0
[ 1598.968723]  process_one_work+0x149/0x360
[ 1598.973192]  worker_thread+0x4d/0x3c0
[ 1598.977240]  kthread+0x109/0x140
[ 1598.980797]  ? rescuer_thread+0x380/0x380
[ 1598.985226]  ? kthread_park+0x60/0x60
[ 1598.989262]  ret_from_fork+0x25/0x30
[ 1721.782347] INFO: task kworker/u98:1:436 blocked for more than 120 
seconds.
[ 1721.790026]       Not tainted 4.13.0-rc3+ #19
[ 1721.795326] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 1721.804425] kworker/u98:1   D    0   436      2 0x00000000
[ 1721.810958] Workqueue: nvme-wq nvme_reset_work [nvme]
[ 1721.816999] Call Trace:
[ 1721.820161]  __schedule+0x1dc/0x780
[ 1721.824470]  schedule+0x36/0x80
[ 1721.828389]  blk_mq_freeze_queue_wait+0x4b/0xb0
[ 1721.833835]  ? remove_wait_queue+0x60/0x60
[ 1721.838781]  nvme_wait_freeze+0x33/0x50 [nvme_core]
[ 1721.844596]  nvme_reset_work+0x6b9/0xc40 [nvme]
[ 1721.850208]  ? __switch_to+0x23e/0x4a0
[ 1721.854756]  process_one_work+0x149/0x360
[ 1721.859606]  worker_thread+0x4d/0x3c0
[ 1721.864035]  kthread+0x109/0x140
[ 1721.867985]  ? rescuer_thread+0x380/0x380
[ 1721.872805]  ? kthread_park+0x60/0x60
[ 1721.877222]  ret_from_fork+0x25/0x30
[ 1721.881589] INFO: task modprobe:12986 blocked for more than 120 seconds.

Any thoughts ?

-Max.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Bug Report: can't unload nvme module in case of disabled device
  2017-08-01 12:58 Bug Report: can't unload nvme module in case of disabled device Max Gurtovoy
@ 2017-08-10  8:59 ` Christoph Hellwig
  2017-08-10 17:04   ` Max Gurtovoy
  2017-08-10 16:45 ` Keith Busch
  1 sibling, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2017-08-10  8:59 UTC (permalink / raw)


Is this a PCIe or fabrics controller?  Did you get a chance to bisect
where this behavior appeared?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Bug Report: can't unload nvme module in case of disabled device
  2017-08-10  8:59 ` Christoph Hellwig
@ 2017-08-10 17:04   ` Max Gurtovoy
  2017-08-10 19:36     ` Keith Busch
  0 siblings, 1 reply; 9+ messages in thread
From: Max Gurtovoy @ 2017-08-10 17:04 UTC (permalink / raw)




On 8/10/2017 11:59 AM, Christoph Hellwig wrote:
> Is this a PCIe or fabrics controller?  Did you get a chance to bisect
> where this behavior appeared?
>

I'm using PCIe ctrl.
Using 4.13-rc4+ I couldn't even run easier scenario of only unloading 
the nvme module (with SAMSUNG MZPLL1T6HEHP-00003 and Intel P3500/3700 
devices):

[  369.997917] INFO: task modprobe:3709 blocked for more than 120 seconds.
[  370.005215]       Not tainted 4.13.0-rc4+ #21
[  370.010017] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  370.018647] modprobe        D    0  3709   3654 0x00000000
[  370.024695] Call Trace:
[  370.027400]  __schedule+0x1dc/0x780
[  370.031261]  schedule+0x36/0x80
[  370.034756]  blk_mq_freeze_queue_wait+0x4b/0xb0
[  370.039750]  ? remove_wait_queue+0x60/0x60
[  370.044263]  blk_freeze_queue+0x1a/0x20
[  370.048489]  blk_cleanup_queue+0x7f/0x150
[  370.052927]  nvme_dev_remove_admin+0x36/0x50 [nvme]
[  370.058303]  nvme_remove+0xa2/0x130 [nvme]
[  370.062820]  pci_device_remove+0x39/0xc0
[  370.067142]  device_release_driver_internal+0x141/0x200
[  370.072898]  driver_detach+0x3f/0x80
[  370.076852]  bus_remove_driver+0x55/0xd0
[  370.081186]  driver_unregister+0x2c/0x50
[  370.085521]  pci_unregister_driver+0x2a/0xa0
[  370.090227]  nvme_exit+0x10/0xb84 [nvme]
[  370.094562]  SyS_delete_module+0x171/0x250
[  370.099101]  ? exit_to_usermode_loop+0x5e/0x88
[  370.103996]  entry_SYSCALL_64_fastpath+0x1a/0xa5
[  370.109096] RIP: 0033:0x7f146b5106b7
[  370.113037] RSP: 002b:00007ffd2cae12e8 EFLAGS: 00000206 ORIG_RAX: 
00000000000000b0
[  370.121431] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 
00007f146b5106b7
[  370.129295] RDX: 0000000000000000 RSI: 0000000000000800 RDI: 
000000000223f5e8
[  370.137167] RBP: 000000000223f580 R08: 00007f146b7d5060 R09: 
00007f146b580a40
[  370.145029] R10: 00007ffd2cae1070 R11: 0000000000000206 R12: 
00007ffd2cae0310
[  370.152890] R13: 0000000000000000 R14: 000000000223f5e8 R15: 
0000000000000000

the new scenario:
1. modprobe nvme
2. sleep 10
3. modprobe -r nvme

works on 4.11.0/4.12.0 but not on 4.13.0-rc4+.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Bug Report: can't unload nvme module in case of disabled device
  2017-08-10 17:04   ` Max Gurtovoy
@ 2017-08-10 19:36     ` Keith Busch
  2017-08-13  8:29       ` Max Gurtovoy
  0 siblings, 1 reply; 9+ messages in thread
From: Keith Busch @ 2017-08-10 19:36 UTC (permalink / raw)


On Thu, Aug 10, 2017@08:04:13PM +0300, Max Gurtovoy wrote:
> 
> I'm using PCIe ctrl.
> Using 4.13-rc4+ I couldn't even run easier scenario of only unloading the
> nvme module (with SAMSUNG MZPLL1T6HEHP-00003 and Intel P3500/3700 devices):
> 
> [  369.997917] INFO: task modprobe:3709 blocked for more than 120 seconds.
> [  370.005215]       Not tainted 4.13.0-rc4+ #21
> [  370.010017] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [  370.018647] modprobe        D    0  3709   3654 0x00000000
> [  370.024695] Call Trace:
> [  370.027400]  __schedule+0x1dc/0x780
> [  370.031261]  schedule+0x36/0x80
> [  370.034756]  blk_mq_freeze_queue_wait+0x4b/0xb0
> [  370.039750]  ? remove_wait_queue+0x60/0x60
> [  370.044263]  blk_freeze_queue+0x1a/0x20
> [  370.048489]  blk_cleanup_queue+0x7f/0x150
> [  370.052927]  nvme_dev_remove_admin+0x36/0x50 [nvme]
> [  370.058303]  nvme_remove+0xa2/0x130 [nvme]
> [  370.062820]  pci_device_remove+0x39/0xc0
> [  370.067142]  device_release_driver_internal+0x141/0x200
> [  370.072898]  driver_detach+0x3f/0x80
> [  370.076852]  bus_remove_driver+0x55/0xd0
> [  370.081186]  driver_unregister+0x2c/0x50
> [  370.085521]  pci_unregister_driver+0x2a/0xa0
> [  370.090227]  nvme_exit+0x10/0xb84 [nvme]
> [  370.094562]  SyS_delete_module+0x171/0x250
> [  370.099101]  ? exit_to_usermode_loop+0x5e/0x88
> [  370.103996]  entry_SYSCALL_64_fastpath+0x1a/0xa5
> [  370.109096] RIP: 0033:0x7f146b5106b7
> [  370.113037] RSP: 002b:00007ffd2cae12e8 EFLAGS: 00000206 ORIG_RAX:
> 00000000000000b0
> [  370.121431] RAX: ffffffffffffffda RBX: 0000000000000003 RCX:
> 00007f146b5106b7
> [  370.129295] RDX: 0000000000000000 RSI: 0000000000000800 RDI:
> 000000000223f5e8
> [  370.137167] RBP: 000000000223f580 R08: 00007f146b7d5060 R09:
> 00007f146b580a40
> [  370.145029] R10: 00007ffd2cae1070 R11: 0000000000000206 R12:
> 00007ffd2cae0310
> [  370.152890] R13: 0000000000000000 R14: 000000000223f5e8 R15:
> 0000000000000000
> 
> the new scenario:
> 1. modprobe nvme
> 2. sleep 10
> 3. modprobe -r nvme
> 
> works on 4.11.0/4.12.0 but not on 4.13.0-rc4+.

This I'm not able to reproduce. The stack trace is saying there are
entered requests on the admin queue, but that shouldn't be possible at
this point in nvme_remove. I'll keep looking.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Bug Report: can't unload nvme module in case of disabled device
  2017-08-10 19:36     ` Keith Busch
@ 2017-08-13  8:29       ` Max Gurtovoy
  2017-08-14 20:24         ` Keith Busch
  0 siblings, 1 reply; 9+ messages in thread
From: Max Gurtovoy @ 2017-08-13  8:29 UTC (permalink / raw)




On 8/10/2017 10:36 PM, Keith Busch wrote:
> On Thu, Aug 10, 2017@08:04:13PM +0300, Max Gurtovoy wrote:
>>
>> I'm using PCIe ctrl.
>> Using 4.13-rc4+ I couldn't even run easier scenario of only unloading the
>> nvme module (with SAMSUNG MZPLL1T6HEHP-00003 and Intel P3500/3700 devices):
>>
>> [  369.997917] INFO: task modprobe:3709 blocked for more than 120 seconds.
>> [  370.005215]       Not tainted 4.13.0-rc4+ #21
>> [  370.010017] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
>> this message.
>> [  370.018647] modprobe        D    0  3709   3654 0x00000000
>> [  370.024695] Call Trace:
>> [  370.027400]  __schedule+0x1dc/0x780
>> [  370.031261]  schedule+0x36/0x80
>> [  370.034756]  blk_mq_freeze_queue_wait+0x4b/0xb0
>> [  370.039750]  ? remove_wait_queue+0x60/0x60
>> [  370.044263]  blk_freeze_queue+0x1a/0x20
>> [  370.048489]  blk_cleanup_queue+0x7f/0x150
>> [  370.052927]  nvme_dev_remove_admin+0x36/0x50 [nvme]
>> [  370.058303]  nvme_remove+0xa2/0x130 [nvme]
>> [  370.062820]  pci_device_remove+0x39/0xc0
>> [  370.067142]  device_release_driver_internal+0x141/0x200
>> [  370.072898]  driver_detach+0x3f/0x80
>> [  370.076852]  bus_remove_driver+0x55/0xd0
>> [  370.081186]  driver_unregister+0x2c/0x50
>> [  370.085521]  pci_unregister_driver+0x2a/0xa0
>> [  370.090227]  nvme_exit+0x10/0xb84 [nvme]
>> [  370.094562]  SyS_delete_module+0x171/0x250
>> [  370.099101]  ? exit_to_usermode_loop+0x5e/0x88
>> [  370.103996]  entry_SYSCALL_64_fastpath+0x1a/0xa5
>> [  370.109096] RIP: 0033:0x7f146b5106b7
>> [  370.113037] RSP: 002b:00007ffd2cae12e8 EFLAGS: 00000206 ORIG_RAX:
>> 00000000000000b0
>> [  370.121431] RAX: ffffffffffffffda RBX: 0000000000000003 RCX:
>> 00007f146b5106b7
>> [  370.129295] RDX: 0000000000000000 RSI: 0000000000000800 RDI:
>> 000000000223f5e8
>> [  370.137167] RBP: 000000000223f580 R08: 00007f146b7d5060 R09:
>> 00007f146b580a40
>> [  370.145029] R10: 00007ffd2cae1070 R11: 0000000000000206 R12:
>> 00007ffd2cae0310
>> [  370.152890] R13: 0000000000000000 R14: 000000000223f5e8 R15:
>> 0000000000000000
>>
>> the new scenario:
>> 1. modprobe nvme
>> 2. sleep 10
>> 3. modprobe -r nvme
>>
>> works on 4.11.0/4.12.0 but not on 4.13.0-rc4+.
>
> This I'm not able to reproduce. The stack trace is saying there are
> entered requests on the admin queue, but that shouldn't be possible at
> this point in nvme_remove. I'll keep looking.
>

After bisecting I found that the following commit caused the simple 
load/unload nvme driver failure:

commit 1ad43c0078b79a76accd0fe64062e47b3430dc6b
Author: Ming Lei <minlei at redhat.com>
Date:   Wed Aug 2 08:01:45 2017 +0800

     blk-mq: don't leak preempt counter/q_usage_counter when allocating 
rq failed

Adding Ming to this thread.

I'm continuing with the debug of the new scenario (load nvme && sleep 10 
&& unload nvme).

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Bug Report: can't unload nvme module in case of disabled device
  2017-08-13  8:29       ` Max Gurtovoy
@ 2017-08-14 20:24         ` Keith Busch
  0 siblings, 0 replies; 9+ messages in thread
From: Keith Busch @ 2017-08-14 20:24 UTC (permalink / raw)


On Sun, Aug 13, 2017@11:29:59AM +0300, Max Gurtovoy wrote:
> 
> After bisecting I found that the following commit caused the simple
> load/unload nvme driver failure:
> 
> commit 1ad43c0078b79a76accd0fe64062e47b3430dc6b
> Author: Ming Lei <minlei at redhat.com>
> Date:   Wed Aug 2 08:01:45 2017 +0800
> 
>     blk-mq: don't leak preempt counter/q_usage_counter when allocating rq
> failed
> 
> Adding Ming to this thread.
> 
> I'm continuing with the debug of the new scenario (load nvme && sleep 10 &&
> unload nvme).
 
I'm reviewing that commit, and it looks wrong to me. It is only pairing
the blk_queue_exit if request allocation was successful. That will get
the q_usage_counter off when request allocation fails, making a queue
freeze impossible. I'll send a patch.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Bug Report: can't unload nvme module in case of disabled device
  2017-08-01 12:58 Bug Report: can't unload nvme module in case of disabled device Max Gurtovoy
  2017-08-10  8:59 ` Christoph Hellwig
@ 2017-08-10 16:45 ` Keith Busch
  2017-08-10 19:17   ` Keith Busch
  1 sibling, 1 reply; 9+ messages in thread
From: Keith Busch @ 2017-08-10 16:45 UTC (permalink / raw)


On Tue, Aug 01, 2017@03:58:10PM +0300, Max Gurtovoy wrote:
> Hi all,
> 
> I would like to report a bug that reproduced by the following steps (I'm
> using 4.13.0-rc3+):
> 
> 1. modprobe nvme
> 2. echo 0 >  /sys/block/nvme0n1/device/device/enable
> 3. nvme list (stuck for more than 1-2 mins)
> 4. modprobe -r nvme (stuck forever)
> 
> log:
> 
> [ 1342.388888] nvme nvme0: controller is down; will reset: CSTS=0x3,
> PCI_STATUS=0x10
> [ 1476.021392] INFO: task kworker/u98:1:436 blocked for more than 120
> seconds.
> [ 1476.029072]       Not tainted 4.13.0-rc3+ #19
> [ 1476.033878] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 1476.042505] kworker/u98:1   D    0   436      2 0x00000000
> [ 1476.048569] Workqueue: nvme-wq nvme_reset_work [nvme]
> [ 1476.054133] Call Trace:
> [ 1476.056862]  __schedule+0x1dc/0x780
> [ 1476.060706]  schedule+0x36/0x80
> [ 1476.064180]  blk_mq_freeze_queue_wait+0x4b/0xb0
> [ 1476.069175]  ? remove_wait_queue+0x60/0x60
> [ 1476.073693]  nvme_wait_freeze+0x33/0x50 [nvme_core]
> [ 1476.079068]  nvme_reset_work+0x6b9/0xc40 [nvme]
> [ 1476.084075]  ? __switch_to+0x23e/0x4a0
> [ 1476.088209]  process_one_work+0x149/0x360
> [ 1476.092625]  worker_thread+0x4d/0x3c0
> [ 1476.096692]  kthread+0x109/0x140
> [ 1476.100247]  ? rescuer_thread+0x380/0x380
> [ 1476.104664]  ? kthread_park+0x60/0x60
> [ 1476.108698]  ret_from_fork+0x25/0x30

This looks like a path does not pair the freeze start with the reset's
freeze wait. I'll have to see what the pci 'enable' sysfs entry does.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Bug Report: can't unload nvme module in case of disabled device
  2017-08-10 16:45 ` Keith Busch
@ 2017-08-10 19:17   ` Keith Busch
  2017-08-10 19:34     ` Keith Busch
  0 siblings, 1 reply; 9+ messages in thread
From: Keith Busch @ 2017-08-10 19:17 UTC (permalink / raw)


On Thu, Aug 10, 2017@12:45:36PM -0400, Keith Busch wrote:
> On Tue, Aug 01, 2017@03:58:10PM +0300, Max Gurtovoy wrote:
> > Hi all,
> > 
> > I would like to report a bug that reproduced by the following steps (I'm
> > using 4.13.0-rc3+):
> > 
> > 1. modprobe nvme
> > 2. echo 0 >  /sys/block/nvme0n1/device/device/enable
> > 3. nvme list (stuck for more than 1-2 mins)
> > 4. modprobe -r nvme (stuck forever)
> > 
> > log:
> > 
> > [ 1342.388888] nvme nvme0: controller is down; will reset: CSTS=0x3,
> > PCI_STATUS=0x10
> > [ 1476.021392] INFO: task kworker/u98:1:436 blocked for more than 120
> > seconds.
> > [ 1476.029072]       Not tainted 4.13.0-rc3+ #19
> > [ 1476.033878] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > this message.
> > [ 1476.042505] kworker/u98:1   D    0   436      2 0x00000000
> > [ 1476.048569] Workqueue: nvme-wq nvme_reset_work [nvme]
> > [ 1476.054133] Call Trace:
> > [ 1476.056862]  __schedule+0x1dc/0x780
> > [ 1476.060706]  schedule+0x36/0x80
> > [ 1476.064180]  blk_mq_freeze_queue_wait+0x4b/0xb0
> > [ 1476.069175]  ? remove_wait_queue+0x60/0x60
> > [ 1476.073693]  nvme_wait_freeze+0x33/0x50 [nvme_core]
> > [ 1476.079068]  nvme_reset_work+0x6b9/0xc40 [nvme]
> > [ 1476.084075]  ? __switch_to+0x23e/0x4a0
> > [ 1476.088209]  process_one_work+0x149/0x360
> > [ 1476.092625]  worker_thread+0x4d/0x3c0
> > [ 1476.096692]  kthread+0x109/0x140
> > [ 1476.100247]  ? rescuer_thread+0x380/0x380
> > [ 1476.104664]  ? kthread_park+0x60/0x60
> > [ 1476.108698]  ret_from_fork+0x25/0x30
> 
> This looks like a path does not pair the freeze start with the reset's
> freeze wait. I'll have to see what the pci 'enable' sysfs entry does.

I see how the freeze start/stops are not paired in this scenario:
nvme_dev_disable doesn't start the freeze if the pci device isn't
disabled. It uses this to know if it is disabling the device twice.

In this test, though, you are disabling the pci device without the
driver's knowledge, so that breaks that logic. In light of that, we'll
need different criteria to know when the driver should start a freeze.
I'll test some things out and send a patch.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Bug Report: can't unload nvme module in case of disabled device
  2017-08-10 19:17   ` Keith Busch
@ 2017-08-10 19:34     ` Keith Busch
  0 siblings, 0 replies; 9+ messages in thread
From: Keith Busch @ 2017-08-10 19:34 UTC (permalink / raw)


On Thu, Aug 10, 2017@03:17:17PM -0400, Keith Busch wrote:
> On Thu, Aug 10, 2017@12:45:36PM -0400, Keith Busch wrote:
> > On Tue, Aug 01, 2017@03:58:10PM +0300, Max Gurtovoy wrote:
> > > Hi all,
> > > 
> > > I would like to report a bug that reproduced by the following steps (I'm
> > > using 4.13.0-rc3+):
> > > 
> > > 1. modprobe nvme
> > > 2. echo 0 >  /sys/block/nvme0n1/device/device/enable
> > > 3. nvme list (stuck for more than 1-2 mins)
> > > 4. modprobe -r nvme (stuck forever)
> > > 
> > > log:
> > > 
> > > [ 1342.388888] nvme nvme0: controller is down; will reset: CSTS=0x3,
> > > PCI_STATUS=0x10
> > > [ 1476.021392] INFO: task kworker/u98:1:436 blocked for more than 120
> > > seconds.
> > > [ 1476.029072]       Not tainted 4.13.0-rc3+ #19
> > > [ 1476.033878] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > > this message.
> > > [ 1476.042505] kworker/u98:1   D    0   436      2 0x00000000
> > > [ 1476.048569] Workqueue: nvme-wq nvme_reset_work [nvme]
> > > [ 1476.054133] Call Trace:
> > > [ 1476.056862]  __schedule+0x1dc/0x780
> > > [ 1476.060706]  schedule+0x36/0x80
> > > [ 1476.064180]  blk_mq_freeze_queue_wait+0x4b/0xb0
> > > [ 1476.069175]  ? remove_wait_queue+0x60/0x60
> > > [ 1476.073693]  nvme_wait_freeze+0x33/0x50 [nvme_core]
> > > [ 1476.079068]  nvme_reset_work+0x6b9/0xc40 [nvme]
> > > [ 1476.084075]  ? __switch_to+0x23e/0x4a0
> > > [ 1476.088209]  process_one_work+0x149/0x360
> > > [ 1476.092625]  worker_thread+0x4d/0x3c0
> > > [ 1476.096692]  kthread+0x109/0x140
> > > [ 1476.100247]  ? rescuer_thread+0x380/0x380
> > > [ 1476.104664]  ? kthread_park+0x60/0x60
> > > [ 1476.108698]  ret_from_fork+0x25/0x30
> > 
> > This looks like a path does not pair the freeze start with the reset's
> > freeze wait. I'll have to see what the pci 'enable' sysfs entry does.
> 
> I see how the freeze start/stops are not paired in this scenario:
> nvme_dev_disable doesn't start the freeze if the pci device isn't
> disabled. It uses this to know if it is disabling the device twice.
> 
> In this test, though, you are disabling the pci device without the
> driver's knowledge, so that breaks that logic. In light of that, we'll
> need different criteria to know when the driver should start a freeze.
> I'll test some things out and send a patch.

This should fix it for your scenario, but I am not completely sure this
can't get the freeze depth higher than we need.

---
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index cd888a4..ca03980 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2006,12 +2006,14 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
 	struct pci_dev *pdev = to_pci_dev(dev->dev);
 
 	mutex_lock(&dev->shutdown_lock);
+
+	if (dev->ctrl.state == NVME_CTRL_LIVE ||
+	    dev->ctrl.state == NVME_CTRL_RESETTING)
+		nvme_start_freeze(&dev->ctrl);
+
 	if (pci_is_enabled(pdev)) {
 		u32 csts = readl(dev->bar + NVME_REG_CSTS);
 
-		if (dev->ctrl.state == NVME_CTRL_LIVE ||
-		    dev->ctrl.state == NVME_CTRL_RESETTING)
-			nvme_start_freeze(&dev->ctrl);
 		dead = !!((csts & NVME_CSTS_CFS) || !(csts & NVME_CSTS_RDY) ||
 			pdev->error_state  != pci_channel_io_normal);
 	}
--

^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-08-14 20:24 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-01 12:58 Bug Report: can't unload nvme module in case of disabled device Max Gurtovoy
2017-08-10  8:59 ` Christoph Hellwig
2017-08-10 17:04   ` Max Gurtovoy
2017-08-10 19:36     ` Keith Busch
2017-08-13  8:29       ` Max Gurtovoy
2017-08-14 20:24         ` Keith Busch
2017-08-10 16:45 ` Keith Busch
2017-08-10 19:17   ` Keith Busch
2017-08-10 19:34     ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox